Question

In: Computer Science

#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### In [40]: #importing file users = pd.read_table('u.user', sep='|', index_col='user_id') Describe and show...

#########################PANDAS LANGUAGE##################

#########################MATPLOT LIB#########################

In [40]: #importing file

users = pd.read_table('u.user', sep='|', index_col='user_id')

Describe and show the dataframe

In [ ]:

 
# describe information of all columns
# describe information of all numeric columns only
# describe information of all object columns only
# show first 10 rows of users dataframe

detecting duplicate rows

In [10]:

 
# check wheather a row is identical to a previous row
# count all duplicate rows in the dataframe
# show only duplicate rows in the dataframe
# drop all duplicate rows in the dataframe
# check a single specific column for duplicates occur or not
# check specify more than one column for finding duplicates

In [11]:

 
# display the 3 most frequent occupations in 'users'
# change the data type of a column name age from int to float
# for each occupation, calculate the minimum and maximum ages

In [12]:

 
# for each occupation in 'users', count the number of occurrences
# plot barchar of upper out w.r.t each occupation 

In [13]:

 
# for each occupation, calculate the mean age
# plot pie chart of the upper output

In [14]:

 
# for each combination of occupation and gender, calculate the mean age
# plot barchar of upper out w.r.t each occupation and gender 

In [15]:

 
# sort 'users' by 'occupation' and then by 'age' (in a single command)

Solutions

Expert Solution

Describe and show the dataframe:
df.describe()

Describe information of all columns:
df.describe(self, percentiles=None, include=None, exclude=None)

Describe information of all numeric columns only:
df.describe(include=[np.number])

Describe information of all object columns only:
df.describe(include=[object])

Show first 10 rows of users dataframe:
df.head(10)

Show duplicated rows:
df[df.duplicated()]


Check wheather a row is identical to a previous row:
#The following prints another column that tells us if the current row is identical to the previous one.
​df.col.eq(df.col.shift())

OR

def compare_previous(a):
return np.concatenate(([False],a[1:] == a[:-1]))

df['match'] = compare_previous(df.col.values)

Count all duplicate rows in the dataframe:
​df.duplicated(subset='one', keep='first').sum()

Show only duplicate rows in the dataframe:
df[df.duplicated(['Name_of_column'])]

Drop all duplicate rows in the dataframe:
​df.drop_duplicates()

Check a single specific column for duplicates occur or not:
df.duplicated(subset=['Name_of_column'])

Check more than one column for finding duplicates:
df.duplicated(subset=['column_1','column_2'], keep=False)]

Display the 3 most frequent occupations in 'users':
users.occupation.value_counts().head(3)

Change the data type of a column name age from int to float:
​df.column_name.astype(float)

For each occupation, calculate the minimum and maximum ages:
​users.groupby('occupation').age.agg(['min', 'max'])

For each occupation in 'users', count the number of occurrences:
​users.occupation.value_counts()

Plot barchart of upper output w.r.t each occupation:
​df = df.sort_values('occupation')
plt.bar('Education', 'Salary',data=df)

For each occupation, calculate the mean age:
​users.occupation.age.mean()

Plot pie chart:
​fig = plt.figure(figsize =(15, 12))
plt.pie(df, labels = occupation)

For each combination of occupation and gender, calculate the mean age:
​users.groupby(['occupation', 'gender']).age.mean()

Plot barchart w.r.t each occupation and gender:
plt.bar(occupation, gender)
plt.xlabel("Occupations")
plt.ylabel("Gender")
plt.show()

Sort 'users' by 'occupation' and then by 'age' (in a single command):
​users.sort(['occupation', 'age'])


Related Solutions

#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe...
#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe #rename the column Runtime (Minutes) with Runtime_Minutes, and Revenue (Millions) with Revenue_Millions # show if any column has null value # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if column is numerical than fill with means (if there is no numerical missing value in #data frame then don't code in...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT