Question

In: Computer Science

Read in the movies.csv into a dataframe named movies, display the first 5 rows and answer...

Read in the movies.csv into a dataframe named movies, display the first 5 rows and answer the below 10 questions

url = 'https://raw.githubusercontent.com/PacktPublishing/Pandas-Cookbook/master/data/movie.csv'

6) Use the count method to find the number of non-missing values for each column.

[ ]

7) Display the count of missing values for each column

[ ]

8) List the frequency for the top ten directors

[ ]

9) List the top ten director_name that has the highest average of director_facebook_likes

[ ]

10) List the top ten movie_title that has the longest duration

[ ]

Solutions

Expert Solution

Find the solutions below.

# 6) Use the count method to find the number of non-missing values for each column.

Count() method counts the number of values (not-null). axis = 0 is specified to do this column wise.

df.count(axis = 0)

# 7) Display the count of missing values for each column

isnull() method is used to check if a given data is null. Axis = 0 is applied to make it column wise.

df.isnull().sum(axis=0)

# 8) List the frequency for the top ten directors

First, we group the dataframe by the director_name. Then take count of rows per group. Count is then given the name frequency. Now, we sort the data frame by the frequency column in the descending order using sort_values() method. Again a new index is generated, which is deleted using reset_index(drop=True). Then select the top 10 by head(10).

df.groupby('director_name').size().reset_index(name='frequency').sort_values('frequency',ascending = False).reset_index(drop=True).head(10)

# 9) List the top ten director_name that has the highest average of director_facebook_likes

First we select only director_name and director_facebook_likes from the data rame. Then group by director_name. We select the average of facebook_likes by mean() method and then sort the frame in the descending order of mean of director_facebook_likes. Auto generated index is dropped and top 10 is printed using head() method.

df[['director_name', 'director_facebook_likes']].groupby('director_name').mean().reset_index().sort_values('director_facebook_likes', ascending = False).reset_index(drop=True).head(10)

# 10) List the top ten movie_title that has the longest duration

Sort the data frame by the duration column in the descending order. Reset the index to get it numbered from 0. Now select top 10 movie_title column.

df.sort_values('duration',ascending = False).reset_index()['movie_title'].head(10)


Related Solutions

Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows...
Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows and answer * Use the filter method to select the column names that contain the exact string facebook [ ] * Use the count method to find the number of non-missing values for each column. [ ] * Display the count of missing values for each column
JAVASCRIPT Create an array of 5 objects named "movies" Each object in the movies array, should...
JAVASCRIPT Create an array of 5 objects named "movies" Each object in the movies array, should have the following properties: Movie Name Director Name Year Released WasSuccessful (this should be a boolean and at least 2 should be false) Genre Loop through all of the objects in Array If the movie is successful, display all the movie information on the page. These movies were a success: Title: Forrest Gump Year Realeased: 1994 Director: Robert Zemeckis Genre: Comedy
#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe...
#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe #rename the column Runtime (Minutes) with Runtime_Minutes, and Revenue (Millions) with Revenue_Millions # show if any column has null value # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if column is numerical than fill with means (if there is no numerical missing value in #data frame then don't code in...
Write the code to create an array named movies and store three of your favorite movies...
Write the code to create an array named movies and store three of your favorite movies in the array. Only provide the array and code needed to put the movie names in the array. Do not include additional code
/*Question 3: The following data contains five columns (variables) and five rows (observations). First, read the...
/*Question 3: The following data contains five columns (variables) and five rows (observations). First, read the data into SAS to create a data set. Notice that the first, third, and the fifth variable have missing values. Please replace the missing values of the first, third, and fifth variable with 30, 40, and 50, respectively. Next, for all the variables, if a value is at least 100, make an adjustment to the value such that its new value is equal to...
Write a script to display the following patterns on the screen. Number of rows and columns...
Write a script to display the following patterns on the screen. Number of rows and columns are taken from the command arguments; if they are missing, set default to 3 (rows) and 4 (columns). Hint: you will use a nested loop. **** **** **** a) Display the source code in an editor (#4-11) b) Execute your script in the terminal, and display the command and the result (#4-12)
######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...
######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head and the tail # examine the default index, data types, and shape of ufo dataframe # count the number of missing values in each column # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if any column is numerical has null value than fill this column with mean of that column...
######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...
######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head and the tail # examine the default index, data types, and shape of ufo dataframe # count the number of missing values in each column # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if any column is numerical has null value than fill this column with mean of that column...
First Day on the Job Discussion Please read the case and answer the questions at the...
First Day on the Job Discussion Please read the case and answer the questions at the end. Please respond to two of your peers. Do you agree or disagree? Malik's first day as a new manager ended up more challenging than he expected. While having to adjust to a new workplace and new colleagues, he had an interesting management challenge thrown at him. Toward the end of the day, one of his employees came to him, looking frustrated and exhausted....
[PLEASE USE C++] Write a function to read values of a number of rows, number of...
[PLEASE USE C++] Write a function to read values of a number of rows, number of columns, 2 dimensional (2D) array elements and display the 2D array in a matrix form. Input 2 3 1 4 5 2 3 0 Where, First line of represents the number of rows. Second line of input represents the number of columns. Third line contains array elements of the 1st row and so on. Output 1 4 5 2 3 0 where There must...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT