In: Computer Science
#########################PANDAS LANGUAGE##################
#########################MATPLOT LIB#########################
# read movie.csv into a DataFrame called 'movie' # describe the dataframe #rename the column Runtime (Minutes) with Runtime_Minutes, and Revenue (Millions) with Revenue_Millions
# show if any column has null value
# count total number of null vlaues in the dataframe
# print those rows which has null values
# fill null values,
#if column is numerical than fill with means (if there is no numerical missing value in
#data frame then don't code in this)
#if column is categorical than fill with most frequent value (if there is no categorical missing value in
#data frame then don't code in this)
# plot histogram of the column name year in movie dataframe, which shows how many movies release in a year.
# print the movie detail with title 'Grumpier Old Men'.
# show those movies which are released after 1995-01-01
# sort the movie DataFrame in decending order based on release_date
# for each year, display the total number of movie with specific gerne for example Action=1000,adventure=400
# plot histogram the upper calculated total count
# filter the movies with specific gerne # like show only those movies which are selected Action gerne
# filter the movies with specific gerne
# like show only those movies which are selected Action gerne
# for each Director, display all the movies with detail.
# count the movies and plot barchart top 10 director's movies.
# for each Actor, display all the movies with detail.
# count the movies and visualize the top 10 actor's movies in plot
In [27]:
data file
Rank | Title | Genre | Description | Director | Actors | Year | Runtime (Minutes) | Rating | Votes | Revenue (Millions) | Metascore | |
1 | Guardians of the Galaxy | Action,Adventure,Sci-Fi | A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe. | James Gunn | Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana | 2014 | 121 | 8.1 | 757074 | 333.13 | 76 | |
2 | Prometheus | Adventure,Mystery,Sci-Fi | Following clues to the origin of mankind, a team finds a structure on a distant moon, but they soon realize they are not alone. | Ridley Scott | Noomi Rapace, Logan Marshall-Green, Michael Fassbender, Charlize Theron | 2012 | 124 | 7 | 485820 | 126.46 | 65 | |
3 | Split | Horror,Thriller | Three girls are kidnapped by a man with a diagnosed 23 distinct personalities. They must try to escape before the apparent emergence of a frightful new 24th. | M. Night Shyamalan | James McAvoy, Anya Taylor-Joy, Haley Lu Richardson, Jessica Sula | 2016 | 117 | 7.3 | 157606 | 138.12 | 62 | |
4 | Sing | Animation,Comedy,Family | In a city of humanoid animals, a hustling theater impresario's attempt to save his theater with a singing competition becomes grander than he anticipates even as its finalists' find that their lives will never be the same. | Christophe Lourdelet | Matthew McConaughey,Reese Witherspoon, Seth MacFarlane, Scarlett Johansson | 2016 | 108 | 4.2 | 60545 | 270.32 | 59 | |
5 | Suicide Squad | Action,Adventure,Fantasy | A secret government agency recruits some of the most dangerous incarcerated super-villains to form a defensive task force. Their first mission: save the world from the apocalypse. | David Ayer | Will Smith, Jared Leto, Margot Robbie, Viola Davis | 2015 | 123 | 3.2 | 393727 | 325.02 | 40 | |
6 | The Great Wall | Action,Adventure,Fantasy | European mercenaries searching for black powder become embroiled in the defense of the Great Wall of China against a horde of monstrous creatures. | Yimou Zhang | Matt Damon, Tian Jing, Willem Dafoe, Andy Lau | 2014 | 103 | 6.1 | 56036 | 45.13 | 42 | |
7 | La La Land | Comedy,Drama,Music | A jazz pianist falls for an aspiring actress in Los Angeles. | Damien Chazelle | Ryan Gosling, Emma Stone, Rosemarie DeWitt, J.K. Simmons | 2013 | 128 | 5.3 | 258682 | 151.06 | 93 | |
8 | Mindhorn | Comedy | A has-been actor best known for playing the title character in the 1980s detective series "Mindhorn" must work with the police when a serial killer says that he will only speak with Detective Mindhorn, whom he believes to be a real person. | Sean Foley | Essie Davis, Andrea Riseborough, Julian Barratt,Kenneth Branagh | 2010 | 89 | 6.4 | 2490 | 71 |
# import necessary library
import pandas as pd
import matplotlib.pyplot as plt
# read movie.csv into a DataFrame called 'movie'
movie = pd.read_csv('movies.csv')
# describe the dataframe
movie.describe()
# rename the column Runtime (Minutes) with Runtime_Minutes, and Revenue (Millions) with Revenue_Millions
movie.rename(columns = {'Runtime (Minutes)':'Runtime_Minutes','Revenue (Millions)':'Revenue_Millions'}, inplace = True)
# show if any column has null value
movie.columns[movie.isna().any()].tolist()
# count total number of null vlaues in the dataframe
movie.isna().sum().sum()
# print those rows which has null values
movie[movie.isna().any(axis=1)]
# Fill null values : if column is numerical than fill with means
# Here 'Revenue_Millions', 'Metascore' these two columns are having null values and both are numerical.
movie.fillna(movie.mean(), inplace=True)
# plot histogram of the column name year in movie dataframe, which shows how many movies release in a year.
fig, ax = plt.subplots(figsize =(8, 6))
movie['Year'].hist()
fig.show()
# print the movie detail with title 'Grumpier Old Men'.
movie[movie.Title=='Grumpier Old Men']
# It seems there is no record with this title.
# show those movies which are released after 1995-01-01
movie[movie.Year>1995]
# sort the movie DataFrame in decending order based on release_date
movie.sort_values('Year', ascending=False)
# for each year, display the total number of movie with specific gerne for example
count = movie.groupby('Genre')['Genre'].count()
print(count)
#plot histogram the upper calculated total count
fig, ax = plt.subplots(figsize =(10, 7))
fig = count.hist()
plt.show()
# filter the movies with specific gerne # like show only those movies which are selected Action gerne
movie[movie['Genre']=='Action']
# for each Director, display all the movies with detail.
list_director = movie['Director'].unique().tolist()
for director in list_director:
df = movie[movie['Director'] == director]
print('---------------------------------'+director+'---------------------------------------')
print(df.head())
# count the movies and plot barchart top 10 director's movies.
top_10_directors = movie.groupby('Director')['Director'].count().sort_values(ascending=False).head(10)
print(top_10_directors)
fig, ax = plt.subplots(figsize =(8, 6))
top_10_directors.plot.bar()
plt.show()
# for each Actor, display all the movies with detail.
import itertools
list_actor = movie['Actors'].unique().tolist()
l=[]
for actor in list_actor:
l.append(actor.split(','))
list_flat = set(itertools.chain(*l))
#print(list_flat)
for actor in list_flat:
df = movie[movie['Actors'].str.contains(actor)]
print('---------------------------------'+actor+'---------------------------------------')
print(df.head())
# count the movies and visualize the top 10 actor's movies in plot
actor_list = []
count_list = []
for actor in list_flat:
length = movie[movie['Actors'].str.contains(actor)].shape[0]
actor_list.append(actor)
count_list.append(length)
actor_dict = {'Actor':actor_list, 'Count':count_list}
actor_df = pd.DataFrame.from_dict(actor_dict)
fig, ax = plt.subplots(figsize =(8, 6))
actor_df.sort_values('Count', ascending=False).head(10).plot.bar(x='Actor',y='Count')
fig.show()
All the questions are answered with proper comment.
Still if you have any queries, please feel free to post in comment box. I would be glad to assist you here. If you like my answers and explanation, please give a thumbs up, it really motivates us to provide a good quality answers.