In: Computer Science
1. Find an example of data analytics project that has published
its data publicly and state the source.
2. Explain about the project and why it is interesting to
you?
3. Identify and explain the objective of the data analytics.
4. Identify and explain the data types.
5. Explain the data mining techniques involved
6. State the insights discovered in the project.
Answer 1) IMDB movie database Analysis of 100 years. IMDB has the data on its official website. We can download the data set from IMDB.com or it can be downloaded from Github.com.In this dataset you will try to find some interesting insights into a few movies released between 1916 and 2016, using Python. .Write Python code to explore the data, gain insights into the movies, actors, directors, and collections.
Answer2) The project has the movie dataset of 100 years from 1916 to 2016 with all its attributes such as ratings, budgets, directors, collections, top 250 movies of all time. It was interesting because it gave us insights into the movie database like never before. It also helped us to understand that movies may have a large number of Facebook likes but not good ratings, Ratings countrywide, Actorwise ratings, and director wise ratings.
Answer 3) The objective of the data analysis was to get insights from the data by :
Answers 4) Since we have built the project in python3.8, we will discuss the python built-in datatypes:
Python has the following data types built-in by default, in these categories:
Text Type: | str |
Numeric Types: | int , float , complex |
Sequence Types: | list , tuple , range |
Mapping Type: | dict |
Set Types: | set , frozenset |
Boolean Type: | bool |
Binary Types: | bytes , bytearray ,
memoryview |
Answer 5) Data mining techniques used here are OUTLIER DETECTION/ ANOMALY DETECTION.
This refers to the observation for data items in a dataset that do not match an expected pattern or an expected behavior. Anomalies are also known as outliers, novelties, noise, deviations and exceptions. Often they provide critical and actionable information. An anomaly is an item that deviates considerably from the common average within a dataset or a combination of data. These types of items are statistically aloof as compared to the rest of the data and hence, it indicates that something out of the ordinary has happened and requires additional attention.This technique can be used in a variety of domains, such as intrusion detection, system health monitoring, fraud detection, fault detection, event detection in sensor networks, and detecting eco-system disturbances. Analysts often remove the anomalous data from the dataset top discover results with an increased accuracy.
Answer6)
Damien Chazelle
(director of
Whiplash and La La Land) is in top 10 directors list.actor_1_name Leonardo DiCaprio 330.190476 Brad Pitt 245.000000 Meryl Streep 181.454545 Name: num_critic_for_reviews, dtype: float64 actor_1_name Leonardo DiCaprio 914.476190 Brad Pitt 742.352941 Meryl Streep 297.181818 Name: num_user_for_reviews, dtype: float64