In: Computer Science
The purpose of this is to plot data using Matplotlib.
Description
complete the Jupyter notebook named main.ipynb that reads in the file diamonds.csv into a Pandas DataFrame. Information about the file can be found here:
-------
diamonds | R Documentation |
Prices of over 50,000 round cut diamonds
Description
A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:
Usage
diamonds
Format
A data frame with 53940 rows and 10 variables:
price
price in US dollars (\$326–\$18,823)
carat
weight of the diamond (0.2–5.01)
cut
quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color
diamond colour, from D (best) to J (worst)
clarity
a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
x
length in mm (0–10.74)
y
width in mm (0–58.9)
z
depth in mm (0–31.8)
depth
total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
table
width of top of diamond relative to widest point (43–95)
------
There are two figures that you need to create:
Figure 1:
Figure 2:
There are two figures that you need to create:
Figure 1:
Figure 2:
main.ipynb
is
Setup
The following code imports the required libraries and loads a dataset containing information about diamonds into a Pandas DataFrame. Information about the dataset can be found here.
In [ ]:import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
def normal_distribution(x, mu, sigma):
return 1/(sigma * np.sqrt(2*np.pi)) *
np.exp(-(x-mu)**2/(2*sigma**2))
df = pd.read_csv('diamonds.csv')
df.pop('Unnamed: 0');
df = pd.read_csv('diamonds.csv')
df.pop('Unnamed: 0');
Bar chart of average price per cut
Make a plot the meets the following criteria:
diamonds.csv is
https://forge.scilab.org/index.php/p/rdataset/source/tree/master/csv/ggplot2/diamonds.csv
1. The required source-code is given below:-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
def normal_distribution(x, mu, sigma):
return 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2/(2*sigma**2))
def plotFirst(df):
data=df['price'].where(df['cut']=='Fair').dropna().tolist()
# Calculating Normal Distribution
mean=np.mean(data)
stdv=np.std(data)
arr = []
for num in data:
a = normal_distribution(num,mean,stdv)
arr.append(a)
# Plotting the Graph
fig, axs = plt.subplots(1, 1, figsize=(20,20))
hist = axs.hist(arr, np.arange(min(arr),max(arr),(max(arr)-min(arr))/30))
axs.set_ylabel("Norm. Distrib. of fair-cut Diamonds")
axs.set_xlabel("Bins")
plt.show()
def plotSecond(df):
distinct=df['cut'].unique().tolist()
# Finding the Averages
arr = []
for cut in distinct:
avg = df['price'].where(df['cut']==cut).mean()
arr.append(avg)
# Plotting the Graph
b = (distinct,arr)
plt.bar(*b)
plt.show()
if __name__=="__main__":
df = pd.read_csv('diamonds.csv')
df.pop('Unnamed: 0')
plotFirst(df)
plotSecond(df)
2. Screenshots of the output are below:-