In: Math
5.232608753 | 51.33997 | 1 |
4.559347708 | 3.047033 | 0 |
4.550088246 | 11.71957 | 1 |
3.386566659 | 28.04548 | 1 |
0.989064618 | 0.202602 | 0 |
4.555668273 | 67.83218 | 1 |
4.186405129 | 53.06328 | 1 |
1.207150769 | 78.43352 | 0 |
3.445792543 | 14.46725 | 1 |
2.962975266 | 23.10411 | 0 |
0.173612404 | 65.70817 | 0 |
2.768815371 | 65.28198 | 1 |
2.747367434 | 97.82201 | 1 |
4.486882933 | 77.4523 | 1 |
4.824678695 | 0.743551 | 0 |
5.586206724 | 48.65186 | 1 |
2.755386381 | 73.45392 | 1 |
1.787901977 | 97.36504 | 1 |
5.951385802 | 90.85691 | 1 |
2.737556923 | 15.44293 | 0 |
5.408894983 | 4.157112 | 0 |
1.715859824 | 0.937882 | 0 |
1.278844906 | 74.59771 | 0 |
2.514277044 | 97.32341 | 1 |
3.187058008 | 38.67714 | 1 |
4.949777159 | 87.91089 | 1 |
5.948802076 | 99.45704 | 1 |
4.58854855 | 73.22006 | 1 |
4.944593251 | 2.002865 | 0 |
4.095092929 | 30.82503 | 1 |
1.580255616 | 81.42979 | 1 |
5.582168688 | 77.37155 | 1 |
1.409875297 | 73.8556 | 1 |
4.173571574 | 10.78412 | 0 |
3.405384527 | 76.08957 | 1 |
5.303746588 | 91.13028 | 1 |
2.646338619 | 30.76739 | 0 |
5.648448558 | 24.47563 | 0 |
5.460162608 | 6.448907 | 1 |
2.530400279 | 92.75311 | 1 |
5.282410782 | 26.05696 | 1 |
4.798709185 | 42.12116 | 1 |
4.300055705 | 57.20119 | 1 |
4.729502404 | 6.523547 | 0 |
2.476612604 | 55.6309 | 1 |
3.190133005 | 67.05927 | 1 |
1.021463153 | 77.07357 | 1 |
0.733750098 | 95.86227 | 1 |
2.724156232 | 4.533329 | 0 |
4.232730005 | 96.12467 | 1 |
m = r * std(y)/std(x)
In the equation above, std(y) represents the standard deviation of the y column of data and std(x) is the standard deviation of the x column of data.
Use the Pandas .corr() and .std() methods to compute the slope of the line of best fit between Diameter and Pigment(first & second col).
Next, use compute the y-intecept of the line of best fit using:
b = ybar – m*xbar
Lastly, plot the line of best fit using matplotlib.pyplot.
Assuming the data is already stored in data.csv by the column names of x and y
Code
#importing pandas as pd
import pandas as pd
#importing matplotlib,pyplot as plt
import matplotlib.pyplot as plt
from matplotlib import pylab
#making data frame from the csv file
df = pd.read_csv("data.csv)
#calculating the correlation
r = df.corr(method = 'pearson')
#calculating the standard deviation of all the variables in the dataset where axis = 0 calculation #column wise correlation and axis = 1 calculation row wise
sr = df.std(axis = 0, skipna = True)
#calculating the standard deviation of the specific column
stdy = df.loc[:,"y"].std()
stdx = df.loc[:,"x"].std()
m = r*stdy/stdx
#calculating mean of the y values
ybar = df.loc[:,"y"].mean()
xbar = df.loc[:,"x"].mean()
b = ybar – m*xbar
Y_pred = m*x + b
fig = plt.figure() ax = plt.axes()
ax.plot(x, Y_pred,linestyle = '-')
plot.show()
Output
r = -0.18959
std(y) = 33.68728
std(x) = 1.55705
m = -0.18959*33.68728 / 1.55705= -4.1019
ybar = average of y values = 51.32871
xbar = average of x values = 3.54132
b = ybar – m*xbar = 51.32871 - (-4.1019)*3.54132 = 65.85485