In: Statistics and Probability
Predict Y based on the datasets in X.
X |
Y |
31 |
65 |
39 |
55 |
41 |
32 |
44 |
60 |
47 |
78 |
48 |
59 |
55 |
61 |
65 |
60 |
15 |
23 |
19 |
52 |
a) Construct a scatter plot. Describe the relation between the two variables.
b) Calculate and interpret the correlation coefficient value.
c) Find the equation of the least-squares regression line.
d) What would you predict when independent value is 25?
e) Find and interpret the value of r2.
Sol:
with lm function in R to get the regression of y on x
coeffcient fucntion to get the coeffcients
plot function to get scatterplot
Rcode:
df1 =read.table(header = TRUE, text ="
x y
31 65
39 55
41 32
44 60
47 78
48 59
55 61
65 60
15 23
19 52
"
)
df1
plot(y ~ x, data = df1, xlab = "x", ylab = "y",pch=16)
From scatterplot we observe that there exists a positive linear relationship between x and y
as x increases ,y increases and vice versa
b) Calculate and interpret the correlation coefficient value.
x y xbar ybar x_xbar y_ybar x_xbary_ybar x_xbarsq y_ybarsq
1 31 65 40.4 54.5 -9.4 10.5 -98.7 88.36 110.25
2 39 55 40.4 54.5 -1.4 0.5 -0.7 1.96 0.25
3 41 32 40.4 54.5 0.6 -22.5 -13.5 0.36 506.25
4 44 60 40.4 54.5 3.6 5.5 19.8 12.96 30.25
5 47 78 40.4 54.5 6.6 23.5 155.1 43.56 552.25
6 48 59 40.4 54.5 7.6 4.5 34.2 57.76 20.25
7 55 61 40.4 54.5 14.6 6.5 94.9 213.16 42.25
8 65 60 40.4 54.5 24.6 5.5 135.3 605.16 30.25
9 15 23 40.4 54.5 -25.4 -31.5 800.1 645.16 992.25
10 19 52 40.4 54.5 -21.4 -2.5 53.5 457.96 6.25
r=(x-xbar)*(y-ybar)/sqrtsum((x-xbar)^2)*sum((y-ybar)^2))
r= 1180/sqrt(2126.4* 2290.5)
r=0.534680
Intrepretation:
There exists a moderate positive relationship between x and y.
As x increasess,Y increases and vice vers
sOLUTIONC
Rcode:
df1 =read.table(header = TRUE, text ="
x y
31 65
39 55
41 32
44 60
47 78
48 59
55 61
65 60
15 23
19 52
"
)
df1
plot(y ~ x, data = df1, xlab = "x", ylab = "y",pch=16)
cor(df1$x,df1$y)
linreg=lm(y~x,data=df1)
coefficients(linreg)
abline(coef(linreg)[1:2])
## rounded coefficients for better output
cf <- round(coef(linreg), 4)
## sign check to avoid having plus followed by minus for
negative coefficients
eq <- paste0("y = ", cf[1],
ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " x "
)
## printing of the equation
mtext(eq, 3, line=-2)
Output from regression of y on x
summary(linreg)
Call:
lm(formula = y ~ x, data = df1)
Residuals:
Min 1Q Median 3Q Max
-22.8330 -6.5139 0.7797 7.9072 19.8375
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.0809 13.3185 2.409 0.0426 *
x 0.5549 0.3101 1.790 0.1113
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.3 on 8 degrees of freedom
Multiple R-squared: 0.2859, Adjusted R-squared:
0.1966
F-statistic: 3.203 on 1 and 8 DF, p-value: 0.1113
From regression equation
(Intercept) x
32.0808879 0.5549285
y^= 32.0808879 +0.5549285 *x
Solution-d:
d) What would you predict when independent value is 25?
For x=25
y^= 32.0808879 +0.5549285 *x
=32.0808879 +0.5549285 *25
= 45.9541
e) Find and interpret the value of r2.
R sq=0.2859
28.59% variation in Y is explained by X
Nota good modela s explained variance by model is less