In: Statistics and Probability
* Descriptive statistics:
o Find the mean, median, mode, range, and standard deviation for the data set.
o Create a scatter plot for the data set.
* Regression Analysis:
o Perform a linear regression analysis onto the data set.
o Report the correlation coefficient, the equation of the regression function, and make a few predictions base on hypnotical input values.
o Write down a summary of your conclusions (How well does the regression fit the values? How correlated are the independent and dependent variables? What does the prediction tell you?).
X | Y |
108 | 392.5 |
19 | 46.2 |
13 | 15.7 |
124 | 422.2 |
40 | 119.4 |
57 | 170.9 |
23 | 56.9 |
14 | 77.5 |
45 | 214 |
10 | 65.3 |
5 | 20.9 |
48 | 248.1 |
11 | 23.5 |
23 | 39.6 |
7 | 48.8 |
2 | 6.6 |
24 | 134.9 |
6 | 50.9 |
3 | 4.4 |
23 | 113 |
6 | 14.8 |
9 | 48.7 |
9 | 52.1 |
3 | 13.2 |
29 | 103.9 |
7 | 77.5 |
4 | 11.8 |
20 | 98.1 |
7 | 27.9 |
###R code
x=c(108,19,13,124,40,57,23,14,45,10,5,48,11,23,7,2,24,6,3,23,6,9,9,3,29,7,4,20,7)
x
y=c(392.5,46.2,15.7,422.2,119.4,170.9,56.9,77.5,214,65.3,20.9,248.1,23.5,39.6,48.8,6.6,134.9,50.9,4.4,113,14.8,48.7,52.1,13.2,103.9,77.5,11.8,98.1,27.9)
y
####Descriptive Statistics#########
###For X
summary(x)
mean(x)
mode(x)
median(x)
range(x)
a=var(x)
sd=sqrt(a)###standard deviation
sd
####For Y
summary(y)
mean(y)
mode(y)
median(y)
range(y)
a=var(y)
sd=sqrt(a) ###standard deviation
sd
### a scatter plot
plot(x,y)
###a linear regression analysis#############
fit=lm(y~x)
summary(fit)
###the correlation coefficient, the equation of the regression function
cor(x,y)
#####answer###
> x=c(108,19,13,124,40,57,23,14,45,10,5,48,11,23,7,2,24,6,3,23,6,9,9,3,29,7,4,20,7)
> x
[1] 108 19 13 124 40 57 23 14 45 10 5 48 11 23 7 2 24 6 3
[20] 23 6 9 9 3 29 7 4 20 7
> y=c(392.5,46.2,15.7,422.2,119.4,170.9,56.9,77.5,214,65.3,20.9,248.1,23.5,39.6,48.8,6.6,134.9,50.9,4.4,113,14.8,48.7,52.1,13.2,103.9,77.5,11.8,98.1,27.9)
> y
[1] 392.5 46.2 15.7 422.2 119.4 170.9 56.9 77.5 214.0 65.3 20.9 248.1
[13] 23.5 39.6 48.8 6.6 134.9 50.9 4.4 113.0 14.8 48.7 52.1 13.2
[25] 103.9 77.5 11.8 98.1 27.9
> ####Descriptive Statistics#########
> ###For X
>
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.0 7.0 13.0 24.1 24.0 124.0
> mean(x)
[1] 24.10345
> mode(x)
[1] "numeric"
> median(x)
[1] 13
> range(x)
[1] 2 124
> a=var(x)
> sd=sqrt(a)
> sd
[1] 29.37728
> ####For Y
>
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.40 23.50 52.10 93.77 113.00 422.20
> mean(y)
[1] 93.76897
> mode(y)
[1] "numeric"
> median(y)
[1] 52.1
> range(y)
[1] 4.4 422.2
> a=var(y)
> sd=sqrt(a)
> sd
[1] 106.1344
> ### a scatter plot
> plot(x,y)
> summary(fit)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-50.335 -18.669 -6.492 18.836 71.300
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.0193 7.1630 1.399 0.173
x 3.4746 0.1905 18.242 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 29.61 on 27 degrees of freedom
Multiple R-squared: 0.925, Adjusted R-squared: 0.9222
F-statistic: 332.8 on 1 and 27 DF, p-value: < 2.2e-16
> cor(x,y)
[1] 0.9617441
###conclusions:
R-sequared Value is 0.925 that means the model explains 92.5% variation
in the data ,thus our model is well fitted
& the correlation of our data is perfect positive
thus,our model is adequate.