In: Statistics and Probability
A study is conducted to determine the relationship between a driver's age and the number of accidents he or she has over a 1-year period. The data are shown here. If there is a significant relationship, predict the number of accidents of a driver who is 28.
Driver's | No. of |
Age x | accidents y |
16 | 3 |
24 | 2 |
18 | 5 |
17 | 2 |
23 | 0 |
27 | 1 |
32 | 1 |
Predict y' for a specific value of x, if the relationship is valid. Here x = a driver who is 28. Find the value of y'.
Select one:
a. Since the regression equation is not significant, the value cannot be determined for y'.
b. Since the regression equation is significant, the value of y' is equal to 1.0532.
c. 11.8823.
d. 16.1667.
Answer is
a. Since the regression equation is not significant, the value cannot be determined for y.
Detailed explanation is given below.
Define
X: Age of a driver.
Y: Number of accidents a driver has over 1-year period.
First we will compute correlation coefficient between X and Y using regular formula. Correlation coefficient comes out to be -0.61 which implies that X and Y are negatively correlated but 0.61 is not that strong relationship (-0.8 or 0.8 is strong relationship).
We will fit a simple linear regression line either by Least square method or using R software. We will do it using R software here. Codes are given below
R codes for simple linear regression
>x=c(16,24,18,17,23, 27,32)
>y=c(3,2,5,2,0,1,1)
>cor(x,y)
>m=lm(y~x)
>summary(m)
cor(x,y) is the command for correlation coefficient between X and Y and lm(y~x) is the command for simple linear regression. R software output for the above codes is given in the image below
If you look at the output P-value associated with X is 0.1458 which is greater than 0.05 level of significance which implies that X is not significant to the model. Even if you look at the Multiple R-squared which is 0.3722 which implies only 37.22% of variation is explained by regression line hence the fit is not good. So the answer is since the regression equation is not significant, the value cannot be determined for y.
This answers your question. If you understood, please rate positively.