In: Statistics and Probability
On the last page of this file you will find the Excel output for a multiple linear regression model. The model was built in an attempt to better understand why students at area high schools perform differently on the state high school mathematics test. The average test score for a class of students is what we are trying to predict. In our attempt to understand why these test scores differ, we use 3 independent variables: a rating (0-100) for the quality of the math degree obtained by the instructor, the age of the instructor, and the salary (in thousands) of the instructor.
SUMMARY OUTPUT |
|||||
Regression Statistics |
|||||
Multiple R |
0.597512233 |
||||
R Square |
0.357020869 |
||||
Adjusted R Square |
0.303439274 |
||||
Standard Error |
7.724526046 |
||||
Observations |
40 |
||||
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
3 |
1192.732105 |
397.5774 |
6.663125 |
0.001076925 |
Residual |
36 |
2148.058895 |
59.6683 |
||
Total |
39 |
3340.791 |
|||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
|
Intercept |
35.67761801 |
7.278849159 |
4.901547 |
2.03E-05 |
20.9154278 |
Math Degree |
0.247481581 |
0.069845662 |
3.543263 |
0.001115 |
0.105828014 |
Age |
0.244830604 |
0.185213036 |
1.321886 |
0.194545 |
-0.130798841 |
Income |
0.133296712 |
0.152818937 |
0.872253 |
0.388851 |
-0.176634456 |
We are already given the summary of regression and ANOVA output of the model.
Here, the average test score for a class is the response variable
And quality rating for the maths degree of instructor, age and the salary of instructor.
The estimated regression line is given as,
Avg test score = 35.67761801 + 0.247481581 Math degree rating + 0.244830604 Age + 0.133296712 Income
To estimate the average math score for a class of students whose instructor is 52 years old, earns $48,000, and got her degree in a math program rated 72.
Put Math degree rating = 72, Age = 52, Income = 48000
Avg test score = 35.67761801 + 0.247481581*72 + 0.244830604*52 + 0.133296712*48000 = 72.6257
The estimated average math score for a class of students whose instructor is 52 years old, earns $48,000, and got her degree in a math program rated 72 is 72.6257.
We are given the value of adjusted R-squared = 0.303439274
We know that 100R2 % of the total variation is explained by the model.
Hence, 30.344% of the variations in math scores can be explained by the model.
To test: H0: The regression model is not significant.
H1: The regression model is significant.
Test statistic:
Under H0 test statistic follows F-distribution with (3,36) degrees of freedom.
According to the ANOVA output, F = 6.663125
And P-value for the test is 0.001076925.
For 0.01 level of significance, P-value < 0.01
Hence, we reject H0 at 0.01 level of significance.
Hence, the regression model is significant. That is the model should be retained for further analysis.
Let us test if the ith regression coefficient is significant or not.
To test: H0: =0 versus H1: 0
Test statistic:
Under H0 the test statistic follows t distribution with (n-k-1)=36 degrees of freedom.
If the p-value< 0.01 then, we reject H0 at 0.01 level of significance.
Here, the p-value for math degree is less than 0.01, hence we say that math degree is significant in the model.
But the p-values for Age and salary are greater than 0.01, hence they are insignificant in the model.
Quality of the math degree appear to be significant to the model.
Age and income of instructor appears to be insignificant.
I hope you find the solution helpful. If you have any doubt then feel free to ask in the comment section.
Please do not forget to vote the answer. Thank you in advance!!!