In: Statistics and Probability
The standard project is to use multiple regression analysis to analyze a data set. The data set is a study of student persistent enrolling in the next semester based on Gender, Age, GPA, a 22 questionnaire on self-efficacy, and student enrollment status.
The educational researcher wants to study the relationship between student enrollment status as it relates to gender, age, GPA, and the total response to a 22 questionnaire survey.
a. The estimated multiple regression analysis equation. b. Does the model work? Research this question, use the Significance F value and compare it using p-value. c. How well does the model work? Research this question using p-values and R Square. d. Which variables contribute to the model? Research this question using p-values. e. General interpretation of the data and the data analysis.
Regression Statistics | ||||||||
Multiple R | 0.422451381 | |||||||
R Square | 0.178465169 | |||||||
Adjusted R Square | 0.148037953 | |||||||
Standard Error | 0.431974891 | |||||||
Observations | 113 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 4 | 4.377924324 | 1.094481081 | 5.86531378 | 0.000260992 | |||
Residual | 108 | 20.15304913 | 0.186602307 | |||||
Total | 112 | 24.53097345 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 0.327601267 | 0.33188857 | 0.9870821 | 0.325808653 | -0.330259456 | 0.985461991 | -0.330259456 | 0.985461991 |
Gender | -0.30076921 | 0.083982457 | -3.581333792 | 0.000513969 | -0.467237009 | -0.134301411 | -0.467237009 | -0.134301411 |
Age | 0.006147102 | 0.09460487 | 0.064976589 | 0.94831276 | -0.181376164 | 0.193670367 | -0.181376164 | 0.193670367 |
GPA | 0.165452103 | 0.054477398 | 3.037077944 | 0.002995123 | 0.05746845 | 0.273435756 | 0.05746845 | 0.273435756 |
Total Q | 0.000365667 | 0.003539872 | 0.103299547 | 0.917916809 | -0.006650973 | 0.007382307 | -0.006650973 | 0.007382307 |
Part a )
The regression equation is :
Enrollment status = 0.3276-0.3008*Gender+0.006147*Age+0.1655*Gpa+0.000366*TotalQ
The coefficient values are obtained from the following part of the output:
.
part b)
The model got F value = 5.8653 and a P value = 0.000261
Since the P value 0.000261 < alpha 0.05 , this implies that the model has a significant effect in predicting the value of enrollment status
.
Part c)
The P value obtained suggest that the model is significant in predicting the values of the response variable
But the value of r square = 0.1785 ( rounded to 4 decimal values) is very small
A r square value 0.1785*100 = 17.85% implies that the model is able to explain on 17.85% of the variation in the value of the response variable
Hence in true sense though the model is statistically significant , it doesnot actually help is explaining the variation in the response variable student enrollment status
.
Part d)
The variables with P value < 0.05 contribute to the model significantly
Hence variables: Gender and GPA have got the respective P values < 0.05., hence they have significant influence on the model
The variables: Age and Total Q have respectively a larger p value > 0.05 , hence they so not have significant influence on the model
.
Part e)
Total 113 observations are taken to predict the relation of response variable enrollment status based on the independent variables, age, gender , GPA and Total Q. The P value obtained for model is 0.000261 with F value 5.8653 thus indicating that the model is significant. If we observe the variables individually, once can observe that P values for Gender and GPA are less than 0.05 thus indicating that these variables have significant influence on the model. The R square value of the model is : 17.85% which implies only 17.85% of the variation in the enrollment status is explained by this model.