In: Statistics and Probability
Consider the data in the Excel file Olympic Track and Field Results. The Olympic records in Discus Throw, High Jump and Long Jump all show a clear increasing trend. Can one be predicted from the others(s)? Run a regression analysis using Discus Throw as the dependent variable and High Jump as the independent variable. Interpret the key regression results.
Dataset - https://easyupload.io/wc1oz2
The regression analysis for discuss throw as dependent variable and High jump as independent variable is given below in the attached images
Analysis from the Summary Output
1. Multiple R value is 0.964358 which indicates
Multiple R value is the correlation coefficient which measures the strength of the linear relationship between independent and dependent variable. The values range between -1 and 1 including them
The relationship is stronger for higher values of R
Our Multiple R value is 0.96 which indicates a very good relationship between our variables Discus throw and High Jump
2. R2 Value is 0.929987
R2 value suggests how much % of dependent values are explained by independent variables
Here 92.99% of Discus throw values are explained by High jump values which is a good fit
3. Standard error is 128.1227 which represents the distance the points are away from the regression line which is low when we compare with the dependent variable values which are each more than 1000
ANOVA analysis
1. SS value is the sum of squares. The residual SS component value should be low for the model to be a good fit. Our residual SS is 393970.2 which is very low when compared to the total 5627121 indicating this is a good fit
2. Significance F value is 2.31-15 . If Significance F value is less than 0.05 (our significance level), it shows that results are statistically significant. Our Significant F value is way less than 0.05 hence the results are statistically significant
Coefficient analysis
The regression equation is
Discuss Throw = (High Jump * 58.71) - 2712.54
The P-values for Intercept and High jump are 7.08704-10 and 2.3118-15 which are way less than 0.05 (our significance level) indicating that the predictor variable or independent variable High jump is statistically significant
Residuals
The actual values and the predicted values (from the regression) will not be equal and their difference is residuals. Residuals tells how far the actual values are from the predicted values using regression equation. The difference arises because independent variables will not perfectly predict the dependent variable