In: Statistics and Probability
A Minnesota school district has a novel idea about a factor influencing school attendance: they think the more the school cafeteria serves “junkie” kid preferred food, the more they want to come to school. To test this, they take a sample of eight school days and measure the % of saturated fat in the school lunch and the % school attendance. The students are told in advance what the school lunch will be.
Evaluate the data and then explain if serving tasty, yet lousy,
food is related to attendance.
%
Saturated Fat
in
School
Lunch %
Attendance
13.3 35
24.9 80
9.0 10
34.5 75
36.1 85
22.1 75
22.7 70
24.5 80
1.State your null formally and in lay terms for a simple regression
2. Calculate r and the regression line (y = a + bx) and reject/accept at a=.05.
3. Explain your findings in lay terms using r-square, r, b
4. Calculate a 95% confidence interval for the slope and explain in layterms.
5. Calculate a 95% confidence interval for Y when X is “25” and explain in layterms.
1. Null Hypothesis , H0 : % Saturated Fat in School Lunch doesnt affect Attendance in school.
Null Hypothesis is the default Hypothesis , that is arrived without any tests. In this case, the null hypothesis is that % Saturated Fat in School Lunch doesnt affect Attendance in school. We have to do a regression test to check, if it actually affects or not.
Alternate Hypothesis, Ha : % Saturated Fat in School Lunch affects Attendance in school.
Alternate Hypothesis is the Hypothesis , that we want to prove in the test.
2.
% Saturated Fat in School Lunch | % Attendance |
13.3 | 35 |
24.9 | 80 |
9.0 | 10 |
34.5 | 75 |
36.1 | 85 |
22.1 | 75 |
22.7 | 70 |
24.5 | 80 |
correlation coefficient, r = 0.8572
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.857272 | |||||||
R Square | 0.734916 | |||||||
Adjusted R Square | 0.690735 | |||||||
Standard Error | 14.84423 | |||||||
Observations | 8 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 1 | 3665.394 | 3665.394 | 16.63434 | 0.006513 | |||
Residual | 6 | 1322.106 | 220.351 | |||||
Total | 7 | 4987.5 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 5.92091 | 15.11906 | 0.391619 | 0.70887 | -31.0741 | 42.91592 | -31.0741 | 42.91592 |
% Saturated Fat in School Lunch | 2.472649 | 0.606261 | 4.078522 | 0.006513 | 0.989182 | 3.956117 | 0.989182 | 3.956117 |
The regression equation is given as ,
Y (% Attendance) = 5.92091 + 2.472649 * X (% Saturated Fat in School Lunch).
Since the p value of the variable % Saturated Fat in School Lunch is 0.0065, i.e less than 0.05., the alternate hypothesis is accepted and it is established that the % Saturated Fat in School Lunch affects % Attendance.
3. In this case, r2 = 0.7349 = 73.49%.
It means that 73.49% of the variation of % Attendance is explained by the variation in % Saturated Fat in School Lunch, which is a good amount of explanation.
Also, slope coefficient of % Saturated Fat in School Lunch is 2.472649. It implies two things -
1. The slope is positive, which means % Saturated Fat in School Lunch affects school attendance in a positive way as assumed in the hypothesis.
2. It implies that for every 1% increase in saturated fats in school lunch, there is a corresponding 2.472649% change in the attendance
The correlation between the two variables is 0.8572, which says that there is a high positive correlation between these two variables.
4. Mean of Slope ,b = 2.47265
SE of slope = 0.60626.
For a 95 % confidence, Z score = 1.96.
Therefore, Upper interval of confidence interval of slope = Mean + 1.96 * SE of slope = 2.47265 + 1.96 * 0.60626. = 3.6609
Lower interval of confidence interval of slope = Mean - 1.96 * SE of slope = 2.47265 - 1.96 * 0.60626. = 1.284
These upper and lower bounds imply that slope lies in the range of (1.284, 3.6609) with a 95% probability .
Hope I answered your query. Do comment and like , if u like my reply.