In: Statistics and Probability
Question 1:A study is conducted to examine the influence of ‘screen time’on student performance on Statistics exams. A class of 12students is observed over a period of time, with the independent variable being the averageamount of time per day each student spends on TV/internet, and the dependent variable being their subsequent Statistics exam score, in %. The data is shown in the table below:StudentHrs/DayWatching TV or on InternetExam Score (%)1010024.67932.56547.22653.04163.99474.29483.95298.613103.984112.866124.061(a)Determine the equationof the line of best fit, relating Y = exam score (%) to X= hrs/dayspent watching TV or on the internet.(b)Use the line of best fit calculated in Part (a) to calculate an estimate for the exam score that a student would get, to the nearest %, ifthey spentan average of 2.5hrs/daywatching TV or on the internet. Repeat for an estimate of the exam score that would result after a student spent an average of 5.0hrs/daywatching TV or on the internet.(c)Plotthe raw datafrom the tableon an x-y graph, and then draw the line of best fit showing at least 2 calculated points that are on that line(Hint: your answers for Parts (a) and (b) provide you with 4such points).(d)What is the predicted exam score, to the nearest %,for a student whocompletely avoids the TV or internet? Repeat for a predicted exam score for a student who spends an average of 12.0hrs/daywatching TV or on the internet. Comment briefly on your answers for these two estimates, and what it implies about the limitationsof the linear regression model generated in Part (a).(e)Calculate the covariance between X and Y.(f)Calculate the correlation coefficient for this data set.(g)Calculate the coefficient of determination for this data set, and explain what its value means with respect to the line of best fit calculated in Part (a).(h)Conduct a hypothesis test on the significance of correlation between hrs/dayspent watching TV or on the internet, and exam performance, using the critical-value method at LOC = 95%.(i)Use the p-value method to determine the common values of LOC (if any) for which your decision in Part (h) would be that there is no significant correlation between X and Y, and the common LOC values(if any)for which the opposite decision would be
a)
The regression equation is defined as,
The least square estimate of intercept and slope are,
From the data values,
Student | X | Y | X^2 | Y^2 | X*Y |
1 | 0 | 100 | 0 | 10000 | 0 |
2 | 4.6 | 79 | 21.16 | 6241 | 363.4 |
3 | 2.5 | 65 | 6.25 | 4225 | 162.5 |
4 | 7.2 | 26 | 51.84 | 676 | 187.2 |
5 | 3 | 41 | 9 | 1681 | 123 |
6 | 3.9 | 94 | 15.21 | 8836 | 366.6 |
7 | 4.2 | 94 | 17.64 | 8836 | 394.8 |
8 | 3.9 | 52 | 15.21 | 2704 | 202.8 |
9 | 8.6 | 13 | 73.96 | 169 | 111.8 |
10 | 3.9 | 84 | 15.21 | 7056 | 327.6 |
11 | 2.8 | 66 | 7.84 | 4356 | 184.8 |
12 | 4 | 61 | 16 | 3721 | 244 |
Sum | 48.6 | 775 | 249.32 | 58501 | 2668.5 |
The least square regression model is,
b)
The least square regression model is,
For x = 2.5 hrs/day
For x = 5 hrs/day
c)
The scatter plot is obtained in excel by following these steps,
Step 1: Write the data values in excel with two added point calculated above.
Step 2: Scatter plot
Select the data values then INSERT> Recommended Charts> X Y Scatter > OK.
Step 3: To add Trend line, linear equation and R-square value in plot.
Click on Add Chart element> Trendline > More options>OK. The screenshot is shown below,
The Chart is obtained. The screenshot is shown below,
d)
The least square regression model is,
For x = 12 hrs/day
Which is unrealistic because the value of dependent variable is predicted for out of the range of independent variable.
e)
f)
The correlation coefficient is obtained using the formula,
g)
The coefficient of determination (R Square) is,
h)
The hypothesis test for significance of the correlation coefficient is performed in following steps,
Step 1: The null and alternative hypotheses are,
Step 2: The significance level,
The critical value for the t-statistic for degree of freedom = n-2=10
Step 3: The t-statistic is obtained using the formula,
Step 4:
Since the t-statistic is greater than t-critical value, the null hypothesis is rejected. Hence there is a significant correlation between X and Y.
i)
P-value method
The p-value for the t statistic is obtained from t distribution table for t = - 3.154 and degree of freedom = n-2=10
Since the p-value is less than 0.05 at 5% significance level, the null hypothesis is rejected