In: Statistics and Probability
The SAT and the ACT are the two major standardized tests that colleges use to evaluate candidates. Most students take just one of these tests. However, some students take both. The data data311.dat gives the scores of 60 students who did this. How can we relate the two tests? (a) Plot the data with SAT on the x axis and ACT on the y axis. Describe the overall pattern and any unusual observations. (b) Find the least-squares regression line and draw it on your plot. Give the results of the significance test for the slope. (Round your regression slope and intercept to three decimal places, your test statistic to two decimal places, and your P-value to four decimal places.) ACT = + (SAT) t = P = (c) What is the correlation between the two tests? (Round your answer to three decimal places.)
obs sat act 1 1031 23 2 801 17 3 663 12 4 1096 27 5 693 17 6 906 22 7 708 17 8 1180 26 9 914 19 10 1099 25 11 775 20 12 1194 27 13 1009 21 14 899 22 15 833 18 16 1087 22 17 802 18 18 901 18 19 877 21 20 1049 20 21 868 17 22 792 17 23 1008 17 24 1167 25 25 554 10 26 1045 20 27 1206 28 28 875 22 29 798 19 30 1060 21 31 1124 26 32 1176 25 33 1068 23 34 732 12 35 741 14 36 969 22 37 593 12 38 613 19 39 619 14 40 1122 24 41 911 18 42 787 16 43 1033 26 44 781 14 45 941 26 46 989 24 47 756 15 48 1043 24 49 647 10 50 817 17 51 357 9 52 1157 27 53 1115 25 54 904 19 55 1094 27 56 837 19 57 573 12 58 749 18 59 1203 25 60 895 23
(a) Creating a scatter plot, with SAT on the x-axis and ACT on the y-axis,
(b) The fitted least-squares regression line can be expressed as:
Let ACT = y (say) and SAT = x (say)
where Intercept estimate (where, are the means of ACT and SAT values respectively) and the estimated slope coefficient
Substituting the data values,
= 0.023
And = 0.641
Hence, the fitted regression line is obtained as:
Plotting the line in the scatter plot in (a):
To test the significance of the slope coefficient,
Vs
the appropriate test statistic is given by the formula:
where
where, is the predicted ACT value, obtained by substituting the given SAT values in the fitted least square regression line; for i = 1,2,...,60
Substituting the values from the given data, we get,
SE(b) = 0.00151
Substituting the values in the test statistic:
...................(Using excel, without rounding the intermediate values, exact computed answer would be 15.072)
Choosing a 5% level of significance, the p-value for the test can be obtained by looking for the range of values in the t table that contains the test statistic value corresponding to 60 - 1 = 59 degrees of freedom. We would get a range of significance levels (alphas) within which the p-value would lie. To obtain an exact p-value, we may use the excel function:
We get p-value = 0.0000
Since, the p-value = 0.0000< 0.05, is significant at 5% level, we may reject H0, concluding that the slope coefficient is significant.
(c) The correlation between the two continuous variable can be best measured using the Pearson's correlation coefficient (-1<r<1), given by the formula:
= 0.893
From the coefficient obtained, we may conclude that there exist a strong positive linear relationship between the variables ACT and SAT.