In: Statistics and Probability
A psychology professor is interested in the relationship between his students’ performance on two separate quizzes from his research methods course. He looks at the quiz scores for 16 different students, and notes each student's scores on the two quizzes (where each quiz was based on a 30-point scale). If you wanted to test the hypothesis to see if there was any linear relationship between these two quantitative variables, what would be the appropriate analysis? What would be the null and alternative hypotheses for this test? Would the alternative hypothesis be one-tailed (directional) or two-tailed (non-directional)? Why? If you wanted to fit a straight line through a scatterplot of these two variables and use that line for purposes of prediction, what analysis would provide you with that information?
Here, we take two variables for n=16 students, they are X: Performance score in the first quiz and Y: Performance score in the second quiz and X and Y are quantitative variables.
If we want to test the hypothesis to see if there was any linear relationship between these two quantitative variables X and Y, the appropriate analysis would be as follows:
1. By assuming linear relationship between X and Y, we find the product moment(Karpearson’s) coefficient of correlation (r) between X and Y.
2. We test the significance of the coefficient of correlation (r) statistically by framing the following hypotheses:
3. Null hypothesis H0: r is insignificant, that is r = 0, that is the linear relationship between X and Y is insignificant. (vs)
4. Alternative hypothesis Ha: r is significant, that is r ≠ 0, that is the linear relationship between X and Y is significant.
5. Since, Ha: r ≠ 0 then it is two-tailed(non-directional) test because either r>0 or r<0.
6. To test the above H0 (vs) Ha, we compute t test statistic = r *sqrt[(n-2)/(1-r^2)].
7. If the computed value of |t| is less than the critical value of t of (n-2) degree of freedom for two-tailed test at α level of significance, we accept H0, otherwise we reject H0 that is we accept Ha.
If you want to fit a straight line through a scatter plot of these two variables X and Y and use that line for purposes of prediction, the analysis we would perform as follows:
1. First we plot scatter plot for the given data by taking X on horizontal axis and Y on vertical axis with a suitable scale. We draw an approximate straight line between the plotted points on the graph. If more than half (that is in our case 9 or more), cluster very nearer around the straight line, then we expect a fairly or good linear relationship between X and Y.
2. If there exists a fairly or good linear relationship between X and Y by observing the scatter plot, we find the equation of the approximate straight line drawn in the scatter plot by establishing the linear equation Y=a+bX between X and Y with the help of principle of least squares.
3. The constants intercept of the line (a) and slope of the line (b) are found such that the sum of squares of differences between observed values of Y and estimated values of Y is least.
4. The fitted equation Y=a+bX is useful to predict the values of Y(score in the second quiz) for the given values X(score in the first quiz).
5. Also, additionally, if we want to check the goodness of straight line fit between X and Y, we may conduct F test and t test.
6. By assuming the straight line fit is not good under H0, F test statistic is calculated and if the calculated value of F is more than the critical value of F then H0 is rejected otherwise H0 is accepted.
7. Also by considering H0:a=0 and H0:b=0, we conduct two separate t tests, if the calculated values of t are more than their critical values then we reject H0, otherwise accept.
8. In the above F test and t tests, the rejection of the null hypothesis H0 indicates that there is a good linear fit between X and Y.