In: Statistics and Probability
Consider a study design in which we have collected multiple response measurements at each value of the predictor. Suppose we have ni observed responses at each value of xi, indexed by i=1,…,m, and yij corresponds to the j-th observation on the response, j=1,…,ni for the i-th value of the predictor. This means we have m unique predictor values, and ni response measurements for each of the m values of the predictor. In this situation, it is possible to create a test that can be used to test for how poorly the regression line captures the linear relationship.
(a) (4 points) Consider the traditional variance decomposition of a simple regression model: SST=SSReg+RSS. Show that we can further decompose the residual sum of squares into: the pure error (i.e. deviations of the individual responses from the average response at each unique value of the predictor), denoted by SSPure and the lack of fit error (i.e. deviations of the average response at each x value from the regression line), denoted by SSLack
. (b) (1 points) Determine the degrees of freedom for the pure error and the lack of fit error
. (c) (3 points) Determine the expected values of the mean squares of the pure error (MSPure) and the lack of fit error (MSLack). You may assume that model assumptions are satisfied.
(d) (2 points) The test statistic for this test is F=MSLackMSPure. Explain why this should follow an F distribution.
(e) (2 points) Based on the test statistic in (d) and the expected values in (c), explain why a large value of the test statistic implies that the true regression function is not linear, and thus the fit of our regression model is poor.