Question

In: Statistics and Probability

Anscombe's Data Observation x1 y1 x2 y2 x3 y3 x4 y4 1 10 8.04 10 9.14...

Anscombe's Data
Observation x1 y1 x2 y2 x3 y3 x4 y4
1 10 8.04 10 9.14 10 7.46 8 6.58
2 8 6.95 8 8.14 8 6.77 8 5.76
3 13 7.58 13 8.74 13 12.74 8 7.71
4 9 8.81 9 8.77 9 7.11 8 8.84
5 11 8.33 11 9.26 11 7.81 8 8.47
6 14 9.96 14 8.1 14 8.84 8 7.04
7 6 7.24 6 6.13 6 6.08 8 5.25
8 4 4.26 4 3.1 4 5.39 19 12.5
9 12 10.84 12 9.13 12 8.15 8 5.56
10 7 4.82 7 7.26 7 6.42 8 7.91
11 5 5.68 5 4.74 5 5.73 8 6.89
  • Fit a simple linear regression model to each set of (x, y) data, i.e., one model fit to (x1, y1), one model fit to (x2, y2), one model fit to (x3, y3), and one model fit to (x4, y4).

  • Write down the estimated regression equation for each fitted model, together with the values of the coefficient of determination, r2, and the standard error of the estimate, s=MSE‾‾‾‾‾√.

  • For each set of (x, y) data, create a scatterplot of y (vertical) versus x (horizontal) with the estimated regression line added to the plot.

  • For each set of (x, y) data, create a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot, do the zero mean and constant variance assumptions about the simple linear regression model error seem reasonable?

  • For each set of (x, y) data, create a normal probability plot of the standardized residuals. Based on each plot, does the normality assumption about the simple linear regression model error seem reasonable?

  • For each set of (x, y) data, are there any outliers?

  • For each set of (x, y) data, are there any high leverage points?

  • For each set of (x, y) data, are there any influential points?

    Post a summary of your group’s analysis. What important “big picture” conclusions can you draw from your analysis?

Solutions

Expert Solution

1) for set of (x1, y1)

Regression Equation

y1   =   3.00 + 0.500 x1

coefficient of determination R2 =66.65%

standard error of the estimate S=1.23660

scatterplot of y1 (vertical) versus x1 (horizontal) with the estimated regression line added to the plot.

a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot

From above graph the zero mean and constant variance assumptions about the simple linear regression model error seem reasonable

a normal probability plot of the standardized residuals

from above graph the normality assumption about the simple linear regression model error seem reasonable

1) for set of (x2, y2)

Regression Equation

y2   =   3.00 + 0.500 x2

coefficient of determination R2 =66.62%

standard error of the estimate S=1.2372

scatterplot of y2 (vertical) versus x2 (horizontal) with the estimated regression line added to the plot.

a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot

From above graph the zero mean and constant variance assumptions about the simple linear regression model error seem not reasonable

a normal probability plot of the standardized residuals

from above graph the normality assumption about the simple linear regression model error seem reasonable

3) for (x3,y3)

Regression Equation

y3   =   3.00 + 0.500 x3

coefficient of determination R2 =66.6%

standard error of the estimate S=1.2357

scatterplot of y3 (vertical) versus x3 (horizontal) with the estimated regression line added to the plot.

a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot

from above the zero mean and constant variance assumptions about the simple linear regression model error seem not reasonable

a normal probability plot of the standardized residuals

from above graph the normality assumption about the simple linear regression model error seem reasonable

4)

for (x4,y4)

Regression Equation

y4   =   3.00 + 0.500 x4

coefficient of determination R2 =66.63%

standard error of the estimate S=1.2361

scatterplot of y3 (vertical) versus x3 (horizontal) with the estimated regression line added to the plot.

a scatterplot of the residuals (vertical) versus (horizontal). Based on each plot

from above the zero mean and constant variance assumptions about the simple linear regression model error seem not reasonable

a normal probability plot of the standardized residuals

from above graph the normality assumption about the simple linear regression model error seem reasonable


Related Solutions

Observation x1 y1 x2 y2 x3 y3 x4 y4 1 10 8.04 10 9.14 10 7.46...
Observation x1 y1 x2 y2 x3 y3 x4 y4 1 10 8.04 10 9.14 10 7.46 8 6.58 2 8 6.95 8 8.14 8 6.77 8 5.76 3 13 7.58 13 8.74 13 12.74 8 7.71 4 9 8.81 9 8.77 9 7.11 8 8.84 5 11 8.33 11 9.26 11 7.81 8 8.47 6 14 9.96 14 8.1 14 8.84 8 7.04 7 6 7.24 6 6.13 6 6.08 8 5.25 8 4 4.26 4 3.1 4 5.39 19...
Let x,y ∈ R3 such that x = (x1,x2,x3) and y = (y1,y2,y3) determine if <x,y>=...
Let x,y ∈ R3 such that x = (x1,x2,x3) and y = (y1,y2,y3) determine if <x,y>= x1y1+2x2y2+3x3y3    is an inner product
Let X1,X2,X3 be i.i.d. N(0,1) random variables. Suppose Y1 = X1 + X2 + X3, Y2...
Let X1,X2,X3 be i.i.d. N(0,1) random variables. Suppose Y1 = X1 + X2 + X3, Y2 = X1 −X2, Y3 =X1 −X3. Find the joint pdf of Y = (Y1,Y2,Y3)′ using : Multivariate normal distribution properties.
Let Y1 < Y2 < Y3 < Y4 < Y5 be the order statistics of a...
Let Y1 < Y2 < Y3 < Y4 < Y5 be the order statistics of a random sample of size 5 from a continuous distribution with median m. What is P(Y2 < m < Y4)?
Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample...
Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size n = 4 from a distribution with pdf f(x) = 3X2, 0 < x < 1, zero elsewhere. (a) Find the joint pdf of Y3 and Y4. (b) Find the conditional pdf of Y3, given Y4 = y4. (c) Evaluate E(Y3|y4)
Let Y1 < Y2 < Y3 < Y4 < Y5 denote the order statistics of a...
Let Y1 < Y2 < Y3 < Y4 < Y5 denote the order statistics of a random sample of size 5 from a distribution having pdf f(x) = e−x, 0 < x < ∞, zero elsewhere. show that Y4 and Y5 – Y4 are independent. Hint: First find the joint pdf of Y4 and Y5.
Let U = {(x1,x2,x3,x4) ∈F4 | 2x1 = x3, x1 + x4 = 0}. (a) Prove...
Let U = {(x1,x2,x3,x4) ∈F4 | 2x1 = x3, x1 + x4 = 0}. (a) Prove that U is a subspace of F4. (b) Find a basis for U and prove that dimU = 2. (c) Complete the basis for U in (b) to a basis of F4. (d) Find an explicit isomorphism T : U →F2. (e) Let T as in part (d). Find a linear map S: F4 →F2 such that S(u) = T(u) for all u ∈...
1. Let ρ: R2 ×R2 →R be given by ρ((x1,y1),(x2,y2)) = |x1 −x2|+|y1 −y2|. (a) Prove...
1. Let ρ: R2 ×R2 →R be given by ρ((x1,y1),(x2,y2)) = |x1 −x2|+|y1 −y2|. (a) Prove that (R2,ρ) is a metric space. (b) In (R2,ρ), sketch the open ball with center (0,0) and radius 1. 2. Let {xn} be a sequence in a metric space (X,ρ). Prove that if xn → a and xn → b for some a,b ∈ X, then a = b. 3. (Optional) Let (C[a,b],ρ) be the metric space discussed in example 10.6 on page 344...
The prices of inputs (x1,x2,x3,x4) are (4,1,3,2): (a) If the production function is given by f(x3,x4)...
The prices of inputs (x1,x2,x3,x4) are (4,1,3,2): (a) If the production function is given by f(x3,x4) =min⁡{x1+x2,x3+x4} what is the minimum cost of producing one unit of output? (b) If the production function is given by f(x3,x4)=x1+x2 +min⁡{x3+x4} what is the minimum cost of producing one unit of output?
The parametric equations x = x1 + (x2 − x1)t,    y = y1 + (y2 − y1)t...
The parametric equations x = x1 + (x2 − x1)t,    y = y1 + (y2 − y1)t where 0 ≤ t ≤ 1 describe the line segment that joins the points P1(x1, y1) and P2(x2, y2). Use a graphing device to draw the triangle with vertices A(1, 1), B(4, 3), C(1, 6). Find the parametrization, including endpoints, and sketch to check. (Enter your answers as a comma-separated list of equations. Let x and y be in terms of t.)
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT