In: Statistics and Probability
The data in the accompanying table represent the population of a certain country every 10 years for the years 1900-2000. An ecologist is interested in finding an equation that describes the population of the country over time. Complete parts (a) through (f) below.
Year, x
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
Population, y
76,212
92,228
104,021
123.202
132,164
151,325
179,323
203,302
226,542
248,709
281,421
(A) Determine the least-squares regression equation, treating year as the explanatory variable. Choose the correct answer below.
a. y = -3,782,357x + 2,0242
b. y = 1,236,375x - 3,782,357
c. y = 2,0242x - 1,547,901
d. y = 2,0242x – 3,782,357
(B) A normal probability plot of the residuals indicates that the residuals are approximately normally distributed. Test whether a linear relation exists between year and population. Use the a=0.01 level of significance. State the null and alternative hypotheses. Determine the P-value of this hypothesis test. State whether to reject or not reject the null and why.
(C) Draw a scatter diagram, treating year as the explanatory variable.
(D) Plot the residuals against the explanatory variable, year. Does a linear model seem appropriate based on the scatter diagram and residual? plot?
(E) Which of the following is the moral?
a)The moral is that inferential procedures may indicate that a linear relation between the two variables exists even though diagnostic tools? (such as residual? plots) indicate that a linear model is inappropriate.
b)The moral is that inferential procedures may indicate that a nonlinear relation between the two variables exists even though diagnostic tools? (such as residual? plots) indicate that a linear model is appropriate.
c)The moral is that explanatory variables may indicate that a linear relation between the two variables does not exist even though diagnostic tools? (such as residual plots) indicate that a linear model is appropriate.
using excel data analysis tool for regression, following o/p is obtained
Regression Statistics | |||||
Multiple R | 0.9907 | ||||
R Square | 0.9816 | ||||
Adjusted R Square | 0.9795 | ||||
Standard Error | 9701.9621 | ||||
Observations | 11 | ||||
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 1 | 45082213871.65 | 45082213872 | 478.9455 | 0.0000 |
Residual | 9 | 847152623.08 | 94128069 | ||
Total | 10 | 45929366494.73 | |||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | |
Intercept | -3782357 | 180407.6473 | -20.9656 | 0.0000 | -4368652 |
X | 2024 | 92.5046 | 21.8848 | 0.0000 | 1723.822 |
a)
answer option d)
Y = 2024 *x -3782357
b)
Ho: ß1= 0
H1: ß1╪ 0
n= 11
alpha= 0.01
SSxx = Σ(x-x̅)² =
11000.00
estimated std error of slope =Se(ß1) =
s/√Sxx =
92.5046
t stat = ß1 /Se(ß1) =
21.885
p-value = 0.0000
decision : p-value<α , reject Ho
reject Ho and conclude that linear relations exists between X and y
c)
d)
e)
The moral is that inferential procedures may indicate that a linear relation between the two variables exists even though diagnostic tools (such as residual plots) indicate that a linear model is inappropriate