In: Statistics and Probability
The data in the accompanying table represent the population of a certain country every 10 years for the years 1900-2000. An ecologist is interested in finding an equation that describes the population of the country over time. Complete parts (a) through (f) below.
year, x | population, y | year, x | population, y |
1900 | 73212 | 1960 | 179323 |
1910 | 92228 | 1970 | 203302 |
1920 | 104021 | 1980 | 226542 |
1930 | 123202 | 1990 | 248709 |
1940 | 132164 | 2000 | 281421 |
1950 | 151325 |
Determine the P-value of this hypothesis test.
P-value = __
c) draw a scatter diagram, treating year as the explanatory variable.
d) plot the residuals against the explanatory variable, year.
e) does a linear model seem appropriate based on the scatter diagram and residual plot?
f) what is the moral?
Here are all the calculations of the regression equation:
Year | y | A = (x-x-bar) | B = (y-ybar) | C= A*B | A^2 | D = y-y-pred | D^2 | ||
1900 | 73212 | -50 | -91828.8 | 4591441 | 2500 | -10075.3 | 101512036 | ||
1910 | 92228 | -40 | -72812.8 | 2912513 | 1600 | -8710.49 | 75872651.9 | ||
1920 | 104021 | -30 | -61019.8 | 1830595 | 900 | -122.664 | 15046.3677 | ||
1930 | 123202 | -20 | -41838.8 | 836776.4 | 400 | 1077.164 | 1160281.5 | ||
1940 | 132164 | -10 | -32876.8 | 328768.2 | 100 | 12495.99 | 156149789 | ||
1950 | 151325 | 0 | -13715.8 | 0 | 0 | 13715.82 | 188123668 | ||
1960 | 179323 | 10 | 14282.18 | 142821.8 | 100 | 6098.645 | 37193476.4 | ||
1970 | 203302 | 20 | 38261.18 | 765223.6 | 400 | 2500.473 | 6252363.86 | ||
1980 | 226542 | 30 | 61501.18 | 1845035 | 900 | -358.7 | 128665.69 | ||
1990 | 248709 | 40 | 83668.18 | 3346727 | 1600 | -2144.87 | 4600479.02 | ||
2000 | 281421 | 50 | 116380.2 | 5819009 | 2500 | -14476 | 209555892 | ||
AVERAGE | 1950 | 165040.8 | SUM | 22418910 | 11000 | 780564350 | |||
b1 | 2038.083 | SSE | 9312.861 | ||||||
b0 | -3809221 | Standard Error of b1 | 88.79464 |
Null Hypothesis: Coefficient of variable 'year' is zero i.e b1 = 0
The formulas used are:
Hence, we will run a t-statistic test on the coefficient
t = Coefficient/Standard Error
t = 2038.083/88.79
t = 22.952
Hence, the p-value at 0.05 level is 2.69E-0.9 (p<0.001) which is significant.
c) Scatter diagram:
d) Residuals plot:
e) From the scatter plot and residual plot, it seems that the data points are randomly dispersed in the residual plot and the scatter plot shows a near linear relationship.
f) Hence, there is a significant relationship between both the variables and year could be used to predict the population of the country over time.