In: Statistics and Probability
2. Let’s use the data from the sea ice extent by year. a. Do a t-test to determine if the slope = 0, give null and alternative hypotheses, test statistic, pvalue, decision and interpretation. b. Construct a residual plot vs fitted values. c. Look at a histogram of the residuals. d. Are there any obvious outliers? Find that observation that is the most glaring and find out how many standard deviations it is from the mean. Can this be justified to be removed? e. Are the assumptions for regression met? (Linearity, Constant Standard Deviation and Normality of errors). If not, which one is violated.
data:
Year Extent
1980 9.18
1981 8.86
1982 9.42
1983 9.33
1984 8.56
1985 8.55
1986 9.48
1987 9.05
1988 9.13
1989 8.83
1990 8.48
1991 8.54
1992 9.32
1993 8.79
1994 8.92
1995 7.83
1996 9.16
1997 8.34
1998 8.45
1999 8.6
2000 8.38
2001 8.3
2002 8.16
2003 7.85
2004 7.93
2005 7.35
2006 7.54
2007 6.04
2008 7.35
2009 6.92
2010 6.98
2011 6.46
2012 5.89
2013 7.45
2014 7.23
2015 6.97
2016 6.08
2017 6.77
2018 6.13
2019 5.66
a.
R output:
Call:
lm(formula = Extent ~ Year, data = Data)
Residuals:
Min 1Q Median 3Q Max
-1.2816 -0.3173 0.0742 0.3701 0.9072
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 177.212487 13.806202 12.84 2.16e-15 ***
Year -0.084649 0.006905 -12.26 8.90e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5041 on 38 degrees of freedom
Multiple R-squared: 0.7982, Adjusted R-squared: 0.7929
F-statistic: 150.3 on 1 and 38 DF, p-value: 8.897e-15
b.
(c)
d.
Yes, and the outlier is 28th observation: Year=2007, Extent=6.04.
e.
From Residual vs. fitted plot, we see that assumption of linearity does not hold since the residuals are not randomly distributed along the horizontal line. From QQ plot we see that the assumption of normality holds and from Residual vs. fitted plot we see that assumption of constant variance holds. From Cook's distance we see that there are two outliers are present.