In: Statistics and Probability
Estimate a linear time trend for the Netflix data using data for the period 2000Q1 ... Estimate a linear time trend for the Netflix data using data for the period 2000Q1 to 2008Q4. a. Carry out a preliminary analysis of the data including a line fit plot and interpret your results. b. Test whether the slope is significantly different from zero, using alpha = 0.05. c. Compute S and R2 and interpret the results. d. Compute point forecasts for sales in each quarter of 2009. e. Comment upon your results. Note: you can use Regression under the Data Analysis tool in Microsoft Excel.
Year | Quarter | QtrNum | Quarterly Sales |
2000 | 1 | 1 | 5.17 |
2000 | 2 | 2 | 7.15 |
2000 | 3 | 3 | 10.18 |
2000 | 4 | 4 | 13.39 |
2001 | 1 | 5 | 17.06 |
2001 | 2 | 6 | 18.36 |
2001 | 3 | 7 | 18.88 |
2001 | 4 | 8 | 21.62 |
2002 | 1 | 9 | 30.53 |
2002 | 2 | 10 | 36.36 |
2002 | 3 | 11 | 40.73 |
2002 | 4 | 12 | 45.19 |
2003 | 1 | 13 | 55.67 |
2003 | 2 | 14 | 63.19 |
2003 | 3 | 15 | 72.20 |
2003 | 4 | 16 | 81.19 |
2004 | 1 | 17 | 99.82 |
2004 | 2 | 18 | 119.71 |
2004 | 3 | 19 | 140.41 |
2004 | 4 | 20 | 140.66 |
2005 | 1 | 21 | 152.45 |
2005 | 2 | 22 | 164.03 |
2005 | 3 | 23 | 172.74 |
2005 | 4 | 24 | 193.00 |
2006 | 1 | 25 | 224.13 |
2006 | 2 | 26 | 239.35 |
2006 | 3 | 27 | 255.95 |
2006 | 4 | 28 | 277.23 |
2007 | 1 | 29 | 305.32 |
2007 | 2 | 30 | 303.69 |
2007 | 3 | 31 | 293.97 |
2007 | 4 | 32 | 302.36 |
2008 | 1 | 33 | 326.18 |
2008 | 2 | 34 | 337.61 |
2008 | 3 | 35 | 341.27 |
2008 | 4 | 36 | 359.94 |
2009 | 1 | 37 | 394.1 |
2009 | 2 | 38 | 408.59 |
2009 | 3 | 39 | 423.12 |
2009 | 4 | 40 | 444.19 |
> library(forecast)
> d= data.frame(time_series)
> ts_data = ts(d$Quarterly.Sales, frequency=4,
start=c(2000,1),end = c(2008,4))
> ts_data
Qtr1 Qtr2 Qtr3 Qtr4
2000 5.17 7.15 10.18 13.39
2001 17.06 18.36 18.88 21.62
2002 30.53 36.36 40.73 45.19
2003 55.67 63.19 72.20 81.19
2004 99.82 119.71 140.41 140.66
2005 152.45 164.03 172.74 193.00
2006 224.13 239.35 255.95 277.23
2007 305.32 303.69 293.97 302.36
2008 326.18 337.61 341.27 359.94
> plot.ts(ts_data)
> fit = tslm(ts_data~trend)
> summary(fit)
Call:
tslm(formula = ts_data ~ trend)
Residuals:
Min 1Q Median 3Q Max
-37.628 -22.162 2.554 16.872 54.558
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -60.6013 8.7324 -6.94 5.33e-08 ***
trend 11.2137 0.4116 27.25 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 25.65 on 34 degrees of freedom
Multiple R-squared: 0.9562, Adjusted R-squared:
0.9549
F-statistic: 742.3 on 1 and 34 DF, p-value: < 2.2e-16
#from above output we can see R^2 value is 0.9549 i.e modelling is explaining 95.49% variation in the data.
and also the slope is significant if we see p-values at 5% l.o.s
> fcast <- forecast (fit, h=4, level=c(80,95))
> plot(fcast)
> lines(fitted(fit))
#below are the point forecast for 2009 for all the 4
quarters
> fcast
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2009 Q1 354.3063 318.8894 389.7232 299.2348 409.3778
2009 Q2 365.5200 329.9482 401.0918 310.2076 420.8324
2009 Q3 376.7337 340.9996 412.4678 321.1690 432.2985
2009 Q4 387.9474 352.0437 423.8512 332.1189 443.7760