Question

In: Economics

Given some data, estimate a simple Exponential Smoother.

Given some data, estimate a simple Exponential Smoother.

Solutions

Expert Solution

The simplest of the exponentially smoothing methods is naturally called simple exponential smoothing (SES)13. This method is suitable for forecasting data with no clear trend or seasonal pattern. For example, the data in Figure 7.1 do not display any clear trending behaviour or any seasonality. (There is a rise in the last few years, which might suggest a trend. We will consider whether a trended method would be better for this series later in this chapter.) We have already considered the naïve and the average as possible methods for forecasting such data (Section 3.1).

oildata <- window(oil, start=1996)
autoplot(oildata) +
  ylab("Oil (millions of tonnes)") + xlab("Year")

Figure 7.1: Oil production in Saudi Arabia from 1996 to 2013.

Using the naïve method, all forecasts for the future are equal to the last observed value of the series,^yT+h|T=yT,y^T+h|T=yT,for h=1,2,…h=1,2,…. Hence, the naïve method assumes that the most recent observation is the only important one, and all previous observations provide no information for the future. This can be thought of as a weighted average where all of the weight is given to the last observation.

Using the average method, all future forecasts are equal to a simple average of the observed data,^yT+h|T=1TT∑t=1yt,y^T+h|T=1T∑t=1Tyt,for h=1,2,…h=1,2,…. Hence, the average method assumes that all observations are of equal importance, and gives them equal weights when generating forecasts.

We often want something between these two extremes. For example, it may be sensible to attach larger weights to more recent observations than to observations from the distant past. This is exactly the concept behind simple exponential smoothing. Forecasts are calculated using weighted averages, where the weights decrease exponentially as observations come from further in the past — the smallest weights are associated with the oldest observations:^yT+1|T=αyT+α(1−α)yT−1+α(1−α)2yT−2+⋯,(7.1)(7.1)y^T+1|T=αyT+α(1−α)yT−1+α(1−α)2yT−2+⋯,where 0≤α≤10≤α≤1 is the smoothing parameter. The one-step-ahead forecast for time T+1T+1 is a weighted average of all of the observations in the series y1,…,yTy1,…,yT. The rate at which the weights decrease is controlled by the parameter αα.

The table below shows the weights attached to observations for four different values of αα when forecasting using simple exponential smoothing. Note that the sum of the weights even for a small value of αα will be approximately one for any reasonable sample size.

α=0.2α=0.2 α=0.4α=0.4 α=0.6α=0.6 α=0.8α=0.8
yTyT 0.2000 0.4000 0.6000 0.8000
yT−1yT−1 0.1600 0.2400 0.2400 0.1600
yT−2yT−2 0.1280 0.1440 0.0960 0.0320
yT−3yT−3 0.1024 0.0864 0.0384 0.0064
yT−4yT−4 0.0819 0.0518 0.0154 0.0013
yT−5yT−5 0.0655 0.0311 0.0061 0.0003

For any αα between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If αα is small (i.e., close to 0), more weight is given to observations from the more distant past. If αα is large (i.e., close to 1), more weight is given to the more recent observations. For the extreme case where α=1α=1, ^yT+1|T=yTy^T+1|T=yT, and the forecasts are equal to the naïve forecasts.

We present two equivalent forms of simple exponential smoothing, each of which leads to the forecast Equation (7.1).

Weighted average form

The forecast at time T+1T+1 is equal to a weighted average between the most recent observation yTyT and the previous forecast ^yT|T−1y^T|T−1:^yT+1|t=αyT+(1−α)^yT|T−1,y^T+1|t=αyT+(1−α)y^T|T−1,where 0≤α≤10≤α≤1 is the smoothing parameter. Similarly, we can write the fitted values as^yt+1|t=αyt+(1−α)^yt|t−1,y^t+1|t=αyt+(1−α)y^t|t−1,for t=1,…,Tt=1,…,T. (Recall that fitted values are simply one-step forecasts of the training data.)

The process has to start somewhere, so we let the first fitted value at time 1 be denoted by ℓ0ℓ0 (which we will have to estimate). Then^y2|1=αy1+(1−α)ℓ0^y3|2=αy2+(1−α)^y2|1^y4|3=αy3+(1−α)^y3|2⋮^yT|T−1=αyT−1+(1−α)^yT−1|T−2^yT+1|T=αyT+(1−α)^yT|T−1.y^2|1=αy1+(1−α)ℓ0y^3|2=αy2+(1−α)y^2|1y^4|3=αy3+(1−α)y^3|2⋮y^T|T−1=αyT−1+(1−α)y^T−1|T−2y^T+1|T=αyT+(1−α)y^T|T−1.Substituting each equation into the following equation, we obtain^y3|2=αy2+(1−α)[αy1+(1−α)ℓ0]=αy2+α(1−α)y1+(1−α)2ℓ0^y4|3=αy3+(1−α)[αy2+α(1−α)y1+(1−α)2ℓ0]=αy3+α(1−α)y2+α(1−α)2y1+(1−α)3ℓ0  ⋮^yT+1|T=T−1∑j=0α(1−α)jyT−j+(1−α)Tℓ0.y^3|2=αy2+(1−α)[αy1+(1−α)ℓ0]=αy2+α(1−α)y1+(1−α)2ℓ0y^4|3=αy3+(1−α)[αy2+α(1−α)y1+(1−α)2ℓ0]=αy3+α(1−α)y2+α(1−α)2y1+(1−α)3ℓ0  ⋮y^T+1|T=∑j=0T−1α(1−α)jyT−j+(1−α)Tℓ0.The last term becomes tiny for large TT. So, the weighted average form leads to the same forecast Equation (7.1).

Component form

An alternative representation is the component form. For simple exponential smoothing, the only component included is the level, ℓtℓt. (Other methods which are considered later in this chapter may also include a trend btbt and a seasonal component stst.) Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation for each of the components included in the method. The component form of simple exponential smoothing is given by:Forecast equation^yt+h|t=ℓtSmoothing equationℓt=αyt+(1−α)ℓt−1,Forecast equationy^t+h|t=ℓtSmoothing equationℓt=αyt+(1−α)ℓt−1,where ℓtℓt is the level (or the smoothed value) of the series at time tt. Setting h=1h=1 gives the fitted values, while setting t=Tt=T gives the true forecasts beyond the training data.

The forecast equation shows that the forecast value at time t+1t+1 is the estimated level at time tt. The smoothing equation for the level (usually referred to as the level equation) gives the estimated level of the series at each period tt.

If we replace ℓtℓt with ^yt+1|ty^t+1|t and ℓt−1ℓt−1 with ^yt|t−1y^t|t−1 in the smoothing equation, we will recover the weighted average form of simple exponential smoothing.

The component form of simple exponential smoothing is not particularly useful, but it will be the easiest form to use when we start adding other components.

Flat forecasts

Simple exponential smoothing has a “flat” forecast function:^yT+h|T=^yT+1|T=ℓT,h=2,3,….y^T+h|T=y^T+1|T=ℓT,h=2,3,….That is, all forecasts take the same value, equal to the last level component. Remember that these forecasts will only be suitable if the time series has no trend or seasonal component.

Optimisation

The application of every exponential smoothing method requires the smoothing parameters and the initial values to be chosen. In particular, for simple exponential smoothing, we need to select the values of αα and ℓ0ℓ0. All forecasts can be computed from the data once we know those values. For the methods that follow there is usually more than one smoothing parameter and more than one initial component to be chosen.

In some cases, the smoothing parameters may be chosen in a subjective manner — the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more reliable and objective way to obtain values for the unknown parameters is to estimate them from the observed data.

In Section 5.2, we estimated the coefficients of a regression model by minimising the sum of the squared residuals (usually known as SSE or “sum of squared errors”). Similarly, the unknown parameters and the initial values for any exponential smoothing method can be estimated by minimising the SSE. The residuals are specified as et=yt−^yt|t−1et=yt−y^t|t−1 for t=1,…,Tt=1,…,T. Hence, we find the values of the unknown parameters and the initial values that minimiseSSE=T∑t=1(yt−^yt|t−1)2=T∑t=1e2t.(7.2)(7.2)SSE=∑t=1T(yt−y^t|t−1)2=∑t=1Tet2.

Unlike the regression case (where we have formulas which return the values of the regression coefficients that minimise the SSE), this involves a non-linear minimisation problem, and we need to use an optimisation tool to solve it.

Example: Oil production

In this example, simple exponential smoothing is applied to forecast oil production in Saudi Arabia.

oildata <- window(oil, start=1996)
# Estimate parameters
fc <- ses(oildata, h=5)
# Accuracy of one-step-ahead training errors
round(accuracy(fc),2)
#>               ME  RMSE   MAE MPE MAPE MASE  ACF1
#> Training set 6.4 28.12 22.26 1.1 4.61 0.93 -0.03

This gives parameter estimates ^α=0.83α^=0.83 and ^ℓ0=446.6ℓ^0=446.6, obtained by minimising SSE over periods t=1,2,…,18t=1,2,…,18, subject to the restriction that 0≤α≤10≤α≤1.

In Table 7.1 we demonstrate the calculation using these parameters. The second last column shows the estimated level for times t=0t=0 to t=18t=18; the last few rows of the last column show the forecasts for h=1,2,3,4,5h=1,2,3,4,5.

Table 7.1: Forecasting the total oil production in millions of tonnes for Saudi Arabia using simple exponential smoothing.
Year Time Observation Level Forecast
tt ytyt ℓtℓt \hat{y}_{t&#124;t-1}\hat{y}_{t&#124;t-1}
1995 0 446.59
1996 1 445.36 445.57 446.59
1997 2 453.20 451.93 445.57
1998 3 454.41 454.00 451.93
1999 4 422.38 427.63 454.00
2000 5 456.04 451.32 427.63
2001 6 440.39 442.20 451.32
2002 7 425.19 428.02 442.20
2003 8 486.21 476.54 428.02
2004 9 500.43 496.46 476.54
2005 10 521.28 517.15 496.46
2006 11 508.95 510.31 517.15
2007 12 488.89 492.45 510.31
2008 13 509.87 506.98 492.45
2009 14 456.72 465.07 506.98
2010 15 473.82 472.36 465.07
2011 16 525.95 517.05 472.36
2012 17 549.83 544.39 517.05
2013 18 542.34 542.68 544.39
hh \hat{y}_{T+h&#124;T}\hat{y}_{T+h&#124;T}
2014 1 542.68
2015 2 542.68
2016 3 542.68
2017 4 542.68
2018 5 542.68

The black line in Figure 7.2 is a plot of the data, which shows a changing level over time.

autoplot(fc) +
  autolayer(fitted(fc), series="Fitted") +
  ylab("Oil (millions of tonnes)") + xlab("Year")

Figure 7.2: Simple exponential smoothing applied to oil production in Saudi Arabia (1996–2013).

The forecasts for the period 2014–2018 are plotted in Figure 7.2. Also plotted are one-step-ahead fitted values alongside the data over the period 1996–2013. The large value of αα in this example is reflected in the large adjustment that takes place in the estimated level ℓtℓt at each time. A smaller value of αα would lead to smaller changes over time, and so the series of fitted values would be smoother.

The prediction intervals shown here are calculated using the methods described in Section 7.7. The prediction intervals show that there is considerable uncertainty in the future values of oil production over the five-year forecast period. So interpreting the point forecasts without accounting for the large uncertainty can be very misleading In some books it is called “single exponential smoothing”


Related Solutions

Given the following data, use exponential smoothing with a = 0.3 and α =.5 to develop...
Given the following data, use exponential smoothing with a = 0.3 and α =.5 to develop a demand forecasts for period 7.  Assume that the forecast for week 1= 19. Use the Mean Absolute Percent Error to determine which forecasts are more accurate. Period 1 2 3 4 5 6 Demand 17 19 15 19 13 18
Explain the difference in Simple and Souble Exponential Smoothing methods.
Explain the difference in Simple and Souble Exponential Smoothing methods.
Use Excel to estimate a simple linear regression model for the following data (Y is a...
Use Excel to estimate a simple linear regression model for the following data (Y is a dependent variable and X is an independent variable): Y X 0 -2 0 -1 1 0 1 1 3 2 Fill in Multiple Blanks: What is the slope of the estimated line?  In your answer, show one (1) digit to the right of the decimal point, for example, 1.0, 1.2. Apply the appropriate rounding rule if necessary. What is the Y-intercept?
With the gasoline time series data from the given table, show the exponential smoothing forecasts using...
With the gasoline time series data from the given table, show the exponential smoothing forecasts using α = 0.1. week 1 2 3 4    5 6 7 8 9 10 11 12 sales 17 21 19 23 18 16 20 18 22 20 15 22   a. Applying the MSE measure of forecast accuracy, would you prefer a smoothing constant of α = 0.1 or α = 0.2 for the gasoline sales time series? Do not round your interim computations and...
Estimate, on the basis of CTST, the pre-exponential factor at 300K for the following types of...
Estimate, on the basis of CTST, the pre-exponential factor at 300K for the following types of gas reactions: a. a bimolecular reaction between an atom and a diatomic molecule with the formation of a linear activated complex. b. a biomolecular reaction between two diatomic molecules that form a linear activated complex with no free rotation.
Q1. Using the data provided for your group assignment estimate the simple regression Y= Final_exam and...
Q1. Using the data provided for your group assignment estimate the simple regression Y= Final_exam and X= assignment_grade. Each part of question is worth 2 marks. Prior to estimating the regression what are your a priori expectations about the sign of β1? Explain. Write down the regression results in traditional form, with t statistics below each of the estimated coefficients and anything else that should be included. Test the null hypothesis that β1=0 against the two sided alternative hypothesis at...
Estimate the multiple linear regression equation     for the given data    1              2        &n
Estimate the multiple linear regression equation     for the given data    1              2              3               4 10             1              2               3 12            18            24             30 Estimate the multiple linear regression equation y with overparenthesis on top equals b subscript 0 plus b subscript 1 x subscript 1 plus b subscript 2 x subscript 2 for the given data x subscript 1 1 2 3 4 x subscript 2 10 1 2 3 y 12 18 24 30
estimate on the basis of CTST, the pre-exponential factor at 300 K for the following types...
estimate on the basis of CTST, the pre-exponential factor at 300 K for the following types of gas reactions: 1) a bimolecular reaction between an atom and a diatomic molecule with the formation of a linear activated complex. b) a bimolecular reaction between two diatomic molecules that form a linear activated complex with no free rotation
Question2 The data or information related to some populations and their samples are given as the...
Question2 The data or information related to some populations and their samples are given as the followings (you can only use the partial given tables (Z, T, or Chi-Square) to find your answers. if you can’t find the exact values from the given tables, you should use the approximation approaches using the given tables only. We used similar approximations in our tutorials). If cases such as nonparametric methods or t' (approximation of t-student) are the only ways to solve certain...
Given below is simple random sample data for wait times, in minutes, for a call center....
Given below is simple random sample data for wait times, in minutes, for a call center. At the 98% confidence level, calculate the confidence interval estimate for the variance in wait time for the population of all calls at the call center. Assume the population is normally distributed. 12.1 11.5 13.4 16.2 11.3 12.2 11.3
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT