In: Statistics and Probability
Auto correlation is the correlation of a time series with its own past and future values,it is also sometimes called lagged correlation or serial correlation, which is the correlation between members of a series of numbers arranged in time. Positive auto correlation might be considered a specific form of persistence, a tendency for a system to remain in the same state from one observation to the next.
Autocorrelation refers to how correlated a time series is with its past values whereas the ACF is the plot used to see the correlation between the points, up to and including the lag unit. In ACF, the correlation coefficient is in the x-axis whereas the number of lags is shown in the y-axis.
The Autocorrelation function plot will let you know how the given time series is correlated with itself .
Normally in an ARIMA model, we make use of either the AR term or the MA term. We use both of these terms only on rare occasions. We use the ACF plot to decide which one of these terms we would use for our time series
If there is a Positive autocorrelation at lag 1 then we use the AR model
If there is a Negative autocorrelation at lag 1 then we use the MA model
After plotting the ACF plot we move to Partial Autocorrelation Function plots (PACF). A partial autocorrelation is a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed.
The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags.
If the PACF plot drops off at lag n, then use an AR(n) model and if the drop in PACF is more gradual then we use the MA term
Autoregressive component: A purely AR model forecasts only using a combination of the past values sorta like linear regression where the number of AR terms used is directly proportional to the number of previous periods taken into consideration for the forecasting.
Auto correlation complicates the application of statistical tests by reducing the number of independent observations,also complicate the identification of significant co variance or correlation between time series, it is predictable, probabilistically, because future values depend on current and past values
Three tools for assessing the auto correlation of a time series are
(1) the time series plot
(2) the lagged scatter plot &
(3) the auto correlation function.
A clearer pattern for an MA model is in the ACF.
The ACF will have non-zero autocorrelations only at lags involved in the model.
PACF takes into consideration the correlation between a time series and each of its intermediate lagged values.
The identification of an MA model is done with the ACF rather than PACF.For an MA model, the theoretical PACF does not shut off, but instead tapers toward 0.
This is useful to detect the ORDER of a autoregressive model. That is, the PACF for a time series with lag 1 will have non-zero value only till 1,the partial auto-correlation function (PACF) gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags. It contrasts with the auto-correlation function, which does not control for other lags.
Identification of an AR model is often best done with the PACF.For an AR model, the theoretical PACF shuts off past the order of the model.The phrase shuts off means that in theory, the partial autocorrelations are equal to 0, beyond that point. Put another way, the number of non-zero partial autocorrelations gives the order of the AR model.
By the order of the model we mean the most extreme lag of x that is used as a predictor.
Now
ARIMA stands for Auto Regressive Integrated Moving Average. There are seasonal and Non-seasonal ARIMA models that can be used for forecasting
Non-Seasonal ARIMA model: This method has three variables to account for
P = Periods to lag for eg: (if P= 3 then we will use the three previous periods of our time series in the autoregressive portion of the calculation) P helps adjust the line that is being fitted to forecast the series.
Purely autoregressive models resemble a linear regression where the predictive variables are P number of previous periods
D = In an ARIMA model we transform a time series into stationary one(series without trend or seasonality) using differencing. D refers to the number of differencing transformations required by the time series to get stationary.
Stationary time series is when the mean and variance are constant over time. It is easier to predict when the series is stationary.
The first differencing value is the difference between the current time period and the previous time period. If these values fail to revolve around a constant mean and variance then we find the second differencing using the values of the first differencing. We repeat this until we get a stationary series
The best way to determine whether or not the series is sufficiently differenced is to plot the differenced series and check to see if there is a constant mean and variance.
Q = This variable denotes the lag of the error component, where error component is a part of the time series not explained by trend or seasonality