An ARIMA ((p,d,q)) model is a univariate time-series model that forecasts a series using a combination of (1) its own lagged values, (2) lagged shocks, and (3) differencing to handle persistent trends or unit-root behavior.
Model Form (Lag-Operator Notation)
One common compact representation is:
[ \phi(L)(1-L)^d y_t = c + \theta(L)\varepsilon_t ]
where:
- (L) is the lag operator ((Ly_t = y_{t-1})),
- (d) is the number of differences applied to achieve approximate stationarity,
- (\phi(L)) is an autoregressive polynomial of order (p),
- (\theta(L)) is a moving-average polynomial of order (q),
- (\varepsilon_t) is an unpredictable shock (innovation).
The Three Components (Plain Language)
- AR ((p)): today’s value depends on past values.
- I ((d)): differences remove persistent level effects (for example, trends or unit roots).
- MA ((q)): today’s value depends on past shocks (forecast errors).
How ARIMA Is Used (Box-Jenkins Workflow)
- Make the series stationary (choose \(d\); optionally transform logs).
- Identify \(p\) and \(q\) using the ACF/PACF and diagnostics.
- Estimate the model parameters.
- Diagnose residuals (should look like white noise).
- Forecast and evaluate out-of-sample performance.
Practical Notes And Limitations
- ARIMA is primarily a forecasting tool; it does not by itself identify causal effects.
- Structural breaks (policy regime changes, measurement changes, crises) can make historical ARIMA relationships unstable.
- Over-differencing can remove useful long-run information and create artificial moving-average behavior.
Common Extensions
- SARIMA: adds seasonal AR/MA terms and seasonal differencing.
- ARIMAX: includes exogenous regressors (not purely univariate).