Autoregressive Integrated Moving Average (ARIMA) Model

An ARIMA ((p,d,q)) model is a univariate time-series model that forecasts a series using a combination of (1) its own lagged values, (2) lagged shocks, and (3) differencing to handle persistent trends or unit-root behavior.

Model Form (Lag-Operator Notation)

One common compact representation is:

[ \phi(L)(1-L)^d y_t = c + \theta(L)\varepsilon_t ]

where:

(L) is the lag operator ((Ly_t = y_{t-1})),
(d) is the number of differences applied to achieve approximate stationarity,
(\phi(L)) is an autoregressive polynomial of order (p),
(\theta(L)) is a moving-average polynomial of order (q),
(\varepsilon_t) is an unpredictable shock (innovation).

The Three Components (Plain Language)

AR ((p)): today’s value depends on past values.
I ((d)): differences remove persistent level effects (for example, trends or unit roots).
MA ((q)): today’s value depends on past shocks (forecast errors).

How ARIMA Is Used (Box-Jenkins Workflow)

Make the series stationary (choose \(d\); optionally transform logs).
Identify \(p\) and \(q\) using the ACF/PACF and diagnostics.
Estimate the model parameters.
Diagnose residuals (should look like white noise).
Forecast and evaluate out-of-sample performance.

Practical Notes And Limitations

ARIMA is primarily a forecasting tool; it does not by itself identify causal effects.
Structural breaks (policy regime changes, measurement changes, crises) can make historical ARIMA relationships unstable.
Over-differencing can remove useful long-run information and create artificial moving-average behavior.

Common Extensions

SARIMA: adds seasonal AR/MA terms and seasonal differencing.
ARIMAX: includes exogenous regressors (not purely univariate).