Autoregressive Integrated Moving Average (ARIMA) Model

A time-series forecasting model that combines autoregression, differencing, and moving-average shocks.

An ARIMA ((p,d,q)) model is a univariate time-series model that forecasts a series using a combination of (1) its own lagged values, (2) lagged shocks, and (3) differencing to handle persistent trends or unit-root behavior.

Model Form (Lag-Operator Notation)

One common compact representation is:

[ \phi(L)(1-L)^d y_t = c + \theta(L)\varepsilon_t ]

where:

  • (L) is the lag operator ((Ly_t = y_{t-1})),
  • (d) is the number of differences applied to achieve approximate stationarity,
  • (\phi(L)) is an autoregressive polynomial of order (p),
  • (\theta(L)) is a moving-average polynomial of order (q),
  • (\varepsilon_t) is an unpredictable shock (innovation).

The Three Components (Plain Language)

  • AR ((p)): today’s value depends on past values.
  • I ((d)): differences remove persistent level effects (for example, trends or unit roots).
  • MA ((q)): today’s value depends on past shocks (forecast errors).

How ARIMA Is Used (Box-Jenkins Workflow)

  1. Make the series stationary (choose \(d\); optionally transform logs).
  2. Identify \(p\) and \(q\) using the ACF/PACF and diagnostics.
  3. Estimate the model parameters.
  4. Diagnose residuals (should look like white noise).
  5. Forecast and evaluate out-of-sample performance.

Practical Notes And Limitations

  • ARIMA is primarily a forecasting tool; it does not by itself identify causal effects.
  • Structural breaks (policy regime changes, measurement changes, crises) can make historical ARIMA relationships unstable.
  • Over-differencing can remove useful long-run information and create artificial moving-average behavior.

Common Extensions

  • SARIMA: adds seasonal AR/MA terms and seasonal differencing.
  • ARIMAX: includes exogenous regressors (not purely univariate).