Omitted Variable Bias

Why regression estimates become biased when a relevant explanatory variable is left out.

Omitted variable bias happens when a model leaves out a factor that affects the outcome and is correlated with an included regressor. In plain language, the model gives credit to the wrong variable because an important driver is missing.

Core Mechanics

Suppose the true data-generating process is:

[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u ]

but we estimate:

[ y = \alpha_0 + \alpha_1 x_1 + e ]

If (x_2) is omitted and correlated with (x_1), then:

[ \text{Bias}(\hat{\alpha}_1) = \beta_2 \cdot \frac{\operatorname{Cov}(x_1, x_2)}{\operatorname{Var}(x_1)} ]

The sign of the bias depends on two signs: the effect of the omitted factor ((\beta_2)) and its correlation with (x_1).

Why It Matters In Economics

OVB is one of the main reasons naive correlations are often misread as causal effects. For example, estimating the return to education without controlling for ability can overstate or understate the true return, depending on how ability and schooling move together.

In policy work, omitted variables can produce expensive mistakes: a subsidy may look effective simply because the treated group had better initial conditions that were not modeled.

Common Mitigation Strategies

  • Include the most relevant controls based on economic theory, not just statistical fit.
  • Use fixed effects when unobserved heterogeneity is stable over time.
  • Use instrumental variables when key regressors remain endogenous.
  • Check robustness across alternative specifications.