In one sentence
Adjusted R-squared (often written \(\bar{R}^2\)) is a version of \(R^2\) that accounts for model complexity by penalizing you for adding predictors that do not meaningfully improve fit.
Definition and formula
For an OLS regression with \(n\) observations and \(k\) predictors (excluding the intercept):
\[ \bar{R}^2 = 1 - (1 - R^2) \frac{n - 1}{n - k - 1} \]
Intuition: if you add a predictor that only increases \(R^2\) a tiny amount, \(\bar{R}^2\) can fall because you used up degrees of freedom.
How to interpret it
- Values closer to 1 imply better in-sample fit, after accounting for the number of predictors.
- It is mainly useful for comparing models estimated on the same dataset with the same dependent variable.
What it is good for (and not good for)
Good for:
- A quick sanity check when comparing nested models (same \(y\), different sets of \(x\)’s).
- Detecting “added predictors for no real gain”.
Not good for:
- Comparing models with different dependent variables.
- Predictive performance evaluation (use out-of-sample tests / cross-validation for that).
Model selection overview
flowchart TD
A["Fit baseline model"] --> B["Add candidate predictor(s)"]
B --> C{"Does Adj R2 increase?"}
C -- Yes --> D["Keep predictor<br/>(if signs, units, and theory make sense)"]
C -- No --> E["Drop predictor<br/>(or rethink transformation/specification)"]
D --> F["Also check: AIC/BIC, residual plots,<br/>multicollinearity, and out-of-sample error"]
E --> F
Related Terms with Definitions
- Coefficient of Determination (R-squared): A measure that indicates the proportion of the variance in the dependent variable predictable from the independent variable(s).
- T-test: A hypothesis test used to determine the significance of individual regression coefficients.
- F-test: A statistical test used to determine the joint significance of group(s) of variables in a model.
- Degrees of Freedom: The number of values in the final calculation of a statistic that are free to vary.
Quiz
### Adjusted R-squared accounts for:
- [ ] Only the total number of observations
- [x] The number of predictors in the model
- [ ] Only the goodness of fit
- [ ] The intercept of the model
> **Explanation:** Adjusted R-squared adjusts for the number of predictors in the model to provide an accurate measure of the explained variability.
### Which test helps determine the significance of adding a new predictor?
- [x] t-Test
- [ ] z-Test
- [ ] Shapiro-Wilk Test
- [ ] Cochrane Test
> **Explanation:** The t-test is used to determine the individual significance of adding a new variable in the regression model.
### True or False: Adjusted R-squared can be higher than R-squared?
- [ ] True
- [x] False
> **Explanation:** Adjusted R-squared accounts for degrees of freedom and can never be higher than R-squared; it can only be equal or less.
### Adjusted R-squared will decrease when:
- [x] A new variable fails to improve the model.
- [ ] A model loses accuracy.
- [ ] The sample size is large.
- [ ] The variance increases.
> **Explanation:** Adjusted R-squared will decrease if an added variable does not contribute much to the model’s explanatory power.
### Which of the following is better for choosing a model?
- [ ] Highest R-squared
- [x] Highest adjusted R-squared
- [ ] Lowest p-value
- [ ] Highest t-test statistic
> **Explanation:** Highest adjusted R-squared is a better metric because it accounts for the number of predictors and provides a more truthful measure of model fitness.
### Degrees of freedom in the adjusted R-squared context refers to:
- [ ] The total number of observations
- [ ] The total number of variables
- [x] The difference between the number of observations and the number of predictors
- [ ] The number of independent variables
> **Explanation:** Degrees of freedom in adjusted R-squared context represents the difference between the number of observations and the number of predictors.
### Adjusted R-squared vs R-squared: Adjusted R-squared is:
- [ ] Often higher
- [ ] Usually equal
- [x] Adjusted for the number of predictors
- [ ] Dependent on the intercept
> **Explanation:** Adjusted R-squared corrects for the number of predictors, providing a more accurate measure of model fitness.
### What does a low adjusted R-squared indicate?
- [ ] The model is a good fit.
- [x] The model does not explain much of the variability in the dependent variable.
- [ ] There is multicollinearity.
- [ ] The sample size is large.
> **Explanation:** A low adjusted R-squared indicates that the model does not explain a significant portion of the variance in the dependent variable.
### If an added variable reduces the adjusted R-squared, it implies:
- [ ] Improved model fit
- [x] The variable does not add explanatory power.
- [ ] Higher p-value
- [ ] Independent observations
> **Explanation:** A decrease in adjusted R-squared after adding a variable suggests that the new variable does not add significant explanatory power to the model.
### To increase adjusted R-squared:
- [x] Add relevant variables.
- [ ] Increase sample size.
- [ ] Remove degrees of freedom.
- [ ] Lower the intercept.
> **Explanation:** Adjusted R-squared increases when relevant variables that significantly contribute to the model's explanatory power are added.