Analysis of Variance

A statistical technique to decompose total population variance into parts to test significance of differences among groups.

In one sentence

Analysis of variance (ANOVA) tests whether group means differ more than you would expect from within-group variability.

Core idea

ANOVA decomposes total variation into:

  • Between-group variation (differences in group means)
  • Within-group variation (noise/dispersion inside groups)

If the between-group component is large relative to within-group variation, at least one group mean likely differs.

The F statistic (one-way ANOVA)

For a one-way ANOVA with \(k\) groups:

\[ F = \frac{MS_{between}}{MS_{within}} \]

Where \(MS\) are mean squares (sum of squares divided by degrees of freedom).

Assumptions (standard one-way)

  • Independent observations (no correlation across units)
  • Approximately normal errors within groups (less critical in large samples)
  • Homogeneity of variances (or use robust alternatives when violated)

If observations are correlated (panel/repeated measures), you typically use repeated-measures ANOVA or mixed models.

Visual map

    flowchart TD
	  Data["Outcome data + group labels"] --> Split["Split variation"]
	  Split --> Between["Between-group variation"]
	  Split --> Within["Within-group variation"]
	  Between --> F["F = MS_between / MS_within"]
	  Within --> F
	  F --> Decision{"p-value small?"}
	  Decision -- Yes --> Result["Reject equal means"]
	  Decision -- No --> Result2["Fail to reject"]
  • Variance: A measure of the dispersion of a set of data points around their mean.
  • Statistical Significance: A determination that a relationship between variables is caused by something other than chance.
  • Sampling Error: The discrepancy between the sample statistic and the population parameter due to random selection.

Quiz

### What does ANOVA stand for? - [ ] Analysis of Values - [ ] Annual Number of Variations - [x] Analysis of Variance - [ ] Average Numerical Output of Values > **Explanation:** ANOVA stands for Analysis of Variance. It is used to determine if there are any statistically significant differences between the means of independent groups. ### One-way ANOVA is primarily used to compare: - [ ] Variance between two groups - [x] Variance among three or more groups - [ ] Variance within a single group - [ ] Variance in a dependent variable only > **Explanation:** One-way ANOVA is used to compare the means across three or more independent groups to find out if at least one of the group means is statistically different from the others. ### True or False: ANOVA can be used for correlated groups. - [ ] True - [x] False > **Explanation:** Standard one-way ANOVA assumes independent observations; correlated designs typically use repeated-measures ANOVA or mixed models. ### Which of the following assumptions is not required for ANOVA? - [ ] Homogeneity of variances - [ ] Independence of observations - [ ] Normal distribution - [x] Linear relationship between variables > **Explanation:** A linear relationship between variables is not one of the assumptions of ANOVA. ### When is the F-test relevant in the context of ANOVA? - [x] When comparing variances between groups - [ ] When analyzing a single sample - [ ] When performing a regression analysis - [ ] When assessing a linear trend > **Explanation:** The F-test is used within ANOVA to compare variances between groups to determine if there are significant differences. ### In one-way ANOVA, the null hypothesis is typically: - [x] All group means are equal - [ ] All group variances are equal - [ ] The dependent variable has no mean - [ ] There are exactly two groups > **Explanation:** The test asks whether mean differences are larger than expected from within-group noise. ### One-way ANOVA can be written as a regression with: - [x] Group indicator (dummy) variables - [ ] Only an intercept and no regressors - [ ] Only a time trend - [ ] Only lagged dependent variables > **Explanation:** ANOVA and regression are algebraically equivalent for many designs. ### If the homogeneity of variances assumption is strongly violated, a common alternative is: - [x] Welch’s ANOVA (or other heteroskedasticity-robust approaches) - [ ] Always use a coin flip - [ ] Drop all observations - [ ] Assume variances are equal anyway > **Explanation:** Welch’s ANOVA relaxes equal-variance assumptions. ### After rejecting equal means, a common next step is: - [x] Post-hoc multiple-comparison tests (e.g., Tukey) to see which groups differ - [ ] Conclude every pair differs by the same amount - [ ] Ignore the result and keep the null - [ ] Replace ANOVA with a CPI calculation > **Explanation:** Post-hoc tests control error rates when comparing many pairs. ### True or False: With only two groups, one-way ANOVA is closely related to a two-sample t-test. - [x] True - [ ] False > **Explanation:** For two groups, the ANOVA F-test and the squared t-test coincide under standard assumptions.