Bootstrap

Background

Introduced by Bradley Efron in 1979, the bootstrap made it practical to estimate sampling distributions without heavy parametric assumptions. Modern computing allows thousands of resamples to approximate confidence intervals and standard errors even for complex estimators.

Definitions and Concepts

Bootstrap sample: A sample of size (n) drawn with replacement from the original dataset of size (n).
Bootstrap replication: The statistic (e.g., mean, median, regression coefficient) computed on a bootstrap sample.
Bootstrap distribution: The empirical distribution of many replications, used to estimate standard errors and confidence intervals.
Percentile interval: Interval between chosen percentiles (e.g., 2.5th and 97.5th) of the bootstrap distribution.

    flowchart LR
	  data[Original data (n observations)]
	  resample[Resample with replacement\nmany times]
	  stat[Compute statistic\non each resample]
	  dist[Bootstrap distribution\nof statistic]
	  ci[Standard errors & CIs]
	  data --> resample --> stat --> dist --> ci

Practical Steps

Draw a bootstrap sample of size (n) with replacement from the observed data.
Compute the statistic of interest.
Repeat steps 1–2 many times (e.g., 1,000–10,000).
Use the empirical distribution of the replications to estimate standard errors, bias, and intervals.

Notes and Limitations

Works well when the sample represents the population and when observations are i.i.d.
Time-series or clustered data require adaptations (block bootstrap, cluster bootstrap).
Heavy computation can be a constraint with very large datasets, though modern hardware mitigates this.

Jackknife: Resampling by systematically leaving out observations.
Monte Carlo simulation: Random sampling to study models or estimators, often from known distributions.
Bias correction: Adjusting estimates using the bootstrap-estimated bias.
Confidence interval: Range expected to contain the true parameter with a stated probability.

Quiz

1. The bootstrap resamples: - [x] With replacement from the observed data - [ ] Without replacement - [ ] From a theoretical distribution only - [ ] Only residuals > **Explanation:** Standard bootstrap draws with replacement from the original sample. 2. A bootstrap replication is: - [x] The statistic computed on one resample - [ ] The original sample size - [ ] A parametric assumption - [ ] A hypothesis test > **Explanation:** Each resample yields one replication of the statistic. 3. The bootstrap distribution is used to estimate: - [x] Standard errors and confidence intervals - [ ] Tax rates - [ ] Exchange rates - [ ] Inventory turnover > **Explanation:** It approximates the sampling distribution of the statistic. 4. The percentile method takes: - [x] Selected percentiles of the bootstrap distribution as interval endpoints - [ ] Only the mean of the bootstrap distribution - [ ] Only the maximum - [ ] Only the minimum > **Explanation:** Percentiles define interval bounds (e.g., 2.5th and 97.5th). 5. Which assumption is typical for the simple bootstrap? - [x] Observations are independent and identically distributed - [ ] Data are perfectly normal - [ ] Samples must be infinite - [ ] Parameters are known > **Explanation:** i.i.d. data underpin the classic bootstrap; variants handle dependence. 6. What is a block bootstrap used for? - [x] Dependent data such as time series - [ ] Perfectly independent samples - [ ] Cross-sectional surveys only - [ ] Parametric simulations > **Explanation:** Blocks preserve dependence structures when resampling. 7. Bootstrap bias is: - [x] The difference between the average bootstrap statistic and the original estimate - [ ] Always zero - [ ] Set by the sample mean - [ ] The same as variance > **Explanation:** Bias estimates how far the estimator tends to drift from the original sample estimate. 8. Increasing the number of bootstrap replications generally: - [x] Yields more stable interval estimates - [ ] Increases estimator bias - [ ] Makes the method invalid - [ ] Eliminates computation time > **Explanation:** More resamples reduce Monte Carlo error in estimated intervals. 9. A key advantage of the bootstrap is: - [x] Minimal reliance on parametric distributional assumptions - [ ] It replaces all data collection - [ ] It ensures exact answers for small samples - [ ] It never requires computation > **Explanation:** The bootstrap is non-parametric and flexible across many estimators. 10. In regression, bootstrapping residuals can: - [x] Preserve the design matrix while resampling model errors - [ ] Eliminate heteroskedasticity automatically - [ ] Guarantee normality - [ ] Remove the need for diagnostics > **Explanation:** Residual bootstrap keeps predictors fixed and resamples errors to assess estimator variability.