Regression to the Mean in Small-Area Health Estimates
Abstract
Why We're Publishing This
We spent months building prediction models that don't work. Machine learning, gradient boosting, 47 features, careful cross-validation—and the result was indistinguishable from random guessing.
This matters because "early warning systems" are tempting. Health departments want to identify communities that will decline so they can intervene early. Funders want to predict which investments will pay off. The appeal is obvious.
But with CDC PLACES data, these systems cannot work as intended. We're publishing this so others don't waste resources chasing the same dead end—and so practitioners know to focus on burden levels rather than trajectory labels.
Introduction
The promise of predictive analytics in public health has generated substantial interest in developing "early warning systems" that could identify communities at risk of health decline before deterioration occurs. Such systems could theoretically enable proactive resource allocation, allowing health departments to intervene in communities predicted to decline rather than reacting after problems emerge.
We undertook a systematic effort to develop such a prediction system. Using five years of CDC PLACES data (2020-2024) covering 72,161 census tracts, we constructed a Composite Health Burden Index (CHBI) and attempted to predict which communities would improve, decline, or remain stable. We employed gradient-boosted decision trees with 47 features including prior trajectories, spatial context, and demographic covariates.
Despite this comprehensive approach, our best models achieved only F1=0.26—performance indistinguishable from random classification. Feature importance analysis revealed that prior trajectory features contributed negatively to model performance. Adding these features made predictions worse, not better.
This counterintuitive finding demanded explanation.
The Stakes of Trajectory Prediction
The practical stakes are considerable. If trajectory prediction were reliable, it could enable anticipatory intervention—deploying resources before health deterioration becomes entrenched.
However, unreliable trajectory prediction could cause harm. Labeling a community as "declining" based on noisy data could trigger unnecessary intervention while diverting resources from communities with genuine need. Labeling a community as "improving" could justify inaction when intervention is warranted.
Beyond resource allocation, trajectory labels affect community narratives. Being labeled a "declining" community may discourage investment, affect property values, or stigmatize residents—even if the label reflects measurement error rather than genuine health trends.
Methods
Data Source
We obtained CDC PLACES data for release years 2020-2024, which provide model-based small-area estimates of health outcomes for all U.S. census tracts. CDC PLACES uses multilevel regression and poststratification (MRP), combining BRFSS survey data with American Community Survey demographics.
Composite Health Burden Index
We constructed a CHBI as the arithmetic mean of seven standardized PLACES measures: obesity, diabetes, coronary heart disease, high blood pressure, smoking, lack of insurance, and physical inactivity.
Trajectory Classification
Three classes based on year-over-year CHBI change:
- Improving: CHBI decreased by >0.3 SD
- Declining: CHBI increased by >0.3 SD
- Stable: CHBI change within ±0.3 SD
Distinguishing RTM from True Dynamics
Under regression to the mean (RTM), negative autocorrelation should be stronger for extreme prior changes—because extreme observations are more likely to reflect measurement error. We stratified tracts into quintiles by prior change magnitude and computed autocorrelation within each quintile.
Results
Prediction Performance
| Metric | Value | Random Baseline |
|---|---|---|
| Macro F1 | 0.26 | 0.25 |
| Balanced Accuracy | 0.33 | 0.33 |
Performance was indistinguishable from chance. Prior trajectory features had negative SHAP values—including them made predictions worse.
The Quintile Gradient
| Prior Change Quintile | Correlation (r) | Variance Explained |
|---|---|---|
| Q1 (smallest changes) | -0.05 | 0.3% |
| Q2 | -0.16 | 2.4% |
| Q3 | -0.29 | 8.1% |
| Q4 | -0.42 | 17.7% |
| Q5 (largest changes) | -0.61 | 37.0% |
This gradient is the signature of regression to the mean. Only extreme prior changes show strong negative autocorrelation. Near-zero correlation for small changes indicates no genuine mean reversion among stable tracts.
Level Persistence
Despite unstable year-over-year changes, CHBI levels were highly persistent:
R² = 99.7%
of next-year health burden explained by current burden
High-burden communities remain high-burden. The "unpredictability" is confined to year-over-year noise, not to underlying health status.
Discussion
Principal Findings
Trajectory prediction failed because year-over-year changes in PLACES estimates contain substantial measurement error. The quintile gradient—where extreme changes show strong reversion but small changes show none—is consistent with regression to the mean as the dominant mechanism.
Implications for Practice
- Avoid trajectory labels. A tract classified as "improving" is more likely to be "declining" next year than to remain "improving."
- Focus on levels, not changes. CHBI levels are 99.7% persistent. High-burden tracts remain high-burden—this is actionable.
- Use 3-year rolling averages. Multi-year averages smooth measurement noise when trend information is needed.
- Build monitoring systems, not early warning systems. Prediction cannot work; responsive monitoring can.
What This Means for CDC PLACES
These findings do not impugn CDC PLACES—they identify appropriate and inappropriate uses. PLACES excels at cross-sectional snapshots: identifying high-burden areas for targeting. The limitation concerns longitudinal trajectory analysis, where model-based estimation may introduce systematic artifacts.
Conclusions
The failure of CDC PLACES-based trajectory prediction (F1=0.26) is consistent with regression to the mean in small-area health estimates. Annual trajectory labels are unreliable and should not be used for resource allocation.
Specific recommendations:
- Remove trajectory labels from Community Health Improvement Plans and dashboards
- Focus resource allocation on current burden levels, which are stable and reliable
- Use three-year rolling averages when trend information is needed
- Target communities with persistently high burden (3+ years above threshold)
- Build responsive monitoring systems rather than predictive early warning systems
We close by emphasizing what our findings do not mean. Community health is not chaotic—levels are 99.7% stable. The challenge is that annual changes in PLACES estimates are too noisy to support trajectory prediction. Better measurement could enable better prediction; in the interim, focusing on burden levels represents the most defensible use of available data.