Pre-print v1.0.0

Regression to the Mean in Small-Area Health Estimates

Corey Schuman, MS

Abstract

Background: We tried to predict which communities would get healthier or sicker—and failed completely. Our best machine learning models achieved F1=0.26, indistinguishable from random guessing. This paper explains why: the year-over-year "changes" in CDC PLACES data are mostly measurement noise, not real health trends.

Objective: To investigate why trajectory prediction fails for CDC PLACES small-area health estimates and determine whether the failure reflects measurement error or genuine unpredictability in community health dynamics.

Methods: We analyzed autocorrelation structure of year-over-year changes in a Composite Health Burden Index (CHBI) across 72,161 census tracts. We stratified by prior change magnitude to distinguish regression to the mean from true mean-reverting dynamics.

Results: Year-over-year changes showed strong negative autocorrelation (r = -0.22 to -0.58). Critically, this correlation scaled with prior change magnitude: near-zero (r = -0.05) for small prior changes, strongly negative (r = -0.61) for extreme prior changes. This quintile gradient is consistent with regression to the mean. However, CHBI levels were 99.7% persistent—community health burden itself is highly stable.

Conclusions: Annual trajectory labels ("improving," "declining") based on PLACES data are unreliable and should not be used for resource allocation. Focus on burden levels, not year-over-year changes. Use 3-year rolling averages when trend information is needed.

Keywords: small-area estimation ,regression to the mean ,health disparities ,CDC PLACES ,trajectory prediction ,measurement error

What we tried to do

We built machine learning models to predict which neighborhoods would get healthier or sicker next year. If this worked, health departments could intervene before problems develop instead of reacting after.

What happened

Total failure. Our predictions were no better than flipping a coin. A neighborhood labeled "improving" one year was actually more likely to be labeled "declining" the next year than to keep improving.

Why it failed

The year-to-year "changes" in CDC health data are mostly measurement noise, not real health trends. It's like weighing yourself on a wobbly scale—the number bounces around, but your actual weight hasn't changed.

The good news

While year-to-year changes are noisy, the underlying health levels are 99.7% stable. High-burden communities stay high-burden. This is actionable: focus on where the burden IS, not where you think it's going.

Why We're Publishing This

We spent months building prediction models that don't work. Machine learning, gradient boosting, 47 features, careful cross-validation—and the result was indistinguishable from random guessing.

This matters because "early warning systems" are tempting. Health departments want to identify communities that will decline so they can intervene early. Funders want to predict which investments will pay off. The appeal is obvious.

But with CDC PLACES data, these systems cannot work as intended. We're publishing this so others don't waste resources chasing the same dead end—and so practitioners know to focus on burden levels rather than trajectory labels.

Introduction

The promise of predictive analytics in public health has generated substantial interest in developing "early warning systems" that could identify communities at risk of health decline before deterioration occurs. Such systems could theoretically enable proactive resource allocation, allowing health departments to intervene in communities predicted to decline rather than reacting after problems emerge.

We undertook a systematic effort to develop such a prediction system. Using five years of CDC PLACES data (2020-2024) covering 72,161 census tracts, we constructed a Composite Health Burden Index (CHBI) and attempted to predict which communities would improve, decline, or remain stable. We employed gradient-boosted decision trees with 47 features including prior trajectories, spatial context, and demographic covariates.

Despite this comprehensive approach, our best models achieved only F1=0.26—performance indistinguishable from random classification. Feature importance analysis revealed that prior trajectory features contributed negatively to model performance. Adding these features made predictions worse, not better.

This counterintuitive finding demanded explanation.

The Stakes of Trajectory Prediction

The practical stakes are considerable. If trajectory prediction were reliable, it could enable anticipatory intervention—deploying resources before health deterioration becomes entrenched.

However, unreliable trajectory prediction could cause harm. Labeling a community as "declining" based on noisy data could trigger unnecessary intervention while diverting resources from communities with genuine need. Labeling a community as "improving" could justify inaction when intervention is warranted.

Beyond resource allocation, trajectory labels affect community narratives. Being labeled a "declining" community may discourage investment, affect property values, or stigmatize residents—even if the label reflects measurement error rather than genuine health trends.

Methods

Data Source

We obtained CDC PLACES data for release years 2020-2024, which provide model-based small-area estimates of health outcomes for all U.S. census tracts. CDC PLACES uses multilevel regression and poststratification (MRP), combining BRFSS survey data with American Community Survey demographics.

Composite Health Burden Index

We constructed a CHBI as the arithmetic mean of seven standardized PLACES measures: obesity, diabetes, coronary heart disease, high blood pressure, smoking, lack of insurance, and physical inactivity.

Trajectory Classification

Three classes based on year-over-year CHBI change:

Improving: CHBI decreased by >0.3 SD
Declining: CHBI increased by >0.3 SD
Stable: CHBI change within ±0.3 SD

Distinguishing RTM from True Dynamics

Under regression to the mean (RTM), negative autocorrelation should be stronger for extreme prior changes—because extreme observations are more likely to reflect measurement error. We stratified tracts into quintiles by prior change magnitude and computed autocorrelation within each quintile.

Results

Prediction Performance

Metric	Value	Random Baseline
Macro F1	0.26	0.25
Balanced Accuracy	0.33	0.33

Performance was indistinguishable from chance. Prior trajectory features had negative SHAP values—including them made predictions worse.

The Quintile Gradient

Prior Change Quintile	Correlation (r)	Variance Explained
Q1 (smallest changes)	-0.05	0.3%
Q2	-0.16	2.4%
Q3	-0.29	8.1%
Q4	-0.42	17.7%
Q5 (largest changes)	-0.61	37.0%

This gradient is the signature of regression to the mean. Only extreme prior changes show strong negative autocorrelation. Near-zero correlation for small changes indicates no genuine mean reversion among stable tracts.

Level Persistence

Despite unstable year-over-year changes, CHBI levels were highly persistent:

R² = 99.7%

of next-year health burden explained by current burden

High-burden communities remain high-burden. The "unpredictability" is confined to year-over-year noise, not to underlying health status.

Discussion

Principal Findings

Trajectory prediction failed because year-over-year changes in PLACES estimates contain substantial measurement error. The quintile gradient—where extreme changes show strong reversion but small changes show none—is consistent with regression to the mean as the dominant mechanism.

Implications for Practice

Avoid trajectory labels. A tract classified as "improving" is more likely to be "declining" next year than to remain "improving."
Focus on levels, not changes. CHBI levels are 99.7% persistent. High-burden tracts remain high-burden—this is actionable.
Use 3-year rolling averages. Multi-year averages smooth measurement noise when trend information is needed.
Build monitoring systems, not early warning systems. Prediction cannot work; responsive monitoring can.

What This Means for CDC PLACES

These findings do not impugn CDC PLACES—they identify appropriate and inappropriate uses. PLACES excels at cross-sectional snapshots: identifying high-burden areas for targeting. The limitation concerns longitudinal trajectory analysis, where model-based estimation may introduce systematic artifacts.

Conclusions

The failure of CDC PLACES-based trajectory prediction (F1=0.26) is consistent with regression to the mean in small-area health estimates. Annual trajectory labels are unreliable and should not be used for resource allocation.

Specific recommendations:

Remove trajectory labels from Community Health Improvement Plans and dashboards
Focus resource allocation on current burden levels, which are stable and reliable
Use three-year rolling averages when trend information is needed
Target communities with persistently high burden (3+ years above threshold)
Build responsive monitoring systems rather than predictive early warning systems

We close by emphasizing what our findings do not mean. Community health is not chaotic—levels are 99.7% stable. The challenge is that annual changes in PLACES estimates are too noisy to support trajectory prediction. Better measurement could enable better prediction; in the interim, focusing on burden levels represents the most defensible use of available data.