Spatial Synchrony, Not Contagion: A Methodological Correction in Community Health Trajectory Prediction
Abstract
Why We're Publishing This
We got it wrong, and that matters. Our preliminary analysis suggested a dramatic finding: that health changes "spread" across community boundaries with neighbor trajectories being 3.8x more predictive than a community's own history. This would have been a paradigm shift in public health intervention design.
It was wrong. What we found was an artifact of a methodological error called temporal data leakage—we accidentally used future data to predict the future. When we fixed the error, the "spatial contagion" effect vanished completely.
We're publishing this correction prominently because:
- Science requires transparency about errors. Burying mistakes creates a misleading scientific record.
- This error is easy to make. Others working with spatial health data may benefit from seeing how temporal leakage manifests.
- The negative finding matters for policy. Interventions designed around "spillover effects" would waste resources if those effects don't exist.
- Trust requires honesty. We want policymakers to trust our findings—and that means being upfront about what we got wrong.
1. Introduction
1.1 The Spatial Contagion Hypothesis
A compelling hypothesis in spatial epidemiology posits that health changes "spread" across community boundaries—that a community's health trajectory is influenced not just by its own characteristics but by the trajectories of its neighbors. This "spatial contagion" framing draws on social network research demonstrating the spread of behaviors through connected individuals and neighborhood effects theory suggesting that place-based exposures propagate across geographic boundaries.
Initial analyses of CDC PLACES tract-level health data appeared to support this hypothesis dramatically. Preliminary models suggested that the average health trajectory of neighboring census tracts was 3.8 times more predictive of a focal tract's future trajectory than the tract's own historical trend—a finding that, if valid, would represent a paradigm shift in how we conceptualize community health interventions.
1.2 The Problem of Temporal Leakage
However, this striking finding warranted methodological scrutiny. A critical question emerged: When computing "neighbor trajectory," what time period is used?
In predictive modeling, features must be constructed using only information available before the outcome period. If predicting health trajectory from year T-1 to year T, legitimate predictive features can only use data from years T-2, T-3, etc. Using data from year T in feature construction constitutes temporal data leakage—the model appears predictive because it "sees" contemporaneous information, not because it captures genuine predictive signal.
This distinction matters enormously for the spatial contagion hypothesis:
- Contemporaneous correlation: Neighbors change together at the same time (spatial synchrony)
- Predictive contagion: Prior neighbor change predicts future focal change
Only the second supports policy interventions targeting spatial spillovers.
2. Methods
2.1 Data Source and Sample
We analyzed CDC PLACES tract-level health estimates from 2020-2026 releases, harmonized to a consistent panel structure. The analytic sample comprised 189,566 tract-year observations from 72,161 unique census tracts across 51 states and territories, with prediction years 2022, 2023, and 2024.
2.2 Outcome Variable
We constructed a Composite Health Burden Index (CHBI) as a weighted average of seven PLACES measures: obesity (20%), diabetes (20%), coronary heart disease (15%), mental health days (15%), hypertension (10%), physical inactivity (10%), and physical health days (10%). Tracts were classified as DECLINE (>0.3 SD increase in CHBI), STABLE, or IMPROVE (<0.3 SD decrease).
2.3 Spatial Feature Construction
We built a Queen contiguity neighbor graph from Census TIGER/Line tract geometries (mean: 6.2 neighbors per tract). The critical methodological comparison involved two specifications:
Specification A: Contemporaneous (Leaked)
neighbor_avg_change: Mean neighbor CHBI change from year T-1 to year Tneighbor_avg_chbi: Mean neighbor CHBI in year T
Problem: Uses year T data to predict year T outcome
Specification B: Properly Lagged
neighbor_avg_change: Mean neighbor CHBI change from year T-2 to year T-1neighbor_avg_chbi: Mean neighbor CHBI in year T-1
Correct: Uses only pre-outcome information
2.4 Model Training and Evaluation
We trained XGBoost classifiers with temporal cross-validation (train: 2022-2023; test: 2024). To assess feature importance, we used:
- Permutation importance on the holdout test set (10 repeats, 95% bootstrap CIs)
- Ablation experiments: Model performance with (a) all features, (b) no spatial features, (c) spatial features only
- Linear regression baseline comparing R² with and without spatial features
3. Results
3.1 The Temporal Leakage Effect
Table 1. Permutation Importance on Holdout Data (2024)
| Feature | Contemporaneous (Leaked) | Properly Lagged |
|---|---|---|
| neighbor_avg_change | 0.0390 [0.037, 0.042] | 0.0010 [0.0009, 0.0012] |
| CHBI_change_1yr | 0.0023 [0.002, 0.003] | 0.0009 [0.0006, 0.0013] |
| Ratio | 16.7x | 1.12x |
| CIs Overlap? | No | Yes |
With contemporaneous features, neighbor_avg_change appeared 16.7 times more
important than own-tract change, with non-overlapping confidence intervals suggesting
statistical significance. After correction, the ratio dropped to 1.12x with
overlapping CIs—no significant difference.
3.2 Ablation Experiments
Table 2. Ablation Experiment Results (Macro-F1 with 95% Bootstrap CIs)
| Model | Contemporaneous | Properly Lagged [95% CI] |
|---|---|---|
| Full model (all features) | 0.320 | 0.260 [0.259, 0.261] |
| No spatial features | 0.261 | 0.261 [0.260, 0.263] |
| Spatial features only | 0.415 | 0.259 [0.258, 0.260] |
| Spatial contribution | +18.2% | -0.4% |
With leaked features, spatial-only models (F1=0.415) dramatically outperformed the full model—an impossibility in proper machine learning that signals data leakage. After correction, confidence intervals for all three model specifications overlap, confirming no statistically significant difference.
3.3 Spatial Autocorrelation
Table 3. Global Moran's I with Significance Testing
| Variable | Moran's I | Z-score | p-value | Interpretation |
|---|---|---|---|---|
| CHBI levels | 0.757 | 254.0 | <0.001 | Very strong positive autocorrelation |
| Trajectory outcomes | 0.211 | 69.7 | <0.001 | Moderate clustering |
Both statistics are highly significant (p<0.001), confirming strong spatial structure in health outcomes. Communities near each other have similar health burdens (I=0.76) and somewhat similar trajectory outcomes (I=0.21). However, this spatial clustering does not translate to predictive contagion—knowing neighbor levels does not help predict focal changes.
4. Discussion
4.1 Principal Findings
Our central finding is methodological: the apparent "spatial contagion" in community health trajectories was an artifact of temporal data leakage.
When neighbor trajectory features were computed contemporaneously with the outcome period (year T-1 to T), they appeared 16.7 times more predictive than a community's own historical trend. After correcting to properly lagged features (year T-2 to T-1), this advantage vanished entirely.
This represents spatial synchrony, not spatial contagion. Communities near each other experience health changes at the same time—likely due to shared exposures, economic shocks, policy changes, or healthcare system factors—but prior neighbor trajectories do not predict future focal trajectories.
4.2 Why This Matters
The distinction between synchrony and contagion has critical policy implications:
If contagion were real:
- Interventions in one community could spillover to neighbors
- "Buffer zone" investments around improving areas would be justified
- Regional intervention units would be more efficient than community-level
Given synchrony (our finding):
- Apparent spatial patterns reflect shared causes, not causal spillovers
- Targeting clusters may help efficiency but not due to propagation effects
- Interventions must address root causes, not rely on geographic diffusion
5. Conclusions
We found no evidence that neighboring community health trajectories predict focal community trajectories beyond what the focal community's own history predicts. The striking "spatial contagion" finding from preliminary analyses was entirely attributable to temporal data leakage.
Communities exhibit spatial synchrony—they change together—but this reflects shared causes, not causal spillovers. Public health interventions cannot rely on geographic diffusion effects; they must address the root causes of community health trajectories directly.
This finding underscores the critical importance of methodological rigor in spatial health research. Apparent patterns must be tested against proper temporal specifications before informing policy.
Acknowledgments
We thank the anonymous peer reviewers whose rigorous critique identified the temporal leakage issue that fundamentally changed this paper's conclusions. This is how science is supposed to work.
Data Availability
All data are from publicly available CDC PLACES releases. Code for data processing, feature engineering, model training, and the methodological correction analysis is available at: github.com/cschuman/resilience-mapping