Pre-print v1.0.0

Spatial Synchrony, Not Contagion: A Methodological Correction in Community Health Trajectory Prediction

Corey Schuman
Download PDF

Abstract

Background: We found no evidence that neighboring community health trajectories predict focal community trajectories. The apparent "spatial contagion" reported in preliminary analyses—where neighbor trajectories appeared 3.8x more predictive than a community's own history—was entirely due to temporal data leakage in our feature construction.
Objective: To test whether the spatial contagion effect was genuine or an artifact of improper temporal alignment (using future data to predict the future).
Methods: We analyzed 189,566 tract-year observations from 72,161 U.S. census tracts using CDC PLACES data (2020-2024). We compared leaked features (using year T data to predict year T outcomes) versus properly lagged features (using only year T-1 and earlier). We used permutation importance with bootstrap confidence intervals on held-out 2024 data.
Results: The spatial contagion effect vanished when we fixed the temporal alignment. With leaked features, neighbor change was 16.7x more important than own-tract change. With proper lagging, both had negligible importance (ratio: 1.12x, overlapping CIs). Critical context: even our best model achieves only F1=0.26—meaning no features we tested enable meaningful trajectory prediction.
Conclusions: Communities exhibit spatial synchrony (changing together due to shared labor markets, healthcare systems, and policies) but NOT predictive contagion. Don't expect health improvements to "spread" to neighbors—address root causes directly.
Keywords: spatial epidemiology ,temporal data leakage ,health trajectories ,methodological correction ,CDC PLACES ,spatial synchrony

Why We're Publishing This

We got it wrong, and that matters. Our preliminary analysis suggested a dramatic finding: that health changes "spread" across community boundaries with neighbor trajectories being 3.8x more predictive than a community's own history. This would have been a paradigm shift in public health intervention design.

It was wrong. What we found was an artifact of a methodological error called temporal data leakage—we accidentally used future data to predict the future. When we fixed the error, the "spatial contagion" effect vanished completely.

We're publishing this correction prominently because:

  • Science requires transparency about errors. Burying mistakes creates a misleading scientific record.
  • This error is easy to make. Others working with spatial health data may benefit from seeing how temporal leakage manifests.
  • The negative finding matters for policy. Interventions designed around "spillover effects" would waste resources if those effects don't exist.
  • Trust requires honesty. We want policymakers to trust our findings—and that means being upfront about what we got wrong.

1. Introduction

1.1 The Spatial Contagion Hypothesis

A compelling hypothesis in spatial epidemiology posits that health changes "spread" across community boundaries—that a community's health trajectory is influenced not just by its own characteristics but by the trajectories of its neighbors. This "spatial contagion" framing draws on social network research demonstrating the spread of behaviors through connected individuals and neighborhood effects theory suggesting that place-based exposures propagate across geographic boundaries.

Initial analyses of CDC PLACES tract-level health data appeared to support this hypothesis dramatically. Preliminary models suggested that the average health trajectory of neighboring census tracts was 3.8 times more predictive of a focal tract's future trajectory than the tract's own historical trend—a finding that, if valid, would represent a paradigm shift in how we conceptualize community health interventions.

1.2 The Problem of Temporal Leakage

However, this striking finding warranted methodological scrutiny. A critical question emerged: When computing "neighbor trajectory," what time period is used?

In predictive modeling, features must be constructed using only information available before the outcome period. If predicting health trajectory from year T-1 to year T, legitimate predictive features can only use data from years T-2, T-3, etc. Using data from year T in feature construction constitutes temporal data leakage—the model appears predictive because it "sees" contemporaneous information, not because it captures genuine predictive signal.

This distinction matters enormously for the spatial contagion hypothesis:

  • Contemporaneous correlation: Neighbors change together at the same time (spatial synchrony)
  • Predictive contagion: Prior neighbor change predicts future focal change

Only the second supports policy interventions targeting spatial spillovers.

2. Methods

2.1 Data Source and Sample

We analyzed CDC PLACES tract-level health estimates from 2020-2026 releases, harmonized to a consistent panel structure. The analytic sample comprised 189,566 tract-year observations from 72,161 unique census tracts across 51 states and territories, with prediction years 2022, 2023, and 2024.

2.2 Outcome Variable

We constructed a Composite Health Burden Index (CHBI) as a weighted average of seven PLACES measures: obesity (20%), diabetes (20%), coronary heart disease (15%), mental health days (15%), hypertension (10%), physical inactivity (10%), and physical health days (10%). Tracts were classified as DECLINE (>0.3 SD increase in CHBI), STABLE, or IMPROVE (<0.3 SD decrease).

2.3 Spatial Feature Construction

We built a Queen contiguity neighbor graph from Census TIGER/Line tract geometries (mean: 6.2 neighbors per tract). The critical methodological comparison involved two specifications:

Specification A: Contemporaneous (Leaked)

  • neighbor_avg_change: Mean neighbor CHBI change from year T-1 to year T
  • neighbor_avg_chbi: Mean neighbor CHBI in year T

Problem: Uses year T data to predict year T outcome

Specification B: Properly Lagged

  • neighbor_avg_change: Mean neighbor CHBI change from year T-2 to year T-1
  • neighbor_avg_chbi: Mean neighbor CHBI in year T-1

Correct: Uses only pre-outcome information

2.4 Model Training and Evaluation

We trained XGBoost classifiers with temporal cross-validation (train: 2022-2023; test: 2024). To assess feature importance, we used:

  1. Permutation importance on the holdout test set (10 repeats, 95% bootstrap CIs)
  2. Ablation experiments: Model performance with (a) all features, (b) no spatial features, (c) spatial features only
  3. Linear regression baseline comparing R² with and without spatial features

3. Results

3.1 The Temporal Leakage Effect

Table 1. Permutation Importance on Holdout Data (2024)

FeatureContemporaneous (Leaked)Properly Lagged
neighbor_avg_change0.0390 [0.037, 0.042]0.0010 [0.0009, 0.0012]
CHBI_change_1yr0.0023 [0.002, 0.003]0.0009 [0.0006, 0.0013]
Ratio16.7x1.12x
CIs Overlap?NoYes

With contemporaneous features, neighbor_avg_change appeared 16.7 times more important than own-tract change, with non-overlapping confidence intervals suggesting statistical significance. After correction, the ratio dropped to 1.12x with overlapping CIs—no significant difference.

3.2 Ablation Experiments

Table 2. Ablation Experiment Results (Macro-F1 with 95% Bootstrap CIs)

ModelContemporaneousProperly Lagged [95% CI]
Full model (all features)0.3200.260 [0.259, 0.261]
No spatial features0.2610.261 [0.260, 0.263]
Spatial features only0.4150.259 [0.258, 0.260]
Spatial contribution+18.2%-0.4%

With leaked features, spatial-only models (F1=0.415) dramatically outperformed the full model—an impossibility in proper machine learning that signals data leakage. After correction, confidence intervals for all three model specifications overlap, confirming no statistically significant difference.

3.3 Spatial Autocorrelation

Table 3. Global Moran's I with Significance Testing

VariableMoran's IZ-scorep-valueInterpretation
CHBI levels0.757254.0<0.001Very strong positive autocorrelation
Trajectory outcomes0.21169.7<0.001Moderate clustering

Both statistics are highly significant (p<0.001), confirming strong spatial structure in health outcomes. Communities near each other have similar health burdens (I=0.76) and somewhat similar trajectory outcomes (I=0.21). However, this spatial clustering does not translate to predictive contagion—knowing neighbor levels does not help predict focal changes.

4. Discussion

4.1 Principal Findings

Our central finding is methodological: the apparent "spatial contagion" in community health trajectories was an artifact of temporal data leakage.

When neighbor trajectory features were computed contemporaneously with the outcome period (year T-1 to T), they appeared 16.7 times more predictive than a community's own historical trend. After correcting to properly lagged features (year T-2 to T-1), this advantage vanished entirely.

This represents spatial synchrony, not spatial contagion. Communities near each other experience health changes at the same time—likely due to shared exposures, economic shocks, policy changes, or healthcare system factors—but prior neighbor trajectories do not predict future focal trajectories.

4.2 Why This Matters

The distinction between synchrony and contagion has critical policy implications:

If contagion were real:

  • Interventions in one community could spillover to neighbors
  • "Buffer zone" investments around improving areas would be justified
  • Regional intervention units would be more efficient than community-level

Given synchrony (our finding):

  • Apparent spatial patterns reflect shared causes, not causal spillovers
  • Targeting clusters may help efficiency but not due to propagation effects
  • Interventions must address root causes, not rely on geographic diffusion

5. Conclusions

We found no evidence that neighboring community health trajectories predict focal community trajectories beyond what the focal community's own history predicts. The striking "spatial contagion" finding from preliminary analyses was entirely attributable to temporal data leakage.

Communities exhibit spatial synchrony—they change together—but this reflects shared causes, not causal spillovers. Public health interventions cannot rely on geographic diffusion effects; they must address the root causes of community health trajectories directly.

This finding underscores the critical importance of methodological rigor in spatial health research. Apparent patterns must be tested against proper temporal specifications before informing policy.

Acknowledgments

We thank the anonymous peer reviewers whose rigorous critique identified the temporal leakage issue that fundamentally changed this paper's conclusions. This is how science is supposed to work.

Data Availability

All data are from publicly available CDC PLACES releases. Code for data processing, feature engineering, model training, and the methodological correction analysis is available at: github.com/cschuman/resilience-mapping