Key distributional patterns and correlations across 64,255 grid points and 3 cities.
Mean feature values — same index, different thermal regime. This gap is why transfer learning is hard.
From raw satellite imagery to classified UHI grid — 7 processing stages.
Queries Microsoft Planetary Computer STAC API for multi-source satellite imagery.
13 optical bands at 10-60m native resolution. Atmospherically corrected Level 2A. Primary source for 25 spectral indices including NDVI, NDBI, NDMI, BSI, EVI.
LWIR Band 10 (10.6-11.2 μm) at 100m. Raw thermal radiance — the single most important UHI signal. Quality-masked using QA_PIXEL bit flags.
30m global elevation. Critical for Santiago where altitude creates thermal gradients greater than any urban material effect.
Building footprints with height data. 15 morphology features: density, compactness, sky view, volume. Available for Rio & Santiago only.
Dual-source masking ensures clean imagery.
Scene Classification Layer flags: cloud, cloud shadow, snow, water automatically excluded. Only vegetation, soil, and built-up pixels retained.
Bit-mask quality flags. Cloud, dilated cloud, cirrus, and cloud shadow bits checked. Result: zero NaN LST values across all 64,255 points.
Each point represents a 500m × 500m ground footprint.
Captures proportional mix of rooftops, streets, courtyards, and tree canopy. Median compositing is robust to outliers and cloud artifacts. Window size 5 was optimal after testing 0-10.
39 features total: 5 spectral bands + 13 indices + 15 building metrics + 6 interactions.
lwir11, LST, SWIR2_NIR, blue, nir08
NDVI, NDBI, BSI, NDWI, Albedo, EVI, emissivity, SAVI, MNDWI, NDMI, ISA, BU, NDVI_Landsat
density, count, height_mean/max/std, area, volume, sky_view, coverage_ratio, compactness...
LST×NDVI, LST×NDBI, elev×LST, NDVI−NDBI, LST×Albedo, BU×LST
Interactions capture non-linear relationships: LST×NDVI reveals how vegetation cooling effectiveness varies with temperature; elev×LST identifies thermal inversion trap zones.
Per-city normalization prevents domain shift; shared PCA rotation preserves transferability.
Maps features to uniform [0,1] rank space independently per city. Robust to outliers. Prevents leaking cross-city absolute temperature scales — Rio's 49°C and Santiago's 43°C LST both map to same rank.
Fitted on concatenated labeled data (Rio + Santiago). 3 components retain 96% variance. Same rotation applied to Freetown. Reduces overfitting on auxiliary features.
Each UHI class routed to the source city whose thermal signature best matches Freetown's pattern for that class.
Rio's dense tropical core (concrete, asphalt, tightly packed low-rise) matches Freetown's informal settlement hotspots. Corrugated metal + minimal vegetation = similar thermal signature.
Santiago's transitional peri-urban fringe: moderate-density housing + partial tree cover + exposed soil best matches Freetown's mixed-use areas.
Broadest cool-surface coverage: Santiago irrigated parks + Rio coastal mangroves + Rio forested hillsides. Average ensemble leverages both models' vegetation sensitivities.
Ensemble probabilities adjusted to match approximate known prevalence for Freetown.
Rio's UHI is fundamentally a material property problem. The city's heat distribution is driven by what surfaces are made of — concrete, asphalt, corrugated metal — not by topography or climate gradients. This creates sharp, non-linear decision boundaries that gradient boosting excels at learning.
XGBoost outperformed Random Forest in 5-fold CV (0.959 vs 0.947 F1) because boosting iteratively corrects misclassifications in the Medium class — the hardest to separate. Medium zones sit at the thermal boundary between hot commercial/residential core and cooler forested periphery. XGBoost's sequential error correction focuses specifically on these transitional pixels.
LWIR11 at 25% importance means raw thermal radiance is the single strongest signal. This captures the physical reality: dark impervious surfaces absorb shortwave solar radiation during the day and re-emit it as longwave IR. In Zona Norte, corrugated metal roofing on favelas creates peak emissions — the metal heats rapidly, has low thermal mass, and sits surrounded by other hot surfaces with no vegetation buffer. The 12°C LST range within Rio (37°C in Tijuca Forest to 49°C in Centro) is wider than Santiago's entire city range.
Moisture deficit is Rio's second most important signal — not vegetation greenness (NDVI), but actual moisture availability. Sealed impervious surfaces cannot evaporate water, eliminating the cooling effect of evapotranspiration. Where NDMI is lowest (Centro, industrial waterfront), temperatures are highest. Where NDMI is highest (Tijuca Forest, lagoon margins), the evaporative cooling effect suppresses LST by 8-10°C compared to adjacent built areas just hundreds of meters away.
Compactness at 6% importance quantifies the urban canyon effect. When buildings are tightly packed, longwave radiation bounces between walls and roofs rather than escaping to the sky. This is why commercial Centro (tall, compact, geometric) and favelas (dense, irregular, low-rise) both show high temperatures — for different architectural reasons. The 15 building features from 3D-GloBFP collectively add +1.1% F1 over spectral indices alone, confirming that 3D urban form contains information not visible from pure spectral signatures.
The model achieves 99.7% accuracy — only 91 points classified incorrectly. Most mismatches cluster at class boundaries: coastal edges where water adjacency creates micro-climate effects, and forest-urban transition zones where a 500m grid cell captures mixed land cover. These are genuinely ambiguous pixels where the ground truth label itself might be arguable.
Santiago presented the opposite challenge from Rio. The Medium class dominates at 49.9% of all samples — and it's spectrally ambiguous. Peri-urban transitional zones share reflectance signatures with both High (urban core) and Low (vegetated foothills). Unconstrained XGBoost boosters overfit to this majority class, sacrificing High and Low accuracy to push Medium F1 higher.
A constrained Random Forest (min_samples_split=25, ccp_alpha=0.001) was chosen specifically to handle this class imbalance gracefully. By capping tree depth and enforcing minimum leaf sizes, the RF learns broader decision regions that don't memorize Medium-class noise. The result: 0.690 F1 with only 0.023 train-test gap — a stable, honest model that acknowledges Santiago's fundamental classification difficulty.
At 20.4% importance, elevation is the #1 predictor — something that would never happen in flat Rio (where elevation is only 5th). Santiago sits in an Andean basin with 500m of relief across the metropolitan area. Temperature drops roughly 6°C per 700m of elevation gain: Lo Barnechea at 900m+ is naturally 4-5°C cooler than Estación Central at 520m, regardless of what buildings are there. This makes Santiago's UHI partially a topographic artifact — heat concentrates where the city happens to occupy the lowest ground.
The Elev×LST interaction at 17.4% captures a phenomenon unique to basin cities: thermal inversions. At night, cold air drains from the Andes into the valley, creating a temperature inversion layer. During the day, the valley floor bakes under the sun. Points where BOTH elevation is low AND LST is high mark these thermal trap zones — Maipú, Puente Alto, and the industrial corridors of Renca experience this double thermal stress. This interaction term doesn't exist in Rio or Freetown because they lack basin topography.
Santiago's 69% F1 isn't a model failure — it reflects genuine physical ambiguity. Medium zones are transitional: moderate-density housing mixed with partial tree cover and exposed soil. In spectral space, these pixels overlap heavily with both High (commercial corridors) and Low (irrigated parks). The per-class breakdown shows the model actually classifies High (F1=0.68) and Low (F1=0.69) comparably to Medium (F1=0.70) — no single class is dramatically worse. The ceiling is set by the physical landscape, not the algorithm.
Unlike Rio where building compactness is a top feature, Santiago's building morphology features rank outside the top 10. The 3D-GloBFP data shows Santiago has lower building density (mean 0.004) than Rio (0.005), and building heights average only 7.4m vs. Rio's 15.8m. Santiago's thermal signature is written by its geology first, and its buildings second. This has direct policy implications: urban form interventions will have smaller returns here than vegetation and ventilation strategies that work with the natural airflow patterns.
Freetown has no ground-truth UHI labels. Every prediction comes from models trained on Rio and Santiago — cities with fundamentally different climates, topography, and urban form. The central challenge is domain shift: absolute LST ranges barely overlap between cities because Chile's High-class temperatures can sit near Brazil's Low-class values due to differences in latitude, season, and atmospheric conditions.
The solution was a three-part pipeline: QuantileTransformer normalization removes absolute temperature scale (a pixel at the 90th percentile in Freetown maps to the same value as Rio's 90th, regardless of whether raw temperatures differ by 15°C), PCA(3) on 5 spectral bands reduces dimensionality while preserving cross-city patterns, and Source-Specialist Routing assigns each class to the source city whose thermal signature best matches.
Leave-One-City-Out revealed that neither city alone transfers well: Chile→Brazil achieves only F1=0.467 (gap 0.275), and Brazil→Chile drops to F1=0.360 (gap 0.541). But the per-class breakdown told a different story: Brazil was better at High (F1=0.449 vs Chile's 0.587 — wait, Chile was actually better at High). Looking deeper, we found that each source city excels at different classes: Brazil's tropical thermal core best matches Freetown's informal settlement hotspots, while Chile's peri-urban transitions better capture Medium-class patterns.
Empirical search showed that only five spectral bands sustain cross-city transfer: lwir11, LST, SWIR2_NIR, blue, nir08. All 13 spectral indices (NDVI, NDBI, etc.) and all 15 building morphology features degraded transfer performance when included. These features encode city-specific patterns: Rio's NDVI distribution is tropical broadleaf while Santiago's is Mediterranean scrub — same index, completely different ecological meaning. The 3D-GloBFP building data isn't even available for Sierra Leone, removing 15 features entirely.
With no ground truth, we constructed proxy labels using KMeans clustering on PCA-transformed features, ordered by mean LST. The reported F1 of 0.58 is measured against these pseudo-labels — not true UHI classes. KMeans follows spectral similarity, not the same class boundary rules as human-defined UHI categories. The true accuracy could be higher or lower. This is an inherent limitation of cross-city transfer to unlabeled targets.
Despite transfer limitations, the spatial pattern is physically plausible: High (red) concentrates in the dense coastal lowlands around Kroo Bay and the peninsula tip. Low (green) dominates the forested hillsides above 150m elevation. Medium (amber) fills the transitional zone. This matches expected UHI physics — dense informal settlements with corrugated metal roofing at low elevation are hotter than vegetated uplands. The model may not be perfectly calibrated, but it's capturing real thermal geography.
n_estimators=300, max_depth=8, lr=0.1. Class-weighted. Only 91 mismatches out of 28,488 points.
min_samples_split=25, ccp_alpha=0.001. Constrained RF chosen over XGB — boosters overfit Medium class.
RF+XGB ensemble, class-level specialist routing, Nelder-Mead calibration on pseudo-labels.
5-fold stratified CV. First three bars show RF ablation — each feature group adds incremental value. XGBoost selected as final model (+1.2% over RF).
Thermal radiance dominates. Building morphology (compactness) confirms urban canyon effects. Material properties drive Rio's UHI — not topography.
Leave-One-City-Out diagnostics reveal significant domain mismatch — motivating the specialist routing approach.
Neither city alone transfers well → specialist routing by class achieves +8-12% improvement over global model.
Peri-urban zones share spectral signatures with both High and Low classes. Medium is a transitional category without distinct physical markers — it represents the boundary between urban core and vegetated periphery. This fundamentally limits RF to ~69% F1.
Freetown's informal settlement morphology (corrugated metal roofing) has no analogue in training data. 15 building features unavailable. Specialist routing partially mitigates but can't eliminate this gap.
KMeans pseudo-labels follow spectral similarity, not ground-truth UHI. Validation F1 of 0.58 may overestimate or underestimate true performance depending on cluster-class alignment.
Brazil (Jan-Mar 2023), Chile (Jan 2024), Sierra Leone (Jan-Feb 2023). Different years and seasons mean phenological state differs — affecting NDVI and EVI signals.
Thermal radiance (LWIR11) and moisture deficit capture 35% of signal. Dark roofing, sealed surfaces, and building canyon effects overwhelm green space availability. Temperature is a material property problem in tropical megacities.
Elevation and elevation×LST interactions account for 50% of feature importance. The Andean basin creates thermal inversions that exceed any urban effect — 6-8°C variation over short horizontal distances. Urban planners must work with topography, not against it.
Extreme patchiness from corrugated metal roofing: highly reflective but poor emittance. Neither Rio nor Santiago training data fully captures this pattern. The city's rapid informal growth rate (3-5% annually) compounds UHI annually.
Specialist routing (High→Brazil, Medium→Chile, Low→Combined) achieves 8-12% improvement over a single global model. Source-city thermal similarity matters more than geographic proximity for effective transfer.
City-specific interventions derived from model feature importance and ground-truth analysis.