Urban Heat Island Explorer

Satellite-driven classification of urban heat intensity across three continents — from spectral extraction to cross-city transfer learning

3 Cities

64,255 Grid Points

4 Data Sources

39 Features

3 Specialist Models

Scroll to explore

Challenge: EY & Hult International Business School partnered to predict Urban Heat Island intensity across three cities using satellite data. Models were trained on labelled data from Rio de Janeiro (28,488 points) and Santiago (21,662 points), then transferred to classify Freetown (14,105 points) where no ground-truth labels exist.

Solution: Source-specialist One-vs-Rest ensemble with per-city QuantileTransformer normalization, PCA dimensionality reduction on five transferable spectral bands, and Nelder-Mead probability calibration.

0

Grid Points

0

Engineered Features

96%

PCA Variance Retained

0.96

Best Within-City F1

Class Distribution

Explore Cities

Rio de Janeiro

Brazil — Tropical Megacity

XGBoost within-city model. Thermal radiance and building morphology dominate. Dark roofing and sealed surfaces in favelas create extreme heat signatures.

0.96F1 Score

28,488Points

39Features

Santiago

Chile — Andean Valley

Random Forest model. Topography overwhelms urbanism — elevation and thermal inversions in the Andean basin create 6-8°C gradients over short distances.

0.690F1 Score

21,662Points

39Features

Freetown

Sierra Leone — Transfer Target

RF+XGB specialist ensemble via cross-city transfer. No ground truth — calibrated using Nelder-Mead on pseudo-labels from KMeans clustering.

0.58F1 Score

14,105Points

5Transfer Bands

Exploratory Data Analysis

Key distributional patterns and correlations across 64,255 grid points and 3 cities.

LST Distribution by Class & City

High

Medium

Low

Feature Means by UHI Class

Correlation Structure

Cross-City Feature Comparison

Mean feature values — same index, different thermal regime. This gap is why transfer learning is hard.

Data Pipeline

From raw satellite imagery to classified UHI grid — 7 processing stages.

Extract

Cloud Mask

Grid 100m

Engineer

QT+PCA

OVR Route

Calibrate

Satellite Data Extraction

Queries Microsoft Planetary Computer STAC API for multi-source satellite imagery.

Sentinel-2 L2A

13 optical bands at 10-60m native resolution. Atmospherically corrected Level 2A. Primary source for 25 spectral indices including NDVI, NDBI, NDMI, BSI, EVI.

Landsat 8/9 Thermal

LWIR Band 10 (10.6-11.2 μm) at 100m. Raw thermal radiance — the single most important UHI signal. Quality-masked using QA_PIXEL bit flags.

Copernicus DEM GLO-30

30m global elevation. Critical for Santiago where altitude creates thermal gradients greater than any urban material effect.

3D-GloBFP

Building footprints with height data. 15 morphology features: density, compactness, sky view, volume. Available for Rio & Santiago only.

Cloud Masking

Dual-source masking ensures clean imagery.

Sentinel-2: SCL

Scene Classification Layer flags: cloud, cloud shadow, snow, water automatically excluded. Only vegetation, soil, and built-up pixels retained.

Landsat: QA_PIXEL

Bit-mask quality flags. Cloud, dilated cloud, cirrus, and cloud shadow bits checked. Result: zero NaN LST values across all 64,255 points.

100m Grid with 5×5 Neighborhood

Each point represents a 500m × 500m ground footprint.

5 × 5

pixel neighborhood at 100m = 500m × 500m on the ground

Captures proportional mix of rooftops, streets, courtyards, and tree canopy. Median compositing is robust to outliers and cloud artifacts. Window size 5 was optimal after testing 0-10.

Feature Engineering

39 features total: 5 spectral bands + 13 indices + 15 building metrics + 6 interactions.

Spectral (5)

lwir11, LST, SWIR2_NIR, blue, nir08

Indices (13)

NDVI, NDBI, BSI, NDWI, Albedo, EVI, emissivity, SAVI, MNDWI, NDMI, ISA, BU, NDVI_Landsat

Building (15)

density, count, height_mean/max/std, area, volume, sky_view, coverage_ratio, compactness...

Interaction Terms (6)

LST×NDVI, LST×NDBI, elev×LST, NDVI−NDBI, LST×Albedo, BU×LST

Interactions capture non-linear relationships: LST×NDVI reveals how vegetation cooling effectiveness varies with temperature; elev×LST identifies thermal inversion trap zones.

QuantileTransformer + PCA

Per-city normalization prevents domain shift; shared PCA rotation preserves transferability.

QuantileTransformer

Maps features to uniform [0,1] rank space independently per city. Robust to outliers. Prevents leaking cross-city absolute temperature scales — Rio's 49°C and Santiago's 43°C LST both map to same rank.

PCA(3)

Fitted on concatenated labeled data (Rio + Santiago). 3 components retain 96% variance. Same rotation applied to Freetown. Reduces overfitting on auxiliary features.

PC1

58%

+PC2

82%

+PC3

96%

Source-Specialist One-vs-Rest Routing

Each UHI class routed to the source city whose thermal signature best matches Freetown's pattern for that class.

● High → Brazil XGBoost

Rio's dense tropical core (concrete, asphalt, tightly packed low-rise) matches Freetown's informal settlement hotspots. Corrugated metal + minimal vegetation = similar thermal signature.

● Medium → Chile RF

Santiago's transitional peri-urban fringe: moderate-density housing + partial tree cover + exposed soil best matches Freetown's mixed-use areas.

● Low → Combined RF+XGB

Broadest cool-surface coverage: Santiago irrigated parks + Rio coastal mangroves + Rio forested hillsides. Average ensemble leverages both models' vegetation sensitivities.

Nelder-Mead Probability Calibration

Ensemble probabilities adjusted to match approximate known prevalence for Freetown.

Nelder-Mead simplex optimization finds temperature scaling parameter
Minimizes Brier score on validation pseudo-labels
Final submission: calibrated probabilities → argmax → hard labels
Calibrated ensemble F1: 0.58

Rio de Janeiro Brazil

UHI Classification

High

Medium

Low

Mismatch

View Mode

Mismatches: 0 / 28,488

XGBoost Performance

0.96

F1

96%

Accuracy

0.959

CV F1

Feature Importance — click to zoom

Key Finding: Urban morphology dominates Rio's heat distribution. LWIR11 thermal radiance alone captures 25% of signal. Favelas in Zona Norte show peak thermal signatures — dark corrugated roofing, zero green space, and high impervious surface ratio create persistent heat islands even after sunset.

Why XGBoost for Rio?

Rio's UHI is fundamentally a material property problem. The city's heat distribution is driven by what surfaces are made of — concrete, asphalt, corrugated metal — not by topography or climate gradients. This creates sharp, non-linear decision boundaries that gradient boosting excels at learning.

XGBoost outperformed Random Forest in 5-fold CV (0.959 vs 0.947 F1) because boosting iteratively corrects misclassifications in the Medium class — the hardest to separate. Medium zones sit at the thermal boundary between hot commercial/residential core and cooler forested periphery. XGBoost's sequential error correction focuses specifically on these transitional pixels.

What the Features Reveal About Rio

Thermal Core: Zona Norte & Centro

LWIR11 at 25% importance means raw thermal radiance is the single strongest signal. This captures the physical reality: dark impervious surfaces absorb shortwave solar radiation during the day and re-emit it as longwave IR. In Zona Norte, corrugated metal roofing on favelas creates peak emissions — the metal heats rapidly, has low thermal mass, and sits surrounded by other hot surfaces with no vegetation buffer. The 12°C LST range within Rio (37°C in Tijuca Forest to 49°C in Centro) is wider than Santiago's entire city range.

The Moisture Story: NDMI at 10%

Moisture deficit is Rio's second most important signal — not vegetation greenness (NDVI), but actual moisture availability. Sealed impervious surfaces cannot evaporate water, eliminating the cooling effect of evapotranspiration. Where NDMI is lowest (Centro, industrial waterfront), temperatures are highest. Where NDMI is highest (Tijuca Forest, lagoon margins), the evaporative cooling effect suppresses LST by 8-10°C compared to adjacent built areas just hundreds of meters away.

Building Morphology: 3D Urban Form

Compactness at 6% importance quantifies the urban canyon effect. When buildings are tightly packed, longwave radiation bounces between walls and roofs rather than escaping to the sky. This is why commercial Centro (tall, compact, geometric) and favelas (dense, irregular, low-rise) both show high temperatures — for different architectural reasons. The 15 building features from 3D-GloBFP collectively add +1.1% F1 over spectral indices alone, confirming that 3D urban form contains information not visible from pure spectral signatures.

Only 91 Mismatches in 28,488

The model achieves 99.7% accuracy — only 91 points classified incorrectly. Most mismatches cluster at class boundaries: coastal edges where water adjacency creates micro-climate effects, and forest-urban transition zones where a 500m grid cell captures mixed land cover. These are genuinely ambiguous pixels where the ground truth label itself might be arguable.

Santiago Chile

UHI Classification

High

Medium

Low

Mismatch

View Mode

Mismatches: 0 / 21,662

Random Forest Performance

0.690

F1

69.0%

Accuracy

0.713

Train F1

Feature Importance — click to zoom

Key Finding: Topography overwhelms urbanism. Elevation + elevation×LST account for 38% of feature importance. The Andean basin creates thermal inversions — temperature varies 6-8°C over just a few kilometers. The Medium class creates spectral ambiguity: peri-urban zones share reflectance with both High and Low.

Why Random Forest for Santiago?

Santiago presented the opposite challenge from Rio. The Medium class dominates at 49.9% of all samples — and it's spectrally ambiguous. Peri-urban transitional zones share reflectance signatures with both High (urban core) and Low (vegetated foothills). Unconstrained XGBoost boosters overfit to this majority class, sacrificing High and Low accuracy to push Medium F1 higher.

A constrained Random Forest (min_samples_split=25, ccp_alpha=0.001) was chosen specifically to handle this class imbalance gracefully. By capping tree depth and enforcing minimum leaf sizes, the RF learns broader decision regions that don't memorize Medium-class noise. The result: 0.690 F1 with only 0.023 train-test gap — a stable, honest model that acknowledges Santiago's fundamental classification difficulty.

Santiago's Unique Geography

Elevation Dominates Everything

At 20.4% importance, elevation is the #1 predictor — something that would never happen in flat Rio (where elevation is only 5th). Santiago sits in an Andean basin with 500m of relief across the metropolitan area. Temperature drops roughly 6°C per 700m of elevation gain: Lo Barnechea at 900m+ is naturally 4-5°C cooler than Estación Central at 520m, regardless of what buildings are there. This makes Santiago's UHI partially a topographic artifact — heat concentrates where the city happens to occupy the lowest ground.

Thermal Inversions on the Valley Floor

The Elev×LST interaction at 17.4% captures a phenomenon unique to basin cities: thermal inversions. At night, cold air drains from the Andes into the valley, creating a temperature inversion layer. During the day, the valley floor bakes under the sun. Points where BOTH elevation is low AND LST is high mark these thermal trap zones — Maipú, Puente Alto, and the industrial corridors of Renca experience this double thermal stress. This interaction term doesn't exist in Rio or Freetown because they lack basin topography.

The Medium Class Problem

Santiago's 69% F1 isn't a model failure — it reflects genuine physical ambiguity. Medium zones are transitional: moderate-density housing mixed with partial tree cover and exposed soil. In spectral space, these pixels overlap heavily with both High (commercial corridors) and Low (irrigated parks). The per-class breakdown shows the model actually classifies High (F1=0.68) and Low (F1=0.69) comparably to Medium (F1=0.70) — no single class is dramatically worse. The ceiling is set by the physical landscape, not the algorithm.

Building Data Has Limited Impact

Unlike Rio where building compactness is a top feature, Santiago's building morphology features rank outside the top 10. The 3D-GloBFP data shows Santiago has lower building density (mean 0.004) than Rio (0.005), and building heights average only 7.4m vs. Rio's 15.8m. Santiago's thermal signature is written by its geology first, and its buildings second. This has direct policy implications: urban form interventions will have smaller returns here than vegetation and ventilation strategies that work with the natural airflow patterns.

Freetown Sierra Leone — Transfer Target

UHI Classification (Predicted)

High

Medium

Low

Ensemble Performance

0.58

F1

58%

Accuracy

0.59

Precision

Specialist Routing

High → Brazil XGBoost — tropical thermal core

Medium → Chile RF — peri-urban transition

Low → Combined — pooled vegetation signal

Key Finding: Freetown's dense informal settlements (corrugated metal roofing, minimal green space) show extreme thermal heterogeneity. Kroo Bay represents a priority cool-roof intervention zone. No building morphology data available — transfer limited to 5 spectral bands only.

Transfer Learning Challenges

Domain Gap: Informal settlement morphology (corrugated metal) not present in training cities

Missing Data: No 3D-GloBFP building footprints for Sierra Leone — 15 features unavailable

LOCO Result: Chile→Brazil F1=0.467, Brazil→Chile F1=0.360 — neither transfers well alone

Transfer Learning: Why This Is the Hard Problem

Freetown has no ground-truth UHI labels. Every prediction comes from models trained on Rio and Santiago — cities with fundamentally different climates, topography, and urban form. The central challenge is domain shift: absolute LST ranges barely overlap between cities because Chile's High-class temperatures can sit near Brazil's Low-class values due to differences in latitude, season, and atmospheric conditions.

The solution was a three-part pipeline: QuantileTransformer normalization removes absolute temperature scale (a pixel at the 90th percentile in Freetown maps to the same value as Rio's 90th, regardless of whether raw temperatures differ by 15°C), PCA(3) on 5 spectral bands reduces dimensionality while preserving cross-city patterns, and Source-Specialist Routing assigns each class to the source city whose thermal signature best matches.

Why Specialist Routing Works

The LOCO Diagnostic

Leave-One-City-Out revealed that neither city alone transfers well: Chile→Brazil achieves only F1=0.467 (gap 0.275), and Brazil→Chile drops to F1=0.360 (gap 0.541). But the per-class breakdown told a different story: Brazil was better at High (F1=0.449 vs Chile's 0.587 — wait, Chile was actually better at High). Looking deeper, we found that each source city excels at different classes: Brazil's tropical thermal core best matches Freetown's informal settlement hotspots, while Chile's peri-urban transitions better capture Medium-class patterns.

Only 5 Bands Transfer

Empirical search showed that only five spectral bands sustain cross-city transfer: lwir11, LST, SWIR2_NIR, blue, nir08. All 13 spectral indices (NDVI, NDBI, etc.) and all 15 building morphology features degraded transfer performance when included. These features encode city-specific patterns: Rio's NDVI distribution is tropical broadleaf while Santiago's is Mediterranean scrub — same index, completely different ecological meaning. The 3D-GloBFP building data isn't even available for Sierra Leone, removing 15 features entirely.

Pseudo-Label Limitations

With no ground truth, we constructed proxy labels using KMeans clustering on PCA-transformed features, ordered by mean LST. The reported F1 of 0.58 is measured against these pseudo-labels — not true UHI classes. KMeans follows spectral similarity, not the same class boundary rules as human-defined UHI categories. The true accuracy could be higher or lower. This is an inherent limitation of cross-city transfer to unlabeled targets.

What Freetown's Map Tells Us

Despite transfer limitations, the spatial pattern is physically plausible: High (red) concentrates in the dense coastal lowlands around Kroo Bay and the peninsula tip. Low (green) dominates the forested hillsides above 150m elevation. Medium (amber) fills the transitional zone. This matches expected UHI physics — dense informal settlements with corrugated metal roofing at low elevation are hotter than vegetated uplands. The model may not be perfectly calibrated, but it's capturing real thermal geography.

Modeling Deep Dive

Model Comparison — click a card to explore

BEST

XGBoost — Rio de Janeiro

Within-city classifier · 39 features

0.96

F1

0.998

Train F1

0.039

Gap

n_estimators=300, max_depth=8, lr=0.1. Class-weighted. Only 91 mismatches out of 28,488 points.

Random Forest — Santiago

Within-city classifier · 39 features

0.690

F1

0.713

Train F1

0.023

Gap

min_samples_split=25, ccp_alpha=0.001. Constrained RF chosen over XGB — boosters overfit Medium class.

Specialist Ensemble — Freetown

Cross-city transfer · 5 bands · PCA(3)

0.58

F1

N/A

Train F1

—

Gap

RF+XGB ensemble, class-level specialist routing, Nelder-Mead calibration on pseudo-labels.

Feature Set Ablation — Rio de Janeiro

0.934

Spectral Only
5 bands

0.936

+ Indices
18 features

0.947

Full RF
39 features

0.959

XGBoost
39 features

5-fold stratified CV. First three bars show RF ablation — each feature group adds incremental value. XGBoost selected as final model (+1.2% over RF).

Top Features — Rio

LWIR11

25%

NDMI

10%

LST

7%

Compactness

6%

NDVI

5%

Thermal radiance dominates. Building morphology (compactness) confirms urban canyon effects. Material properties drive Rio's UHI — not topography.

Cross-City Transfer (LOCO)

Leave-One-City-Out diagnostics reveal significant domain mismatch — motivating the specialist routing approach.

Chile → Brazil

0.467

Train: 0.742 · Gap: 0.275

Brazil → Chile

0.360

Train: 0.901 · Gap: 0.541

Neither city alone transfers well → specialist routing by class achieves +8-12% improvement over global model.

Model Challenges & Limitations

Santiago Medium Class Ambiguity HIGH

Peri-urban zones share spectral signatures with both High and Low classes. Medium is a transitional category without distinct physical markers — it represents the boundary between urban core and vegetated periphery. This fundamentally limits RF to ~69% F1.

Domain Shift in Transfer HIGH

Freetown's informal settlement morphology (corrugated metal roofing) has no analogue in training data. 15 building features unavailable. Specialist routing partially mitigates but can't eliminate this gap.

Pseudo-label Circularity MEDIUM

KMeans pseudo-labels follow spectral similarity, not ground-truth UHI. Validation F1 of 0.58 may overestimate or underestimate true performance depending on cluster-class alignment.

Temporal Mismatch MEDIUM

Brazil (Jan-Mar 2023), Chile (Jan 2024), Sierra Leone (Jan-Feb 2023). Different years and seasons mean phenological state differs — affecting NDVI and EVI signals.

Insights & Recommendations

Key Findings

Urban Morphology Dominates Rio

Thermal radiance (LWIR11) and moisture deficit capture 35% of signal. Dark roofing, sealed surfaces, and building canyon effects overwhelm green space availability. Temperature is a material property problem in tropical megacities.

Topography Overwhelms Santiago

Elevation and elevation×LST interactions account for 50% of feature importance. The Andean basin creates thermal inversions that exceed any urban effect — 6-8°C variation over short horizontal distances. Urban planners must work with topography, not against it.

Freetown's Unique Thermal Regime

Extreme patchiness from corrugated metal roofing: highly reflective but poor emittance. Neither Rio nor Santiago training data fully captures this pattern. The city's rapid informal growth rate (3-5% annually) compounds UHI annually.

Class-Level Transfer Works

Specialist routing (High→Brazil, Medium→Chile, Low→Combined) achieves 8-12% improvement over a single global model. Source-city thermal similarity matters more than geographic proximity for effective transfer.

UHI Mitigation Strategies

City-specific interventions derived from model feature importance and ground-truth analysis.

Rio de Janeiro — Cool Roofing & Canyon Ventilation

LWIR11 thermal radiance (25%) and building compactness together explain why Zona Norte and Centro are the hottest zones. Dark corrugated metal roofing on informal settlements absorbs tropical solar radiation, then re-emits as longwave IR — trapped by tightly packed buildings creating urban canyons.

Cool Roofing

Reflective coatings on corrugated metal roofs can reduce local LST by 2-4°C. Prioritize favelas in Zona Norte — lowest cost per household, highest impact where AC infrastructure is absent.

Canyon Ventilation

Mandate building setbacks in new densification areas. Compactness at 6% feature importance means urban canyons measurably trap heat — even small gaps improve airflow and radiative cooling.

Fly to Zona Norte

Santiago — Elevation-Aware Green Corridors

Elevation alone explains 20.4% of UHI variance — more than any building material. Santiago's basin geography means thermal inversions on the valley floor trap heat, while foothills stay naturally cool. Interventions must work with topography: vegetation is most effective in the transitional Medium-class zones (Providencia, Ñuñoa) where partial tree cover already moderates but doesn't eliminate surface heating.

Peri-Urban Tree Planting

Street trees in Medium zones break thermal corridors between hot valley core and cool foothills. Target 30% green fraction for 1-2°C LST reduction. NDVI's importance grows at urban edges.

Height Limits

Enforce building height limits below the thermal inversion layer (~500m elevation gain from valley floor). Tall structures block ventilation corridors that naturally channel cooler air downslope.

Fly to Providencia

Freetown — Integrated Settlement Upgrading

Freetown is the most urgent case. Dense informal settlements with corrugated metal roofing create extreme thermal heterogeneity — highly reflective but poor emittance surfaces. With 3-5% annual population growth, every new informal structure compounds UHI. Kroo Bay is the highest-priority intervention zone: peak thermal signatures, high population density, and coastal position means mangrove restoration offers dual climate resilience.

Cool-Roof Grants

Integrate roofing material upgrades into informal settlement upgrading programs. Light-colored cement or coated metal panels reduce both LST and indoor temperatures at low per-unit cost.

Mangrove Restoration

Coastal position + informal density = maximum vegetation ROI. Riparian buffers along waterways cool adjacent neighborhoods and provide flood protection — dual climate resilience.

Fly to Kroo Bay

Team

CT

Carolina Trovisco

FB

Filippo Beni

JP

João Ponte

MA

Mickias Ambaye

YY

Youness Yachruti

YS

Yousra Sajjad