Spaces:

ibm-esa-geospatial
/

challenge

Running

App Files Files Community

From Hazard Detection to Risk Intelligence: TerraMind’s Path Toward Predictive Modeling

by donia-metaplanet - opened Sep 30

Discussion

donia-metaplanet

Sep 30

•

edited 14 days ago

Introduction

Natural hazards (such as wildfires, landslides, floods) are among the greatest threats to societies and ecosystems. Predicting these risks remains a daunting challenge. Traditional models are often tailored to specific regions and struggle to scale globally.

Today’s risk maps are frequently outdated or static. They often fail to account for climate change, urban expansion, or shifts in land use and soil properties, factors that can radically alter exposure to hazards. As a result, they become obsolete quickly. Satellite data, however, provide continually refreshed signals of changing conditions, allowing risk assessments to remain relevant to the threats communities face today.

To address these challenges, our framework follows a three-step strategy:

Hazard Detection (current focus): Lightweight binary segmentation decoders identify hazard footprints directly from satellite imagery. Each decoder is hazard-specific but built on a shared frozen backbone encoder that converts raw imagery into rich embeddings of the land surface.
Hazard Prediction (future direction): Detection outputs serve as ground-truth for models trained on temporal windows of pre-event imagery, enabling forecasts of where hazards are likely to occur.
Risk Assessment (long-term vision): Predictions are integrated with exposure and vulnerability data, turning hazard probabilities into actionable risk intelligence.

Both hazard detection and prediction models use TerraMind embeddings and Thinking-in-Modalities (TiM). Our framework applies most naturally to frequent hazards that leave repeated, observable signatures. These events supply the abundant training data required for machine learning. Rare events, in contrast, lack sufficient examples for effective predictive modeling.

Trained Models for Hazard Detection

At the heart of our framework lies a simple but powerful concept: specialized U-Net decoders trained for individual hazard types, all built on top of TerraMind’s frozen backbone. The frozen backbone acts as a feature extractor, while decoders specialize in detecting footprints such as flooded areas, burn scars, or unstable slopes.

For this phase, we focused on three hazards:

Floods: Water vs. non-water
Wildfires: Burned vs. unburned areas
Landslides: Recent landslide vs unchanged

All models were trained with TerraMind_v1_base. For the flood detection model, we additionally leveraged TiM (Thinking in Modalities). Training used Dice loss, ideal for imbalanced segmentation tasks, and AdamW for stable convergence.

Model Performance

Hazard Type	Dataset / Inputs	mIoU	F1 Score	Loss	Remarks
Floods	Sen1Floods11 (S1 GRD + S2 L1C)	0.884	0.936	0.131	Without TiM
Floods	Sen1Floods11 (S1 GRD + S2 L1C)	0.901	0.947	0.089	Improved with TiM LULC
Burn Scars	HLS Burn Scars (Landsat + Sentinel)	0.885	0.936	0.092	Surpassed benchmark (83.6 mIoU) [1]
Landslides	Landslide4Sense (S2 L2A + DEM)	0.662	0.751	0.276	Outperformed Landslide4Sense competition winner (F1 74.54%) [2]

Key Insights

Despite different modalities (radar, optical, DEMs), the backbone-decoder design remains robust.
Even with lightweight decoders, performance matches or surpasses existing benchmarks.
TiM contribution: Flood detection already benefited from TiM LULC features, and future work will further extend TiM integration to enhance wildfire and landslide models (including DEMs).

From Hazard Detection to Predictive Modeling

While current models identify past hazard footprints, their value extends far beyond event mapping [3]. Accurate segmentation enables the creation of large-scale databases of historical events. Running detection pipelines across broad regions produces a catalog of when and where hazards occurred, a foundation for predictive modeling.

The aim is not to predict exact event timing but to provide risk assessments: estimating the likelihood of hazards occurring within a defined temporal window. This is particularly valuable for insurance, disaster planning, and infrastructure resilience.

Predictive models ingest temporal sequences of pre-event imagery, learning patterns that consistently precede hazards. Recurrent U-Net variants or time-series convolutional models, fed with sliding windows of days to months of imagery, can estimate probabilities of hazard occurrence at pixel or regional scale.

Both detection and prediction remain modular, built on TerraMind’s frozen backbone. Detection decoders, temporal predictors, and future risk scorers all leverage the same foundational embeddings and are enhanced with Thinking-in-Modalities (TiM) to better exploit multimodal information. This creates a flexible “AI store” for geospatial risk intelligence where new models can be added seamlessly.

Hazard Dataset Generation

Scaling hazard detection globally requires systematic dataset generation. Below, we outline the steps that automate this process across different hazards and inputs.

Pipeline & Preprocessing

The pipeline begins with multimodal inputs: Sentinel-1 GRD, Sentinel-2 L1C/L2A, or DEMs undergoing cropping, normalization, and co-registration before embedding extraction. Each hazard has tailored preprocessing:

Flood Mapping (Sen1Floods11): Complex preprocessing to align Sentinel-1 and Sentinel-2 at pixel level.
Wildfires (Sentinel-2 L2A, HLS-equivalent): Resampled to 30 m resolution to match HLS training data.
Landslides (S2 L2A + DEM): Required co-registration of DEM and optical layers.

This modularity ensures domain-specific accuracy while maintaining backbone consistency.

Deployment

Dataset generation is driven by a simple query interface:

Inputs: region coordinates, temporal window, hazard type
Process: automatic data retrieval (via AWS registry), preprocessing, temporal sequence construction
Outputs: segmentation masks of flood extents, burn scars, or unstable slopes

These outputs serve two purposes: immediate event detection and population of the historical hazard archive that fuels predictive modeling.

Scalability and Generalization

A core strength of the architecture is scalability. New decoders can be added without retraining the entire system, allowing growth to new hazards or related geospatial tasks.

The framework also supports generalization. By relying on consistent embeddings across modalities, it adapts from detection (mapping past events) to prediction (anticipating the risk of future ones). This flexibility positions TerraMind as a data-driven compass for navigating risk, adaptable to emerging climate-driven threats.

Update since 1st of October

We have continued to expand the hazard-dataset generation pipeline and refine the detection models. Updated quantitative metrics will be shared once we surpass previous performance benchmarks. Below are two key developments since the beginning of October.

Operational Insights: Why storing TerraMind’s embeddings matters

Running our full pipeline on a new test region: the area surrounding the Usine hydroélectrique EDF de Salignac in France, highlighted a major structural advantage of using a foundation model.

Averaging across all runs, we observed that:

More than 95% of total processing time was spent on preprocessing
- 20.4%: data download
- 78.6%: co-registration
Less than 5% was spent on actual model inference.
This confirms the importance of investing in reusable embeddings. Once the encoder has produced embeddings for a region, any number of additional hazard-specific decoders can be applied with minimal additional cost. In practice, deploying the encoder once and reusing embeddings can reduce the total compute footprint by up to 95% when adding new detection models.

This reinforces the value of TerraMind as a foundation model. The key advantage is that these embeddings are fully reusable: once they are computed for a region, they can support not only flood detection but also any other hazard analysis such as wildfires or landslides without repeating the heavy preprocessing steps. Adding a new decoder therefore requires only a small fraction of the original computation time.

From Water Detection to True Flood Mapping

Running the flood model on the Salignac region (120 Sentinel-1 and Sentinel-2 pairs, January 2022 → October 2025) provided a second important insight: the model detects water, not floods. To extract actual flood events from these raw water detections, we applied a post-processing workflow to separate permanent water from unusual water presence.

Floods are defined by their exceptional nature. A pixel is considered flooded only if water appears where it is historically rare. For this study, we followed a simple and widely-used logic in global flood mapping: A pixel is considered flooded if its historical water frequency is below 5–10%

This thresholding approach is commonly used when water-depth data is unavailable. It avoids relying on static river or lake masks, which can be outdated or inaccurate.

This aligns with how flood datasets are defined operationally. For example, in the insurance sector, a flood is understood as: “The covering of normally dry land by water from rivers, lakes, the sea, or heavy precipitation, beyond typical seasonal fluctuations.”

Distinguishing normal water extent from abnormal expansion is therefore essential for any credible flood-risk product.

While no flood was detected here, minor overflow from the river is visible, but it is not considered a flood according to operational definitions.

Current Stage and Future Integration

Much of the workflow is already automated. Pipelines for floods, wildfires, and landslides are operational, enabling event mapping. Temporal sequence generation is streamlined, users specify only parameters, while the system manages acquisition, preprocessing, and decoding.

The risk prediction component has been conceptually designed but not yet trained. Immediate next steps are:

Scaling hazard detection to broader regions to build a comprehensive training database.
Improving training data fidelity by prioritizing higher-resolution satellite imagery. For example, in the burn scars model, we initially relied on harmonized Landsat–Sentinel (HLS, 30 m) data because it provided the ground-truth labels, but future work will instead use native Sentinel-2 L2A imagery (10 m). By superposing Sentinel-2 inputs onto the HLS-derived ground truth, we retain validated labels while benefiting from higher resolution.
Leveraging TiM (Thinking-in-Modalities): Flood detection already benefited from TiM LULC features, and future work will further extend TiM integration to enhance wildfire and landslide models.
Training and validating predictive models using the detection archive as ground truth.

Through this staged approach, TerraMind already delivers actionable hazard detection while evolving toward predictive, interactive risk intelligence.

Contact email

[email protected]

Models

donia-metaplanet/TerraMind-Blue-Sky-Challenge

References

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment