Bridging Semantics and Geometry: A Decoupled LVLM–SAM Framework for Reasoning Segmentation in Remote Sensing

This is the 7B model of Think2Seg-RS, a decoupled framework for reasoning segmentation in remote sensing (RS) imagery.

Our core idea is to decouple high-level semantic reasoning from low-level geometric execution. Specifically, we train an LVLM prompter (e.g., Qwen-2.5-VL) to control a frozen Segment Anything Model (SAM2) via structured geometric prompts. Through a result-oriented reinforcement learning objective, the LVLM learns to translate abstract semantic reasoning into spatially grounded actions, achieving state-of-the-art performance on the EarthReason dataset.

For more details, code, and the complete framework, please visit our GitHub repository.

Downloads last month
13
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RicardoString/Think2Seg-RS-7B

Finetuned
(813)
this model
Quantizations
2 models

Dataset used to train RicardoString/Think2Seg-RS-7B