File size: 1,539 Bytes
6820897
1209b38
6820897
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f12ae79
6820897
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: creativeml-openrail-m
datasets:
- DarthReca/crisislandmark
language:
- en
library_name: torchgeo
tags:
- remote-sensing
- text-to-image-retrieval
- multimodal
- geospatial
- SAR
- multispectral
- crisis-management
- earth-observation
- contrastive-learning
---
# CLOSP

CLOSP (Contrastive Language Optical SAR Pretraining) is a multimodal architecture designed for text-to-image retrieval. 
It creates a unified embedding space for text, Sentinel-2 (MSI), and Sentinel-1 (SAR) data.

This repository contains all the separate visual encoders in PyTorch format.

## Model Details
The model uses three separate encoders: one for text, one for Sentinel-1 (SAR) data, and one for Sentinel-2 (MSI) data. 
During training, it uses a contrastive objective to align the textual embeddings with the corresponding visual embeddings (either SAR or MSI).


- **Developed by:** Daniele Rege Cambrin
- **Model type:** CLOSP
- **Language(s) (NLP):** english
- **License:** CreativeML-OpenRAIL-M
- **Repository:** [GitHub](https://github.com/DarthReca/closp)
- **Paper:** [ArXiv](https://arxiv.org/abs/2507.10403)

## Citation

```bibtex
@misc{cambrin2025texttoremotesensingimageretrievalrgbsources,
      title={Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources}, 
      author={Daniele Rege Cambrin and Lorenzo Vaiani and Giuseppe Gallipoli and Luca Cagliero and Paolo Garza},
      year={2025},
      eprint={2507.10403},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.10403}, 
}
```