lcybuaa commited on
Commit
63488e0
·
verified ·
1 Parent(s): 1c81950

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -3
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - Vision
5
+ - Multi-model
6
+ - Vision-Language
7
+ - Remote-sensing
8
+ widget:
9
+ - src: >-
10
+ https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png
11
+ candidate_labels: playing music, playing sports
12
+ example_title: Cat & Dog
13
+ ---
14
+
15
+ # Git-RSCLIP-base
16
+
17
+ [Git-RSCLIP](https://arxiv.org/pdf/2501.00895) is pre-trained on the Git-10M dataset (a global-scale remote sensing image-text pair dataset, consisting of 10 million image-text pairs) at size 256x256, first released in [this repository](https://github.com/chen-yang-liu/Text2Earth). It employs a similar structure to [[google/siglip-base-patch16-224](https://huggingface.co/google/siglip-base-patch16-224)].
18
+
19
+ This is a **base version**, the **large version** is here: [[**Git-RSCLIP-large**](https://huggingface.co/lcybuaa/Git-RSCLIP)]
20
+
21
+ ## Intended uses & limitations
22
+
23
+ You can use the raw model for tasks like zero-shot image classification and image-text retrieval.
24
+
25
+
26
+ ### How to use
27
+
28
+ #### Use Git-RSCLIP to get image features
29
+
30
+ ```python
31
+ from PIL import Image
32
+ import requests
33
+ from transformers import AutoProcessor, AutoModel
34
+ import torch
35
+
36
+ model = AutoModel.from_pretrained("lcybuaa/Git-RSCLIP")
37
+ processor = AutoProcessor.from_pretrained("lcybuaa/Git-RSCLIP")
38
+
39
+ url = "https://github.com/Chen-Yang-Liu/PromptCC/blob/main/Example/B/train_000051.png?raw=true"
40
+ image = Image.open(requests.get(url, stream=True).raw)
41
+
42
+ inputs = processor(images=image, return_tensors="pt")
43
+
44
+ with torch.no_grad():
45
+ image_features = model.get_image_features(**inputs)
46
+ ```
47
+
48
+
49
+ #### zero-shot image classification:
50
+
51
+ ```python
52
+ from PIL import Image
53
+ import requests
54
+ from transformers import AutoProcessor, AutoModel
55
+ import torch
56
+
57
+ model = AutoModel.from_pretrained("lcybuaa/Git-RSCLIP")
58
+ processor = AutoProcessor.from_pretrained("lcybuaa/Git-RSCLIP")
59
+
60
+ url = "https://github.com/Chen-Yang-Liu/PromptCC/blob/main/Example/B/train_000051.png?raw=true"
61
+ image = Image.open(requests.get(url, stream=True).raw)
62
+
63
+ texts = ["a remote sensing image of river", "a remote sensing image of houses and roads"]
64
+ inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")
65
+
66
+ with torch.no_grad():
67
+ outputs = model(**inputs)
68
+
69
+ logits_per_image = outputs.logits_per_image
70
+ probs = torch.sigmoid(logits_per_image) # these are the probabilities
71
+ top5_indices = torch.argsort(probs, descending=True)[:, :5].cpu().numpy()
72
+ top1_indices = top5_indices[:, 0]
73
+ print(f"the image 0 is '{top1_indices[0]}'")
74
+ ```
75
+
76
+ For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/siglip.html#).
77
+
78
+
79
+ ## Training procedure
80
+
81
+ ### Training data
82
+
83
+ Git-RSCLIP is pre-trained on the Git-10M dataset (a global-scale remote sensing image-text pair dataset, consisting of 10 million image-text pairs) [(Liu et al., 2024)](https://github.com/chen-yang-liu/Text2Earth).
84
+
85
+ ### Preprocessing
86
+
87
+ Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
88
+
89
+ Texts are tokenized and padded to the same length (64 tokens).
90
+
91
+
92
+ ### BibTeX entry and citation info
93
+
94
+ ```bibtex
95
+ @misc{liu2025text2earthunlockingtextdrivenremote,
96
+ title={Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model},
97
+ author={Chenyang Liu and Keyan Chen and Rui Zhao and Zhengxia Zou and Zhenwei Shi},
98
+ year={2025},
99
+ eprint={2501.00895},
100
+ archivePrefix={arXiv},
101
+ primaryClass={cs.CV},
102
+ url={https://arxiv.org/abs/2501.00895},
103
+ }
104
+ ```