Spaces:
Paused
Paused
Refactor the code to support hf model module format & support grefcoco dataset
Browse filesFormer-commit-id: 3dbe02e218af71e0ca6bcb8b6220247a617bfdcb
- README.md +11 -3
- requirements.txt +7 -4
README.md
CHANGED
@@ -1,3 +1,6 @@
|
|
|
|
|
|
|
|
1 |
# LISA: Reasoning Segmentation via Large Language Model
|
2 |
|
3 |
<font size=7><div align='center'><b>LISA</b>: Large <b>L</b>anguage <b>I</b>nstructed <b>S</b>egmentation <b>A</b>ssistant</div></font>
|
@@ -69,14 +72,18 @@
|
|
69 |
<p align="center"> <img src="imgs/fig_overview.jpg" width="100%"> </p>
|
70 |
|
71 |
## News
|
|
|
72 |
- [x] [2023.8.23] Refactor code, and release new model [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1). welcome to check it out!
|
|
|
|
|
|
|
73 |
- [x] [2023.8.9] Training code is released!
|
74 |
- [x] [2023.8.4] [Online Demo](http://103.170.5.190:7860/) is released!
|
75 |
- [x] [2023.8.4] [*ReasonSeg* Dataset](https://drive.google.com/drive/folders/125mewyg5Ao6tZ3ZdJ-1-E3n04LGVELqy?usp=sharing) and the [LISA-13B-llama2-v0-explanatory](https://huggingface.co/xinlai/LISA-13B-llama2-v0-explanatory) model are released!
|
76 |
- [x] [2023.8.3] Inference code and the [LISA-13B-llama2-v0](https://huggingface.co/xinlai/LISA-13B-llama2-v0) model are released. Welcome to check out!
|
77 |
- [x] [2023.8.2] [Paper](https://arxiv.org/pdf/2308.00692.pdf) is released and GitHub repo is created.
|
78 |
|
79 |
-
**LISA: Reasoning Segmentation
|
80 |
[Xin Lai](https://scholar.google.com/citations?user=tqNDPA4AAAAJ&hl=zh-CN),
|
81 |
[Zhuotao Tian](https://scholar.google.com/citations?user=mEjhz-IAAAAJ&hl=en),
|
82 |
[Yukang Chen](https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en),
|
@@ -104,6 +111,7 @@ For more details, please refer to the [paper](https://arxiv.org/abs/2308.00692).
|
|
104 |
## Installation
|
105 |
```
|
106 |
pip install -r requirements.txt
|
|
|
107 |
```
|
108 |
|
109 |
## Training
|
@@ -116,7 +124,7 @@ The training data consists of 4 types of data:
|
|
116 |
|
117 |
3. Referring segmentation datasets: [refCOCO](https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip), [refCOCO+](https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip), [refCOCOg](https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip), [refCLEF](https://web.archive.org/web/20220413011817/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refclef.zip) ([saiapr_tc-12](https://web.archive.org/web/20220515000000/http://bvisionweb1.cs.unc.edu/licheng/referit/data/images/saiapr_tc-12.zip))
|
118 |
|
119 |
-
Note: the
|
120 |
|
121 |
4. Visual Question Answering dataset: [LLaVA-Instruct-150k](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_instruct_150k.json)
|
122 |
|
@@ -221,7 +229,7 @@ deepspeed --master_port=24999 train_ds.py \
|
|
221 |
|
222 |
## Inference
|
223 |
|
224 |
-
To chat with [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1) or [LISA-13B-llama2-v1-explanatory] (Coming Soon):
|
225 |
(Note that `chat.py` currently does not support `v0` models (i.e., `LISA-13B-llama2-v0` and `LISA-13B-llama2-v0-explanatory`), please first checkout to the legacy version repo `git checkout 0e26916` to use the `v0` models.)
|
226 |
```
|
227 |
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
|
|
|
1 |
+
[](http://103.170.5.190:7860/)
|
2 |
+
[](https://openxlab.org.cn/apps/detail/openxlab-app/LISA)
|
3 |
+
|
4 |
# LISA: Reasoning Segmentation via Large Language Model
|
5 |
|
6 |
<font size=7><div align='center'><b>LISA</b>: Large <b>L</b>anguage <b>I</b>nstructed <b>S</b>egmentation <b>A</b>ssistant</div></font>
|
|
|
72 |
<p align="center"> <img src="imgs/fig_overview.jpg" width="100%"> </p>
|
73 |
|
74 |
## News
|
75 |
+
<<<<<<< HEAD
|
76 |
- [x] [2023.8.23] Refactor code, and release new model [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1). welcome to check it out!
|
77 |
+
=======
|
78 |
+
- [x] [2023.8.14] Online demo of LISA is also in [OpenXLab apps](https://openxlab.org.cn/apps/detail/openxlab-app/LISA). Thanks for their support!
|
79 |
+
>>>>>>> 0e26916dff58c9f2eaad981f127e1171a7c1e3e0
|
80 |
- [x] [2023.8.9] Training code is released!
|
81 |
- [x] [2023.8.4] [Online Demo](http://103.170.5.190:7860/) is released!
|
82 |
- [x] [2023.8.4] [*ReasonSeg* Dataset](https://drive.google.com/drive/folders/125mewyg5Ao6tZ3ZdJ-1-E3n04LGVELqy?usp=sharing) and the [LISA-13B-llama2-v0-explanatory](https://huggingface.co/xinlai/LISA-13B-llama2-v0-explanatory) model are released!
|
83 |
- [x] [2023.8.3] Inference code and the [LISA-13B-llama2-v0](https://huggingface.co/xinlai/LISA-13B-llama2-v0) model are released. Welcome to check out!
|
84 |
- [x] [2023.8.2] [Paper](https://arxiv.org/pdf/2308.00692.pdf) is released and GitHub repo is created.
|
85 |
|
86 |
+
**LISA: Reasoning Segmentation via Large Language Model [[Paper](https://arxiv.org/abs/2308.00692)]** <br />
|
87 |
[Xin Lai](https://scholar.google.com/citations?user=tqNDPA4AAAAJ&hl=zh-CN),
|
88 |
[Zhuotao Tian](https://scholar.google.com/citations?user=mEjhz-IAAAAJ&hl=en),
|
89 |
[Yukang Chen](https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en),
|
|
|
111 |
## Installation
|
112 |
```
|
113 |
pip install -r requirements.txt
|
114 |
+
pip install flash-attn --no-build-isolation
|
115 |
```
|
116 |
|
117 |
## Training
|
|
|
124 |
|
125 |
3. Referring segmentation datasets: [refCOCO](https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip), [refCOCO+](https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip), [refCOCOg](https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip), [refCLEF](https://web.archive.org/web/20220413011817/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refclef.zip) ([saiapr_tc-12](https://web.archive.org/web/20220515000000/http://bvisionweb1.cs.unc.edu/licheng/referit/data/images/saiapr_tc-12.zip))
|
126 |
|
127 |
+
Note: the original links of refCOCO series data are down, and we update them with new ones. If the download speed is super slow or unstable, we also provide a [OneDrive link](https://mycuhk-my.sharepoint.com/:f:/g/personal/1155154502_link_cuhk_edu_hk/Em5yELVBvfREodKC94nOFLoBLro_LPxsOxNV44PHRWgLcA?e=zQPjsc) to download. **You must also follow the rules that the original datasets require.**
|
128 |
|
129 |
4. Visual Question Answering dataset: [LLaVA-Instruct-150k](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_instruct_150k.json)
|
130 |
|
|
|
229 |
|
230 |
## Inference
|
231 |
|
232 |
+
To chat with [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1) or [LISA-13B-llama2-v1-explanatory](https://huggingface.co/xinlai/LISA-13B-llama2-v1-explanatory) (Coming Soon):
|
233 |
(Note that `chat.py` currently does not support `v0` models (i.e., `LISA-13B-llama2-v0` and `LISA-13B-llama2-v0-explanatory`), please first checkout to the legacy version repo `git checkout 0e26916` to use the `v0` models.)
|
234 |
```
|
235 |
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
|
requirements.txt
CHANGED
@@ -1,6 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
einops==0.4.1
|
2 |
fastapi==0.100.1
|
3 |
-
flash_attn==2.0.4
|
4 |
gradio==3.39.0
|
5 |
markdown2==2.4.10
|
6 |
numpy==1.24.2
|
@@ -11,9 +16,7 @@ pycocotools==2.0.6
|
|
11 |
ray==2.6.1
|
12 |
Requests==2.31.0
|
13 |
shortuuid==1.0.11
|
14 |
-
torch==1.11.0+cu113
|
15 |
-
torchvision==0.12.0+cu113
|
16 |
tqdm==4.64.1
|
17 |
-
transformers==4.
|
18 |
uvicorn==0.23.2
|
19 |
bitsandbytes==0.41.1
|
|
|
1 |
+
--extra-index-url https://download.pytorch.org/whl/cu117
|
2 |
+
torch==2.0.1
|
3 |
+
torchvision==0.15.2
|
4 |
+
packaging
|
5 |
+
sentencepiece
|
6 |
+
peft==0.4.0
|
7 |
einops==0.4.1
|
8 |
fastapi==0.100.1
|
|
|
9 |
gradio==3.39.0
|
10 |
markdown2==2.4.10
|
11 |
numpy==1.24.2
|
|
|
16 |
ray==2.6.1
|
17 |
Requests==2.31.0
|
18 |
shortuuid==1.0.11
|
|
|
|
|
19 |
tqdm==4.64.1
|
20 |
+
transformers==4.31.0
|
21 |
uvicorn==0.23.2
|
22 |
bitsandbytes==0.41.1
|