X-Lai commited on
Commit
a819952
·
2 Parent(s): e5c9ee0 4430259

Refactor the code to support hf model module format & support grefcoco dataset

Browse files

Former-commit-id: 3dbe02e218af71e0ca6bcb8b6220247a617bfdcb

Files changed (2) hide show
  1. README.md +11 -3
  2. requirements.txt +7 -4
README.md CHANGED
@@ -1,3 +1,6 @@
 
 
 
1
  # LISA: Reasoning Segmentation via Large Language Model
2
 
3
  <font size=7><div align='center'><b>LISA</b>: Large <b>L</b>anguage <b>I</b>nstructed <b>S</b>egmentation <b>A</b>ssistant</div></font>
@@ -69,14 +72,18 @@
69
  <p align="center"> <img src="imgs/fig_overview.jpg" width="100%"> </p>
70
 
71
  ## News
 
72
  - [x] [2023.8.23] Refactor code, and release new model [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1). welcome to check it out!
 
 
 
73
  - [x] [2023.8.9] Training code is released!
74
  - [x] [2023.8.4] [Online Demo](http://103.170.5.190:7860/) is released!
75
  - [x] [2023.8.4] [*ReasonSeg* Dataset](https://drive.google.com/drive/folders/125mewyg5Ao6tZ3ZdJ-1-E3n04LGVELqy?usp=sharing) and the [LISA-13B-llama2-v0-explanatory](https://huggingface.co/xinlai/LISA-13B-llama2-v0-explanatory) model are released!
76
  - [x] [2023.8.3] Inference code and the [LISA-13B-llama2-v0](https://huggingface.co/xinlai/LISA-13B-llama2-v0) model are released. Welcome to check out!
77
  - [x] [2023.8.2] [Paper](https://arxiv.org/pdf/2308.00692.pdf) is released and GitHub repo is created.
78
 
79
- **LISA: Reasoning Segmentation Via Large Language Model [[Paper](https://arxiv.org/abs/2308.00692)]** <br />
80
  [Xin Lai](https://scholar.google.com/citations?user=tqNDPA4AAAAJ&hl=zh-CN),
81
  [Zhuotao Tian](https://scholar.google.com/citations?user=mEjhz-IAAAAJ&hl=en),
82
  [Yukang Chen](https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en),
@@ -104,6 +111,7 @@ For more details, please refer to the [paper](https://arxiv.org/abs/2308.00692).
104
  ## Installation
105
  ```
106
  pip install -r requirements.txt
 
107
  ```
108
 
109
  ## Training
@@ -116,7 +124,7 @@ The training data consists of 4 types of data:
116
 
117
  3. Referring segmentation datasets: [refCOCO](https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip), [refCOCO+](https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip), [refCOCOg](https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip), [refCLEF](https://web.archive.org/web/20220413011817/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refclef.zip) ([saiapr_tc-12](https://web.archive.org/web/20220515000000/http://bvisionweb1.cs.unc.edu/licheng/referit/data/images/saiapr_tc-12.zip))
118
 
119
- Note: the origianl links of refCOCO series data are down, and we update them with new ones
120
 
121
  4. Visual Question Answering dataset: [LLaVA-Instruct-150k](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_instruct_150k.json)
122
 
@@ -221,7 +229,7 @@ deepspeed --master_port=24999 train_ds.py \
221
 
222
  ## Inference
223
 
224
- To chat with [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1) or [LISA-13B-llama2-v1-explanatory] (Coming Soon):
225
  (Note that `chat.py` currently does not support `v0` models (i.e., `LISA-13B-llama2-v0` and `LISA-13B-llama2-v0-explanatory`), please first checkout to the legacy version repo `git checkout 0e26916` to use the `v0` models.)
226
  ```
227
  CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
 
1
+ [![Gradio](https://img.shields.io/badge/Gradio-Online%20Demo-blue)](http://103.170.5.190:7860/)
2
+ [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/openxlab-app/LISA)
3
+
4
  # LISA: Reasoning Segmentation via Large Language Model
5
 
6
  <font size=7><div align='center'><b>LISA</b>: Large <b>L</b>anguage <b>I</b>nstructed <b>S</b>egmentation <b>A</b>ssistant</div></font>
 
72
  <p align="center"> <img src="imgs/fig_overview.jpg" width="100%"> </p>
73
 
74
  ## News
75
+ <<<<<<< HEAD
76
  - [x] [2023.8.23] Refactor code, and release new model [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1). welcome to check it out!
77
+ =======
78
+ - [x] [2023.8.14] Online demo of LISA is also in [OpenXLab apps](https://openxlab.org.cn/apps/detail/openxlab-app/LISA). Thanks for their support!
79
+ >>>>>>> 0e26916dff58c9f2eaad981f127e1171a7c1e3e0
80
  - [x] [2023.8.9] Training code is released!
81
  - [x] [2023.8.4] [Online Demo](http://103.170.5.190:7860/) is released!
82
  - [x] [2023.8.4] [*ReasonSeg* Dataset](https://drive.google.com/drive/folders/125mewyg5Ao6tZ3ZdJ-1-E3n04LGVELqy?usp=sharing) and the [LISA-13B-llama2-v0-explanatory](https://huggingface.co/xinlai/LISA-13B-llama2-v0-explanatory) model are released!
83
  - [x] [2023.8.3] Inference code and the [LISA-13B-llama2-v0](https://huggingface.co/xinlai/LISA-13B-llama2-v0) model are released. Welcome to check out!
84
  - [x] [2023.8.2] [Paper](https://arxiv.org/pdf/2308.00692.pdf) is released and GitHub repo is created.
85
 
86
+ **LISA: Reasoning Segmentation via Large Language Model [[Paper](https://arxiv.org/abs/2308.00692)]** <br />
87
  [Xin Lai](https://scholar.google.com/citations?user=tqNDPA4AAAAJ&hl=zh-CN),
88
  [Zhuotao Tian](https://scholar.google.com/citations?user=mEjhz-IAAAAJ&hl=en),
89
  [Yukang Chen](https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en),
 
111
  ## Installation
112
  ```
113
  pip install -r requirements.txt
114
+ pip install flash-attn --no-build-isolation
115
  ```
116
 
117
  ## Training
 
124
 
125
  3. Referring segmentation datasets: [refCOCO](https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip), [refCOCO+](https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip), [refCOCOg](https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip), [refCLEF](https://web.archive.org/web/20220413011817/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refclef.zip) ([saiapr_tc-12](https://web.archive.org/web/20220515000000/http://bvisionweb1.cs.unc.edu/licheng/referit/data/images/saiapr_tc-12.zip))
126
 
127
+ Note: the original links of refCOCO series data are down, and we update them with new ones. If the download speed is super slow or unstable, we also provide a [OneDrive link](https://mycuhk-my.sharepoint.com/:f:/g/personal/1155154502_link_cuhk_edu_hk/Em5yELVBvfREodKC94nOFLoBLro_LPxsOxNV44PHRWgLcA?e=zQPjsc) to download. **You must also follow the rules that the original datasets require.**
128
 
129
  4. Visual Question Answering dataset: [LLaVA-Instruct-150k](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_instruct_150k.json)
130
 
 
229
 
230
  ## Inference
231
 
232
+ To chat with [LISA-13B-llama2-v1](https://huggingface.co/xinlai/LISA-13B-llama2-v1) or [LISA-13B-llama2-v1-explanatory](https://huggingface.co/xinlai/LISA-13B-llama2-v1-explanatory) (Coming Soon):
233
  (Note that `chat.py` currently does not support `v0` models (i.e., `LISA-13B-llama2-v0` and `LISA-13B-llama2-v0-explanatory`), please first checkout to the legacy version repo `git checkout 0e26916` to use the `v0` models.)
234
  ```
235
  CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
requirements.txt CHANGED
@@ -1,6 +1,11 @@
 
 
 
 
 
 
1
  einops==0.4.1
2
  fastapi==0.100.1
3
- flash_attn==2.0.4
4
  gradio==3.39.0
5
  markdown2==2.4.10
6
  numpy==1.24.2
@@ -11,9 +16,7 @@ pycocotools==2.0.6
11
  ray==2.6.1
12
  Requests==2.31.0
13
  shortuuid==1.0.11
14
- torch==1.11.0+cu113
15
- torchvision==0.12.0+cu113
16
  tqdm==4.64.1
17
- transformers==4.29.0
18
  uvicorn==0.23.2
19
  bitsandbytes==0.41.1
 
1
+ --extra-index-url https://download.pytorch.org/whl/cu117
2
+ torch==2.0.1
3
+ torchvision==0.15.2
4
+ packaging
5
+ sentencepiece
6
+ peft==0.4.0
7
  einops==0.4.1
8
  fastapi==0.100.1
 
9
  gradio==3.39.0
10
  markdown2==2.4.10
11
  numpy==1.24.2
 
16
  ray==2.6.1
17
  Requests==2.31.0
18
  shortuuid==1.0.11
 
 
19
  tqdm==4.64.1
20
+ transformers==4.31.0
21
  uvicorn==0.23.2
22
  bitsandbytes==0.41.1