Spaces:
Runtime error
Runtime error
Commit
Β·
70fcd58
1
Parent(s):
e81e82a
Delete README.md
Browse files
README.md
DELETED
@@ -1,182 +0,0 @@
|
|
1 |
-
# MiniGPT-V
|
2 |
-
|
3 |
-
<font size='5'>**MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning**</font>
|
4 |
-
|
5 |
-
Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiongβ¨, Mohamed Elhoseinyβ¨
|
6 |
-
|
7 |
-
β¨equal last author
|
8 |
-
|
9 |
-
<a href='https://minigpt-v2.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPTv2.pdf'><img src='https://img.shields.io/badge/Paper-PDF-red'></a> <a href='https://minigpt-v2.github.io'><img src='https://img.shields.io/badge/Gradio-Demo-blue'></a> [](https://www.youtube.com/watch?v=atFCwV2hSY4)
|
10 |
-
|
11 |
-
|
12 |
-
<font size='5'>**MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models**</font>
|
13 |
-
|
14 |
-
Deyao Zhu*, Jun Chen*, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
|
15 |
-
|
16 |
-
*equal contribution
|
17 |
-
|
18 |
-
<a href='https://minigpt-4.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2304.10592'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/spaces/Vision-CAIR/minigpt4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <a href='https://huggingface.co/Vision-CAIR/MiniGPT-4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> [](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be)
|
19 |
-
|
20 |
-
*King Abdullah University of Science and Technology*
|
21 |
-
|
22 |
-
## π‘ Get help - [Q&A](https://github.com/Vision-CAIR/MiniGPT-4/discussions/categories/q-a) or [Discord π¬](https://discord.gg/5WdJkjbAeE)
|
23 |
-
|
24 |
-
|
25 |
-
## News
|
26 |
-
[Oct.13 2023] Breaking! We release the first major update with our MiniGPT-v2
|
27 |
-
|
28 |
-
[Aug.28 2023] We now provide a llama 2 version of MiniGPT-4
|
29 |
-
|
30 |
-
## Online Demo
|
31 |
-
|
32 |
-
Click the image to chat with MiniGPT-v2 around your images
|
33 |
-
[](https://minigpt-v2.github.io/)
|
34 |
-
|
35 |
-
Click the image to chat with MiniGPT-4 around your images
|
36 |
-
[](https://minigpt-4.github.io)
|
37 |
-
|
38 |
-
|
39 |
-
## MiniGPT-v2 Examples
|
40 |
-
|
41 |
-

|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
## MiniGPT-4 Examples
|
46 |
-
| | |
|
47 |
-
:-------------------------:|:-------------------------:
|
48 |
-
 | 
|
49 |
-
 | 
|
50 |
-
|
51 |
-
More examples can be found in the [project page](https://minigpt-4.github.io).
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
## Getting Started
|
56 |
-
### Installation
|
57 |
-
|
58 |
-
**1. Prepare the code and the environment**
|
59 |
-
|
60 |
-
Git clone our repository, creating a python environment and activate it via the following command
|
61 |
-
|
62 |
-
```bash
|
63 |
-
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
|
64 |
-
cd MiniGPT-4
|
65 |
-
conda env create -f environment.yml
|
66 |
-
conda activate minigpt4
|
67 |
-
```
|
68 |
-
|
69 |
-
|
70 |
-
**2. Prepare the pretrained LLM weights**
|
71 |
-
|
72 |
-
**MiniGPT-v2** is based on Llama2 Chat 7B. For **MiniGPT-4**, we have both Vicuna V0 and Llama 2 version.
|
73 |
-
Download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
|
74 |
-
|
75 |
-
| Llama 2 Chat 7B | Vicuna V0 13B | Vicuna V0 7B |
|
76 |
-
:------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:
|
77 |
-
[Download](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main) | [Downlad](https://huggingface.co/Vision-CAIR/vicuna/tree/main) | [Download](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main)
|
78 |
-
|
79 |
-
|
80 |
-
Then, set the variable *llama_model* in the model config file to the LLM weight path.
|
81 |
-
|
82 |
-
* For MiniGPT-v2, set the LLM path
|
83 |
-
[here](minigpt4/configs/models/minigpt_v2.yaml#L15) at Line 14.
|
84 |
-
|
85 |
-
* For MiniGPT-4 (Llama2), set the LLM path
|
86 |
-
[here](minigpt4/configs/models/minigpt4_llama2.yaml#L15) at Line 15.
|
87 |
-
|
88 |
-
* For MiniGPT-4 (Vicuna), set the LLM path
|
89 |
-
[here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18
|
90 |
-
|
91 |
-
**3. Prepare the pretrained model checkpoints**
|
92 |
-
|
93 |
-
Download the pretrained model checkpoints
|
94 |
-
|
95 |
-
|
96 |
-
| MiniGPT-v2 (LLaMA-2 Chat 7B) |
|
97 |
-
|------------------------------|
|
98 |
-
| [Download](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing) |
|
99 |
-
|
100 |
-
For **MiniGPT-v2**, set the path to the pretrained checkpoint in the evaluation config file
|
101 |
-
in [eval_configs/minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#L10) at Line 8.
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
| MiniGPT-4 (Vicuna 13B) | MiniGPT-4 (Vicuna 7B) | MiniGPT-4 (LLaMA-2 Chat 7B) |
|
106 |
-
|----------------------------|---------------------------|---------------------------------|
|
107 |
-
| [Download](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link) | [Download](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing) | [Download](https://drive.google.com/file/d/11nAPjEok8eAGGEG1N2vXo3kBLCg0WgUk/view?usp=sharing) |
|
108 |
-
|
109 |
-
For **MiniGPT-4**, set the path to the pretrained checkpoint in the evaluation config file
|
110 |
-
in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 8 for Vicuna version or [eval_configs/minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#L10) for LLama2 version.
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
### Launching Demo Locally
|
115 |
-
|
116 |
-
For MiniGPT-v2, run
|
117 |
-
```
|
118 |
-
python demo_v2.py --cfg-path eval_configs/minigpt4v2_eval.yaml --gpu-id 0
|
119 |
-
```
|
120 |
-
|
121 |
-
For MiniGPT-4 (Vicuna version), run
|
122 |
-
|
123 |
-
```
|
124 |
-
python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0
|
125 |
-
```
|
126 |
-
|
127 |
-
For MiniGPT-4 (Llama2 version), run
|
128 |
-
|
129 |
-
```
|
130 |
-
python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0
|
131 |
-
```
|
132 |
-
|
133 |
-
|
134 |
-
To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1.
|
135 |
-
This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM.
|
136 |
-
For more powerful GPUs, you can run the model
|
137 |
-
in 16 bit by setting `low_resource` to `False` in the relevant config file:
|
138 |
-
|
139 |
-
* MiniGPT-v2: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6)
|
140 |
-
* MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6)
|
141 |
-
* MiniGPT-4 (Vicuna): [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6)
|
142 |
-
|
143 |
-
Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run MiniGPT-4 on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing)
|
144 |
-
|
145 |
-
|
146 |
-
### Training
|
147 |
-
For training details of MiniGPT-4, check [here](MiniGPT4_Train.md).
|
148 |
-
|
149 |
-
|
150 |
-
|
151 |
-
|
152 |
-
## Acknowledgement
|
153 |
-
|
154 |
-
+ [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
|
155 |
-
+ [Lavis](https://github.com/salesforce/LAVIS) This repository is built upon Lavis!
|
156 |
-
+ [Vicuna](https://github.com/lm-sys/FastChat) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
|
157 |
-
+ [LLaMA](https://github.com/facebookresearch/llama) The strong open-sourced LLaMA 2 language model.
|
158 |
-
|
159 |
-
|
160 |
-
If you're using MiniGPT-4/MiniGPT-v2 in your research or applications, please cite using this BibTeX:
|
161 |
-
```bibtex
|
162 |
-
|
163 |
-
@article{Chen2023minigpt,
|
164 |
-
title={MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning},
|
165 |
-
author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed},
|
166 |
-
journal={github},
|
167 |
-
year={2023}
|
168 |
-
}
|
169 |
-
|
170 |
-
@article{zhu2023minigpt,
|
171 |
-
title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},
|
172 |
-
author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
|
173 |
-
journal={arXiv preprint arXiv:2304.10592},
|
174 |
-
year={2023}
|
175 |
-
}
|
176 |
-
```
|
177 |
-
|
178 |
-
|
179 |
-
## License
|
180 |
-
This repository is under [BSD 3-Clause License](LICENSE.md).
|
181 |
-
Many codes are based on [Lavis](https://github.com/salesforce/LAVIS) with
|
182 |
-
BSD 3-Clause License [here](LICENSE_Lavis.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|