Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,25 @@ language:
|
|
12 |
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
|
14 |
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
In recent years, the rapid development of multimodal large language models (MLLMs) has significantly advanced the research progress of artificial general intelligence (AGI).
|
17 |
By utilizing vast multimodal data from the internet and combining it with self-supervised learning techniques,
|
18 |
MLLMs have demonstrated exceptional capabilities in visual perception and understanding human language instructions.
|
@@ -44,13 +62,15 @@ RoboBrain consists of three key robotic capabilities for long-horizon manipulati
|
|
44 |
Based on the ShareRobot dataset we have constructed, RoboBrain has achieved state-of-the-art performance in multiple robotic benchmarks through a well-designed multi-stage training process,
|
45 |
realizing a cognitive leap from abstract instruction understanding to concrete action expression.
|
46 |
<p align="center">
|
47 |
-
<img src="https://superrobobrain.github.io/images/RoboBrain_teaser.png"
|
48 |
<p>
|
49 |
|
50 |
|
|
|
51 |
|
|
|
52 |
|
53 |
-
|
54 |
```python
|
55 |
import torch
|
56 |
from transformers import AutoProcessor, AutoModelForPreTraining
|
@@ -92,11 +112,55 @@ inputs = {k: v.to("cuda:0") for k, v in inputs.items()}
|
|
92 |
print("Generating output...")
|
93 |
output = model.generate(**inputs, max_new_tokens=250)
|
94 |
print(processor.decode(output[0][2:], skip_special_tokens=True))
|
|
|
95 |
```
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
|
99 |
-
## Citation
|
100 |
|
101 |
```
|
102 |
@article{ji2025robobrain,
|
|
|
12 |
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
|
14 |
|
15 |
+
<p align="center">
|
16 |
+
</a>  ⭐️ <a href="https://superrobobrain.github.io/">Project</a></a>   |   🤗 <a href="https://huggingface.co/BAAI/RoboBrain/">Hugging Face</a>   |   🤖 <a href="https://superrobobrain.github.io/">ModelScope</a>   |   🌎 <a href="https://github.com/FlagOpen/ShareRobot">Dataset</a>   |   📑 <a href="http://arxiv.org/abs/2502.21257">Paper</a>   |   💬 <a href="https://superrobobrain.github.io/">WeChat</a>
|
17 |
+
</p>
|
18 |
+
<p align="center">
|
19 |
+
</a>  🎯 <a href="">RoboOS (Coming Soon)</a>: An Efficient Open-Source Multi-Robot Coordination System for RoboBrain.
|
20 |
+
</p>
|
21 |
+
<p align="center">
|
22 |
+
</a>  🎯 <a href="https://tanhuajie.github.io/ReasonRFT/">ReasonRFT</a>: Exploring a New RFT Paradigm to Enhance RoboBrain's Visual Reasoning Capabilities.
|
23 |
+
</p>
|
24 |
+
|
25 |
+
## 🤗 Checkpoints
|
26 |
+
| Models | Checkpoint | Description |
|
27 |
+
|----------------------|----------------------------------------------------------------|------------------------------------------------------------|
|
28 |
+
| Planning Model | [🤗 Planning CKPTs](https://huggingface.co/BAAI/RoboBrain/) | Used for Planning prediction in our paper |
|
29 |
+
| Affordance (A-LoRA) | [🤗 Affordance CKPTs](https://superrobobrain.github.io/) | Used for Affordance prediction in our paper *(Coming Soon)* |
|
30 |
+
| Trajectory (T-LoRA) | [🤗 Trajectory CKPTs](https://superrobobrain.github.io/) | Used for Trajectory prediction in our paper *(Coming Soon)* |
|
31 |
+
|
32 |
+
## 🔥 Introduction
|
33 |
+
|
34 |
In recent years, the rapid development of multimodal large language models (MLLMs) has significantly advanced the research progress of artificial general intelligence (AGI).
|
35 |
By utilizing vast multimodal data from the internet and combining it with self-supervised learning techniques,
|
36 |
MLLMs have demonstrated exceptional capabilities in visual perception and understanding human language instructions.
|
|
|
62 |
Based on the ShareRobot dataset we have constructed, RoboBrain has achieved state-of-the-art performance in multiple robotic benchmarks through a well-designed multi-stage training process,
|
63 |
realizing a cognitive leap from abstract instruction understanding to concrete action expression.
|
64 |
<p align="center">
|
65 |
+
<img src="https://superrobobrain.github.io/images/RoboBrain_teaser.png" />
|
66 |
<p>
|
67 |
|
68 |
|
69 |
+
## 🤖 Inference
|
70 |
|
71 |
+
### Option 1: HF inference
|
72 |
|
73 |
+
#### Run python script as example:
|
74 |
```python
|
75 |
import torch
|
76 |
from transformers import AutoProcessor, AutoModelForPreTraining
|
|
|
112 |
print("Generating output...")
|
113 |
output = model.generate(**inputs, max_new_tokens=250)
|
114 |
print(processor.decode(output[0][2:], skip_special_tokens=True))
|
115 |
+
|
116 |
```
|
117 |
|
118 |
+
### Option 2: VLLM inference
|
119 |
+
#### Install and launch VLLM
|
120 |
+
```bash
|
121 |
+
# Install vllm package
|
122 |
+
pip install vllm==0.6.6.post1
|
123 |
+
|
124 |
+
# Launch Robobrain with vllm
|
125 |
+
python -m vllm.entrypoints.openai.api_server --model BAAI/RoboBrain --served-model-name robobrain --max_model_len 16384 --limit_mm_per_prompt image=8
|
126 |
+
```
|
127 |
|
128 |
+
#### Run python script as example:
|
129 |
+
```python
|
130 |
+
from openai import OpenAI
|
131 |
+
import base64
|
132 |
+
|
133 |
+
openai_api_key = "robobrain-123123"
|
134 |
+
openai_api_base = "http://127.0.0.1:8000/v1"
|
135 |
+
|
136 |
+
client = OpenAI(
|
137 |
+
api_key=openai_api_key,
|
138 |
+
base_url=openai_api_base,
|
139 |
+
)
|
140 |
+
|
141 |
+
response = client.chat.completions.create(
|
142 |
+
model="robobrain",
|
143 |
+
messages=[
|
144 |
+
{
|
145 |
+
"role": "user",
|
146 |
+
"content": [
|
147 |
+
{
|
148 |
+
"type": "image_url",
|
149 |
+
"image_url": {
|
150 |
+
"url": "http://images.cocodataset.org/val2017/000000039769.jpg"
|
151 |
+
},
|
152 |
+
},
|
153 |
+
{"type": "text", "text": "What is shown in this image?"},
|
154 |
+
],
|
155 |
+
},
|
156 |
+
]
|
157 |
+
)
|
158 |
+
|
159 |
+
content = response.choices[0].message.content
|
160 |
+
print(content)
|
161 |
+
```
|
162 |
|
163 |
+
## 📑 Citation
|
164 |
|
165 |
```
|
166 |
@article{ji2025robobrain,
|