BAAI
/

Safetensors
English
llava_onevision
tanhuajie2001 commited on
Commit
1c7de27
·
verified ·
1 Parent(s): 40967c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -4
README.md CHANGED
@@ -12,7 +12,25 @@ language:
12
  <!-- Provide a quick summary of what the model is/does. -->
13
  [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
14
 
15
- ## Introduction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  In recent years, the rapid development of multimodal large language models (MLLMs) has significantly advanced the research progress of artificial general intelligence (AGI).
17
  By utilizing vast multimodal data from the internet and combining it with self-supervised learning techniques,
18
  MLLMs have demonstrated exceptional capabilities in visual perception and understanding human language instructions.
@@ -44,13 +62,15 @@ RoboBrain consists of three key robotic capabilities for long-horizon manipulati
44
  Based on the ShareRobot dataset we have constructed, RoboBrain has achieved state-of-the-art performance in multiple robotic benchmarks through a well-designed multi-stage training process,
45
  realizing a cognitive leap from abstract instruction understanding to concrete action expression.
46
  <p align="center">
47
- <img src="https://superrobobrain.github.io/images/RoboBrain_teaser.png" width="80%"/>
48
  <p>
49
 
50
 
 
51
 
 
52
 
53
- ## Usage
54
  ```python
55
  import torch
56
  from transformers import AutoProcessor, AutoModelForPreTraining
@@ -92,11 +112,55 @@ inputs = {k: v.to("cuda:0") for k, v in inputs.items()}
92
  print("Generating output...")
93
  output = model.generate(**inputs, max_new_tokens=250)
94
  print(processor.decode(output[0][2:], skip_special_tokens=True))
 
95
  ```
96
 
 
 
 
 
 
 
 
 
 
97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
 
99
- ## Citation
100
 
101
  ```
102
  @article{ji2025robobrain,
 
12
  <!-- Provide a quick summary of what the model is/does. -->
13
  [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
14
 
15
+ <p align="center">
16
+ </a>&nbsp&nbsp⭐️ <a href="https://superrobobrain.github.io/">Project</a></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/BAAI/RoboBrain/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://superrobobrain.github.io/">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp🌎 <a href="https://github.com/FlagOpen/ShareRobot">Dataset</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="http://arxiv.org/abs/2502.21257">Paper</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://superrobobrain.github.io/">WeChat</a>
17
+ </p>
18
+ <p align="center">
19
+ </a>&nbsp&nbsp🎯 <a href="">RoboOS (Coming Soon)</a>: An Efficient Open-Source Multi-Robot Coordination System for RoboBrain.
20
+ </p>
21
+ <p align="center">
22
+ </a>&nbsp&nbsp🎯 <a href="https://tanhuajie.github.io/ReasonRFT/">ReasonRFT</a>: Exploring a New RFT Paradigm to Enhance RoboBrain's Visual Reasoning Capabilities.
23
+ </p>
24
+
25
+ ## 🤗 Checkpoints
26
+ | Models | Checkpoint | Description |
27
+ |----------------------|----------------------------------------------------------------|------------------------------------------------------------|
28
+ | Planning Model | [🤗 Planning CKPTs](https://huggingface.co/BAAI/RoboBrain/) | Used for Planning prediction in our paper |
29
+ | Affordance (A-LoRA) | [🤗 Affordance CKPTs](https://superrobobrain.github.io/) | Used for Affordance prediction in our paper *(Coming Soon)* |
30
+ | Trajectory (T-LoRA) | [🤗 Trajectory CKPTs](https://superrobobrain.github.io/) | Used for Trajectory prediction in our paper *(Coming Soon)* |
31
+
32
+ ## 🔥 Introduction
33
+
34
  In recent years, the rapid development of multimodal large language models (MLLMs) has significantly advanced the research progress of artificial general intelligence (AGI).
35
  By utilizing vast multimodal data from the internet and combining it with self-supervised learning techniques,
36
  MLLMs have demonstrated exceptional capabilities in visual perception and understanding human language instructions.
 
62
  Based on the ShareRobot dataset we have constructed, RoboBrain has achieved state-of-the-art performance in multiple robotic benchmarks through a well-designed multi-stage training process,
63
  realizing a cognitive leap from abstract instruction understanding to concrete action expression.
64
  <p align="center">
65
+ <img src="https://superrobobrain.github.io/images/RoboBrain_teaser.png" />
66
  <p>
67
 
68
 
69
+ ## 🤖 Inference
70
 
71
+ ### Option 1: HF inference
72
 
73
+ #### Run python script as example:
74
  ```python
75
  import torch
76
  from transformers import AutoProcessor, AutoModelForPreTraining
 
112
  print("Generating output...")
113
  output = model.generate(**inputs, max_new_tokens=250)
114
  print(processor.decode(output[0][2:], skip_special_tokens=True))
115
+
116
  ```
117
 
118
+ ### Option 2: VLLM inference
119
+ #### Install and launch VLLM
120
+ ```bash
121
+ # Install vllm package
122
+ pip install vllm==0.6.6.post1
123
+
124
+ # Launch Robobrain with vllm
125
+ python -m vllm.entrypoints.openai.api_server --model BAAI/RoboBrain --served-model-name robobrain --max_model_len 16384 --limit_mm_per_prompt image=8
126
+ ```
127
 
128
+ #### Run python script as example:
129
+ ```python
130
+ from openai import OpenAI
131
+ import base64
132
+
133
+ openai_api_key = "robobrain-123123"
134
+ openai_api_base = "http://127.0.0.1:8000/v1"
135
+
136
+ client = OpenAI(
137
+ api_key=openai_api_key,
138
+ base_url=openai_api_base,
139
+ )
140
+
141
+ response = client.chat.completions.create(
142
+ model="robobrain",
143
+ messages=[
144
+ {
145
+ "role": "user",
146
+ "content": [
147
+ {
148
+ "type": "image_url",
149
+ "image_url": {
150
+ "url": "http://images.cocodataset.org/val2017/000000039769.jpg"
151
+ },
152
+ },
153
+ {"type": "text", "text": "What is shown in this image?"},
154
+ ],
155
+ },
156
+ ]
157
+ )
158
+
159
+ content = response.choices[0].message.content
160
+ print(content)
161
+ ```
162
 
163
+ ## 📑 Citation
164
 
165
  ```
166
  @article{ji2025robobrain,