BAAI
/

RoboBrain

Safetensors

English

llava_onevision

Model card Files Files and versions Community

tanhuajie2001 commited on Mar 28

Commit

1c7de27

verified ·

1 Parent(s): 40967c1

Update README.md

Browse files

Files changed (1) hide show

README.md +68 -4

README.md CHANGED Viewed

@@ -12,7 +12,25 @@ language:
 <!-- Provide a quick summary of what the model is/does. -->
 [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
-## Introduction
 In recent years, the rapid development of multimodal large language models (MLLMs) has significantly advanced the research progress of artificial general intelligence (AGI).
 By utilizing vast multimodal data from the internet and combining it with self-supervised learning techniques,
 MLLMs have demonstrated exceptional capabilities in visual perception and understanding human language instructions.
@@ -44,13 +62,15 @@ RoboBrain consists of three key robotic capabilities for long-horizon manipulati
 Based on the ShareRobot dataset we have constructed, RoboBrain has achieved state-of-the-art performance in multiple robotic benchmarks through a well-designed multi-stage training process,
 realizing a cognitive leap from abstract instruction understanding to concrete action expression.
 <p align="center">
-    <img src="https://superrobobrain.github.io/images/RoboBrain_teaser.png" width="80%"/>
 <p>
-## Usage
 ```python
 import torch
 from transformers import AutoProcessor, AutoModelForPreTraining
@@ -92,11 +112,55 @@ inputs = {k: v.to("cuda:0") for k, v in inputs.items()}
 print("Generating output...")
 output = model.generate(**inputs, max_new_tokens=250)
 print(processor.decode(output[0][2:], skip_special_tokens=True))
 ```
-## Citation
 ```
 @article{ji2025robobrain,

 <!-- Provide a quick summary of what the model is/does. -->
 [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
+<p align="center">
+        </a>&nbsp&nbsp⭐️ <a href="https://superrobobrain.github.io/">Project</a></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/BAAI/RoboBrain/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://superrobobrain.github.io/">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp🌎 <a href="https://github.com/FlagOpen/ShareRobot">Dataset</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="http://arxiv.org/abs/2502.21257">Paper</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://superrobobrain.github.io/">WeChat</a>
+</p>
+<p align="center">
+        </a>&nbsp&nbsp🎯 <a href="">RoboOS (Coming Soon)</a>: An Efficient Open-Source Multi-Robot Coordination System for RoboBrain.
+</p>
+<p align="center">
+</a>&nbsp&nbsp🎯 <a href="https://tanhuajie.github.io/ReasonRFT/">ReasonRFT</a>: Exploring a New RFT Paradigm to Enhance RoboBrain's Visual Reasoning Capabilities.
+</p>
+## 🤗 Checkpoints
+| Models               | Checkpoint                                                     | Description                                                |
+|----------------------|----------------------------------------------------------------|------------------------------------------------------------|
+| Planning Model       | [🤗 Planning CKPTs](https://huggingface.co/BAAI/RoboBrain/)   | Used for Planning prediction in our paper                   |
+| Affordance (A-LoRA)  | [🤗 Affordance CKPTs](https://superrobobrain.github.io/)      | Used for Affordance prediction in our paper *(Coming Soon)* |
+| Trajectory (T-LoRA)  | [🤗 Trajectory CKPTs](https://superrobobrain.github.io/)      | Used for Trajectory prediction in our paper *(Coming Soon)* |
+## 🔥 Introduction
 In recent years, the rapid development of multimodal large language models (MLLMs) has significantly advanced the research progress of artificial general intelligence (AGI).
 By utilizing vast multimodal data from the internet and combining it with self-supervised learning techniques,
 MLLMs have demonstrated exceptional capabilities in visual perception and understanding human language instructions.
 Based on the ShareRobot dataset we have constructed, RoboBrain has achieved state-of-the-art performance in multiple robotic benchmarks through a well-designed multi-stage training process,
 realizing a cognitive leap from abstract instruction understanding to concrete action expression.
 <p align="center">
+    <img src="https://superrobobrain.github.io/images/RoboBrain_teaser.png" />
 <p>
+## 🤖 Inference
+### Option 1: HF inference
+#### Run python script as example:
 ```python
 import torch
 from transformers import AutoProcessor, AutoModelForPreTraining
 print("Generating output...")
 output = model.generate(**inputs, max_new_tokens=250)
 print(processor.decode(output[0][2:], skip_special_tokens=True))
 ```
+### Option 2: VLLM inference
+#### Install and launch VLLM
+```bash
+# Install vllm package
+pip install vllm==0.6.6.post1
+# Launch Robobrain with vllm
+python -m vllm.entrypoints.openai.api_server --model BAAI/RoboBrain --served-model-name robobrain  --max_model_len 16384 --limit_mm_per_prompt image=8
+```
+#### Run python script as example:
+```python
+from openai import OpenAI
+import base64
+openai_api_key = "robobrain-123123"
+openai_api_base = "http://127.0.0.1:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+response = client.chat.completions.create(
+    model="robobrain",
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "http://images.cocodataset.org/val2017/000000039769.jpg"
+                    },
+                },
+                {"type": "text", "text": "What is shown in this image?"},
+            ],
+        },
+    ]
+)
+content = response.choices[0].message.content
+print(content)
+```
+## 📑 Citation
 ```
 @article{ji2025robobrain,