BAAI
/

RoboBrain

Safetensors

English

llava_onevision

Model card Files Files and versions Community

yuheng2000 commited on Mar 29

Commit

76dc8c6

verified ·

1 Parent(s): 78dcdd5

Update README.md

Browse files

Files changed (1) hide show

README.md +16 -10

README.md CHANGED Viewed

@@ -28,9 +28,9 @@ language:
 ## 🔥 Overview
 Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: **(1) Planning Capability**, which involves decomposing complex manipulation instructions into manageable sub-tasks; **(2) Affordance Perception**, the ability to recognize and interpret the affordances of interactive objects; and **(3) Trajectory Prediction**, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.
-<div align="center">
-<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/overview.png" />
-</div>
 ## 🚀 Features
 This repository supports:
@@ -65,9 +65,10 @@ This repository supports:
 - **[`A-LoRA for Affordance`](https://huggingface.co/BAAI/RoboBrain-LoRA-Affordance/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
 - **[`T-LoRA for Trajectory`](https://huggingface.co/BAAI/RoboBrain/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory. *(Coming Soon)*
-<div align="center">
-<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/training.png" />
-</div>
 | Models               | Checkpoint                                                     | Description                                                |
 |----------------------|----------------------------------------------------------------|------------------------------------------------------------|
@@ -290,9 +291,11 @@ print(f"Prediction: {pred}")
 ```
-<div align="center">
 <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png" />
-</div>
 ### 3. Usage for Trajectory Prediction
 *Coming Soon ...*
@@ -301,9 +304,12 @@ print(f"Prediction: {pred}")
 ## <a id="Evaluation">🤖 Evaluation</a>
 *Coming Soon ...*
-<div align="center">
 <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png" />
-</div>
 ## 😊 Acknowledgement

 ## 🔥 Overview
 Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: **(1) Planning Capability**, which involves decomposing complex manipulation instructions into manageable sub-tasks; **(2) Affordance Perception**, the ability to recognize and interpret the affordances of interactive objects; and **(3) Trajectory Prediction**, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.
+![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/overview.png)
 ## 🚀 Features
 This repository supports:
 - **[`A-LoRA for Affordance`](https://huggingface.co/BAAI/RoboBrain-LoRA-Affordance/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
 - **[`T-LoRA for Trajectory`](https://huggingface.co/BAAI/RoboBrain/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory. *(Coming Soon)*
+![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/training.png)
+<!-- <div align="center">
+<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/training.png" width="400"/>
+</div> -->
 | Models               | Checkpoint                                                     | Description                                                |
 |----------------------|----------------------------------------------------------------|------------------------------------------------------------|
 ```
+![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png)
+<!-- <div align="center">
 <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png" />
+</div> -->
 ### 3. Usage for Trajectory Prediction
 *Coming Soon ...*
 ## <a id="Evaluation">🤖 Evaluation</a>
 *Coming Soon ...*
+![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png)
+<!-- <div align="center">
 <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png" />
+</div> -->
 ## 😊 Acknowledgement