Update README.md
Browse files
README.md
CHANGED
@@ -28,9 +28,9 @@ language:
|
|
28 |
## π₯ Overview
|
29 |
Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: **(1) Planning Capability**, which involves decomposing complex manipulation instructions into manageable sub-tasks; **(2) Affordance Perception**, the ability to recognize and interpret the affordances of interactive objects; and **(3) Trajectory Prediction**, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.
|
30 |
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
|
35 |
## π Features
|
36 |
This repository supports:
|
@@ -65,9 +65,10 @@ This repository supports:
|
|
65 |
- **[`A-LoRA for Affordance`](https://huggingface.co/BAAI/RoboBrain-LoRA-Affordance/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
|
66 |
- **[`T-LoRA for Trajectory`](https://huggingface.co/BAAI/RoboBrain/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory. *(Coming Soon)*
|
67 |
|
68 |
-
|
69 |
-
<
|
70 |
-
|
|
|
71 |
|
72 |
| Models | Checkpoint | Description |
|
73 |
|----------------------|----------------------------------------------------------------|------------------------------------------------------------|
|
@@ -290,9 +291,11 @@ print(f"Prediction: {pred}")
|
|
290 |
|
291 |
```
|
292 |
|
293 |
-
|
|
|
|
|
294 |
<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png" />
|
295 |
-
</div>
|
296 |
|
297 |
### 3. Usage for Trajectory Prediction
|
298 |
*Coming Soon ...*
|
@@ -301,9 +304,12 @@ print(f"Prediction: {pred}")
|
|
301 |
## <a id="Evaluation">π€ Evaluation</a>
|
302 |
*Coming Soon ...*
|
303 |
|
304 |
-
|
|
|
|
|
|
|
305 |
<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png" />
|
306 |
-
</div>
|
307 |
|
308 |
## π Acknowledgement
|
309 |
|
|
|
28 |
## π₯ Overview
|
29 |
Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: **(1) Planning Capability**, which involves decomposing complex manipulation instructions into manageable sub-tasks; **(2) Affordance Perception**, the ability to recognize and interpret the affordances of interactive objects; and **(3) Trajectory Prediction**, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.
|
30 |
|
31 |
+
|
32 |
+

|
33 |
+
|
34 |
|
35 |
## π Features
|
36 |
This repository supports:
|
|
|
65 |
- **[`A-LoRA for Affordance`](https://huggingface.co/BAAI/RoboBrain-LoRA-Affordance/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
|
66 |
- **[`T-LoRA for Trajectory`](https://huggingface.co/BAAI/RoboBrain/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory. *(Coming Soon)*
|
67 |
|
68 |
+

|
69 |
+
<!-- <div align="center">
|
70 |
+
<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/training.png" width="400"/>
|
71 |
+
</div> -->
|
72 |
|
73 |
| Models | Checkpoint | Description |
|
74 |
|----------------------|----------------------------------------------------------------|------------------------------------------------------------|
|
|
|
291 |
|
292 |
```
|
293 |
|
294 |
+

|
295 |
+
|
296 |
+
<!-- <div align="center">
|
297 |
<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png" />
|
298 |
+
</div> -->
|
299 |
|
300 |
### 3. Usage for Trajectory Prediction
|
301 |
*Coming Soon ...*
|
|
|
304 |
## <a id="Evaluation">π€ Evaluation</a>
|
305 |
*Coming Soon ...*
|
306 |
|
307 |
+
|
308 |
+

|
309 |
+
|
310 |
+
<!-- <div align="center">
|
311 |
<img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png" />
|
312 |
+
</div> -->
|
313 |
|
314 |
## π Acknowledgement
|
315 |
|