BAAI
/

Safetensors
English
llava_onevision
yuheng2000 commited on
Commit
76dc8c6
Β·
verified Β·
1 Parent(s): 78dcdd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -10
README.md CHANGED
@@ -28,9 +28,9 @@ language:
28
  ## πŸ”₯ Overview
29
  Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: **(1) Planning Capability**, which involves decomposing complex manipulation instructions into manageable sub-tasks; **(2) Affordance Perception**, the ability to recognize and interpret the affordances of interactive objects; and **(3) Trajectory Prediction**, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.
30
 
31
- <div align="center">
32
- <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/overview.png" />
33
- </div>
34
 
35
  ## πŸš€ Features
36
  This repository supports:
@@ -65,9 +65,10 @@ This repository supports:
65
  - **[`A-LoRA for Affordance`](https://huggingface.co/BAAI/RoboBrain-LoRA-Affordance/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
66
  - **[`T-LoRA for Trajectory`](https://huggingface.co/BAAI/RoboBrain/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory. *(Coming Soon)*
67
 
68
- <div align="center">
69
- <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/training.png" />
70
- </div>
 
71
 
72
  | Models | Checkpoint | Description |
73
  |----------------------|----------------------------------------------------------------|------------------------------------------------------------|
@@ -290,9 +291,11 @@ print(f"Prediction: {pred}")
290
 
291
  ```
292
 
293
- <div align="center">
 
 
294
  <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png" />
295
- </div>
296
 
297
  ### 3. Usage for Trajectory Prediction
298
  *Coming Soon ...*
@@ -301,9 +304,12 @@ print(f"Prediction: {pred}")
301
  ## <a id="Evaluation">πŸ€– Evaluation</a>
302
  *Coming Soon ...*
303
 
304
- <div align="center">
 
 
 
305
  <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png" />
306
- </div>
307
 
308
  ## 😊 Acknowledgement
309
 
 
28
  ## πŸ”₯ Overview
29
  Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: **(1) Planning Capability**, which involves decomposing complex manipulation instructions into manageable sub-tasks; **(2) Affordance Perception**, the ability to recognize and interpret the affordances of interactive objects; and **(3) Trajectory Prediction**, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.
30
 
31
+
32
+ ![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/overview.png)
33
+
34
 
35
  ## πŸš€ Features
36
  This repository supports:
 
65
  - **[`A-LoRA for Affordance`](https://huggingface.co/BAAI/RoboBrain-LoRA-Affordance/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
66
  - **[`T-LoRA for Trajectory`](https://huggingface.co/BAAI/RoboBrain/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory. *(Coming Soon)*
67
 
68
+ ![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/training.png)
69
+ <!-- <div align="center">
70
+ <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/training.png" width="400"/>
71
+ </div> -->
72
 
73
  | Models | Checkpoint | Description |
74
  |----------------------|----------------------------------------------------------------|------------------------------------------------------------|
 
291
 
292
  ```
293
 
294
+ ![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png)
295
+
296
+ <!-- <div align="center">
297
  <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/demo/examples.png" />
298
+ </div> -->
299
 
300
  ### 3. Usage for Trajectory Prediction
301
  *Coming Soon ...*
 
304
  ## <a id="Evaluation">πŸ€– Evaluation</a>
305
  *Coming Soon ...*
306
 
307
+
308
+ ![](https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png)
309
+
310
+ <!-- <div align="center">
311
  <img src="https://github.com/FlagOpen/RoboBrain/blob/main/assets/result.png" />
312
+ </div> -->
313
 
314
  ## 😊 Acknowledgement
315