lixuejing commited on
Commit
cd635eb
·
1 Parent(s): 38f42b3
Files changed (1) hide show
  1. src/about.py +24 -20
src/about.py CHANGED
@@ -98,29 +98,29 @@ FlagEvalMM is an open-source evaluation framework designed to comprehensively a
98
  # Embodied verse
99
  EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset,包括:
100
 
101
- <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: 包含100张来自不同杂乱环境的真实世界图像,每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码,用于评估基于空间关系的自由空间指代表达。
102
- <a href="https://zeyofu.github.io/blink/" target="_blank"> Blink </a>: 包含一些可以被人类轻松解决的视觉问题,EmbodiedVerse采样了和空间理解相关的类别(Counting, Relative_Depth,Spatial_Relation, Multi-view_Reasoning,Visual_Correspondence)
103
- <a href="https://huggingface.co/datasets/nyu-visionx/CV-Bench" target="_blank"> CVBench </a>: 一个以视觉为中心的数据集,包含2638个人工筛选的问题。
104
- <a href="https://arxiv.org/abs/2411.16537" target="_blank"> RoboSpatial-Home </a>: 一个旨在评估视觉语言模型(VLMs)在真实室内机器人环境中空间推理能力的新基准。
105
- <a href="https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench" target="_blank"> EmbspatialBench </a>: 一个用于评估具身视觉语言模型(LVLM)具身空间理解能力的基准。该基准自动从具身场景中提取,并从自我中心视角涵盖 6 种空间关系。
106
- <a href="https://danielchyeh.github.io/All-Angles-Bench/" target="_blank"> All-Angles Bench </a>: 一个多视图理解基准,包含 90 个真实场景中超过 2100 个人工标注的多视图问答对。.
107
- <a href="https://huggingface.co/datasets/nyu-visionx/VSI-Bench" target="_blank"> VSI-Bench </a>: 一个基于视频的基准数据集,从真实室内场景的自我中心视角视频中构造问题,旨在评估多模态大模型的视觉空间智能。EmbodiedVerse使用了包含400问题的tiny子集。
108
- <a href="https://arxiv.org/pdf/2412.07755" target="_blank"> SAT </a>: 一个具有挑战性的真实图像动态空间测试集。
109
- <a href="https://arxiv.org/pdf/2412.04447" target="_blank"> EgoPlan-Bench2 </a>: 一个涵盖 4 大领域和 24 个详细场景的日常任务基准,与人类日常生活紧密契合。
110
- <a href="https://github.com/embodiedreasoning/ERQA" target="_blank"> ERQA </a>: 这个评估基准涵盖了与空间推理和世界知识相关的各种主题,侧重于现实世界的场景,尤其是在机器人技术背景下。
111
 
112
  EmbodiedVerse-Open is a meta-dataset composed of 10 datasets for comprehensively evaluating models in embodied intelligence scenarios, including:
113
 
114
- <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: The dataset is a collection of 100 real-world images from diverse cluttered environments, each annotated with a sentence describing a desired free space and a corresponding mask, designed to evaluate free space referencing using spatial relations.
115
- <a href="https://zeyofu.github.io/blink/" target="_blank"> Blink </a>: Including some visual problems that can be easily solved by humans, EmbodiedVerse samples categories related to spatial understanding (Counting, Relative_Depth, Spatial_Relation, Multi-view_Reasoning, Visual_Correspondence).
116
- <a href="https://huggingface.co/datasets/nyu-visionx/CV-Bench" target="_blank"> CVBench </a>: A vision-centric benchmarks, containing 2638 manually-inspected examples.
117
- <a href="https://arxiv.org/abs/2411.16537" target="_blank"> RoboSpatial-Home </a>: A new spatial reasoning benchmark designed to evaluate vision-language models (VLMs) in real-world indoor environments for robotics.
118
- <a href="https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench" target="_blank"> EmbspatialBench </a>: A benchmark for evaluating embodied spatial understanding of LVLM. The benchmark is automatically derived from embodied scenes and covers 6 spatial relationships from an egocentric perspective.
119
- <a href="https://danielchyeh.github.io/All-Angles-Bench/" target="_blank"> All-Angles Bench </a>: A Benchmark for Multi-View Understanding, including over 2,100 human-annotated multi-view QA pairs across 90 real-world scenes.
120
- <a href="https://huggingface.co/datasets/nyu-visionx/VSI-Bench" target="_blank"> VSI-Bench </a>: A video-based benchmark dataset constructs questions from egocentric-view videos of real indoor scenes, aiming to evaluate the visual-spatial intelligence of multimodal large models. EmbodiedVerse uses a tiny subset containing 400 questions.
121
- <a href="https://arxiv.org/pdf/2412.07755" target="_blank"> SAT </a>: A challenging real-image dynamic spatial benchmark.
122
- <a href="https://arxiv.org/pdf/2412.04447" target="_blank"> EgoPlan-Bench2 </a>: A benchmark which encompasses everyday tasks spanning4 major domains and 24 detailed scenarios, closely aligned with human daily life.
123
- <a href="https://github.com/embodiedreasoning/ERQA" target="_blank"> ERQA </a>: This evaluation benchmark covers a variety of topics related to spatial reasoning and world knowledge focused on real-world scenarios, particularly in the context of robotics.
124
 
125
  数据集子集链接 :comming soon
126
 
@@ -141,10 +141,12 @@ You can find:
141
  EVALUATION_METRIC_TEXT = """
142
  ### 评测指标缩写介绍如下:
143
  ### Evaluation Metrics Abbreviations are introduced below:
 
144
  Perception
145
  - Perception_Visual Grounding(P_VG)
146
  - Perception_Counting(P_C)
147
  - Perception_State & Activity Understanding
 
148
  SpatialReasoning
149
  - SpatialReasoning_Dynamic(SR_D)
150
  - SpatialReasoning_Relative direction(SR_Rd)
@@ -153,9 +155,11 @@ SpatialReasoning
153
  - SpatialReasoning_Depth estimation(SR_De)
154
  - SpatialReasoning_Relative shape(SR_Rs)
155
  - SpatialReasoning_Size estimation(SR_Se)
 
156
  Prediction
157
  - Prediction_Trajectory(P_T)
158
  - Prediction_Futureprediction(P_Fd)
 
159
  Planning
160
  - Planning_Goal Decomposition(P_GD)
161
  - Planning_Navigation(P_N)
 
98
  # Embodied verse
99
  EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset,包括:
100
 
101
+ - <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: 包含100张来自不同杂乱环境的真实世界图像,每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码,用于评估基于空间关系的自由空间指代表达。
102
+ - <a href="https://zeyofu.github.io/blink/" target="_blank"> Blink </a>: 包含一些可以被人类轻松解决的视觉问题,EmbodiedVerse采样了和空间理解相关的类别(Counting, Relative_Depth,Spatial_Relation, Multi-view_Reasoning,Visual_Correspondence)
103
+ - <a href="https://huggingface.co/datasets/nyu-visionx/CV-Bench" target="_blank"> CVBench </a>: 一个以视觉为中心的数据集,包含2638个人工筛选的问题。
104
+ - <a href="https://arxiv.org/abs/2411.16537" target="_blank"> RoboSpatial-Home </a>: 一个旨在评估视觉语言模型(VLMs)在真实室内机器人环境中空间推理能力的新基准。
105
+ - <a href="https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench" target="_blank"> EmbspatialBench </a>: 一个用于评估具身视觉语言模型(LVLM)具身空间理解能力的基准。该基准自动从具身场景中提取,并从自我中心视角涵盖 6 种空间关系。
106
+ - <a href="https://danielchyeh.github.io/All-Angles-Bench/" target="_blank"> All-Angles Bench </a>: 一个多视图理解基准,包含 90 个真实场景中超过 2100 个人工标注的多视图问答对。.
107
+ - <a href="https://huggingface.co/datasets/nyu-visionx/VSI-Bench" target="_blank"> VSI-Bench </a>: 一个基于视频的基准数据集,从真实室内场景的自我中心视角视频中构造问题,旨在评估多模态大模型的视觉空间智能。EmbodiedVerse使用了包含400问题的tiny子集。
108
+ - <a href="https://arxiv.org/pdf/2412.07755" target="_blank"> SAT </a>: 一个具有挑战性的真实图像动态空间测试集。
109
+ - <a href="https://arxiv.org/pdf/2412.04447" target="_blank"> EgoPlan-Bench2 </a>: 一个涵盖 4 大领域和 24 个详细场景的日常任务基准,与人类日常生活紧密契合。
110
+ - <a href="https://github.com/embodiedreasoning/ERQA" target="_blank"> ERQA </a>: 这个评估基准涵盖了与空间推理和世界知识相关的各种主题,侧重于现实世界的场景,尤其是在机器人技术背景下。
111
 
112
  EmbodiedVerse-Open is a meta-dataset composed of 10 datasets for comprehensively evaluating models in embodied intelligence scenarios, including:
113
 
114
+ - <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: The dataset is a collection of 100 real-world images from diverse cluttered environments, each annotated with a sentence describing a desired free space and a corresponding mask, designed to evaluate free space referencing using spatial relations.
115
+ - <a href="https://zeyofu.github.io/blink/" target="_blank"> Blink </a>: Including some visual problems that can be easily solved by humans, EmbodiedVerse samples categories related to spatial understanding (Counting, Relative_Depth, Spatial_Relation, Multi-view_Reasoning, Visual_Correspondence).
116
+ - <a href="https://huggingface.co/datasets/nyu-visionx/CV-Bench" target="_blank"> CVBench </a>: A vision-centric benchmarks, containing 2638 manually-inspected examples.
117
+ - <a href="https://arxiv.org/abs/2411.16537" target="_blank"> RoboSpatial-Home </a>: A new spatial reasoning benchmark designed to evaluate vision-language models (VLMs) in real-world indoor environments for robotics.
118
+ - <a href="https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench" target="_blank"> EmbspatialBench </a>: A benchmark for evaluating embodied spatial understanding of LVLM. The benchmark is automatically derived from embodied scenes and covers 6 spatial relationships from an egocentric perspective.
119
+ - <a href="https://danielchyeh.github.io/All-Angles-Bench/" target="_blank"> All-Angles Bench </a>: A Benchmark for Multi-View Understanding, including over 2,100 human-annotated multi-view QA pairs across 90 real-world scenes.
120
+ - <a href="https://huggingface.co/datasets/nyu-visionx/VSI-Bench" target="_blank"> VSI-Bench </a>: A video-based benchmark dataset constructs questions from egocentric-view videos of real indoor scenes, aiming to evaluate the visual-spatial intelligence of multimodal large models. EmbodiedVerse uses a tiny subset containing 400 questions.
121
+ - <a href="https://arxiv.org/pdf/2412.07755" target="_blank"> SAT </a>: A challenging real-image dynamic spatial benchmark.
122
+ - <a href="https://arxiv.org/pdf/2412.04447" target="_blank"> EgoPlan-Bench2 </a>: A benchmark which encompasses everyday tasks spanning4 major domains and 24 detailed scenarios, closely aligned with human daily life.
123
+ - <a href="https://github.com/embodiedreasoning/ERQA" target="_blank"> ERQA </a>: This evaluation benchmark covers a variety of topics related to spatial reasoning and world knowledge focused on real-world scenarios, particularly in the context of robotics.
124
 
125
  数据集子集链接 :comming soon
126
 
 
141
  EVALUATION_METRIC_TEXT = """
142
  ### 评测指标缩写介绍如下:
143
  ### Evaluation Metrics Abbreviations are introduced below:
144
+
145
  Perception
146
  - Perception_Visual Grounding(P_VG)
147
  - Perception_Counting(P_C)
148
  - Perception_State & Activity Understanding
149
+
150
  SpatialReasoning
151
  - SpatialReasoning_Dynamic(SR_D)
152
  - SpatialReasoning_Relative direction(SR_Rd)
 
155
  - SpatialReasoning_Depth estimation(SR_De)
156
  - SpatialReasoning_Relative shape(SR_Rs)
157
  - SpatialReasoning_Size estimation(SR_Se)
158
+
159
  Prediction
160
  - Prediction_Trajectory(P_T)
161
  - Prediction_Futureprediction(P_Fd)
162
+
163
  Planning
164
  - Planning_Goal Decomposition(P_GD)
165
  - Planning_Navigation(P_N)