sunshinepku commited on
Commit
0d84286
Β·
verified Β·
1 Parent(s): c25cb29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -6
README.md CHANGED
@@ -8,6 +8,8 @@ pinned: false
8
  ---
9
  # BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
10
 
 
 
11
  [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
12
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/yourusername/bioprotocolbench/pulls)
13
 
@@ -17,10 +19,14 @@ pinned: false
17
 
18
  Biological protocols are the fundamental bedrock of reproducible and safe life science research. While LLMs have shown remarkable capabilities on general tasks, their systematic evaluation on highly specialized, accuracy-critical, and inherently procedural texts like biological protocols remains limited. BioProBench fills this gap by providing a robust framework to evaluate LLMs on diverse aspects of protocol understanding and reasoning.
19
 
20
- BioProBench features:
 
 
21
 
 
22
  * πŸ“š **Large-scale Data:** Built upon **27K original biological protocols**, yielding nearly **556K high-quality structured instances**.
23
  * 🎯 **Comprehensive Tasks:** A suite of ** five core tasks** challenging LLMs on different facets of procedural understanding and generation:
 
24
  * Protocol Question Answering (PQA)
25
  * Step Ordering (ORD)
26
  * Error Correction (ERR)
@@ -31,13 +37,15 @@ BioProBench features:
31
 
32
  ---
33
 
 
34
  ## πŸš€ Motivation
35
 
36
  Biological protocols are the operational blueprint for experiments. As biological research increasingly leverages automation and AI, the ability of AI systems to understand and reason about these complex procedures is paramount. Current LLMs, while powerful, face significant challenges:
37
 
38
- * **Limited Procedural Understanding:** LLMs struggle with the temporal dependencies, conditional logic, and specific requirements embedded within protocols.
39
- * **Lack of Systematic Evaluation:** There has been a lack of large-scale, multi-task benchmarks specifically designed to diagnose LLMs' limitations on procedural biological texts.
40
- * **Bridging the Gap:** Developing AI systems capable of safely automating and even optimizing experiments requires models that can reliably interpret and generate protocols.
 
41
 
42
  BioProBench addresses these challenges by providing the necessary data and tasks for comprehensive evaluation and driving the development of more capable models.
43
 
@@ -45,6 +53,10 @@ BioProBench addresses these challenges by providing the necessary data and tasks
45
 
46
  ## πŸ“Š Dataset Structure
47
 
 
 
 
 
48
  BioProBench provides a layered data design to support various model development stages:
49
 
50
  * A raw corpus of **27K protocols** for pretraining or RAG applications.
@@ -54,8 +66,7 @@ BioProBench provides a layered data design to support various model development
54
  The dataset and code are publicly available:
55
 
56
  * **Code Repository:** [https://github.com/YuyangSunshine/bioprotocolbench](https://github.com/YuyangSunshine/bioprotocolbench/)
57
- * **Hugging Face Dataset:** [https://huggingface.co/datasets/GreatCaptainNemo/BioProBench](https://huggingface.co/BioProBench)
58
-
59
 
60
  ---
61
 
 
8
  ---
9
  # BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
10
 
11
+ [![ArXiv](https://img.shields.io/badge/ArXiv-paper-B31B1B.svg?logo=arXiv&logoColor=Red)](https://arxiv.org/pdf/2505.07889)
12
+ [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Dataset-FFD210.svg?logo=HuggingFace&logoColor=black)](https://huggingface.co/BioProBench)
13
  [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
14
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/yourusername/bioprotocolbench/pulls)
15
 
 
19
 
20
  Biological protocols are the fundamental bedrock of reproducible and safe life science research. While LLMs have shown remarkable capabilities on general tasks, their systematic evaluation on highly specialized, accuracy-critical, and inherently procedural texts like biological protocols remains limited. BioProBench fills this gap by providing a robust framework to evaluate LLMs on diverse aspects of protocol understanding and reasoning.
21
 
22
+ <div align="center">
23
+ <img src="https://github.com/YuyangSunshine/bioprotocolbench/blob/main/figures/overview.png?raw=true" alt="BioProBench Logo" width="1000"/>
24
+ </div>
25
 
26
+ BioProBench features:
27
  * πŸ“š **Large-scale Data:** Built upon **27K original biological protocols**, yielding nearly **556K high-quality structured instances**.
28
  * 🎯 **Comprehensive Tasks:** A suite of ** five core tasks** challenging LLMs on different facets of procedural understanding and generation:
29
+
30
  * Protocol Question Answering (PQA)
31
  * Step Ordering (ORD)
32
  * Error Correction (ERR)
 
37
 
38
  ---
39
 
40
+
41
  ## πŸš€ Motivation
42
 
43
  Biological protocols are the operational blueprint for experiments. As biological research increasingly leverages automation and AI, the ability of AI systems to understand and reason about these complex procedures is paramount. Current LLMs, while powerful, face significant challenges:
44
 
45
+ * Limited Procedural Understanding:
46
+ * LLMs struggle with the temporal dependencies, conditional logic, and specific requirements embedded within protocols.
47
+ * Lack of Systematic Evaluation: There has been a lack of large-scale, multi-task benchmarks specifically designed to diagnose LLMs' limitations on procedural biological texts.
48
+ * Bridging the Gap: Developing AI systems capable of safely automating and even optimizing experiments requires models that can reliably interpret and generate protocols.
49
 
50
  BioProBench addresses these challenges by providing the necessary data and tasks for comprehensive evaluation and driving the development of more capable models.
51
 
 
53
 
54
  ## πŸ“Š Dataset Structure
55
 
56
+ <div align="center">
57
+ <img src="https://github.com/YuyangSunshine/bioprotocolbench/blob/main/figures/samples.jpg?raw=true" alt="BioProBench Logo" width="1000"/>
58
+ </div>
59
+
60
  BioProBench provides a layered data design to support various model development stages:
61
 
62
  * A raw corpus of **27K protocols** for pretraining or RAG applications.
 
66
  The dataset and code are publicly available:
67
 
68
  * **Code Repository:** [https://github.com/YuyangSunshine/bioprotocolbench](https://github.com/YuyangSunshine/bioprotocolbench/)
69
+ * **Hugging Face Dataset:** [https://huggingface.co/BioProBench](https://huggingface.co/BioProBench)
 
70
 
71
  ---
72