jiagengwu commited on
Commit
5bc6722
Β·
verified Β·
1 Parent(s): 1b13796

Update docs.md

Browse files
Files changed (1) hide show
  1. docs.md +77 -44
docs.md CHANGED
@@ -1,41 +1,71 @@
1
- <div style="display: flex; align-items: center; justify-content: space-between; width: 100%; height: 50px;">
2
- <img
3
- src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/1bNk6xHD90mlVaUOJ3kT6.png"
4
- alt="HMS"
5
- style="width: 20%; height: 100%; object-fit: contain;"
6
- />
7
- <img
8
- src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/ZVx7ahuV1mVuIeygYwirc.png"
9
- alt="MGB"
10
- style="width: 36%; height: 100%; object-fit: contain;"
11
- />
12
- <img
13
- src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/TkKKjmq98Wv_p5shxJTMY.png"
14
- alt="Broad"
15
- style="width: 19%; height: 100%; object-fit: contain;"
16
- />
17
- <img
18
- src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/UcM8kmTaVkAM1qf3v09K8.png"
19
- alt="YLab"
20
- style="width: 15%; height: 100%; object-fit: contain;"
21
- />
22
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  </div>
24
 
 
25
  <h2>πŸ“œ Background</h2>
26
  <p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
27
  This project is led and maintained by the team of <a href="https://ylab.top/">Prof. Jie Yang</a> and <a href="https://www.drugepi.org/team/joshua-kueiyu-lin">Prof. Kueiyu Joshua Lin</a> at Harvard Medical School and Brigham and Women's Hospital.
28
 
29
-
30
- <div style="display: flex; align-items: center; justify-content: center; width: 100%; height: auto;">
31
- <img
32
- src="https://cdn-uploads.huggingface.co/production/uploads/633c70c4ccce04161f841c30/OLN3J8_Yq8dx_LrgjYSsC.png"
33
- alt="dataset"
34
- style="max-width: 80%; max-height: 100%; object-fit: contain;"
35
- />
36
  </div>
37
 
38
-
39
  <h2>πŸ† BRIDGE Leaderboard</h2>
40
  <p>BRIDGE features three leaderboards, each evaluating LLM performance in clinical text tasks under a distinct inference strategy:</p>
41
  <ul>
@@ -45,15 +75,12 @@ This project is led and maintained by the team of <a href="https://ylab.top/">Pr
45
  </ul>
46
  <p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
47
 
48
- <div style="display: flex; align-items: center; justify-content: center; width: 100%; height: 450px;">
49
- <img
50
- src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/xpyabfXWqacZD-ThQ5guU.jpeg"
51
- alt="HMS"
52
- style="max-width: 90%; max-height: 100%; object-fit: contain;"
53
- />
54
  </div>
55
 
56
-
57
  <h2>🌍 Key Features</h2>
58
  <ul>
59
  <li><strong>Real-world Clinical Text</strong>: All tasks are sourced from real-world medical settings, such as electronic health records (EHRs), clinical case reports, or healthcare consultations</li>
@@ -71,6 +98,7 @@ This project is led and maintained by the team of <a href="https://ylab.top/">Pr
71
  </ul>
72
  More Details can be found in our <a href="https://arxiv.org/abs/2504.19467">BRIDGE paper</a> and <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a>.
73
 
 
74
  <h2>πŸ› οΈ How to Evaluate Your Model on BRIDGE ?</h2>
75
  <h4>πŸ“‚ Dataset Access</h4>
76
  <p>All fully open-access datasets in BRIDGE are available in <a href="https://huggingface.co/datasets/YLab-Open/BRIDGE-Open">BRIDGE-Open</a>. To ensure the fairness of this leaderboard, we publicly release the following data for each task:
@@ -88,33 +116,36 @@ Importantly, all 87 datasets have been verified to be either fully open-access o
88
  </ul>
89
  We will review and evaluate your submission and update the leaderboard accordingly.
90
 
 
91
  <h2>πŸ“’ Updates</h2>
92
  <ul>
93
  <li>πŸ—“οΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
94
  <li>πŸ—“οΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
95
  </ul>
96
 
 
97
  <h2>🀝 Contributing</h2>
98
  <p>We welcome and greatly value contributions and collaborations from the community!
99
  If you have clinical text datasets that you would like to share for broader exploration, please contact us!</p>
100
  <p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
101
 
 
102
  <h2>πŸš€ Donation</h2>
103
  <p>BRIDGE is a non-profit, researcher-led benchmark that requires substantial resources (e.g., high-performance GPUs, a dedicated team) to sustain. To support open and impactful academic research that advances clinical care, we welcome your contributions. Please contact Prof. Jie Yang at <a href="mailto:[email protected]">[email protected]</a> to discuss donation opportunities.</p>
104
 
 
105
  <h2>πŸ“¬ Contact Information</h2>
106
- <p>If you have any questions about BRIDGE or the leaderboard, feel free to reach out!</p>
107
  <ul>
108
  <li><strong>Leaderboard Managers</strong>: Jiageng Wu (<a href="mailto:[email protected]">[email protected]</a>), Kevin Xie (<a href="mailto:[email protected]">[email protected]</a>), Bowen Gu (<a href="mailto:[email protected]">[email protected]</a>)</li>
109
  <li><strong>Benchmark Managers</strong>: Jiageng Wu, Bowen Gu</li>
110
  <li><strong>Project Lead</strong>: Jie Yang (<a href="mailto:[email protected]">[email protected]</a>)</li>
111
  </ul>
112
- </div>
113
 
 
114
  <h2>πŸ“š Citation</h2>
115
  <p>If you find this leaderboard useful for your research and applications, please cite the following papers:</p>
116
- <pre style="white-space: pre-wrap; overflow-wrap: anywhere;">
117
- <code>@article{BRIDGE-benchmark,
118
  title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
119
  author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
120
  year={2025},
@@ -132,6 +163,8 @@ If you have clinical text datasets that you would like to share for broader expl
132
  pages={AIra2400012},
133
  year={2024},
134
  publisher={Massachusetts Medical Society}
135
- }
136
- </code></pre>
137
- <p>If you use the datasets in BRIDGE, please also cite the original paper of datasets, which can be found in our BRIDGE paper.</p>
 
 
 
1
+ <!-- ---------- Global Styles ---------- -->
2
+ <style>
3
+ /* 1. Center content and limit max width for readability */
4
+ .wrapper{
5
+ max-width:880px; /* change here if you prefer wider/narrower */
6
+ margin:0 auto;
7
+ padding:0 1rem;
8
+ }
9
+
10
+ /* 2. Logo bar (top row) */
11
+ .logo-bar{
12
+ display:flex;
13
+ align-items:center;
14
+ justify-content:space-between;
15
+ height:50px;
16
+ margin-bottom:25px;
17
+ }
18
+ .logo-bar img{
19
+ height:100%;
20
+ max-width:100%;
21
+ object-fit:contain;
22
+ }
23
+
24
+ /* 3. Generic paragraph spacing */
25
+ p{line-height:1.6;}
26
+
27
+ /* 4. Re-usable image section */
28
+ .section-img{
29
+ display:flex;
30
+ justify-content:center;
31
+ align-items:center;
32
+ margin:25px 0; /* vertical breathing room */
33
+ }
34
+ .section-img img{
35
+ max-width:90%;
36
+ height:auto;
37
+ object-fit:contain; /* avoid distortion */
38
+ }
39
+
40
+ /* 5. Make long BibTeX lines wrap instead of widening page */
41
+ pre code{
42
+ white-space:pre-wrap;
43
+ word-break:break-word;
44
+ }
45
+ </style>
46
+
47
+ <!-- ---------- Page Content ---------- -->
48
+ <div class="wrapper">
49
+
50
+ <!-- Top logos ------------------------------------------------------------>
51
+ <div class="logo-bar">
52
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/1bNk6xHD90mlVaUOJ3kT6.png" alt="HMS" />
53
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/ZVx7ahuV1mVuIeygYwirc.png" alt="MGB" />
54
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/TkKKjmq98Wv_p5shxJTMY.png" alt="Broad" />
55
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/UcM8kmTaVkAM1qf3v09K8.png" alt="YLab" />
56
  </div>
57
 
58
+ <!-- Background ----------------------------------------------------------->
59
  <h2>πŸ“œ Background</h2>
60
  <p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
61
  This project is led and maintained by the team of <a href="https://ylab.top/">Prof. Jie Yang</a> and <a href="https://www.drugepi.org/team/joshua-kueiyu-lin">Prof. Kueiyu Joshua Lin</a> at Harvard Medical School and Brigham and Women's Hospital.
62
 
63
+ <!-- Dataset illustration ------------------------------------------------->
64
+ <div class="section-img">
65
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/633c70c4ccce04161f841c30/OLN3J8_Yq8dx_LrgjYSsC.png" alt="dataset" />
 
 
 
 
66
  </div>
67
 
68
+ <!-- Leaderboard description --------------------------------------------->
69
  <h2>πŸ† BRIDGE Leaderboard</h2>
70
  <p>BRIDGE features three leaderboards, each evaluating LLM performance in clinical text tasks under a distinct inference strategy:</p>
71
  <ul>
 
75
  </ul>
76
  <p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
77
 
78
+ <!-- Leaderboard illustration -------------------------------------------->
79
+ <div class="section-img">
80
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/xpyabfXWqacZD-ThQ5guU.jpeg" alt="model" />
 
 
 
81
  </div>
82
 
83
+ <!-- Key Features --------------------------------------------------------->
84
  <h2>🌍 Key Features</h2>
85
  <ul>
86
  <li><strong>Real-world Clinical Text</strong>: All tasks are sourced from real-world medical settings, such as electronic health records (EHRs), clinical case reports, or healthcare consultations</li>
 
98
  </ul>
99
  More Details can be found in our <a href="https://arxiv.org/abs/2504.19467">BRIDGE paper</a> and <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a>.
100
 
101
+ <!-- Dataset access / submission ----------------------------------------->
102
  <h2>πŸ› οΈ How to Evaluate Your Model on BRIDGE ?</h2>
103
  <h4>πŸ“‚ Dataset Access</h4>
104
  <p>All fully open-access datasets in BRIDGE are available in <a href="https://huggingface.co/datasets/YLab-Open/BRIDGE-Open">BRIDGE-Open</a>. To ensure the fairness of this leaderboard, we publicly release the following data for each task:
 
116
  </ul>
117
  We will review and evaluate your submission and update the leaderboard accordingly.
118
 
119
+ <!-- Updates -------------------------------------------------------------->
120
  <h2>πŸ“’ Updates</h2>
121
  <ul>
122
  <li>πŸ—“οΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
123
  <li>πŸ—“οΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
124
  </ul>
125
 
126
+ <!-- Contributing --------------------------------------------------------->
127
  <h2>🀝 Contributing</h2>
128
  <p>We welcome and greatly value contributions and collaborations from the community!
129
  If you have clinical text datasets that you would like to share for broader exploration, please contact us!</p>
130
  <p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
131
 
132
+ <!-- Donation ------------------------------------------------------------->
133
  <h2>πŸš€ Donation</h2>
134
  <p>BRIDGE is a non-profit, researcher-led benchmark that requires substantial resources (e.g., high-performance GPUs, a dedicated team) to sustain. To support open and impactful academic research that advances clinical care, we welcome your contributions. Please contact Prof. Jie Yang at <a href="mailto:[email protected]">[email protected]</a> to discuss donation opportunities.</p>
135
 
136
+ <!-- Contact -------------------------------------------------------------->
137
  <h2>πŸ“¬ Contact Information</h2>
138
+ <p>If you have any questions about BRIDGE or the leaderboard, feel free to contact us!</p>
139
  <ul>
140
  <li><strong>Leaderboard Managers</strong>: Jiageng Wu (<a href="mailto:[email protected]">[email protected]</a>), Kevin Xie (<a href="mailto:[email protected]">[email protected]</a>), Bowen Gu (<a href="mailto:[email protected]">[email protected]</a>)</li>
141
  <li><strong>Benchmark Managers</strong>: Jiageng Wu, Bowen Gu</li>
142
  <li><strong>Project Lead</strong>: Jie Yang (<a href="mailto:[email protected]">[email protected]</a>)</li>
143
  </ul>
 
144
 
145
+ <!-- Citation ------------------------------------------------------------->
146
  <h2>πŸ“š Citation</h2>
147
  <p>If you find this leaderboard useful for your research and applications, please cite the following papers:</p>
148
+ <pre style="white-space: pre-wrap; overflow-wrap: anywhere;"><code>@article{BRIDGE-benchmark,
 
149
  title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
150
  author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
151
  year={2025},
 
163
  pages={AIra2400012},
164
  year={2024},
165
  publisher={Massachusetts Medical Society}
166
+ }</code></pre>
167
+ <p>If you use the datasets in BRIDGE, please also cite the original paper of datasets, which can be found in our BRIDGE paper.</p>
168
+
169
+ </div>
170
+ <!-- ---------- End of Page Content ---------- -->