Update docs.md
Browse files
docs.md
CHANGED
@@ -1,41 +1,71 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
</div>
|
24 |
|
|
|
25 |
<h2>π Background</h2>
|
26 |
<p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
|
27 |
This project is led and maintained by the team of <a href="https://ylab.top/">Prof. Jie Yang</a> and <a href="https://www.drugepi.org/team/joshua-kueiyu-lin">Prof. Kueiyu Joshua Lin</a> at Harvard Medical School and Brigham and Women's Hospital.
|
28 |
|
29 |
-
|
30 |
-
<div
|
31 |
-
|
32 |
-
src="https://cdn-uploads.huggingface.co/production/uploads/633c70c4ccce04161f841c30/OLN3J8_Yq8dx_LrgjYSsC.png"
|
33 |
-
alt="dataset"
|
34 |
-
style="max-width: 80%; max-height: 100%; object-fit: contain;"
|
35 |
-
/>
|
36 |
</div>
|
37 |
|
38 |
-
|
39 |
<h2>π BRIDGE Leaderboard</h2>
|
40 |
<p>BRIDGE features three leaderboards, each evaluating LLM performance in clinical text tasks under a distinct inference strategy:</p>
|
41 |
<ul>
|
@@ -45,15 +75,12 @@ This project is led and maintained by the team of <a href="https://ylab.top/">Pr
|
|
45 |
</ul>
|
46 |
<p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
|
47 |
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
alt="HMS"
|
52 |
-
style="max-width: 90%; max-height: 100%; object-fit: contain;"
|
53 |
-
/>
|
54 |
</div>
|
55 |
|
56 |
-
|
57 |
<h2>π Key Features</h2>
|
58 |
<ul>
|
59 |
<li><strong>Real-world Clinical Text</strong>: All tasks are sourced from real-world medical settings, such as electronic health records (EHRs), clinical case reports, or healthcare consultations</li>
|
@@ -71,6 +98,7 @@ This project is led and maintained by the team of <a href="https://ylab.top/">Pr
|
|
71 |
</ul>
|
72 |
More Details can be found in our <a href="https://arxiv.org/abs/2504.19467">BRIDGE paper</a> and <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a>.
|
73 |
|
|
|
74 |
<h2>π οΈ How to Evaluate Your Model on BRIDGE ?</h2>
|
75 |
<h4>π Dataset Access</h4>
|
76 |
<p>All fully open-access datasets in BRIDGE are available in <a href="https://huggingface.co/datasets/YLab-Open/BRIDGE-Open">BRIDGE-Open</a>. To ensure the fairness of this leaderboard, we publicly release the following data for each task:
|
@@ -88,33 +116,36 @@ Importantly, all 87 datasets have been verified to be either fully open-access o
|
|
88 |
</ul>
|
89 |
We will review and evaluate your submission and update the leaderboard accordingly.
|
90 |
|
|
|
91 |
<h2>π’ Updates</h2>
|
92 |
<ul>
|
93 |
<li>ποΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
|
94 |
<li>ποΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
|
95 |
</ul>
|
96 |
|
|
|
97 |
<h2>π€ Contributing</h2>
|
98 |
<p>We welcome and greatly value contributions and collaborations from the community!
|
99 |
If you have clinical text datasets that you would like to share for broader exploration, please contact us!</p>
|
100 |
<p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
|
101 |
|
|
|
102 |
<h2>π Donation</h2>
|
103 |
<p>BRIDGE is a non-profit, researcher-led benchmark that requires substantial resources (e.g., high-performance GPUs, a dedicated team) to sustain. To support open and impactful academic research that advances clinical care, we welcome your contributions. Please contact Prof. Jie Yang at <a href="mailto:[email protected]">[email protected]</a> to discuss donation opportunities.</p>
|
104 |
|
|
|
105 |
<h2>π¬ Contact Information</h2>
|
106 |
-
<p>If you have any questions about BRIDGE or the leaderboard, feel free to
|
107 |
<ul>
|
108 |
<li><strong>Leaderboard Managers</strong>: Jiageng Wu (<a href="mailto:[email protected]">[email protected]</a>), Kevin Xie (<a href="mailto:[email protected]">[email protected]</a>), Bowen Gu (<a href="mailto:[email protected]">[email protected]</a>)</li>
|
109 |
<li><strong>Benchmark Managers</strong>: Jiageng Wu, Bowen Gu</li>
|
110 |
<li><strong>Project Lead</strong>: Jie Yang (<a href="mailto:[email protected]">[email protected]</a>)</li>
|
111 |
</ul>
|
112 |
-
</div>
|
113 |
|
|
|
114 |
<h2>π Citation</h2>
|
115 |
<p>If you find this leaderboard useful for your research and applications, please cite the following papers:</p>
|
116 |
-
<pre style="white-space: pre-wrap; overflow-wrap: anywhere;"
|
117 |
-
<code>@article{BRIDGE-benchmark,
|
118 |
title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
|
119 |
author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
|
120 |
year={2025},
|
@@ -132,6 +163,8 @@ If you have clinical text datasets that you would like to share for broader expl
|
|
132 |
pages={AIra2400012},
|
133 |
year={2024},
|
134 |
publisher={Massachusetts Medical Society}
|
135 |
-
}
|
136 |
-
|
137 |
-
|
|
|
|
|
|
1 |
+
<!-- ---------- Global Styles ---------- -->
|
2 |
+
<style>
|
3 |
+
/* 1. Center content and limit max width for readability */
|
4 |
+
.wrapper{
|
5 |
+
max-width:880px; /* change here if you prefer wider/narrower */
|
6 |
+
margin:0 auto;
|
7 |
+
padding:0 1rem;
|
8 |
+
}
|
9 |
+
|
10 |
+
/* 2. Logo bar (top row) */
|
11 |
+
.logo-bar{
|
12 |
+
display:flex;
|
13 |
+
align-items:center;
|
14 |
+
justify-content:space-between;
|
15 |
+
height:50px;
|
16 |
+
margin-bottom:25px;
|
17 |
+
}
|
18 |
+
.logo-bar img{
|
19 |
+
height:100%;
|
20 |
+
max-width:100%;
|
21 |
+
object-fit:contain;
|
22 |
+
}
|
23 |
+
|
24 |
+
/* 3. Generic paragraph spacing */
|
25 |
+
p{line-height:1.6;}
|
26 |
+
|
27 |
+
/* 4. Re-usable image section */
|
28 |
+
.section-img{
|
29 |
+
display:flex;
|
30 |
+
justify-content:center;
|
31 |
+
align-items:center;
|
32 |
+
margin:25px 0; /* vertical breathing room */
|
33 |
+
}
|
34 |
+
.section-img img{
|
35 |
+
max-width:90%;
|
36 |
+
height:auto;
|
37 |
+
object-fit:contain; /* avoid distortion */
|
38 |
+
}
|
39 |
+
|
40 |
+
/* 5. Make long BibTeX lines wrap instead of widening page */
|
41 |
+
pre code{
|
42 |
+
white-space:pre-wrap;
|
43 |
+
word-break:break-word;
|
44 |
+
}
|
45 |
+
</style>
|
46 |
+
|
47 |
+
<!-- ---------- Page Content ---------- -->
|
48 |
+
<div class="wrapper">
|
49 |
+
|
50 |
+
<!-- Top logos ------------------------------------------------------------>
|
51 |
+
<div class="logo-bar">
|
52 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/1bNk6xHD90mlVaUOJ3kT6.png" alt="HMS" />
|
53 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/ZVx7ahuV1mVuIeygYwirc.png" alt="MGB" />
|
54 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/TkKKjmq98Wv_p5shxJTMY.png" alt="Broad" />
|
55 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/UcM8kmTaVkAM1qf3v09K8.png" alt="YLab" />
|
56 |
</div>
|
57 |
|
58 |
+
<!-- Background ----------------------------------------------------------->
|
59 |
<h2>π Background</h2>
|
60 |
<p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
|
61 |
This project is led and maintained by the team of <a href="https://ylab.top/">Prof. Jie Yang</a> and <a href="https://www.drugepi.org/team/joshua-kueiyu-lin">Prof. Kueiyu Joshua Lin</a> at Harvard Medical School and Brigham and Women's Hospital.
|
62 |
|
63 |
+
<!-- Dataset illustration ------------------------------------------------->
|
64 |
+
<div class="section-img">
|
65 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/633c70c4ccce04161f841c30/OLN3J8_Yq8dx_LrgjYSsC.png" alt="dataset" />
|
|
|
|
|
|
|
|
|
66 |
</div>
|
67 |
|
68 |
+
<!-- Leaderboard description --------------------------------------------->
|
69 |
<h2>π BRIDGE Leaderboard</h2>
|
70 |
<p>BRIDGE features three leaderboards, each evaluating LLM performance in clinical text tasks under a distinct inference strategy:</p>
|
71 |
<ul>
|
|
|
75 |
</ul>
|
76 |
<p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
|
77 |
|
78 |
+
<!-- Leaderboard illustration -------------------------------------------->
|
79 |
+
<div class="section-img">
|
80 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/xpyabfXWqacZD-ThQ5guU.jpeg" alt="model" />
|
|
|
|
|
|
|
81 |
</div>
|
82 |
|
83 |
+
<!-- Key Features --------------------------------------------------------->
|
84 |
<h2>π Key Features</h2>
|
85 |
<ul>
|
86 |
<li><strong>Real-world Clinical Text</strong>: All tasks are sourced from real-world medical settings, such as electronic health records (EHRs), clinical case reports, or healthcare consultations</li>
|
|
|
98 |
</ul>
|
99 |
More Details can be found in our <a href="https://arxiv.org/abs/2504.19467">BRIDGE paper</a> and <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a>.
|
100 |
|
101 |
+
<!-- Dataset access / submission ----------------------------------------->
|
102 |
<h2>π οΈ How to Evaluate Your Model on BRIDGE ?</h2>
|
103 |
<h4>π Dataset Access</h4>
|
104 |
<p>All fully open-access datasets in BRIDGE are available in <a href="https://huggingface.co/datasets/YLab-Open/BRIDGE-Open">BRIDGE-Open</a>. To ensure the fairness of this leaderboard, we publicly release the following data for each task:
|
|
|
116 |
</ul>
|
117 |
We will review and evaluate your submission and update the leaderboard accordingly.
|
118 |
|
119 |
+
<!-- Updates -------------------------------------------------------------->
|
120 |
<h2>π’ Updates</h2>
|
121 |
<ul>
|
122 |
<li>ποΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
|
123 |
<li>ποΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
|
124 |
</ul>
|
125 |
|
126 |
+
<!-- Contributing --------------------------------------------------------->
|
127 |
<h2>π€ Contributing</h2>
|
128 |
<p>We welcome and greatly value contributions and collaborations from the community!
|
129 |
If you have clinical text datasets that you would like to share for broader exploration, please contact us!</p>
|
130 |
<p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
|
131 |
|
132 |
+
<!-- Donation ------------------------------------------------------------->
|
133 |
<h2>π Donation</h2>
|
134 |
<p>BRIDGE is a non-profit, researcher-led benchmark that requires substantial resources (e.g., high-performance GPUs, a dedicated team) to sustain. To support open and impactful academic research that advances clinical care, we welcome your contributions. Please contact Prof. Jie Yang at <a href="mailto:[email protected]">[email protected]</a> to discuss donation opportunities.</p>
|
135 |
|
136 |
+
<!-- Contact -------------------------------------------------------------->
|
137 |
<h2>π¬ Contact Information</h2>
|
138 |
+
<p>If you have any questions about BRIDGE or the leaderboard, feel free to contact us!</p>
|
139 |
<ul>
|
140 |
<li><strong>Leaderboard Managers</strong>: Jiageng Wu (<a href="mailto:[email protected]">[email protected]</a>), Kevin Xie (<a href="mailto:[email protected]">[email protected]</a>), Bowen Gu (<a href="mailto:[email protected]">[email protected]</a>)</li>
|
141 |
<li><strong>Benchmark Managers</strong>: Jiageng Wu, Bowen Gu</li>
|
142 |
<li><strong>Project Lead</strong>: Jie Yang (<a href="mailto:[email protected]">[email protected]</a>)</li>
|
143 |
</ul>
|
|
|
144 |
|
145 |
+
<!-- Citation ------------------------------------------------------------->
|
146 |
<h2>π Citation</h2>
|
147 |
<p>If you find this leaderboard useful for your research and applications, please cite the following papers:</p>
|
148 |
+
<pre style="white-space: pre-wrap; overflow-wrap: anywhere;"><code>@article{BRIDGE-benchmark,
|
|
|
149 |
title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
|
150 |
author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
|
151 |
year={2025},
|
|
|
163 |
pages={AIra2400012},
|
164 |
year={2024},
|
165 |
publisher={Massachusetts Medical Society}
|
166 |
+
}</code></pre>
|
167 |
+
<p>If you use the datasets in BRIDGE, please also cite the original paper of datasets, which can be found in our BRIDGE paper.</p>
|
168 |
+
|
169 |
+
</div>
|
170 |
+
<!-- ---------- End of Page Content ---------- -->
|