File size: 4,144 Bytes
5bd3142
fca1e9d
5bd3142
 
 
 
30a1f94
5bd3142
 
58eebcc
5e39079
 
5698034
 
5bd3142
e3fbcd7
5e39079
e3fbcd7
5e39079
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3fbcd7
5e39079
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
title: Open GAMMA - 'GAMJA' V2
emoji: ๐Ÿ”ฅ
colorFrom: blue
colorTo: yellow
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
short_description: AI generates PPT with diagrams and images from given topics
models:
 - VIDraft/Gemma-3-R1984-27B
 - VIDraft/Gemma-3-R1984-12B
 - VIDraft/Gemma-3-R1984-4B 
---

FACTS Grounding Leaderboard - Medical AI Evaluation

๐Ÿฅ Overview
FACTS Grounding is an AI reliability evaluation system developed by Google DeepMind that verifies whether AI responses are grounded solely in provided documents. This evaluation is particularly crucial in healthcare, where inaccurate information can be life-threatening.
๐ŸŽฏ Key Features
Evaluation Methodology

Long Medical Document Input (~32,000 tokens โ‰ˆ 40 A4 pages)
AI Response Generation Based on Documents
Dual-Criteria Assessment

โœ… Quality Check: Does the AI accurately understand the question?
โœ… Grounding Check: Are all responses based on the provided documents?


Medical-Focused Version

236 medical cases selected from 860 total problems
Strict evaluation criteria reflecting healthcare field requirements

๐Ÿ† Current Leaderboard Rankings (June 5, 2025)
Overall Score TOP 5(https://huggingface.co/spaces/MaziyarPanahi/FACTS-Leaderboard)

1. deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
2. VIDraft/Gemma-3-R1984-27B
3. meta-llama/Llama-3.3-70B-Instruct
4. Qwen/Qwen3-30B-A3B
5. Qwen/Qwen3-4B


๐Ÿ’ก Why FACTS Matters for Medical AI
1. Patient Safety Assurance

General AI Error: "The capital of France is London" โ†’ Simple mistake
Medical AI Error: "This medication is safe for pregnant women" โ†’ Life-threatening!

2. Building Healthcare Provider Trust

Ensures all responses are grounded in medical literature
Transparent scoring enables AI system reliability verification

3. Regulatory Compliance & Standardization

Emerging as a global medical AI standard
Potential reference for FDA, CE, and other regulatory approvals

๐ŸŒ Advancing Global Medical AI
Core Values of FACTS

Accuracy: Provides only evidence-based medical responses
Transparency: All evaluation processes and data are public
Accessibility: Global participation in evaluation
Practicality: Reflects real-world healthcare scenarios

Medical-Specific Features

Understanding complex medical terminology and drug interactions
Recognition of diverse symptom descriptions
Verification of clinical guideline and protocol adherence
Consideration of medical ethics and patient privacy

๐Ÿš€ The Future of Medical AI
FACTS Grounding is establishing itself as the 'quality certification system' for medical AI.
Expected Impact

Global Standardization: Unified criteria for worldwide medical AI evaluation
Quality Improvement: Continuous model enhancement through benchmarking
Accelerated Clinical Adoption: Rapid deployment of validated AI systems
Enhanced Patient Care: More accurate and safer healthcare services

๐Ÿค How to Participate
The leaderboard operates with complete transparency, allowing anyone to submit and evaluate their models.

Download the FACTS Grounding dataset
Train and optimize your model
Submit evaluation results
Check rankings on the leaderboard

๐Ÿ“Š Utilizing Evaluation Results
Healthcare Institutions

Objective performance metrics for AI adoption decisions
Selection criteria among multiple AI systems

AI Developers

Model performance benchmarking and improvement direction
Marketing and reliability verification materials

Regulatory Bodies

Reference material for AI medical device approval processes
Supplementary indicator for safety assessments

๐Ÿ“Œ Key Takeaways

FACTS Grounding = Global standard for verifying AI medical accuracy
Medical-Specific Evaluation = Testing based on 236 real healthcare scenarios
Transparent Operation = Open system accessible to all participants
Real-World Impact = Promoting safer and more reliable medical AI development



"What cannot be measured cannot be improved" - FACTS Grounding objectively measures medical AI reliability, contributing to global healthcare AI advancement.

We look forward to more research teams and developers joining this challenge to create better AI for human health! ๐ŸŒ