openfree commited on
Commit
5e39079
ยท
1 Parent(s): 2549300

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -0
README.md CHANGED
@@ -8,4 +8,112 @@ sdk_version: 5.33.0
8
  app_file: app.py
9
  pinned: false
10
  short_description: Reasoning + Deep Research + API(NVIDIA H100 GPU)
 
 
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  app_file: app.py
9
  pinned: false
10
  short_description: Reasoning + Deep Research + API(NVIDIA H100 GPU)
11
+ models:
12
+ - VIDraft/Gemma-3-R1984-27B
13
  ---
14
+ FACTS Grounding Leaderboard - Medical AI Evaluation
15
+ ๐Ÿฅ Overview
16
+ FACTS Grounding is an AI reliability evaluation system developed by Google DeepMind that verifies whether AI responses are grounded solely in provided documents. This evaluation is particularly crucial in healthcare, where inaccurate information can be life-threatening.
17
+ ๐ŸŽฏ Key Features
18
+ Evaluation Methodology
19
+
20
+ Long Medical Document Input (~32,000 tokens โ‰ˆ 40 A4 pages)
21
+ AI Response Generation Based on Documents
22
+ Dual-Criteria Assessment
23
+
24
+ โœ… Quality Check: Does the AI accurately understand the question?
25
+ โœ… Grounding Check: Are all responses based on the provided documents?
26
+
27
+
28
+
29
+ Medical-Focused Version
30
+
31
+ 236 medical cases selected from 860 total problems
32
+ Strict evaluation criteria reflecting healthcare field requirements
33
+
34
+ ๐Ÿ† Current Leaderboard Rankings (June 5, 2025)
35
+ Overall Score TOP 5
36
+
37
+ 1. deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
38
+ 2. VIDraft/Gemma-3-R1984-27B
39
+ 3. meta-llama/Llama-3.3-70B-Instruct
40
+ 4. Qwen/Qwen3-30B-A3B
41
+ 5. Qwen/Qwen3-4B
42
+
43
+
44
+ ๐Ÿ’ก Why FACTS Matters for Medical AI
45
+ 1. Patient Safety Assurance
46
+
47
+ General AI Error: "The capital of France is London" โ†’ Simple mistake
48
+ Medical AI Error: "This medication is safe for pregnant women" โ†’ Life-threatening!
49
+
50
+ 2. Building Healthcare Provider Trust
51
+
52
+ Ensures all responses are grounded in medical literature
53
+ Transparent scoring enables AI system reliability verification
54
+
55
+ 3. Regulatory Compliance & Standardization
56
+
57
+ Emerging as a global medical AI standard
58
+ Potential reference for FDA, CE, and other regulatory approvals
59
+
60
+ ๐ŸŒ Advancing Global Medical AI
61
+ Core Values of FACTS
62
+
63
+ Accuracy: Provides only evidence-based medical responses
64
+ Transparency: All evaluation processes and data are public
65
+ Accessibility: Global participation in evaluation
66
+ Practicality: Reflects real-world healthcare scenarios
67
+
68
+ Medical-Specific Features
69
+
70
+ Understanding complex medical terminology and drug interactions
71
+ Recognition of diverse symptom descriptions
72
+ Verification of clinical guideline and protocol adherence
73
+ Consideration of medical ethics and patient privacy
74
+
75
+ ๐Ÿš€ The Future of Medical AI
76
+ FACTS Grounding is establishing itself as the 'quality certification system' for medical AI.
77
+ Expected Impact
78
+
79
+ Global Standardization: Unified criteria for worldwide medical AI evaluation
80
+ Quality Improvement: Continuous model enhancement through benchmarking
81
+ Accelerated Clinical Adoption: Rapid deployment of validated AI systems
82
+ Enhanced Patient Care: More accurate and safer healthcare services
83
+
84
+ ๐Ÿค How to Participate
85
+ The leaderboard operates with complete transparency, allowing anyone to submit and evaluate their models.
86
+
87
+ Download the FACTS Grounding dataset
88
+ Train and optimize your model
89
+ Submit evaluation results
90
+ Check rankings on the leaderboard
91
+
92
+ ๐Ÿ“Š Utilizing Evaluation Results
93
+ Healthcare Institutions
94
+
95
+ Objective performance metrics for AI adoption decisions
96
+ Selection criteria among multiple AI systems
97
+
98
+ AI Developers
99
+
100
+ Model performance benchmarking and improvement direction
101
+ Marketing and reliability verification materials
102
+
103
+ Regulatory Bodies
104
+
105
+ Reference material for AI medical device approval processes
106
+ Supplementary indicator for safety assessments
107
+
108
+ ๐Ÿ“Œ Key Takeaways
109
+
110
+ FACTS Grounding = Global standard for verifying AI medical accuracy
111
+ Medical-Specific Evaluation = Testing based on 236 real healthcare scenarios
112
+ Transparent Operation = Open system accessible to all participants
113
+ Real-World Impact = Promoting safer and more reliable medical AI development
114
+
115
+
116
+
117
+ "What cannot be measured cannot be improved" - FACTS Grounding objectively measures medical AI reliability, contributing to global healthcare AI advancement.
118
+
119
+ We look forward to more research teams and developers joining this challenge to create better AI for human health! ๐ŸŒ