Spaces:
Sleeping
Sleeping
Merge pull request #5 from harshalmore31/main
Browse filesfeat: implementation of MAI Diagnostic Orchestrator (MAI-DxO) with comprehensive documentation
- README.md +502 -41
- mai_dx/main.py +1256 -7
README.md
CHANGED
@@ -1,79 +1,540 @@
|
|
1 |
-
# Open-MAI-Dx-Orchestrator
|
2 |
|
3 |
-
An open
|
4 |
|
5 |
-
|
|
|
|
|
6 |
|
7 |
-
|
|
|
|
|
8 |
|
9 |
```bash
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
```
|
12 |
|
13 |
-
|
|
|
14 |
|
15 |
-
|
|
|
16 |
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
|
22 |
|
23 |
-
|
24 |
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
-
|
28 |
|
29 |
-
|
|
|
|
|
30 |
|
31 |
-
|
|
|
|
|
32 |
|
33 |
-
|
|
|
|
|
34 |
|
35 |
-
###
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
|
43 |
|
44 |
-
|
|
|
|
|
45 |
|
46 |
-
|
47 |
|
48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
-
|
|
|
|
|
|
|
51 |
|
52 |
-
|
|
|
|
|
|
|
|
|
53 |
|
54 |
-
|
|
|
|
|
|
|
55 |
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
###
|
|
|
|
|
|
|
|
|
59 |
|
60 |
-
|
61 |
|
62 |
-
|
|
|
|
|
|
|
|
|
63 |
|
64 |
```bibtex
|
65 |
@misc{nori2025sequentialdiagnosislanguagemodels,
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
}
|
74 |
```
|
75 |
|
|
|
76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
-
|
79 |
-
|
|
|
|
1 |
+
# Open-MAI-Dx-Orchestrator
|
2 |
|
3 |
+
> **An open-source implementation of the "Sequential Diagnosis with Language Models" paper by Microsoft Research, built with the Swarms AI framework.**
|
4 |
|
5 |
+
[](https://arxiv.org/abs/2506.22405)
|
6 |
+
[](LICENSE)
|
7 |
+
[](https://python.org)
|
8 |
|
9 |
+
MAI-DxO (MAI Diagnostic Orchestrator) is a sophisticated AI-powered diagnostic system that simulates a virtual panel of physician-agents to perform iterative medical diagnosis with cost-effectiveness optimization. This implementation faithfully reproduces the methodology described in the Microsoft Research paper while providing additional features and flexibility.
|
10 |
+
|
11 |
+
## 🚀 Quick Start
|
12 |
|
13 |
```bash
|
14 |
+
# Install the package
|
15 |
+
pip install mai-dx
|
16 |
+
|
17 |
+
# Or install from source
|
18 |
+
git clone https://github.com/The-Swarm-Corporation/Open-MAI-Dx-Orchestrator.git
|
19 |
+
cd Open-MAI-Dx-Orchestrator
|
20 |
+
pip install -e .
|
21 |
```
|
22 |
|
23 |
+
```python
|
24 |
+
from mai_dx import MaiDxOrchestrator
|
25 |
|
26 |
+
# Create orchestrator
|
27 |
+
orchestrator = MaiDxOrchestrator(model_name="gemini/gemini-2.5-flash")
|
28 |
|
29 |
+
# Run diagnosis
|
30 |
+
result = orchestrator.run(
|
31 |
+
initial_case_info="29-year-old woman with sore throat and peritonsillar swelling...",
|
32 |
+
full_case_details="Patient: 29-year-old female. History: Onset of sore throat...",
|
33 |
+
ground_truth_diagnosis="Embryonal rhabdomyosarcoma of the pharynx"
|
34 |
+
)
|
35 |
+
|
36 |
+
print(f"Diagnosis: {result.final_diagnosis}")
|
37 |
+
print(f"Accuracy: {result.accuracy_score}/5.0")
|
38 |
+
print(f"Cost: ${result.total_cost:,}")
|
39 |
+
```
|
40 |
+
|
41 |
+
## 📚 Table of Contents
|
42 |
+
|
43 |
+
- [Features](#-features)
|
44 |
+
- [Installation](#-installation)
|
45 |
+
- [Architecture](#-architecture)
|
46 |
+
- [Usage](#-usage)
|
47 |
+
- [MAI-DxO Variants](#-mai-dxo-variants)
|
48 |
+
- [Configuration](#-configuration)
|
49 |
+
- [Examples](#-examples)
|
50 |
+
- [API Reference](#-api-reference)
|
51 |
+
- [Contributing](#-contributing)
|
52 |
+
- [Citation](#-citation)
|
53 |
+
|
54 |
+
## ✨ Features
|
55 |
+
|
56 |
+
### 🏥 Virtual Physician Panel
|
57 |
+
- **8 Specialized AI Agents**: Each with distinct medical expertise and decision-making roles
|
58 |
+
- **Iterative Deliberation**: Sequential consultation and consensus-building process
|
59 |
+
- **Bayesian Reasoning**: Probability-based differential diagnosis updates
|
60 |
+
- **Cognitive Bias Detection**: Built-in challenger agent to prevent diagnostic errors
|
61 |
+
|
62 |
+
### 💰 Cost-Effectiveness Optimization
|
63 |
+
- **Comprehensive Cost Tracking**: Real-time budget monitoring with 25+ medical test costs
|
64 |
+
- **Resource Stewardship**: AI agent dedicated to cost-conscious care decisions
|
65 |
+
- **Budget Constraints**: Configurable spending limits with intelligent test prioritization
|
66 |
+
- **Value-Based Testing**: Information theory-driven test selection
|
67 |
+
|
68 |
+
### 🎯 Multiple Operational Modes
|
69 |
+
- **Instant**: Immediate diagnosis from initial presentation
|
70 |
+
- **Question-Only**: History-taking without diagnostic tests
|
71 |
+
- **Budgeted**: Cost-constrained diagnostic workup
|
72 |
+
- **No-Budget**: Full diagnostic capability
|
73 |
+
- **Ensemble**: Multiple independent panels with consensus aggregation
|
74 |
+
|
75 |
+
### 📊 Advanced Evaluation
|
76 |
+
- **Clinical Accuracy Scoring**: 5-point Likert scale with detailed rubric
|
77 |
+
- **Management Impact Assessment**: Evaluation based on treatment implications
|
78 |
+
- **Diagnostic Reasoning Tracking**: Complete conversation history and decision trails
|
79 |
+
- **Ensemble Methods**: Multi-run consensus for improved accuracy
|
80 |
+
|
81 |
+
### 🔧 Technical Excellence
|
82 |
+
- **Model Agnostic**: Support for GPT, Gemini, Claude, and other LLMs
|
83 |
+
- **Robust Error Handling**: Comprehensive exception management and fallback mechanisms
|
84 |
+
- **Beautiful Logging**: Structured logging with Loguru for debugging and monitoring
|
85 |
+
- **Type Safety**: Full Pydantic models and type hints throughout
|
86 |
+
|
87 |
+
## 🛠 Installation
|
88 |
+
|
89 |
+
### Prerequisites
|
90 |
+
- Python 3.8 or higher
|
91 |
+
- API keys for your chosen language model provider
|
92 |
+
|
93 |
+
### Standard Installation
|
94 |
+
```bash
|
95 |
+
pip install mai-dx
|
96 |
+
```
|
97 |
+
|
98 |
+
### Development Installation
|
99 |
+
```bash
|
100 |
+
git clone https://github.com/The-Swarm-Corporation/Open-MAI-Dx-Orchestrator.git
|
101 |
+
cd Open-MAI-Dx-Orchestrator
|
102 |
+
pip install -e .
|
103 |
+
```
|
104 |
+
|
105 |
+
### Dependencies
|
106 |
+
The package automatically installs:
|
107 |
+
- `swarms` - AI agent orchestration framework
|
108 |
+
- `loguru` - Advanced logging
|
109 |
+
- `pydantic` - Data validation and serialization
|
110 |
+
|
111 |
+
## 🏗 Architecture
|
112 |
+
|
113 |
+
### Virtual Panel Composition
|
114 |
+
|
115 |
+
The MAI-DxO system consists of 8 specialized AI agents that work together to provide comprehensive medical diagnosis:
|
116 |
+
|
117 |
+
#### Core Diagnostic Panel
|
118 |
+
|
119 |
+
**🧠 Dr. Hypothesis**
|
120 |
+
- Maintains probability-ranked differential diagnosis (top 3 conditions)
|
121 |
+
- Updates probabilities using Bayesian reasoning after each finding
|
122 |
+
- Tracks evidence supporting and contradicting each hypothesis
|
123 |
+
|
124 |
+
**🔬 Dr. Test-Chooser**
|
125 |
+
- Selects up to 3 diagnostic tests per round for maximum information value
|
126 |
+
- Optimizes for discriminatory power between competing hypotheses
|
127 |
+
- Balances diagnostic yield with patient burden
|
128 |
+
|
129 |
+
**🤔 Dr. Challenger**
|
130 |
+
- Acts as devil's advocate to prevent cognitive biases
|
131 |
+
- Identifies contradictory evidence and alternative explanations
|
132 |
+
- Proposes falsifying tests and guards against premature closure
|
133 |
+
|
134 |
+
**💰 Dr. Stewardship**
|
135 |
+
- Enforces cost-conscious, high-value care decisions
|
136 |
+
- Advocates for cheaper alternatives when diagnostically equivalent
|
137 |
+
- Evaluates test necessity and suggests cost-effective strategies
|
138 |
+
|
139 |
+
**✅ Dr. Checklist**
|
140 |
+
- Performs quality control on panel deliberations
|
141 |
+
- Validates test names and maintains logical consistency
|
142 |
+
- Flags errors and ensures proper diagnostic methodology
|
143 |
+
|
144 |
+
#### Coordination and Evaluation
|
145 |
+
|
146 |
+
**🤝 Consensus Coordinator**
|
147 |
+
- Synthesizes panel input into optimal next action
|
148 |
+
- Decides between asking questions, ordering tests, or diagnosing
|
149 |
+
- Balances accuracy, cost, efficiency, and thoroughness
|
150 |
+
|
151 |
+
**🔑 Gatekeeper**
|
152 |
+
- Serves as clinical information oracle with complete case access
|
153 |
+
- Provides objective findings and realistic synthetic results
|
154 |
+
- Maintains clinical realism while preventing information leakage
|
155 |
+
|
156 |
+
**⚖️ Judge**
|
157 |
+
- Evaluates final diagnoses against ground truth
|
158 |
+
- Uses rigorous 5-point clinical rubric
|
159 |
+
- Considers management implications and diagnostic completeness
|
160 |
+
|
161 |
+
### Decision Process Flow
|
162 |
+
|
163 |
+
```mermaid
|
164 |
+
graph TD
|
165 |
+
A[Initial Case Information] --> B[Panel Deliberation]
|
166 |
+
B --> C{Consensus Decision}
|
167 |
+
C -->|Ask| D[Question to Gatekeeper]
|
168 |
+
C -->|Test| E[Diagnostic Tests]
|
169 |
+
C -->|Diagnose| F[Final Diagnosis]
|
170 |
+
D --> G[Update Case Information]
|
171 |
+
E --> G
|
172 |
+
G --> H{Max Iterations or Budget?}
|
173 |
+
H -->|No| B
|
174 |
+
H -->|Yes| F
|
175 |
+
F --> I[Judge Evaluation]
|
176 |
+
I --> J[Diagnosis Result]
|
177 |
+
```
|
178 |
+
|
179 |
+
## 🎮 Usage
|
180 |
+
|
181 |
+
### Basic Usage
|
182 |
+
|
183 |
+
```python
|
184 |
+
from mai_dx import MaiDxOrchestrator
|
185 |
+
|
186 |
+
# Initialize orchestrator
|
187 |
+
orchestrator = MaiDxOrchestrator(
|
188 |
+
model_name="gemini/gemini-2.5-flash",
|
189 |
+
max_iterations=10,
|
190 |
+
initial_budget=10000
|
191 |
+
)
|
192 |
+
|
193 |
+
# Define case information
|
194 |
+
initial_info = "A 45-year-old male presents with chest pain..."
|
195 |
+
full_case = "Patient: 45-year-old male. History: Acute onset chest pain..."
|
196 |
+
ground_truth = "Myocardial infarction"
|
197 |
+
|
198 |
+
# Run diagnosis
|
199 |
+
result = orchestrator.run(initial_info, full_case, ground_truth)
|
200 |
+
|
201 |
+
# Access results
|
202 |
+
print(f"Diagnosis: {result.final_diagnosis}")
|
203 |
+
print(f"Accuracy Score: {result.accuracy_score}/5.0")
|
204 |
+
print(f"Total Cost: ${result.total_cost:,}")
|
205 |
+
print(f"Iterations: {result.iterations}")
|
206 |
+
```
|
207 |
+
|
208 |
+
### Advanced Configuration
|
209 |
+
|
210 |
+
```python
|
211 |
+
# Custom orchestrator with specific settings
|
212 |
+
orchestrator = MaiDxOrchestrator(
|
213 |
+
model_name="gpt-4",
|
214 |
+
max_iterations=15,
|
215 |
+
initial_budget=5000,
|
216 |
+
mode="budgeted",
|
217 |
+
physician_visit_cost=250,
|
218 |
+
enable_budget_tracking=True
|
219 |
+
)
|
220 |
+
|
221 |
+
# Enable debug logging
|
222 |
+
import os
|
223 |
+
os.environ["MAIDX_DEBUG"] = "1"
|
224 |
+
```
|
225 |
+
|
226 |
+
## 📋 MAI-DxO Variants
|
227 |
+
|
228 |
+
The system supports five distinct operational variants, each optimized for different clinical scenarios:
|
229 |
+
|
230 |
+
### 1. Instant Answer
|
231 |
+
```python
|
232 |
+
orchestrator = MaiDxOrchestrator.create_variant("instant")
|
233 |
+
result = orchestrator.run(initial_info, full_case, ground_truth)
|
234 |
+
```
|
235 |
+
- **Use Case**: Emergency triage, rapid screening
|
236 |
+
- **Behavior**: Immediate diagnosis from initial presentation only
|
237 |
+
- **Cost**: Single physician visit ($300)
|
238 |
+
|
239 |
+
### 2. Question-Only
|
240 |
+
```python
|
241 |
+
orchestrator = MaiDxOrchestrator.create_variant("question_only")
|
242 |
+
result = orchestrator.run(initial_info, full_case, ground_truth)
|
243 |
+
```
|
244 |
+
- **Use Case**: Telemedicine, history-taking focused consultations
|
245 |
+
- **Behavior**: Detailed questioning without diagnostic tests
|
246 |
+
- **Cost**: Physician visit only
|
247 |
+
|
248 |
+
### 3. Budgeted
|
249 |
+
```python
|
250 |
+
orchestrator = MaiDxOrchestrator.create_variant("budgeted", budget=3000)
|
251 |
+
result = orchestrator.run(initial_info, full_case, ground_truth)
|
252 |
+
```
|
253 |
+
- **Use Case**: Resource-constrained settings, cost-conscious care
|
254 |
+
- **Behavior**: Full panel with strict budget enforcement
|
255 |
+
- **Cost**: Limited by specified budget
|
256 |
+
|
257 |
+
### 4. No-Budget
|
258 |
+
```python
|
259 |
+
orchestrator = MaiDxOrchestrator.create_variant("no_budget")
|
260 |
+
result = orchestrator.run(initial_info, full_case, ground_truth)
|
261 |
+
```
|
262 |
+
- **Use Case**: Academic medical centers, complex cases
|
263 |
+
- **Behavior**: Full diagnostic capability without cost constraints
|
264 |
+
- **Cost**: Unlimited (tracks for analysis)
|
265 |
+
|
266 |
+
### 5. Ensemble
|
267 |
+
```python
|
268 |
+
orchestrator = MaiDxOrchestrator.create_variant("ensemble")
|
269 |
+
result = orchestrator.run_ensemble(initial_info, full_case, ground_truth, num_runs=3)
|
270 |
+
```
|
271 |
+
- **Use Case**: Critical diagnoses, second opinion simulation
|
272 |
+
- **Behavior**: Multiple independent panels with consensus aggregation
|
273 |
+
- **Cost**: Sum of all panel costs
|
274 |
+
|
275 |
+
## ⚙️ Configuration
|
276 |
+
|
277 |
+
### Model Configuration
|
278 |
+
|
279 |
+
```python
|
280 |
+
# Supported models
|
281 |
+
models = [
|
282 |
+
"gemini/gemini-2.5-flash",
|
283 |
+
"gpt-4o",
|
284 |
+
"gpt-4o-mini",
|
285 |
+
"claude-3-5-sonnet-20241022",
|
286 |
+
"meta-llama/llama-3.1-8b-instruct"
|
287 |
+
]
|
288 |
|
289 |
+
orchestrator = MaiDxOrchestrator(model_name="gpt-4o")
|
290 |
+
```
|
291 |
+
|
292 |
+
### Cost Database Customization
|
293 |
+
|
294 |
+
```python
|
295 |
+
# Access and modify cost database
|
296 |
+
orchestrator = MaiDxOrchestrator()
|
297 |
+
orchestrator.test_cost_db.update({
|
298 |
+
"custom_test": 450,
|
299 |
+
"specialized_imaging": 2000
|
300 |
+
})
|
301 |
+
```
|
302 |
+
|
303 |
+
### Logging Configuration
|
304 |
+
|
305 |
+
```python
|
306 |
+
# Enable detailed debug logging
|
307 |
+
import os
|
308 |
+
os.environ["MAIDX_DEBUG"] = "1"
|
309 |
+
|
310 |
+
# Custom log levels and formats available
|
311 |
+
```
|
312 |
+
|
313 |
+
## 📖 Examples
|
314 |
+
|
315 |
+
### Example 1: Comprehensive Diagnostic Workup
|
316 |
+
|
317 |
+
```python
|
318 |
+
from mai_dx import MaiDxOrchestrator
|
319 |
+
|
320 |
+
# Complex case requiring multiple tests
|
321 |
+
case_info = """
|
322 |
+
A 29-year-old woman was admitted to the hospital because of sore throat
|
323 |
+
and peritonsillar swelling and bleeding. Symptoms did not abate with
|
324 |
+
antimicrobial therapy.
|
325 |
+
"""
|
326 |
+
|
327 |
+
case_details = """
|
328 |
+
Patient: 29-year-old female.
|
329 |
+
History: Onset of sore throat 7 weeks prior to admission. Worsening
|
330 |
+
right-sided pain and swelling. No fevers, headaches, or GI symptoms.
|
331 |
+
Physical Exam: Right peritonsillar mass, displacing the uvula.
|
332 |
+
Initial Labs: FBC, clotting studies normal.
|
333 |
+
"""
|
334 |
+
|
335 |
+
ground_truth = "Embryonal rhabdomyosarcoma of the pharynx"
|
336 |
+
|
337 |
+
# Run with different variants
|
338 |
+
variants = ["question_only", "budgeted", "no_budget"]
|
339 |
+
results = {}
|
340 |
+
|
341 |
+
for variant in variants:
|
342 |
+
if variant == "budgeted":
|
343 |
+
orch = MaiDxOrchestrator.create_variant(variant, budget=3000)
|
344 |
+
else:
|
345 |
+
orch = MaiDxOrchestrator.create_variant(variant)
|
346 |
+
|
347 |
+
results[variant] = orch.run(case_info, case_details, ground_truth)
|
348 |
+
|
349 |
+
# Compare results
|
350 |
+
for variant, result in results.items():
|
351 |
+
print(f"{variant}: {result.final_diagnosis} (Score: {result.accuracy_score})")
|
352 |
+
```
|
353 |
+
|
354 |
+
### Example 2: Ensemble Diagnosis
|
355 |
+
|
356 |
+
```python
|
357 |
+
# High-stakes diagnosis with ensemble approach
|
358 |
+
ensemble_orchestrator = MaiDxOrchestrator.create_variant("ensemble")
|
359 |
+
|
360 |
+
ensemble_result = ensemble_orchestrator.run_ensemble(
|
361 |
+
initial_case_info=case_info,
|
362 |
+
full_case_details=case_details,
|
363 |
+
ground_truth_diagnosis=ground_truth,
|
364 |
+
num_runs=5 # 5 independent diagnostic panels
|
365 |
+
)
|
366 |
+
|
367 |
+
print(f"Ensemble Diagnosis: {ensemble_result.final_diagnosis}")
|
368 |
+
print(f"Confidence Score: {ensemble_result.accuracy_score}/5.0")
|
369 |
+
print(f"Total Cost: ${ensemble_result.total_cost:,}")
|
370 |
+
```
|
371 |
+
|
372 |
+
### Example 3: Custom Cost Analysis
|
373 |
+
|
374 |
+
```python
|
375 |
+
# Analyze cost-effectiveness across variants
|
376 |
+
import matplotlib.pyplot as plt
|
377 |
+
|
378 |
+
variants = ["instant", "question_only", "budgeted", "no_budget"]
|
379 |
+
costs = []
|
380 |
+
accuracies = []
|
381 |
+
|
382 |
+
for variant in variants:
|
383 |
+
orch = MaiDxOrchestrator.create_variant(variant)
|
384 |
+
result = orch.run(case_info, case_details, ground_truth)
|
385 |
+
costs.append(result.total_cost)
|
386 |
+
accuracies.append(result.accuracy_score)
|
387 |
+
|
388 |
+
# Plot cost vs accuracy
|
389 |
+
plt.scatter(costs, accuracies)
|
390 |
+
plt.xlabel('Total Cost ($)')
|
391 |
+
plt.ylabel('Accuracy Score')
|
392 |
+
plt.title('Cost vs Accuracy Trade-off')
|
393 |
+
for i, variant in enumerate(variants):
|
394 |
+
plt.annotate(variant, (costs[i], accuracies[i]))
|
395 |
+
plt.show()
|
396 |
+
```
|
397 |
|
398 |
+
## 🔍 API Reference
|
399 |
|
400 |
+
### MaiDxOrchestrator Class
|
401 |
|
402 |
+
#### Constructor
|
403 |
+
```python
|
404 |
+
MaiDxOrchestrator(
|
405 |
+
model_name: str = "gemini/gemini-2.5-flash",
|
406 |
+
max_iterations: int = 10,
|
407 |
+
initial_budget: int = 10000,
|
408 |
+
mode: str = "no_budget",
|
409 |
+
physician_visit_cost: int = 300,
|
410 |
+
enable_budget_tracking: bool = False
|
411 |
+
)
|
412 |
+
```
|
413 |
|
414 |
+
#### Methods
|
415 |
|
416 |
+
**`run(initial_case_info, full_case_details, ground_truth_diagnosis)`**
|
417 |
+
- Executes the sequential diagnostic process
|
418 |
+
- Returns: `DiagnosisResult` object
|
419 |
|
420 |
+
**`run_ensemble(initial_case_info, full_case_details, ground_truth_diagnosis, num_runs=3)`**
|
421 |
+
- Runs multiple independent sessions with consensus aggregation
|
422 |
+
- Returns: `DiagnosisResult` object
|
423 |
|
424 |
+
**`create_variant(variant, **kwargs)` (Class Method)**
|
425 |
+
- Factory method for creating specialized variants
|
426 |
+
- Variants: "instant", "question_only", "budgeted", "no_budget", "ensemble"
|
427 |
|
428 |
+
### DiagnosisResult Class
|
429 |
|
430 |
+
```python
|
431 |
+
@dataclass
|
432 |
+
class DiagnosisResult:
|
433 |
+
final_diagnosis: str
|
434 |
+
ground_truth: str
|
435 |
+
accuracy_score: float
|
436 |
+
accuracy_reasoning: str
|
437 |
+
total_cost: int
|
438 |
+
iterations: int
|
439 |
+
conversation_history: str
|
440 |
+
```
|
441 |
|
442 |
+
### Utility Functions
|
443 |
|
444 |
+
**`run_mai_dxo_demo(case_info=None, case_details=None, ground_truth=None)`**
|
445 |
+
- Convenience function for quick demonstrations
|
446 |
+
- Returns: Dictionary of results from multiple variants
|
447 |
|
448 |
+
## 🧪 Testing and Validation
|
449 |
|
450 |
+
### Running Tests
|
451 |
+
```bash
|
452 |
+
# Run the built-in demo
|
453 |
+
python -m mai_dx.main
|
454 |
+
|
455 |
+
# Run with custom cases
|
456 |
+
python -c "
|
457 |
+
from mai_dx import run_mai_dxo_demo
|
458 |
+
results = run_mai_dxo_demo()
|
459 |
+
print(results)
|
460 |
+
"
|
461 |
+
```
|
462 |
|
463 |
+
### Benchmarking
|
464 |
+
```python
|
465 |
+
import time
|
466 |
+
from mai_dx import MaiDxOrchestrator
|
467 |
|
468 |
+
# Performance benchmarking
|
469 |
+
start_time = time.time()
|
470 |
+
orchestrator = MaiDxOrchestrator()
|
471 |
+
result = orchestrator.run(case_info, case_details, ground_truth)
|
472 |
+
elapsed = time.time() - start_time
|
473 |
|
474 |
+
print(f"Diagnosis completed in {elapsed:.2f} seconds")
|
475 |
+
print(f"Accuracy: {result.accuracy_score}/5.0")
|
476 |
+
print(f"Cost efficiency: ${result.total_cost/result.accuracy_score:.0f} per accuracy point")
|
477 |
+
```
|
478 |
|
479 |
+
## 🤝 Contributing
|
480 |
+
|
481 |
+
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
|
482 |
+
|
483 |
+
### Development Setup
|
484 |
+
```bash
|
485 |
+
git clone https://github.com/your-org/Open-MAI-Dx-Orchestrator.git
|
486 |
+
cd Open-MAI-Dx-Orchestrator
|
487 |
+
pip install -e ".[dev]"
|
488 |
+
pre-commit install
|
489 |
+
```
|
490 |
|
491 |
+
### Code Style
|
492 |
+
- Follow PEP 8 guidelines
|
493 |
+
- Use type hints throughout
|
494 |
+
- Maintain comprehensive docstrings
|
495 |
+
- Add tests for new features
|
496 |
|
497 |
+
## 📄 License
|
498 |
|
499 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
500 |
+
|
501 |
+
## 📚 Citation
|
502 |
+
|
503 |
+
If you use this implementation in your research, please cite both the original paper and this implementation:
|
504 |
|
505 |
```bibtex
|
506 |
@misc{nori2025sequentialdiagnosislanguagemodels,
|
507 |
+
title={Sequential Diagnosis with Language Models},
|
508 |
+
author={Harsha Nori and Mayank Daswani and Christopher Kelly and Scott Lundberg and Marco Tulio Ribeiro and Marc Wilson and Xiaoxuan Liu and Viknesh Sounderajah and Jonathan Carlson and Matthew P Lungren and Bay Gross and Peter Hames and Mustafa Suleyman and Dominic King and Eric Horvitz},
|
509 |
+
year={2025},
|
510 |
+
eprint={2506.22405},
|
511 |
+
archivePrefix={arXiv},
|
512 |
+
primaryClass={cs.CL},
|
513 |
+
url={https://arxiv.org/abs/2506.22405},
|
514 |
+
}
|
515 |
+
|
516 |
+
@software{mai_dx_orchestrator,
|
517 |
+
title={Open-MAI-Dx-Orchestrator: An Open Source Implementation of Sequential Diagnosis with Language Models},
|
518 |
+
author={The-Swarm-Corporation},
|
519 |
+
year={2025},
|
520 |
+
url={https://github.com/The-Swarm-Corporation/Open-MAI-Dx-Orchestrator.git}
|
521 |
}
|
522 |
```
|
523 |
|
524 |
+
## 🔗 Related Work
|
525 |
|
526 |
+
- [Original Paper](https://arxiv.org/abs/2506.22405) - Sequential Diagnosis with Language Models
|
527 |
+
- [Swarms Framework](https://github.com/kyegomez/swarms) - Multi-agent AI orchestration
|
528 |
+
- [Microsoft Research](https://www.microsoft.com/en-us/research/) - Original research institution
|
529 |
+
|
530 |
+
## 📞 Support
|
531 |
+
|
532 |
+
- **Issues**: [GitHub Issues](https://github.com/The-Swarm-Corporation/Open-MAI-Dx-Orchestrator/issues)
|
533 |
+
- **Discussions**: [GitHub Discussions](https://github.com/The-Swarm-Corporation/Open-MAI-Dx-Orchestrator/discussions)
|
534 |
+
- **Documentation**: [Full Documentation](https://docs.swarms.world)
|
535 |
+
|
536 |
+
---
|
537 |
|
538 |
+
<p align="center">
|
539 |
+
<strong>Built with Swarms for advancing AI-powered medical diagnosis</strong>
|
540 |
+
</p>
|
mai_dx/main.py
CHANGED
@@ -1,12 +1,1261 @@
|
|
1 |
-
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
4 |
class MaiDxOrchestrator:
|
5 |
-
|
6 |
-
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
MAI Diagnostic Orchestrator (MAI-DxO)
|
3 |
|
4 |
+
This script provides a complete implementation of the "Sequential Diagnosis with Language Models"
|
5 |
+
paper, using the `swarms` framework. It simulates a virtual panel of physician-agents to perform
|
6 |
+
iterative medical diagnosis with cost-effectiveness optimization.
|
7 |
+
|
8 |
+
Based on the paper: "Sequential Diagnosis with Language Models"
|
9 |
+
(arXiv:2506.22405v1) by Nori et al.
|
10 |
+
|
11 |
+
Key Features:
|
12 |
+
- Virtual physician panel with specialized roles (Hypothesis, Test-Chooser, Challenger, Stewardship, Checklist)
|
13 |
+
- Multiple operational modes (instant, question_only, budgeted, no_budget, ensemble)
|
14 |
+
- Comprehensive cost tracking and budget management
|
15 |
+
- Clinical accuracy evaluation with 5-point Likert scale
|
16 |
+
- Gatekeeper system for realistic clinical information disclosure
|
17 |
+
- Ensemble methods for improved diagnostic accuracy
|
18 |
+
|
19 |
+
Example Usage:
|
20 |
+
# Standard MAI-DxO usage
|
21 |
+
orchestrator = MaiDxOrchestrator(model_name="gemini/gemini-2.5-flash")
|
22 |
+
result = orchestrator.run(initial_case_info, full_case_details, ground_truth)
|
23 |
+
|
24 |
+
# Budget-constrained variant
|
25 |
+
budgeted_orchestrator = MaiDxOrchestrator.create_variant("budgeted", budget=5000)
|
26 |
+
|
27 |
+
# Ensemble approach
|
28 |
+
ensemble_result = orchestrator.run_ensemble(initial_case_info, full_case_details, ground_truth)
|
29 |
+
"""
|
30 |
+
|
31 |
+
import json
|
32 |
+
import sys
|
33 |
+
import time
|
34 |
+
from dataclasses import dataclass
|
35 |
+
from enum import Enum
|
36 |
+
from typing import Any, Dict, List, Optional, Union, Literal
|
37 |
+
|
38 |
+
from loguru import logger
|
39 |
+
from pydantic import BaseModel, Field
|
40 |
+
from swarms import Agent, Conversation
|
41 |
+
|
42 |
+
# Configure Loguru with beautiful formatting and features
|
43 |
+
logger.remove() # Remove default handler
|
44 |
+
|
45 |
+
# Console handler with beautiful colors
|
46 |
+
logger.add(
|
47 |
+
sys.stdout,
|
48 |
+
level="INFO",
|
49 |
+
format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>",
|
50 |
+
colorize=True
|
51 |
+
)
|
52 |
+
|
53 |
+
# Enable debug mode if environment variable is set
|
54 |
+
import os
|
55 |
+
if os.getenv("MAIDX_DEBUG", "").lower() in ("1", "true", "yes"):
|
56 |
+
logger.add(
|
57 |
+
"logs/maidx_debug_{time:YYYY-MM-DD}.log",
|
58 |
+
level="DEBUG",
|
59 |
+
format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}",
|
60 |
+
rotation="1 day",
|
61 |
+
retention="3 days"
|
62 |
+
)
|
63 |
+
logger.info("🐛 Debug logging enabled - logs will be written to logs/ directory")
|
64 |
+
|
65 |
+
# File handler for persistent logging (optional - uncomment if needed)
|
66 |
+
# logger.add(
|
67 |
+
# "logs/mai_dxo_{time:YYYY-MM-DD}.log",
|
68 |
+
# rotation="1 day",
|
69 |
+
# retention="7 days",
|
70 |
+
# level="DEBUG",
|
71 |
+
# format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | {name}:{function}:{line} - {message}",
|
72 |
+
# compression="zip"
|
73 |
+
# )
|
74 |
+
|
75 |
+
# --- Data Structures and Enums ---
|
76 |
+
|
77 |
+
class AgentRole(Enum):
|
78 |
+
"""Enumeration of roles for the virtual physician panel."""
|
79 |
+
HYPOTHESIS = "Dr. Hypothesis"
|
80 |
+
TEST_CHOOSER = "Dr. Test-Chooser"
|
81 |
+
CHALLENGER = "Dr. Challenger"
|
82 |
+
STEWARDSHIP = "Dr. Stewardship"
|
83 |
+
CHECKLIST = "Dr. Checklist"
|
84 |
+
CONSENSUS = "Consensus Coordinator"
|
85 |
+
GATEKEEPER = "Gatekeeper"
|
86 |
+
JUDGE = "Judge"
|
87 |
+
|
88 |
+
@dataclass
|
89 |
+
class DiagnosisResult:
|
90 |
+
"""Stores the final result of a diagnostic session."""
|
91 |
+
final_diagnosis: str
|
92 |
+
ground_truth: str
|
93 |
+
accuracy_score: float
|
94 |
+
accuracy_reasoning: str
|
95 |
+
total_cost: int
|
96 |
+
iterations: int
|
97 |
+
conversation_history: str
|
98 |
+
|
99 |
+
class Action(BaseModel):
|
100 |
+
"""Pydantic model for a structured action decided by the consensus agent."""
|
101 |
+
action_type: Literal["ask", "test", "diagnose"] = Field(..., description="The type of action to perform.")
|
102 |
+
content: Union[str, List[str]] = Field(..., description="The content of the action (question, test name, or diagnosis).")
|
103 |
+
reasoning: str = Field(..., description="The reasoning behind choosing this action.")
|
104 |
+
|
105 |
+
# --- Main Orchestrator Class ---
|
106 |
|
107 |
class MaiDxOrchestrator:
|
108 |
+
"""
|
109 |
+
Implements the MAI Diagnostic Orchestrator (MAI-DxO) framework.
|
110 |
+
This class orchestrates a virtual panel of AI agents to perform sequential medical diagnosis,
|
111 |
+
evaluates the final diagnosis, and tracks costs.
|
112 |
+
"""
|
113 |
+
def __init__(
|
114 |
+
self,
|
115 |
+
model_name: str = "gemini/gemini-2.5-flash",
|
116 |
+
max_iterations: int = 10,
|
117 |
+
initial_budget: int = 10000,
|
118 |
+
mode: str = "no_budget", # "instant", "question_only", "budgeted", "no_budget", "ensemble"
|
119 |
+
physician_visit_cost: int = 300,
|
120 |
+
enable_budget_tracking: bool = False,
|
121 |
+
):
|
122 |
+
"""
|
123 |
+
Initializes the MAI-DxO system.
|
124 |
+
|
125 |
+
Args:
|
126 |
+
model_name (str): The language model to be used by all agents.
|
127 |
+
max_iterations (int): The maximum number of diagnostic loops.
|
128 |
+
initial_budget (int): The starting budget for diagnostic tests.
|
129 |
+
mode (str): The operational mode of MAI-DxO.
|
130 |
+
physician_visit_cost (int): Cost per physician visit.
|
131 |
+
enable_budget_tracking (bool): Whether to enable budget tracking.
|
132 |
+
"""
|
133 |
+
self.model_name = model_name
|
134 |
+
self.max_iterations = max_iterations
|
135 |
+
self.initial_budget = initial_budget
|
136 |
+
self.mode = mode
|
137 |
+
self.physician_visit_cost = physician_visit_cost
|
138 |
+
self.enable_budget_tracking = enable_budget_tracking
|
139 |
+
|
140 |
+
self.cumulative_cost = 0
|
141 |
+
self.differential_diagnosis = "Not yet formulated."
|
142 |
+
self.conversation = Conversation(
|
143 |
+
time_enabled=True,
|
144 |
+
autosave=False,
|
145 |
+
save_enabled=False
|
146 |
+
)
|
147 |
+
|
148 |
+
# Enhanced cost model based on the paper's methodology
|
149 |
+
self.test_cost_db = {
|
150 |
+
"default": 150,
|
151 |
+
"cbc": 50,
|
152 |
+
"complete blood count": 50,
|
153 |
+
"fbc": 50,
|
154 |
+
"chest x-ray": 200,
|
155 |
+
"chest xray": 200,
|
156 |
+
"mri": 1500,
|
157 |
+
"mri brain": 1800,
|
158 |
+
"mri neck": 1600,
|
159 |
+
"ct scan": 1200,
|
160 |
+
"ct chest": 1300,
|
161 |
+
"ct abdomen": 1400,
|
162 |
+
"biopsy": 800,
|
163 |
+
"core biopsy": 900,
|
164 |
+
"immunohistochemistry": 400,
|
165 |
+
"fish test": 500,
|
166 |
+
"fish": 500,
|
167 |
+
"ultrasound": 300,
|
168 |
+
"ecg": 100,
|
169 |
+
"ekg": 100,
|
170 |
+
"blood glucose": 30,
|
171 |
+
"liver function tests": 80,
|
172 |
+
"renal function": 70,
|
173 |
+
"toxic alcohol panel": 200,
|
174 |
+
"urinalysis": 40,
|
175 |
+
"culture": 150,
|
176 |
+
"pathology": 600,
|
177 |
+
}
|
178 |
+
|
179 |
+
self._init_agents()
|
180 |
+
logger.info(f"🏥 MAI Diagnostic Orchestrator initialized successfully in '{mode}' mode with budget ${initial_budget:,}")
|
181 |
+
|
182 |
+
def _init_agents(self):
|
183 |
+
"""Initializes all required agents with their specific roles and prompts."""
|
184 |
+
self.agents = {
|
185 |
+
role: Agent(
|
186 |
+
agent_name=role.value,
|
187 |
+
system_prompt=self._get_prompt_for_role(role),
|
188 |
+
model_name=self.model_name,
|
189 |
+
max_loops=1,
|
190 |
+
output_type="json" if role == AgentRole.CONSENSUS else "str",
|
191 |
+
print_on=True, # Enable printing for all agents to see outputs
|
192 |
+
) for role in AgentRole
|
193 |
+
}
|
194 |
+
logger.info(f"👥 {len(self.agents)} virtual physician agents initialized and ready for consultation")
|
195 |
+
|
196 |
+
def _get_prompt_for_role(self, role: AgentRole) -> str:
|
197 |
+
"""Returns the system prompt for a given agent role."""
|
198 |
+
prompts = {
|
199 |
+
AgentRole.HYPOTHESIS: """
|
200 |
+
You are Dr. Hypothesis, a specialist in maintaining differential diagnoses. Your role is critical to the diagnostic process.
|
201 |
+
|
202 |
+
CORE RESPONSIBILITIES:
|
203 |
+
- Maintain a probability-ranked differential diagnosis with the top 3 most likely conditions
|
204 |
+
- Update probabilities using Bayesian reasoning after each new finding
|
205 |
+
- Consider both common and rare diseases appropriate to the clinical context
|
206 |
+
- Explicitly track how new evidence changes your diagnostic thinking
|
207 |
+
|
208 |
+
APPROACH:
|
209 |
+
1. Start with the most likely diagnoses based on presenting symptoms
|
210 |
+
2. For each new piece of evidence, consider:
|
211 |
+
- How it supports or refutes each hypothesis
|
212 |
+
- Whether it suggests new diagnoses to consider
|
213 |
+
- How it changes the relative probabilities
|
214 |
+
3. Always explain your Bayesian reasoning clearly
|
215 |
+
|
216 |
+
OUTPUT FORMAT:
|
217 |
+
Provide your updated differential diagnosis with:
|
218 |
+
- Top 3 diagnoses with probability estimates (percentages)
|
219 |
+
- Brief rationale for each
|
220 |
+
- Key evidence supporting each hypothesis
|
221 |
+
- Evidence that contradicts or challenges each hypothesis
|
222 |
+
|
223 |
+
Remember: Your differential drives the entire diagnostic process. Be thorough, evidence-based, and adaptive.
|
224 |
+
""",
|
225 |
+
|
226 |
+
AgentRole.TEST_CHOOSER: """
|
227 |
+
You are Dr. Test-Chooser, a specialist in diagnostic test selection and information theory.
|
228 |
+
|
229 |
+
CORE RESPONSIBILITIES:
|
230 |
+
- Select up to 3 diagnostic tests per round that maximally discriminate between leading hypotheses
|
231 |
+
- Optimize for information value, not just clinical reasonableness
|
232 |
+
- Consider test characteristics: sensitivity, specificity, positive/negative predictive values
|
233 |
+
- Balance diagnostic yield with patient burden and resource utilization
|
234 |
+
|
235 |
+
SELECTION CRITERIA:
|
236 |
+
1. Information Value: How much will this test change diagnostic probabilities?
|
237 |
+
2. Discriminatory Power: How well does it distinguish between competing hypotheses?
|
238 |
+
3. Clinical Impact: Will the result meaningfully alter management?
|
239 |
+
4. Sequential Logic: What should we establish first before ordering more complex tests?
|
240 |
+
|
241 |
+
APPROACH:
|
242 |
+
- For each proposed test, explicitly state which hypotheses it will help confirm or exclude
|
243 |
+
- Consider both positive and negative results and their implications
|
244 |
+
- Think about test sequences (e.g., basic labs before advanced imaging)
|
245 |
+
- Avoid redundant tests that won't add new information
|
246 |
+
|
247 |
+
OUTPUT FORMAT:
|
248 |
+
For each recommended test:
|
249 |
+
- Test name (be specific)
|
250 |
+
- Primary hypotheses it will help evaluate
|
251 |
+
- Expected information gain
|
252 |
+
- How results will change management decisions
|
253 |
+
|
254 |
+
Focus on tests that will most efficiently narrow the differential diagnosis.
|
255 |
+
""",
|
256 |
+
|
257 |
+
AgentRole.CHALLENGER: """
|
258 |
+
You are Dr. Challenger, the critical thinking specialist and devil's advocate.
|
259 |
+
|
260 |
+
CORE RESPONSIBILITIES:
|
261 |
+
- Identify and challenge cognitive biases in the diagnostic process
|
262 |
+
- Highlight contradictory evidence that might be overlooked
|
263 |
+
- Propose alternative hypotheses and falsifying tests
|
264 |
+
- Guard against premature diagnostic closure
|
265 |
+
|
266 |
+
COGNITIVE BIASES TO WATCH FOR:
|
267 |
+
1. Anchoring: Over-reliance on initial impressions
|
268 |
+
2. Confirmation bias: Seeking only supporting evidence
|
269 |
+
3. Availability bias: Overestimating probability of recently seen conditions
|
270 |
+
4. Representativeness: Ignoring base rates and prevalence
|
271 |
+
5. Search satisficing: Stopping at "good enough" explanations
|
272 |
+
|
273 |
+
YOUR APPROACH:
|
274 |
+
- Ask "What else could this be?" and "What doesn't fit?"
|
275 |
+
- Challenge assumptions and look for alternative explanations
|
276 |
+
- Propose tests that could disprove the leading hypothesis
|
277 |
+
- Consider rare diseases when common ones don't fully explain the picture
|
278 |
+
- Advocate for considering multiple conditions simultaneously
|
279 |
+
|
280 |
+
OUTPUT FORMAT:
|
281 |
+
- Specific biases you've identified in the current reasoning
|
282 |
+
- Evidence that contradicts the leading hypotheses
|
283 |
+
- Alternative diagnoses to consider
|
284 |
+
- Tests that could falsify current assumptions
|
285 |
+
- Red flags or concerning patterns that need attention
|
286 |
+
|
287 |
+
Be constructively critical - your role is to strengthen diagnostic accuracy through rigorous challenge.
|
288 |
+
""",
|
289 |
+
|
290 |
+
AgentRole.STEWARDSHIP: """
|
291 |
+
You are Dr. Stewardship, the resource optimization and cost-effectiveness specialist.
|
292 |
+
|
293 |
+
CORE RESPONSIBILITIES:
|
294 |
+
- Enforce cost-conscious, high-value care
|
295 |
+
- Advocate for cheaper alternatives when diagnostically equivalent
|
296 |
+
- Challenge low-yield, expensive tests
|
297 |
+
- Balance diagnostic thoroughness with resource stewardship
|
298 |
+
|
299 |
+
COST-VALUE FRAMEWORK:
|
300 |
+
1. High-Value Tests: Low cost, high diagnostic yield, changes management
|
301 |
+
2. Moderate-Value Tests: Moderate cost, specific indication, incremental value
|
302 |
+
3. Low-Value Tests: High cost, low yield, minimal impact on decisions
|
303 |
+
4. No-Value Tests: Any cost, no diagnostic value, ordered out of habit
|
304 |
+
|
305 |
+
ALTERNATIVE STRATEGIES:
|
306 |
+
- Could patient history/physical exam provide this information?
|
307 |
+
- Is there a less expensive test with similar diagnostic value?
|
308 |
+
- Can we use a staged approach (cheap test first, expensive if needed)?
|
309 |
+
- Does the test result actually change management?
|
310 |
+
|
311 |
+
YOUR APPROACH:
|
312 |
+
- Review all proposed tests for necessity and value
|
313 |
+
- Suggest cost-effective alternatives
|
314 |
+
- Question tests that don't clearly advance diagnosis
|
315 |
+
- Advocate for asking questions before ordering expensive tests
|
316 |
+
- Consider the cumulative cost burden
|
317 |
+
|
318 |
+
OUTPUT FORMAT:
|
319 |
+
- Assessment of proposed tests (high/moderate/low/no value)
|
320 |
+
- Specific cost-effective alternatives
|
321 |
+
- Questions that might obviate need for testing
|
322 |
+
- Recommended modifications to testing strategy
|
323 |
+
- Cumulative cost considerations
|
324 |
+
|
325 |
+
Your goal: Maximum diagnostic accuracy at minimum necessary cost.
|
326 |
+
""",
|
327 |
+
|
328 |
+
AgentRole.CHECKLIST: """
|
329 |
+
You are Dr. Checklist, the quality assurance and consistency specialist.
|
330 |
+
|
331 |
+
CORE RESPONSIBILITIES:
|
332 |
+
- Perform silent quality control on all panel deliberations
|
333 |
+
- Ensure test names are valid and properly specified
|
334 |
+
- Check internal consistency of reasoning across panel members
|
335 |
+
- Flag logical errors or contradictions in the diagnostic approach
|
336 |
+
|
337 |
+
QUALITY CHECKS:
|
338 |
+
1. Test Validity: Are proposed tests real and properly named?
|
339 |
+
2. Logical Consistency: Do the recommendations align with the differential?
|
340 |
+
3. Evidence Integration: Are all findings being considered appropriately?
|
341 |
+
4. Process Adherence: Is the panel following proper diagnostic methodology?
|
342 |
+
5. Safety Checks: Are any critical possibilities being overlooked?
|
343 |
+
|
344 |
+
SPECIFIC VALIDATIONS:
|
345 |
+
- Test names match standard medical terminology
|
346 |
+
- Proposed tests are appropriate for the clinical scenario
|
347 |
+
- No contradictions between different panel members' reasoning
|
348 |
+
- All significant findings are being addressed
|
349 |
+
- No gaps in the diagnostic logic
|
350 |
+
|
351 |
+
OUTPUT FORMAT:
|
352 |
+
- Brief validation summary (✓ Clear / ⚠ Issues noted)
|
353 |
+
- Any test name corrections needed
|
354 |
+
- Logical inconsistencies identified
|
355 |
+
- Missing considerations or gaps
|
356 |
+
- Process improvement suggestions
|
357 |
+
|
358 |
+
Keep your feedback concise but comprehensive. Flag any issues that could compromise diagnostic quality.
|
359 |
+
""",
|
360 |
+
|
361 |
+
AgentRole.CONSENSUS: """
|
362 |
+
You are the Consensus Coordinator, responsible for synthesizing the virtual panel's expertise into a single, optimal decision.
|
363 |
+
|
364 |
+
CORE RESPONSIBILITIES:
|
365 |
+
- Integrate input from Dr. Hypothesis, Dr. Test-Chooser, Dr. Challenger, Dr. Stewardship, and Dr. Checklist
|
366 |
+
- Decide on the single best next action: 'ask', 'test', or 'diagnose'
|
367 |
+
- Balance competing priorities: accuracy, cost, efficiency, and thoroughness
|
368 |
+
- Ensure the chosen action advances the diagnostic process optimally
|
369 |
+
|
370 |
+
DECISION FRAMEWORK:
|
371 |
+
1. DIAGNOSE: Choose when diagnostic certainty is sufficiently high (>85%) for the leading hypothesis
|
372 |
+
2. TEST: Choose when tests will meaningfully discriminate between hypotheses
|
373 |
+
3. ASK: Choose when history/exam questions could provide high-value information
|
374 |
+
|
375 |
+
SYNTHESIS PROCESS:
|
376 |
+
- Weight Dr. Hypothesis's confidence level and differential
|
377 |
+
- Consider Dr. Test-Chooser's information value analysis
|
378 |
+
- Incorporate Dr. Challenger's alternative perspectives
|
379 |
+
- Respect Dr. Stewardship's cost-effectiveness concerns
|
380 |
+
- Address any quality issues raised by Dr. Checklist
|
381 |
+
|
382 |
+
OUTPUT REQUIREMENTS:
|
383 |
+
Provide a JSON object with this exact structure:
|
384 |
+
{
|
385 |
+
"action_type": "ask" | "test" | "diagnose",
|
386 |
+
"content": "specific question(s), test name(s), or final diagnosis",
|
387 |
+
"reasoning": "clear justification synthesizing panel input"
|
388 |
+
}
|
389 |
+
|
390 |
+
For action_type "ask": content should be specific patient history or physical exam questions
|
391 |
+
For action_type "test": content should be properly named diagnostic tests (up to 3)
|
392 |
+
For action_type "diagnose": content should be the complete, specific final diagnosis
|
393 |
+
|
394 |
+
Make the decision that best advances accurate, cost-effective diagnosis.
|
395 |
+
""",
|
396 |
+
|
397 |
+
AgentRole.GATEKEEPER: """
|
398 |
+
You are the Gatekeeper, the clinical information oracle with complete access to the patient case file.
|
399 |
+
|
400 |
+
CORE RESPONSIBILITIES:
|
401 |
+
- Provide objective, specific clinical findings when explicitly requested
|
402 |
+
- Serve as the authoritative source for all patient information
|
403 |
+
- Generate realistic synthetic findings for tests not in the original case
|
404 |
+
- Maintain clinical realism while preventing information leakage
|
405 |
+
|
406 |
+
RESPONSE PRINCIPLES:
|
407 |
+
1. OBJECTIVITY: Provide only factual findings, never interpretations or impressions
|
408 |
+
2. SPECIFICITY: Give precise, detailed results when tests are properly ordered
|
409 |
+
3. REALISM: Ensure all responses reflect realistic clinical scenarios
|
410 |
+
4. NO HINTS: Never provide diagnostic clues or suggestions
|
411 |
+
5. CONSISTENCY: Maintain coherence across all provided information
|
412 |
|
413 |
+
HANDLING REQUESTS:
|
414 |
+
- Patient History Questions: Provide relevant history from case file or realistic details
|
415 |
+
- Physical Exam: Give specific examination findings as would be documented
|
416 |
+
- Diagnostic Tests: Provide exact results as specified or realistic synthetic values
|
417 |
+
- Vague Requests: Politely ask for more specific queries
|
418 |
+
- Invalid Requests: Explain why the request cannot be fulfilled
|
419 |
+
|
420 |
+
SYNTHETIC FINDINGS GUIDELINES:
|
421 |
+
When generating findings not in the original case:
|
422 |
+
- Ensure consistency with established diagnosis and case details
|
423 |
+
- Use realistic reference ranges and values
|
424 |
+
- Maintain clinical plausibility
|
425 |
+
- Avoid pathognomonic findings unless specifically diagnostic
|
426 |
+
|
427 |
+
RESPONSE FORMAT:
|
428 |
+
- Direct, clinical language
|
429 |
+
- Specific measurements with reference ranges when applicable
|
430 |
+
- Clear organization of findings
|
431 |
+
- Professional medical terminology
|
432 |
+
|
433 |
+
Your role is crucial: provide complete, accurate clinical information while maintaining the challenge of the diagnostic process.
|
434 |
+
""",
|
435 |
+
|
436 |
+
AgentRole.JUDGE: """
|
437 |
+
You are the Judge, the diagnostic accuracy evaluation specialist.
|
438 |
+
|
439 |
+
CORE RESPONSIBILITIES:
|
440 |
+
- Evaluate candidate diagnoses against ground truth using a rigorous clinical rubric
|
441 |
+
- Provide fair, consistent scoring based on clinical management implications
|
442 |
+
- Consider diagnostic substance over terminology differences
|
443 |
+
- Account for acceptable medical synonyms and equivalent formulations
|
444 |
+
|
445 |
+
EVALUATION RUBRIC (5-point Likert scale):
|
446 |
+
|
447 |
+
SCORE 5 (Perfect/Clinically Superior):
|
448 |
+
- Clinically identical to reference diagnosis
|
449 |
+
- May be more specific than reference (adding relevant detail)
|
450 |
+
- No incorrect or unrelated additions
|
451 |
+
- Treatment approach would be identical
|
452 |
+
|
453 |
+
SCORE 4 (Mostly Correct - Minor Incompleteness):
|
454 |
+
- Core disease correctly identified
|
455 |
+
- Minor qualifier or component missing/mis-specified
|
456 |
+
- Overall management largely unchanged
|
457 |
+
- Clinically appropriate diagnosis
|
458 |
+
|
459 |
+
SCORE 3 (Partially Correct - Major Error):
|
460 |
+
- Correct general disease category
|
461 |
+
- Major error in etiology, anatomic site, or critical specificity
|
462 |
+
- Would significantly alter workup or prognosis
|
463 |
+
- Partially correct but clinically concerning gaps
|
464 |
+
|
465 |
+
SCORE 2 (Largely Incorrect):
|
466 |
+
- Shares only superficial features with correct diagnosis
|
467 |
+
- Wrong fundamental disease process
|
468 |
+
- Would misdirect clinical workup
|
469 |
+
- Partially contradicts case details
|
470 |
+
|
471 |
+
SCORE 1 (Completely Incorrect):
|
472 |
+
- No meaningful overlap with correct diagnosis
|
473 |
+
- Wrong organ system or disease category
|
474 |
+
- Would likely lead to harmful care
|
475 |
+
- Completely inconsistent with clinical presentation
|
476 |
+
|
477 |
+
EVALUATION PROCESS:
|
478 |
+
1. Compare core disease entity
|
479 |
+
2. Assess etiology/causative factors
|
480 |
+
3. Evaluate anatomic specificity
|
481 |
+
4. Consider diagnostic completeness
|
482 |
+
5. Judge clinical management implications
|
483 |
+
|
484 |
+
OUTPUT FORMAT:
|
485 |
+
- Score (1-5) with clear label
|
486 |
+
- Detailed justification referencing specific rubric criteria
|
487 |
+
- Explanation of how diagnosis would affect clinical management
|
488 |
+
- Note any acceptable medical synonyms or equivalent terminology
|
489 |
+
|
490 |
+
Maintain high standards while recognizing legitimate diagnostic variability in medical practice.
|
491 |
+
"""
|
492 |
+
}
|
493 |
+
return prompts[role]
|
494 |
+
|
495 |
+
def _parse_json_response(self, response: str) -> Dict[str, Any]:
|
496 |
+
"""Safely parses a JSON string, returning a dictionary."""
|
497 |
+
try:
|
498 |
+
# Extract the actual response content from the agent response
|
499 |
+
if isinstance(response, str):
|
500 |
+
# Handle markdown-formatted JSON
|
501 |
+
if "```json" in response:
|
502 |
+
# Extract JSON content between ```json and ```
|
503 |
+
start_marker = "```json"
|
504 |
+
end_marker = "```"
|
505 |
+
start_idx = response.find(start_marker)
|
506 |
+
if start_idx != -1:
|
507 |
+
start_idx += len(start_marker)
|
508 |
+
end_idx = response.find(end_marker, start_idx)
|
509 |
+
if end_idx != -1:
|
510 |
+
json_content = response[start_idx:end_idx].strip()
|
511 |
+
return json.loads(json_content)
|
512 |
+
|
513 |
+
# Try to find JSON-like content in the response
|
514 |
+
lines = response.split('\n')
|
515 |
+
json_lines = []
|
516 |
+
in_json = False
|
517 |
+
brace_count = 0
|
518 |
+
|
519 |
+
for line in lines:
|
520 |
+
stripped_line = line.strip()
|
521 |
+
if stripped_line.startswith('{') and not in_json:
|
522 |
+
in_json = True
|
523 |
+
json_lines = [line] # Start fresh
|
524 |
+
brace_count = line.count('{') - line.count('}')
|
525 |
+
elif in_json:
|
526 |
+
json_lines.append(line)
|
527 |
+
brace_count += line.count('{') - line.count('}')
|
528 |
+
if brace_count <= 0: # Balanced braces, end of JSON
|
529 |
+
break
|
530 |
+
|
531 |
+
if json_lines and in_json:
|
532 |
+
json_content = '\n'.join(json_lines)
|
533 |
+
return json.loads(json_content)
|
534 |
+
|
535 |
+
# Try to extract JSON from text that might contain other content
|
536 |
+
import re
|
537 |
+
# Look for JSON pattern in the text
|
538 |
+
json_pattern = r'\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}'
|
539 |
+
matches = re.findall(json_pattern, response, re.DOTALL)
|
540 |
+
|
541 |
+
for match in matches:
|
542 |
+
try:
|
543 |
+
return json.loads(match)
|
544 |
+
except json.JSONDecodeError:
|
545 |
+
continue
|
546 |
+
|
547 |
+
# Direct parsing attempt as fallback
|
548 |
+
return json.loads(response)
|
549 |
+
|
550 |
+
except (json.JSONDecodeError, IndexError, AttributeError) as e:
|
551 |
+
logger.error(f"Failed to parse JSON response. Error: {e}")
|
552 |
+
logger.debug(f"Response content: {response[:500]}...") # Log first 500 chars
|
553 |
+
# Fallback to a default action if parsing fails
|
554 |
+
return {
|
555 |
+
"action_type": "ask",
|
556 |
+
"content": "Could you please clarify the next best step? The previous analysis was inconclusive.",
|
557 |
+
"reasoning": "Fallback due to parsing error."
|
558 |
+
}
|
559 |
+
|
560 |
+
def _estimate_cost(self, tests: Union[List[str], str]) -> int:
|
561 |
+
"""Estimates the cost of diagnostic tests."""
|
562 |
+
if isinstance(tests, str):
|
563 |
+
tests = [tests]
|
564 |
+
|
565 |
+
cost = 0
|
566 |
+
for test in tests:
|
567 |
+
test_lower = test.lower().strip()
|
568 |
+
|
569 |
+
# Enhanced cost matching with multiple strategies
|
570 |
+
cost_found = False
|
571 |
+
|
572 |
+
# Strategy 1: Exact match
|
573 |
+
if test_lower in self.test_cost_db:
|
574 |
+
cost += self.test_cost_db[test_lower]
|
575 |
+
cost_found = True
|
576 |
+
continue
|
577 |
+
|
578 |
+
# Strategy 2: Partial match (find best matching key)
|
579 |
+
best_match = None
|
580 |
+
best_match_length = 0
|
581 |
+
for cost_key in self.test_cost_db:
|
582 |
+
if cost_key in test_lower or test_lower in cost_key:
|
583 |
+
if len(cost_key) > best_match_length:
|
584 |
+
best_match = cost_key
|
585 |
+
best_match_length = len(cost_key)
|
586 |
+
|
587 |
+
if best_match:
|
588 |
+
cost += self.test_cost_db[best_match]
|
589 |
+
cost_found = True
|
590 |
+
continue
|
591 |
+
|
592 |
+
# Strategy 3: Keyword-based matching
|
593 |
+
if any(keyword in test_lower for keyword in ['biopsy', 'tissue']):
|
594 |
+
cost += self.test_cost_db.get('biopsy', 800)
|
595 |
+
cost_found = True
|
596 |
+
elif any(keyword in test_lower for keyword in ['mri', 'magnetic']):
|
597 |
+
cost += self.test_cost_db.get('mri', 1500)
|
598 |
+
cost_found = True
|
599 |
+
elif any(keyword in test_lower for keyword in ['ct', 'computed tomography']):
|
600 |
+
cost += self.test_cost_db.get('ct scan', 1200)
|
601 |
+
cost_found = True
|
602 |
+
elif any(keyword in test_lower for keyword in ['xray', 'x-ray', 'radiograph']):
|
603 |
+
cost += self.test_cost_db.get('chest x-ray', 200)
|
604 |
+
cost_found = True
|
605 |
+
elif any(keyword in test_lower for keyword in ['blood', 'serum', 'plasma']):
|
606 |
+
cost += 100 # Basic blood test cost
|
607 |
+
cost_found = True
|
608 |
+
elif any(keyword in test_lower for keyword in ['culture', 'sensitivity']):
|
609 |
+
cost += self.test_cost_db.get('culture', 150)
|
610 |
+
cost_found = True
|
611 |
+
elif any(keyword in test_lower for keyword in ['immunohistochemistry', 'ihc']):
|
612 |
+
cost += self.test_cost_db.get('immunohistochemistry', 400)
|
613 |
+
cost_found = True
|
614 |
+
|
615 |
+
# Strategy 4: Default cost for unknown tests
|
616 |
+
if not cost_found:
|
617 |
+
cost += self.test_cost_db['default']
|
618 |
+
logger.debug(f"Using default cost for unknown test: {test}")
|
619 |
+
|
620 |
+
return cost
|
621 |
+
|
622 |
+
def _run_panel_deliberation(self) -> Action:
|
623 |
+
"""Orchestrates one round of debate among the virtual panel to decide the next action."""
|
624 |
+
logger.info("🩺 Virtual medical panel deliberation commenced - analyzing patient case")
|
625 |
+
logger.debug("Panel members: Dr. Hypothesis, Dr. Test-Chooser, Dr. Challenger, Dr. Stewardship, Dr. Checklist")
|
626 |
+
panel_conversation = Conversation(
|
627 |
+
time_enabled=True,
|
628 |
+
autosave=False,
|
629 |
+
save_enabled=False
|
630 |
+
)
|
631 |
+
|
632 |
+
# Prepare comprehensive panel context
|
633 |
+
remaining_budget = self.initial_budget - self.cumulative_cost
|
634 |
+
budget_status = "EXCEEDED" if remaining_budget < 0 else f"${remaining_budget:,}"
|
635 |
+
|
636 |
+
panel_context = f"""
|
637 |
+
DIAGNOSTIC CASE STATUS - ROUND {len(self.conversation.return_history_as_string().split('Gatekeeper:')) - 1}
|
638 |
+
|
639 |
+
=== CASE INFORMATION ===
|
640 |
+
{self.conversation.get_str()}
|
641 |
+
|
642 |
+
=== CURRENT STATE ===
|
643 |
+
Differential Diagnosis: {self.differential_diagnosis}
|
644 |
+
Cumulative Cost: ${self.cumulative_cost:,}
|
645 |
+
Remaining Budget: {budget_status}
|
646 |
+
Mode: {self.mode}
|
647 |
+
Max Iterations: {self.max_iterations}
|
648 |
+
|
649 |
+
=== PANEL TASK ===
|
650 |
+
Virtual medical panel, please deliberate systematically on the next best diagnostic action.
|
651 |
+
Each specialist should provide their expert analysis in sequence.
|
652 |
+
"""
|
653 |
+
panel_conversation.add("System", panel_context)
|
654 |
+
|
655 |
+
# Check mode-specific constraints
|
656 |
+
if self.mode == "instant":
|
657 |
+
# For instant mode, skip deliberation and go straight to diagnosis
|
658 |
+
action_dict = {
|
659 |
+
"action_type": "diagnose",
|
660 |
+
"content": self.differential_diagnosis.split('\n')[0] if '\n' in self.differential_diagnosis else self.differential_diagnosis,
|
661 |
+
"reasoning": "Instant diagnosis mode - providing immediate assessment based on initial presentation"
|
662 |
+
}
|
663 |
+
return Action(**action_dict)
|
664 |
+
|
665 |
+
if self.mode == "question_only":
|
666 |
+
# For question-only mode, prevent test ordering
|
667 |
+
panel_context += "\n\nIMPORTANT: This is QUESTION-ONLY mode. You may ONLY ask patient questions, not order diagnostic tests."
|
668 |
+
panel_conversation.add("System", panel_context)
|
669 |
+
|
670 |
+
# Sequential expert deliberation with enhanced methodology
|
671 |
+
try:
|
672 |
+
# Dr. Hypothesis - Differential diagnosis and probability assessment
|
673 |
+
logger.info("🧠 Dr. Hypothesis analyzing differential diagnosis...")
|
674 |
+
hypothesis = self.agents[AgentRole.HYPOTHESIS].run(panel_conversation.get_str())
|
675 |
+
self.differential_diagnosis = hypothesis # Update main state
|
676 |
+
panel_conversation.add(self.agents[AgentRole.HYPOTHESIS].agent_name, hypothesis)
|
677 |
+
|
678 |
+
# Dr. Test-Chooser - Information value optimization
|
679 |
+
logger.info("🔬 Dr. Test-Chooser selecting optimal tests...")
|
680 |
+
test_choices = self.agents[AgentRole.TEST_CHOOSER].run(panel_conversation.get_str())
|
681 |
+
panel_conversation.add(self.agents[AgentRole.TEST_CHOOSER].agent_name, test_choices)
|
682 |
+
|
683 |
+
# Dr. Challenger - Bias identification and alternative hypotheses
|
684 |
+
logger.info("🤔 Dr. Challenger challenging assumptions...")
|
685 |
+
challenges = self.agents[AgentRole.CHALLENGER].run(panel_conversation.get_str())
|
686 |
+
panel_conversation.add(self.agents[AgentRole.CHALLENGER].agent_name, challenges)
|
687 |
+
|
688 |
+
# Dr. Stewardship - Cost-effectiveness analysis
|
689 |
+
logger.info("💰 Dr. Stewardship evaluating cost-effectiveness...")
|
690 |
+
stewardship_context = panel_conversation.get_str()
|
691 |
+
if self.enable_budget_tracking:
|
692 |
+
stewardship_context += f"\n\nBUDGET TRACKING ENABLED - Current cost: ${self.cumulative_cost}, Remaining: ${remaining_budget}"
|
693 |
+
stewardship_rec = self.agents[AgentRole.STEWARDSHIP].run(stewardship_context)
|
694 |
+
panel_conversation.add(self.agents[AgentRole.STEWARDSHIP].agent_name, stewardship_rec)
|
695 |
+
|
696 |
+
# Dr. Checklist - Quality assurance
|
697 |
+
logger.info("✅ Dr. Checklist performing quality control...")
|
698 |
+
checklist_rep = self.agents[AgentRole.CHECKLIST].run(panel_conversation.get_str())
|
699 |
+
panel_conversation.add(self.agents[AgentRole.CHECKLIST].agent_name, checklist_rep)
|
700 |
+
|
701 |
+
# Consensus Coordinator - Final decision synthesis
|
702 |
+
logger.info("🤝 Consensus Coordinator synthesizing panel decision...")
|
703 |
+
consensus_context = panel_conversation.get_str()
|
704 |
+
|
705 |
+
# Add mode-specific constraints to consensus
|
706 |
+
if self.mode == "budgeted" and remaining_budget <= 0:
|
707 |
+
consensus_context += "\n\nBUDGET CONSTRAINT: Budget exceeded - must either ask questions or provide final diagnosis."
|
708 |
+
|
709 |
+
consensus_response = self.agents[AgentRole.CONSENSUS].run(consensus_context)
|
710 |
+
logger.debug(f"Raw consensus response: {consensus_response}")
|
711 |
+
|
712 |
+
# Extract the actual text content from agent response
|
713 |
+
if hasattr(consensus_response, 'content'):
|
714 |
+
response_text = consensus_response.content
|
715 |
+
elif isinstance(consensus_response, str):
|
716 |
+
response_text = consensus_response
|
717 |
+
else:
|
718 |
+
response_text = str(consensus_response)
|
719 |
+
|
720 |
+
action_dict = self._parse_json_response(response_text)
|
721 |
+
|
722 |
+
# Validate action based on mode constraints
|
723 |
+
action = Action(**action_dict)
|
724 |
+
if self.mode == "question_only" and action.action_type == "test":
|
725 |
+
logger.warning("Test ordering attempted in question-only mode, converting to ask action")
|
726 |
+
action.action_type = "ask"
|
727 |
+
action.content = "Can you provide more details about the patient's symptoms and history?"
|
728 |
+
action.reasoning = "Mode constraint: question-only mode active"
|
729 |
+
|
730 |
+
if self.mode == "budgeted" and action.action_type == "test" and remaining_budget <= 0:
|
731 |
+
logger.warning("Test ordering attempted with insufficient budget, converting to diagnose action")
|
732 |
+
action.action_type = "diagnose"
|
733 |
+
action.content = self.differential_diagnosis.split('\n')[0] if '\n' in self.differential_diagnosis else self.differential_diagnosis
|
734 |
+
action.reasoning = "Budget constraint: insufficient funds for additional testing"
|
735 |
+
|
736 |
+
return action
|
737 |
+
|
738 |
+
except Exception as e:
|
739 |
+
logger.error(f"Error during panel deliberation: {e}")
|
740 |
+
# Fallback action
|
741 |
+
return Action(
|
742 |
+
action_type="ask",
|
743 |
+
content="Could you please provide more information about the patient's current condition?",
|
744 |
+
reasoning=f"Fallback due to panel deliberation error: {str(e)}"
|
745 |
+
)
|
746 |
+
|
747 |
+
def _interact_with_gatekeeper(self, action: Action, full_case_details: str) -> str:
|
748 |
+
"""Sends the panel's action to the Gatekeeper and returns its response."""
|
749 |
+
gatekeeper = self.agents[AgentRole.GATEKEEPER]
|
750 |
+
|
751 |
+
if action.action_type == "ask":
|
752 |
+
request = f"Question: {action.content}"
|
753 |
+
elif action.action_type == "test":
|
754 |
+
request = f"Tests ordered: {', '.join(action.content)}"
|
755 |
+
else:
|
756 |
+
return "No interaction needed for 'diagnose' action."
|
757 |
+
|
758 |
+
# The Gatekeeper needs the full case to act as an oracle
|
759 |
+
prompt = f"""
|
760 |
+
Full Case Details (for your reference only):
|
761 |
+
---
|
762 |
+
{full_case_details}
|
763 |
+
---
|
764 |
+
|
765 |
+
Request from Diagnostic Agent:
|
766 |
+
{request}
|
767 |
+
"""
|
768 |
+
|
769 |
+
response = gatekeeper.run(prompt)
|
770 |
+
return response
|
771 |
+
|
772 |
+
def _judge_diagnosis(self, candidate_diagnosis: str, ground_truth: str) -> Dict[str, Any]:
|
773 |
+
"""Uses the Judge agent to evaluate the final diagnosis."""
|
774 |
+
judge = self.agents[AgentRole.JUDGE]
|
775 |
+
prompt = f"""
|
776 |
+
Please evaluate the following diagnosis.
|
777 |
+
Ground Truth: "{ground_truth}"
|
778 |
+
Candidate Diagnosis: "{candidate_diagnosis}"
|
779 |
+
"""
|
780 |
+
response = judge.run(prompt)
|
781 |
+
|
782 |
+
# Simple parsing for demonstration; a more robust solution would use structured output.
|
783 |
+
try:
|
784 |
+
score = float(response.split("Score:")[1].split("/")[0].strip())
|
785 |
+
reasoning = response.split("Justification:")[1].strip()
|
786 |
+
except (IndexError, ValueError):
|
787 |
+
score = 0.0
|
788 |
+
reasoning = "Could not parse judge's response."
|
789 |
+
|
790 |
+
return {"score": score, "reasoning": reasoning}
|
791 |
+
|
792 |
+
def run(self, initial_case_info: str, full_case_details: str, ground_truth_diagnosis: str) -> DiagnosisResult:
|
793 |
+
"""
|
794 |
+
Executes the full sequential diagnostic process.
|
795 |
+
|
796 |
+
Args:
|
797 |
+
initial_case_info (str): The initial abstract of the case.
|
798 |
+
full_case_details (str): The complete case file for the Gatekeeper.
|
799 |
+
ground_truth_diagnosis (str): The correct final diagnosis for evaluation.
|
800 |
+
|
801 |
+
Returns:
|
802 |
+
DiagnosisResult: An object containing the final diagnosis, evaluation, cost, and history.
|
803 |
+
"""
|
804 |
+
start_time = time.time()
|
805 |
+
self.conversation.add("Gatekeeper", f"Initial Case Information: {initial_case_info}")
|
806 |
+
|
807 |
+
# Add initial physician visit cost
|
808 |
+
self.cumulative_cost += self.physician_visit_cost
|
809 |
+
logger.info(f"Initial physician visit cost: ${self.physician_visit_cost}")
|
810 |
+
|
811 |
+
final_diagnosis = None
|
812 |
+
iteration_count = 0
|
813 |
+
|
814 |
+
for i in range(self.max_iterations):
|
815 |
+
iteration_count = i + 1
|
816 |
+
logger.info(f"--- Starting Diagnostic Loop {iteration_count}/{self.max_iterations} ---")
|
817 |
+
logger.info(f"Current cost: ${self.cumulative_cost:,} | Remaining budget: ${self.initial_budget - self.cumulative_cost:,}")
|
818 |
+
|
819 |
+
try:
|
820 |
+
# Panel deliberates to decide on the next action
|
821 |
+
action = self._run_panel_deliberation()
|
822 |
+
logger.info(f"⚕️ Panel decision: {action.action_type.upper()} -> {action.content}")
|
823 |
+
logger.info(f"💭 Medical reasoning: {action.reasoning}")
|
824 |
+
|
825 |
+
if action.action_type == "diagnose":
|
826 |
+
final_diagnosis = action.content
|
827 |
+
logger.info(f"Final diagnosis proposed: {final_diagnosis}")
|
828 |
+
break
|
829 |
+
|
830 |
+
# Handle mode-specific constraints
|
831 |
+
if self.mode == "question_only" and action.action_type == "test":
|
832 |
+
logger.warning("Test ordering blocked in question-only mode")
|
833 |
+
continue
|
834 |
+
|
835 |
+
if self.mode == "budgeted" and action.action_type == "test":
|
836 |
+
# Check if we can afford the tests
|
837 |
+
estimated_test_cost = self._estimate_cost(action.content)
|
838 |
+
if self.cumulative_cost + estimated_test_cost > self.initial_budget:
|
839 |
+
logger.warning(f"Test cost ${estimated_test_cost} would exceed budget. Skipping tests.")
|
840 |
+
continue
|
841 |
+
|
842 |
+
# Interact with the Gatekeeper
|
843 |
+
response = self._interact_with_gatekeeper(action, full_case_details)
|
844 |
+
self.conversation.add("Gatekeeper", response)
|
845 |
+
|
846 |
+
# Update costs based on action type
|
847 |
+
if action.action_type == "test":
|
848 |
+
test_cost = self._estimate_cost(action.content)
|
849 |
+
self.cumulative_cost += test_cost
|
850 |
+
logger.info(f"Tests ordered: {action.content}")
|
851 |
+
logger.info(f"Test cost: ${test_cost:,} | Cumulative cost: ${self.cumulative_cost:,}")
|
852 |
+
elif action.action_type == "ask":
|
853 |
+
# Questions are part of the same visit until tests are ordered
|
854 |
+
logger.info(f"Questions asked: {action.content}")
|
855 |
+
logger.info(f"No additional cost for questions in same visit")
|
856 |
+
|
857 |
+
# Check budget constraints for budgeted mode
|
858 |
+
if self.mode == "budgeted" and self.cumulative_cost >= self.initial_budget:
|
859 |
+
logger.warning("Budget limit reached. Forcing final diagnosis.")
|
860 |
+
# Use current differential diagnosis or make best guess
|
861 |
+
final_diagnosis = self.differential_diagnosis.split('\n')[0] if '\n' in self.differential_diagnosis else "Diagnosis not reached within budget constraints."
|
862 |
+
break
|
863 |
+
|
864 |
+
except Exception as e:
|
865 |
+
logger.error(f"Error in diagnostic loop {iteration_count}: {e}")
|
866 |
+
# Continue to next iteration or break if critical error
|
867 |
+
continue
|
868 |
+
|
869 |
+
else:
|
870 |
+
# Max iterations reached without diagnosis
|
871 |
+
final_diagnosis = self.differential_diagnosis.split('\n')[0] if '\n' in self.differential_diagnosis else "Diagnosis not reached within maximum iterations."
|
872 |
+
logger.warning(f"Max iterations ({self.max_iterations}) reached. Using best available diagnosis.")
|
873 |
+
|
874 |
+
# Ensure we have a final diagnosis
|
875 |
+
if not final_diagnosis or final_diagnosis.strip() == "":
|
876 |
+
final_diagnosis = "Unable to determine diagnosis within constraints."
|
877 |
+
|
878 |
+
# Calculate total time
|
879 |
+
total_time = time.time() - start_time
|
880 |
+
logger.info(f"Diagnostic session completed in {total_time:.2f} seconds")
|
881 |
+
|
882 |
+
# Judge the final diagnosis
|
883 |
+
logger.info("Evaluating final diagnosis...")
|
884 |
+
try:
|
885 |
+
judgement = self._judge_diagnosis(final_diagnosis, ground_truth_diagnosis)
|
886 |
+
except Exception as e:
|
887 |
+
logger.error(f"Error in diagnosis evaluation: {e}")
|
888 |
+
judgement = {"score": 0.0, "reasoning": f"Evaluation error: {str(e)}"}
|
889 |
+
|
890 |
+
# Create comprehensive result
|
891 |
+
result = DiagnosisResult(
|
892 |
+
final_diagnosis=final_diagnosis,
|
893 |
+
ground_truth=ground_truth_diagnosis,
|
894 |
+
accuracy_score=judgement["score"],
|
895 |
+
accuracy_reasoning=judgement["reasoning"],
|
896 |
+
total_cost=self.cumulative_cost,
|
897 |
+
iterations=iteration_count,
|
898 |
+
conversation_history=self.conversation.get_str()
|
899 |
+
)
|
900 |
+
|
901 |
+
logger.info(f"Diagnostic process completed:")
|
902 |
+
logger.info(f" Final diagnosis: {final_diagnosis}")
|
903 |
+
logger.info(f" Ground truth: {ground_truth_diagnosis}")
|
904 |
+
logger.info(f" Accuracy score: {judgement['score']}/5.0")
|
905 |
+
logger.info(f" Total cost: ${self.cumulative_cost:,}")
|
906 |
+
logger.info(f" Iterations: {iteration_count}")
|
907 |
+
|
908 |
+
return result
|
909 |
+
|
910 |
+
def run_ensemble(self, initial_case_info: str, full_case_details: str, ground_truth_diagnosis: str, num_runs: int = 3) -> DiagnosisResult:
|
911 |
+
"""
|
912 |
+
Runs multiple independent diagnostic sessions and aggregates the results.
|
913 |
+
|
914 |
+
Args:
|
915 |
+
initial_case_info (str): The initial abstract of the case.
|
916 |
+
full_case_details (str): The complete case file for the Gatekeeper.
|
917 |
+
ground_truth_diagnosis (str): The correct final diagnosis for evaluation.
|
918 |
+
num_runs (int): Number of independent runs to perform.
|
919 |
+
|
920 |
+
Returns:
|
921 |
+
DiagnosisResult: Aggregated result from ensemble runs.
|
922 |
+
"""
|
923 |
+
logger.info(f"Starting ensemble run with {num_runs} independent sessions")
|
924 |
+
|
925 |
+
ensemble_results = []
|
926 |
+
total_cost = 0
|
927 |
+
|
928 |
+
for run_id in range(num_runs):
|
929 |
+
logger.info(f"=== Ensemble Run {run_id + 1}/{num_runs} ===")
|
930 |
+
|
931 |
+
# Create a fresh orchestrator instance for each run
|
932 |
+
run_orchestrator = MaiDxOrchestrator(
|
933 |
+
model_name=self.model_name,
|
934 |
+
max_iterations=self.max_iterations,
|
935 |
+
initial_budget=self.initial_budget,
|
936 |
+
mode="no_budget", # Use no_budget for ensemble runs
|
937 |
+
physician_visit_cost=self.physician_visit_cost,
|
938 |
+
enable_budget_tracking=False
|
939 |
+
)
|
940 |
+
|
941 |
+
# Run the diagnostic session
|
942 |
+
result = run_orchestrator.run(initial_case_info, full_case_details, ground_truth_diagnosis)
|
943 |
+
ensemble_results.append(result)
|
944 |
+
total_cost += result.total_cost
|
945 |
+
|
946 |
+
logger.info(f"Run {run_id + 1} completed: {result.final_diagnosis} (Score: {result.accuracy_score})")
|
947 |
+
|
948 |
+
# Aggregate results using consensus
|
949 |
+
final_diagnosis = self._aggregate_ensemble_diagnoses([r.final_diagnosis for r in ensemble_results])
|
950 |
+
|
951 |
+
# Judge the aggregated diagnosis
|
952 |
+
judgement = self._judge_diagnosis(final_diagnosis, ground_truth_diagnosis)
|
953 |
+
|
954 |
+
# Calculate average metrics
|
955 |
+
avg_iterations = sum(r.iterations for r in ensemble_results) / len(ensemble_results)
|
956 |
+
|
957 |
+
# Combine conversation histories
|
958 |
+
combined_history = "\n\n=== ENSEMBLE RESULTS ===\n"
|
959 |
+
for i, result in enumerate(ensemble_results):
|
960 |
+
combined_history += f"\n--- Run {i+1} ---\n"
|
961 |
+
combined_history += f"Diagnosis: {result.final_diagnosis}\n"
|
962 |
+
combined_history += f"Score: {result.accuracy_score}\n"
|
963 |
+
combined_history += f"Cost: ${result.total_cost:,}\n"
|
964 |
+
combined_history += f"Iterations: {result.iterations}\n"
|
965 |
+
|
966 |
+
combined_history += f"\n--- Aggregated Result ---\n"
|
967 |
+
combined_history += f"Final Diagnosis: {final_diagnosis}\n"
|
968 |
+
combined_history += f"Reasoning: {judgement['reasoning']}\n"
|
969 |
+
|
970 |
+
ensemble_result = DiagnosisResult(
|
971 |
+
final_diagnosis=final_diagnosis,
|
972 |
+
ground_truth=ground_truth_diagnosis,
|
973 |
+
accuracy_score=judgement["score"],
|
974 |
+
accuracy_reasoning=judgement["reasoning"],
|
975 |
+
total_cost=total_cost, # Sum of all runs
|
976 |
+
iterations=int(avg_iterations),
|
977 |
+
conversation_history=combined_history
|
978 |
+
)
|
979 |
+
|
980 |
+
logger.info(f"Ensemble completed: {final_diagnosis} (Score: {judgement['score']})")
|
981 |
+
return ensemble_result
|
982 |
+
|
983 |
+
def _aggregate_ensemble_diagnoses(self, diagnoses: List[str]) -> str:
|
984 |
+
"""Aggregates multiple diagnoses from ensemble runs."""
|
985 |
+
# Simple majority voting or use the most confident diagnosis
|
986 |
+
if not diagnoses:
|
987 |
+
return "No diagnosis available"
|
988 |
+
|
989 |
+
# Remove any empty or invalid diagnoses
|
990 |
+
valid_diagnoses = [d for d in diagnoses if d and d.strip() and "not reached" not in d.lower()]
|
991 |
+
|
992 |
+
if not valid_diagnoses:
|
993 |
+
return diagnoses[0] if diagnoses else "No valid diagnosis"
|
994 |
+
|
995 |
+
# If all diagnoses are the same, return that
|
996 |
+
if len(set(valid_diagnoses)) == 1:
|
997 |
+
return valid_diagnoses[0]
|
998 |
+
|
999 |
+
# Use an aggregator agent to select the best diagnosis
|
1000 |
+
try:
|
1001 |
+
aggregator_prompt = f"""
|
1002 |
+
You are a medical consensus aggregator. Given multiple diagnostic assessments from independent medical panels,
|
1003 |
+
select the most accurate and complete diagnosis.
|
1004 |
+
|
1005 |
+
Diagnoses to consider:
|
1006 |
+
{chr(10).join(f"{i+1}. {d}" for i, d in enumerate(valid_diagnoses))}
|
1007 |
+
|
1008 |
+
Provide the single best diagnosis that represents the medical consensus.
|
1009 |
+
Consider clinical accuracy, specificity, and completeness.
|
1010 |
+
"""
|
1011 |
+
|
1012 |
+
aggregator = Agent(
|
1013 |
+
agent_name="Ensemble Aggregator",
|
1014 |
+
system_prompt=aggregator_prompt,
|
1015 |
+
model_name=self.model_name,
|
1016 |
+
max_loops=1,
|
1017 |
+
print_on=True # Enable printing for aggregator agent
|
1018 |
+
)
|
1019 |
+
|
1020 |
+
return aggregator.run(aggregator_prompt).strip()
|
1021 |
+
|
1022 |
+
except Exception as e:
|
1023 |
+
logger.error(f"Error in ensemble aggregation: {e}")
|
1024 |
+
# Fallback to most common diagnosis
|
1025 |
+
from collections import Counter
|
1026 |
+
return Counter(valid_diagnoses).most_common(1)[0][0]
|
1027 |
+
|
1028 |
+
@classmethod
|
1029 |
+
def create_variant(cls, variant: str, **kwargs) -> 'MaiDxOrchestrator':
|
1030 |
+
"""
|
1031 |
+
Factory method to create different MAI-DxO variants as described in the paper.
|
1032 |
+
|
1033 |
+
Args:
|
1034 |
+
variant (str): One of 'instant', 'question_only', 'budgeted', 'no_budget', 'ensemble'
|
1035 |
+
**kwargs: Additional parameters for the orchestrator
|
1036 |
+
|
1037 |
+
Returns:
|
1038 |
+
MaiDxOrchestrator: Configured orchestrator instance
|
1039 |
+
"""
|
1040 |
+
variant_configs = {
|
1041 |
+
"instant": {
|
1042 |
+
"mode": "instant",
|
1043 |
+
"max_iterations": 1,
|
1044 |
+
"enable_budget_tracking": False
|
1045 |
+
},
|
1046 |
+
"question_only": {
|
1047 |
+
"mode": "question_only",
|
1048 |
+
"max_iterations": 10,
|
1049 |
+
"enable_budget_tracking": False
|
1050 |
+
},
|
1051 |
+
"budgeted": {
|
1052 |
+
"mode": "budgeted",
|
1053 |
+
"max_iterations": 10,
|
1054 |
+
"enable_budget_tracking": True,
|
1055 |
+
"initial_budget": kwargs.get("budget", 5000)
|
1056 |
+
},
|
1057 |
+
"no_budget": {
|
1058 |
+
"mode": "no_budget",
|
1059 |
+
"max_iterations": 10,
|
1060 |
+
"enable_budget_tracking": False
|
1061 |
+
},
|
1062 |
+
"ensemble": {
|
1063 |
+
"mode": "no_budget",
|
1064 |
+
"max_iterations": 10,
|
1065 |
+
"enable_budget_tracking": False
|
1066 |
+
}
|
1067 |
+
}
|
1068 |
+
|
1069 |
+
if variant not in variant_configs:
|
1070 |
+
raise ValueError(f"Unknown variant: {variant}. Choose from: {list(variant_configs.keys())}")
|
1071 |
+
|
1072 |
+
config = variant_configs[variant]
|
1073 |
+
config.update(kwargs) # Allow overrides
|
1074 |
+
|
1075 |
+
return cls(**config)
|
1076 |
+
|
1077 |
+
|
1078 |
+
def run_mai_dxo_demo(case_info: str = None, case_details: str = None, ground_truth: str = None) -> Dict[str, DiagnosisResult]:
|
1079 |
+
"""
|
1080 |
+
Convenience function to run a quick demonstration of MAI-DxO variants.
|
1081 |
+
|
1082 |
+
Args:
|
1083 |
+
case_info (str): Initial case information. Uses default if None.
|
1084 |
+
case_details (str): Full case details. Uses default if None.
|
1085 |
+
ground_truth (str): Ground truth diagnosis. Uses default if None.
|
1086 |
+
|
1087 |
+
Returns:
|
1088 |
+
Dict[str, DiagnosisResult]: Results from different MAI-DxO variants
|
1089 |
+
"""
|
1090 |
+
# Use default case if not provided
|
1091 |
+
if not case_info:
|
1092 |
+
case_info = (
|
1093 |
+
"A 29-year-old woman was admitted to the hospital because of sore throat and peritonsillar swelling "
|
1094 |
+
"and bleeding. Symptoms did not abate with antimicrobial therapy."
|
1095 |
+
)
|
1096 |
+
|
1097 |
+
if not case_details:
|
1098 |
+
case_details = """
|
1099 |
+
Patient: 29-year-old female.
|
1100 |
+
History: Onset of sore throat 7 weeks prior to admission. Worsening right-sided pain and swelling.
|
1101 |
+
No fevers, headaches, or gastrointestinal symptoms. Past medical history is unremarkable.
|
1102 |
+
Physical Exam: Right peritonsillar mass, displacing the uvula. No other significant findings.
|
1103 |
+
Initial Labs: FBC, clotting studies normal.
|
1104 |
+
MRI Neck: Showed a large, enhancing mass in the right peritonsillar space.
|
1105 |
+
Biopsy (H&E): Infiltrative round-cell neoplasm with high nuclear-to-cytoplasmic ratio and frequent mitotic figures.
|
1106 |
+
Biopsy (Immunohistochemistry): Desmin and MyoD1 diffusely positive. Myogenin multifocally positive.
|
1107 |
+
Biopsy (FISH): No FOXO1 (13q14) rearrangements detected.
|
1108 |
+
Final Diagnosis from Pathology: Embryonal rhabdomyosarcoma of the pharynx.
|
1109 |
+
"""
|
1110 |
+
|
1111 |
+
if not ground_truth:
|
1112 |
+
ground_truth = "Embryonal rhabdomyosarcoma of the pharynx"
|
1113 |
+
|
1114 |
+
results = {}
|
1115 |
+
|
1116 |
+
# Test key variants
|
1117 |
+
variants = ["no_budget", "budgeted", "question_only"]
|
1118 |
+
|
1119 |
+
for variant in variants:
|
1120 |
+
try:
|
1121 |
+
logger.info(f"Running MAI-DxO variant: {variant}")
|
1122 |
+
|
1123 |
+
if variant == "budgeted":
|
1124 |
+
orchestrator = MaiDxOrchestrator.create_variant(variant, budget=3000, model_name="gemini/gemini-2.5-flash")
|
1125 |
+
else:
|
1126 |
+
orchestrator = MaiDxOrchestrator.create_variant(variant, model_name="gemini/gemini-2.5-flash")
|
1127 |
+
|
1128 |
+
result = orchestrator.run(case_info, case_details, ground_truth)
|
1129 |
+
results[variant] = result
|
1130 |
+
|
1131 |
+
except Exception as e:
|
1132 |
+
logger.error(f"Error running variant {variant}: {e}")
|
1133 |
+
results[variant] = None
|
1134 |
|
1135 |
+
return results
|
1136 |
+
|
1137 |
+
|
1138 |
+
if __name__ == "__main__":
|
1139 |
+
# Example case inspired by the paper's Figure 1
|
1140 |
+
initial_info = (
|
1141 |
+
"A 29-year-old woman was admitted to the hospital because of sore throat and peritonsillar swelling "
|
1142 |
+
"and bleeding. Symptoms did not abate with antimicrobial therapy."
|
1143 |
+
)
|
1144 |
+
|
1145 |
+
full_case = """
|
1146 |
+
Patient: 29-year-old female.
|
1147 |
+
History: Onset of sore throat 7 weeks prior to admission. Worsening right-sided pain and swelling.
|
1148 |
+
No fevers, headaches, or gastrointestinal symptoms. Past medical history is unremarkable. No history of smoking or significant alcohol use.
|
1149 |
+
Physical Exam: Right peritonsillar mass, displacing the uvula. No other significant findings.
|
1150 |
+
Initial Labs: FBC, clotting studies normal.
|
1151 |
+
MRI Neck: Showed a large, enhancing mass in the right peritonsillar space.
|
1152 |
+
Biopsy (H&E): Infiltrative round-cell neoplasm with high nuclear-to-cytoplasmic ratio and frequent mitotic figures.
|
1153 |
+
Biopsy (Immunohistochemistry for Carcinoma): CD31, D2-40, CD34, ERG, GLUT-1, pan-cytokeratin, CD45, CD20, CD3 all negative. Ki-67: 60% nuclear positivity.
|
1154 |
+
Biopsy (Immunohistochemistry for Rhabdomyosarcoma): Desmin and MyoD1 diffusely positive. Myogenin multifocally positive.
|
1155 |
+
Biopsy (FISH): No FOXO1 (13q14) rearrangements detected.
|
1156 |
+
Final Diagnosis from Pathology: Embryonal rhabdomyosarcoma of the pharynx.
|
1157 |
+
"""
|
1158 |
+
|
1159 |
+
ground_truth = "Embryonal rhabdomyosarcoma of the pharynx"
|
1160 |
+
|
1161 |
+
# --- Demonstrate Different MAI-DxO Variants ---
|
1162 |
+
try:
|
1163 |
+
print("\n" + "="*80)
|
1164 |
+
print(" MAI DIAGNOSTIC ORCHESTRATOR (MAI-DxO) - SEQUENTIAL DIAGNOSIS BENCHMARK")
|
1165 |
+
print(" Implementation based on the NEJM Research Paper")
|
1166 |
+
print("="*80)
|
1167 |
+
|
1168 |
+
# Test different variants as described in the paper
|
1169 |
+
variants_to_test = [
|
1170 |
+
("no_budget", "Standard MAI-DxO with no budget constraints"),
|
1171 |
+
("budgeted", "Budget-constrained MAI-DxO ($3000 limit)"),
|
1172 |
+
("question_only", "Question-only variant (no diagnostic tests)"),
|
1173 |
+
]
|
1174 |
+
|
1175 |
+
results = {}
|
1176 |
+
|
1177 |
+
for variant_name, description in variants_to_test:
|
1178 |
+
print(f"\n{'='*60}")
|
1179 |
+
print(f"Testing Variant: {variant_name.upper()}")
|
1180 |
+
print(f"Description: {description}")
|
1181 |
+
print('='*60)
|
1182 |
+
|
1183 |
+
# Create the variant
|
1184 |
+
if variant_name == "budgeted":
|
1185 |
+
orchestrator = MaiDxOrchestrator.create_variant(
|
1186 |
+
variant_name,
|
1187 |
+
budget=3000,
|
1188 |
+
model_name="gemini/gemini-2.5-flash",
|
1189 |
+
max_iterations=5
|
1190 |
+
)
|
1191 |
+
else:
|
1192 |
+
orchestrator = MaiDxOrchestrator.create_variant(
|
1193 |
+
variant_name,
|
1194 |
+
model_name="gemini/gemini-2.5-flash",
|
1195 |
+
max_iterations=5
|
1196 |
+
)
|
1197 |
+
|
1198 |
+
# Run the diagnostic process
|
1199 |
+
result = orchestrator.run(
|
1200 |
+
initial_case_info=initial_info,
|
1201 |
+
full_case_details=full_case,
|
1202 |
+
ground_truth_diagnosis=ground_truth
|
1203 |
+
)
|
1204 |
+
|
1205 |
+
results[variant_name] = result
|
1206 |
+
|
1207 |
+
# Display results
|
1208 |
+
print(f"\n🚀 Final Diagnosis: {result.final_diagnosis}")
|
1209 |
+
print(f"🎯 Ground Truth: {result.ground_truth}")
|
1210 |
+
print(f"⭐ Accuracy Score: {result.accuracy_score}/5.0")
|
1211 |
+
print(f" Reasoning: {result.accuracy_reasoning}")
|
1212 |
+
print(f"💰 Total Cost: ${result.total_cost:,}")
|
1213 |
+
print(f"🔄 Iterations: {result.iterations}")
|
1214 |
+
print(f"⏱️ Mode: {orchestrator.mode}")
|
1215 |
+
|
1216 |
+
# Demonstrate ensemble approach
|
1217 |
+
print(f"\n{'='*60}")
|
1218 |
+
print("Testing Variant: ENSEMBLE")
|
1219 |
+
print("Description: Multiple independent runs with consensus aggregation")
|
1220 |
+
print('='*60)
|
1221 |
+
|
1222 |
+
ensemble_orchestrator = MaiDxOrchestrator.create_variant(
|
1223 |
+
"ensemble",
|
1224 |
+
model_name="gemini/gemini-2.5-flash",
|
1225 |
+
max_iterations=3 # Shorter iterations for ensemble
|
1226 |
+
)
|
1227 |
+
|
1228 |
+
ensemble_result = ensemble_orchestrator.run_ensemble(
|
1229 |
+
initial_case_info=initial_info,
|
1230 |
+
full_case_details=full_case,
|
1231 |
+
ground_truth_diagnosis=ground_truth,
|
1232 |
+
num_runs=2 # Reduced for demo
|
1233 |
+
)
|
1234 |
+
|
1235 |
+
results["ensemble"] = ensemble_result
|
1236 |
+
|
1237 |
+
print(f"\n🚀 Ensemble Diagnosis: {ensemble_result.final_diagnosis}")
|
1238 |
+
print(f"🎯 Ground Truth: {ensemble_result.ground_truth}")
|
1239 |
+
print(f"⭐ Ensemble Score: {ensemble_result.accuracy_score}/5.0")
|
1240 |
+
print(f"💰 Total Ensemble Cost: ${ensemble_result.total_cost:,}")
|
1241 |
+
|
1242 |
+
# --- Summary Comparison ---
|
1243 |
+
print(f"\n{'='*80}")
|
1244 |
+
print(" RESULTS SUMMARY")
|
1245 |
+
print('='*80)
|
1246 |
+
print(f"{'Variant':<15} {'Diagnosis Match':<15} {'Score':<8} {'Cost':<12} {'Iterations':<12}")
|
1247 |
+
print('-'*80)
|
1248 |
+
|
1249 |
+
for variant_name, result in results.items():
|
1250 |
+
match_status = "✓ Match" if result.accuracy_score >= 4.0 else "✗ No Match"
|
1251 |
+
print(f"{variant_name:<15} {match_status:<15} {result.accuracy_score:<8.1f} ${result.total_cost:<11,} {result.iterations:<12}")
|
1252 |
+
|
1253 |
+
print(f"\n{'='*80}")
|
1254 |
+
print("Implementation successfully demonstrates the MAI-DxO framework")
|
1255 |
+
print("as described in 'Sequential Diagnosis with Language Models' paper")
|
1256 |
+
print('='*80)
|
1257 |
+
|
1258 |
+
except Exception as e:
|
1259 |
+
logger.exception(f"An error occurred during the diagnostic session: {e}")
|
1260 |
+
print(f"\n❌ Error occurred: {e}")
|
1261 |
+
print("Please check your model configuration and API keys.")
|