File size: 10,115 Bytes
9e8cbc8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a63f99
 
 
 
 
 
 
 
 
 
 
9e8cbc8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# OnCall.ai - Medical Emergency Assistant

A RAG-based medical assistant system that provides evidence-based clinical guidance for emergency medical situations using real medical guidelines and advanced language models.

## ๐ŸŽฏ Project Overview

OnCall.ai helps healthcare professionals by:
- Processing medical queries through multi-level validation
- Retrieving relevant medical guidelines from curated datasets
- Generating evidence-based clinical advice using specialized medical LLMs
- Providing transparent, traceable medical guidance

## โœ… Current Implementation Status

### **๐ŸŽ‰ COMPLETED MODULES (2025-07-31)**

#### **1. Multi-Level Query Processing System**
- โœ… **UserPromptProcessor** (`src/user_prompt.py`)
  - Level 1: Predefined medical condition mapping (instant response)
  - Level 2: LLM-based condition extraction (Llama3-Med42-70B)
  - Level 3: Semantic search fallback
  - Level 4: Medical query validation (100% non-medical rejection)
  - Level 5: Generic medical search for rare conditions

#### **2. Dual-Index Retrieval System**
- โœ… **BasicRetrievalSystem** (`src/retrieval.py`)
  - Emergency medical guidelines index (emergency.ann)
  - Treatment protocols index (treatment.ann)
  - Vector-based similarity search using PubMedBERT embeddings
  - Intelligent deduplication and result ranking

#### **3. Medical Knowledge Base**
- โœ… **MedicalConditions** (`src/medical_conditions.py`)
  - Predefined condition-keyword mappings
  - Medical terminology validation
  - Extensible condition database

#### **4. LLM Integration**
- โœ… **Med42-70B Client** (`src/llm_clients.py`)
  - Specialized medical language model integration
  - Dual-layer rejection detection for non-medical queries
  - Robust error handling and timeout management

#### **5. Medical Advice Generation**
- โœ… **MedicalAdviceGenerator** (`src/generation.py`)
  - RAG-based prompt construction
  - Intention-aware chunk selection (treatment/diagnosis)
  - Confidence scoring and response formatting
  - Integration with Med42-70B for clinical advice generation

#### **6. Data Processing Pipeline**
- โœ… **Processed Medical Guidelines** (`src/data_processing.py`)
  - ~4000 medical guidelines from EPFL-LLM dataset
  - Emergency subset: ~2000-2500 records
  - Treatment subset: ~2000-2500 records
  - PubMedBERT embeddings (768 dimensions)
  - ANNOY vector indices for fast retrieval

## ๐Ÿ“Š **System Performance (Validated)**

### **Test Results Summary**
```
๐ŸŽฏ Multi-Level Fallback Validation: 69.2% success rate
   - Level 1 (Predefined): 100% success (instant response)
   - Level 4a (Non-medical rejection): 100% success
   - Level 4bโ†’5 (Rare medical): 100% success

๐Ÿ“ˆ End-to-End Pipeline: 100% technical completion
   - Condition extraction: 2.6s average
   - Medical guideline retrieval: 0.3s average
   - Total pipeline: 15.5s average (including generation)
```

### **Quality Metrics**
```
๐Ÿ” Retrieval Performance:
   - Guidelines retrieved: 8-9 per query
   - Relevance scores: 0.245-0.326 (good for medical domain)
   - Emergency/Treatment balance: Correctly maintained
   
๐Ÿง  Generation Quality:
   - Confidence scores: 0.90 for successful generations
   - Evidence-based responses with specific guideline references
   - Appropriate medical caution and clinical judgment emphasis
```

## ๐Ÿ› ๏ธ **Technical Architecture**

### **Data Flow**
```
User Query โ†’ Level 1: Predefined Mapping
     โ†“ (if fails)
Level 2: LLM Extraction
     โ†“ (if fails)
Level 3: Semantic Search
     โ†“ (if fails)
Level 4: Medical Validation
     โ†“ (if fails)
Level 5: Generic Search
     โ†“ (if fails)
No Match Found
```

### **Core Technologies**
- **Embeddings**: NeuML/pubmedbert-base-embeddings (768D)
- **Vector Search**: ANNOY indices with angular distance
- **LLM**: m42-health/Llama3-Med42-70B (medical specialist)
- **Dataset**: EPFL-LLM medical guidelines (~4000 documents)

### **Fallback Mechanism**
```
Level 1: Predefined Mapping (0.001s) โ†’ Success: Direct return
Level 2: LLM Extraction (8-15s) โ†’ Success: Condition mapping  
Level 3: Semantic Search (1-2s) โ†’ Success: Sliding window chunks
Level 4: Medical Validation (8-10s) โ†’ Fail: Return rejection
Level 5: Generic Search (1s) โ†’ Final: General medical guidance
```

## ๐Ÿš€ **NEXT PHASE: Interactive Interface**

### **๐ŸŽฏ Immediate Goals (Next 1-2 Days)**

#### **Phase 1: Gradio Interface Development**
- [ ] **Create `app.py`** - Interactive web interface
  - [ ] Complete pipeline integration
  - [ ] Multi-output display (advice + guidelines + technical details)
  - [ ] Environment-controlled debug mode
  - [ ] User-friendly error handling

#### **Phase 2: Local Validation Testing**
- [ ] **Manual testing** with 20-30 realistic medical queries
  - [ ] Emergency scenarios (cardiac arrest, stroke, MI)
  - [ ] Diagnostic queries (chest pain, respiratory distress)
  - [ ] Treatment protocols (medication management, procedures)
  - [ ] Edge cases (rare conditions, complex symptoms)

#### **Phase 3: HuggingFace Spaces Deployment**
- [ ] **Create requirements.txt** for deployment
- [ ] **Deploy to HF Spaces** for public testing
- [ ] **Production mode configuration** (limited technical details)
- [ ] **Performance monitoring** and user feedback collection

### **๐Ÿ”ฎ Future Enhancements (Next 1-2 Weeks)**

#### **Audio Input Integration**
- [ ] **Whisper ASR integration** for voice queries
- [ ] **Audio preprocessing** and quality validation
- [ ] **Multi-modal interface** (text + audio input)

#### **Evaluation & Metrics**
- [ ] **Faithfulness scoring** implementation
- [ ] **Automated evaluation pipeline** 
- [ ] **Clinical validation** with medical professionals
- [ ] **Performance benchmarking** against target metrics

#### **Dataset Expansion (Future)**
- [ ] **Dataset B integration** (symptom/diagnosis subsets)
- [ ] **Multi-dataset RAG** architecture
- [ ] **Enhanced medical knowledge** coverage

## ๐Ÿ“‹ **Target Performance Metrics**

### **Response Quality**
- [ ] Physician satisfaction: โ‰ฅ 4/5
- [ ] RAG content coverage: โ‰ฅ 80%
- [ ] Retrieval precision (P@5): โ‰ฅ 0.7
- [ ] Medical advice faithfulness: โ‰ฅ 0.8

### **System Performance**  
- [ ] Total response latency: โ‰ค 30 seconds
- [ ] Condition extraction: โ‰ค 5 seconds
- [ ] Guideline retrieval: โ‰ค 2 seconds
- [ ] Medical advice generation: โ‰ค 25 seconds

### **User Experience**
- [ ] Non-medical query rejection: 100%
- [ ] System availability: โ‰ฅ 99%
- [ ] Error handling: Graceful degradation
- [ ] Interface responsiveness: Immediate feedback

## ๐Ÿ—๏ธ **Project Structure**
```
OnCall.ai/
โ”œโ”€โ”€ src/                          # Core modules (โœ… Complete)
โ”‚   โ”œโ”€โ”€ user_prompt.py           # Multi-level query processing
โ”‚   โ”œโ”€โ”€ retrieval.py             # Dual-index vector search
โ”‚   โ”œโ”€โ”€ generation.py            # RAG-based advice generation
โ”‚   โ”œโ”€โ”€ llm_clients.py           # Med42-70B integration
โ”‚   โ”œโ”€โ”€ medical_conditions.py    # Medical knowledge configuration
โ”‚   โ””โ”€โ”€ data_processing.py       # Dataset preprocessing
โ”œโ”€โ”€ models/                       # Pre-processed data (โœ… Complete)
โ”‚   โ”œโ”€โ”€ embeddings/              # Vector embeddings and chunks
โ”‚   โ””โ”€โ”€ indices/                 # ANNOY vector indices
โ”œโ”€โ”€ tests/                        # Validation tests (โœ… Complete)
โ”‚   โ”œโ”€โ”€ test_multilevel_fallback_validation.py
โ”‚   โ”œโ”€โ”€ test_end_to_end_pipeline.py
โ”‚   โ””โ”€โ”€ test_userinput_userprompt_medical_*.py
โ”œโ”€โ”€ docs/                         # Documentation and planning
โ”‚   โ”œโ”€โ”€ next/                    # Current implementation docs
โ”‚   โ””โ”€โ”€ next_gradio_evaluation/  # Interface planning
โ”œโ”€โ”€ app.py                        # ๐ŸŽฏ NEXT: Gradio interface
โ”œโ”€โ”€ requirements.txt              # ๐ŸŽฏ NEXT: Deployment dependencies
โ””โ”€โ”€ README.md                     # This file
```

## ๐Ÿงช **Testing Validation**

### **Completed Tests**
- โœ… **Multi-level fallback validation**: 13 test cases, 69.2% success
- โœ… **End-to-end pipeline testing**: 6 scenarios, 100% technical completion
- โœ… **Component integration**: All modules working together
- โœ… **Error handling**: Graceful degradation and user-friendly messages

### **Key Findings**
- **Predefined mapping**: Instant response for known conditions
- **LLM extraction**: Reliable for complex symptom descriptions  
- **Non-medical rejection**: Perfect accuracy with updated prompt engineering
- **Retrieval quality**: High-relevance medical guidelines (0.2-0.4 relevance scores)
- **Generation capability**: Evidence-based advice with proper medical caution

## ๐Ÿค **Contributing & Development**

### **Environment Setup**
```bash
# Clone repository
git clone [repository-url]
cd OnCall.ai

# Setup virtual environment
python -m venv genAIvenv
source genAIvenv/bin/activate  # On Windows: genAIvenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run tests
python tests/test_end_to_end_pipeline.py

# Start Gradio interface (coming soon)
python app.py
```

### **API Configuration**
```bash
# Set up HuggingFace token for LLM access
export HF_TOKEN=your_huggingface_token

# Enable debug mode for development
export ONCALL_DEBUG=true
```

## โš ๏ธ **Important Notes**

### **Medical Disclaimer**
This system is designed for **research and educational purposes only**. It should not replace professional medical consultation, diagnosis, or treatment. Always consult qualified healthcare providers for medical decisions.

### **Current Limitations**
- **API Dependencies**: Requires HuggingFace API access for LLM functionality
- **Dataset Scope**: Currently focused on emergency and treatment guidelines
- **Language Support**: English medical terminology only
- **Validation Stage**: System under active development and testing

## ๐Ÿ“ž **Contact & Support**

**Development Team**: OnCall.ai Team  
**Last Updated**: 2025-07-31  
**Version**: 0.9.0 (Pre-release)  
**Status**: ๐Ÿšง Ready for Interactive Testing Phase

---

*Built with โค๏ธ for healthcare professionals*