Spaces:
Running
Running
File size: 10,115 Bytes
9e8cbc8 9a63f99 9e8cbc8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
# OnCall.ai - Medical Emergency Assistant
A RAG-based medical assistant system that provides evidence-based clinical guidance for emergency medical situations using real medical guidelines and advanced language models.
## ๐ฏ Project Overview
OnCall.ai helps healthcare professionals by:
- Processing medical queries through multi-level validation
- Retrieving relevant medical guidelines from curated datasets
- Generating evidence-based clinical advice using specialized medical LLMs
- Providing transparent, traceable medical guidance
## โ
Current Implementation Status
### **๐ COMPLETED MODULES (2025-07-31)**
#### **1. Multi-Level Query Processing System**
- โ
**UserPromptProcessor** (`src/user_prompt.py`)
- Level 1: Predefined medical condition mapping (instant response)
- Level 2: LLM-based condition extraction (Llama3-Med42-70B)
- Level 3: Semantic search fallback
- Level 4: Medical query validation (100% non-medical rejection)
- Level 5: Generic medical search for rare conditions
#### **2. Dual-Index Retrieval System**
- โ
**BasicRetrievalSystem** (`src/retrieval.py`)
- Emergency medical guidelines index (emergency.ann)
- Treatment protocols index (treatment.ann)
- Vector-based similarity search using PubMedBERT embeddings
- Intelligent deduplication and result ranking
#### **3. Medical Knowledge Base**
- โ
**MedicalConditions** (`src/medical_conditions.py`)
- Predefined condition-keyword mappings
- Medical terminology validation
- Extensible condition database
#### **4. LLM Integration**
- โ
**Med42-70B Client** (`src/llm_clients.py`)
- Specialized medical language model integration
- Dual-layer rejection detection for non-medical queries
- Robust error handling and timeout management
#### **5. Medical Advice Generation**
- โ
**MedicalAdviceGenerator** (`src/generation.py`)
- RAG-based prompt construction
- Intention-aware chunk selection (treatment/diagnosis)
- Confidence scoring and response formatting
- Integration with Med42-70B for clinical advice generation
#### **6. Data Processing Pipeline**
- โ
**Processed Medical Guidelines** (`src/data_processing.py`)
- ~4000 medical guidelines from EPFL-LLM dataset
- Emergency subset: ~2000-2500 records
- Treatment subset: ~2000-2500 records
- PubMedBERT embeddings (768 dimensions)
- ANNOY vector indices for fast retrieval
## ๐ **System Performance (Validated)**
### **Test Results Summary**
```
๐ฏ Multi-Level Fallback Validation: 69.2% success rate
- Level 1 (Predefined): 100% success (instant response)
- Level 4a (Non-medical rejection): 100% success
- Level 4bโ5 (Rare medical): 100% success
๐ End-to-End Pipeline: 100% technical completion
- Condition extraction: 2.6s average
- Medical guideline retrieval: 0.3s average
- Total pipeline: 15.5s average (including generation)
```
### **Quality Metrics**
```
๐ Retrieval Performance:
- Guidelines retrieved: 8-9 per query
- Relevance scores: 0.245-0.326 (good for medical domain)
- Emergency/Treatment balance: Correctly maintained
๐ง Generation Quality:
- Confidence scores: 0.90 for successful generations
- Evidence-based responses with specific guideline references
- Appropriate medical caution and clinical judgment emphasis
```
## ๐ ๏ธ **Technical Architecture**
### **Data Flow**
```
User Query โ Level 1: Predefined Mapping
โ (if fails)
Level 2: LLM Extraction
โ (if fails)
Level 3: Semantic Search
โ (if fails)
Level 4: Medical Validation
โ (if fails)
Level 5: Generic Search
โ (if fails)
No Match Found
```
### **Core Technologies**
- **Embeddings**: NeuML/pubmedbert-base-embeddings (768D)
- **Vector Search**: ANNOY indices with angular distance
- **LLM**: m42-health/Llama3-Med42-70B (medical specialist)
- **Dataset**: EPFL-LLM medical guidelines (~4000 documents)
### **Fallback Mechanism**
```
Level 1: Predefined Mapping (0.001s) โ Success: Direct return
Level 2: LLM Extraction (8-15s) โ Success: Condition mapping
Level 3: Semantic Search (1-2s) โ Success: Sliding window chunks
Level 4: Medical Validation (8-10s) โ Fail: Return rejection
Level 5: Generic Search (1s) โ Final: General medical guidance
```
## ๐ **NEXT PHASE: Interactive Interface**
### **๐ฏ Immediate Goals (Next 1-2 Days)**
#### **Phase 1: Gradio Interface Development**
- [ ] **Create `app.py`** - Interactive web interface
- [ ] Complete pipeline integration
- [ ] Multi-output display (advice + guidelines + technical details)
- [ ] Environment-controlled debug mode
- [ ] User-friendly error handling
#### **Phase 2: Local Validation Testing**
- [ ] **Manual testing** with 20-30 realistic medical queries
- [ ] Emergency scenarios (cardiac arrest, stroke, MI)
- [ ] Diagnostic queries (chest pain, respiratory distress)
- [ ] Treatment protocols (medication management, procedures)
- [ ] Edge cases (rare conditions, complex symptoms)
#### **Phase 3: HuggingFace Spaces Deployment**
- [ ] **Create requirements.txt** for deployment
- [ ] **Deploy to HF Spaces** for public testing
- [ ] **Production mode configuration** (limited technical details)
- [ ] **Performance monitoring** and user feedback collection
### **๐ฎ Future Enhancements (Next 1-2 Weeks)**
#### **Audio Input Integration**
- [ ] **Whisper ASR integration** for voice queries
- [ ] **Audio preprocessing** and quality validation
- [ ] **Multi-modal interface** (text + audio input)
#### **Evaluation & Metrics**
- [ ] **Faithfulness scoring** implementation
- [ ] **Automated evaluation pipeline**
- [ ] **Clinical validation** with medical professionals
- [ ] **Performance benchmarking** against target metrics
#### **Dataset Expansion (Future)**
- [ ] **Dataset B integration** (symptom/diagnosis subsets)
- [ ] **Multi-dataset RAG** architecture
- [ ] **Enhanced medical knowledge** coverage
## ๐ **Target Performance Metrics**
### **Response Quality**
- [ ] Physician satisfaction: โฅ 4/5
- [ ] RAG content coverage: โฅ 80%
- [ ] Retrieval precision (P@5): โฅ 0.7
- [ ] Medical advice faithfulness: โฅ 0.8
### **System Performance**
- [ ] Total response latency: โค 30 seconds
- [ ] Condition extraction: โค 5 seconds
- [ ] Guideline retrieval: โค 2 seconds
- [ ] Medical advice generation: โค 25 seconds
### **User Experience**
- [ ] Non-medical query rejection: 100%
- [ ] System availability: โฅ 99%
- [ ] Error handling: Graceful degradation
- [ ] Interface responsiveness: Immediate feedback
## ๐๏ธ **Project Structure**
```
OnCall.ai/
โโโ src/ # Core modules (โ
Complete)
โ โโโ user_prompt.py # Multi-level query processing
โ โโโ retrieval.py # Dual-index vector search
โ โโโ generation.py # RAG-based advice generation
โ โโโ llm_clients.py # Med42-70B integration
โ โโโ medical_conditions.py # Medical knowledge configuration
โ โโโ data_processing.py # Dataset preprocessing
โโโ models/ # Pre-processed data (โ
Complete)
โ โโโ embeddings/ # Vector embeddings and chunks
โ โโโ indices/ # ANNOY vector indices
โโโ tests/ # Validation tests (โ
Complete)
โ โโโ test_multilevel_fallback_validation.py
โ โโโ test_end_to_end_pipeline.py
โ โโโ test_userinput_userprompt_medical_*.py
โโโ docs/ # Documentation and planning
โ โโโ next/ # Current implementation docs
โ โโโ next_gradio_evaluation/ # Interface planning
โโโ app.py # ๐ฏ NEXT: Gradio interface
โโโ requirements.txt # ๐ฏ NEXT: Deployment dependencies
โโโ README.md # This file
```
## ๐งช **Testing Validation**
### **Completed Tests**
- โ
**Multi-level fallback validation**: 13 test cases, 69.2% success
- โ
**End-to-end pipeline testing**: 6 scenarios, 100% technical completion
- โ
**Component integration**: All modules working together
- โ
**Error handling**: Graceful degradation and user-friendly messages
### **Key Findings**
- **Predefined mapping**: Instant response for known conditions
- **LLM extraction**: Reliable for complex symptom descriptions
- **Non-medical rejection**: Perfect accuracy with updated prompt engineering
- **Retrieval quality**: High-relevance medical guidelines (0.2-0.4 relevance scores)
- **Generation capability**: Evidence-based advice with proper medical caution
## ๐ค **Contributing & Development**
### **Environment Setup**
```bash
# Clone repository
git clone [repository-url]
cd OnCall.ai
# Setup virtual environment
python -m venv genAIvenv
source genAIvenv/bin/activate # On Windows: genAIvenv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run tests
python tests/test_end_to_end_pipeline.py
# Start Gradio interface (coming soon)
python app.py
```
### **API Configuration**
```bash
# Set up HuggingFace token for LLM access
export HF_TOKEN=your_huggingface_token
# Enable debug mode for development
export ONCALL_DEBUG=true
```
## โ ๏ธ **Important Notes**
### **Medical Disclaimer**
This system is designed for **research and educational purposes only**. It should not replace professional medical consultation, diagnosis, or treatment. Always consult qualified healthcare providers for medical decisions.
### **Current Limitations**
- **API Dependencies**: Requires HuggingFace API access for LLM functionality
- **Dataset Scope**: Currently focused on emergency and treatment guidelines
- **Language Support**: English medical terminology only
- **Validation Stage**: System under active development and testing
## ๐ **Contact & Support**
**Development Team**: OnCall.ai Team
**Last Updated**: 2025-07-31
**Version**: 0.9.0 (Pre-release)
**Status**: ๐ง Ready for Interactive Testing Phase
---
*Built with โค๏ธ for healthcare professionals*
|