Spaces:

ybchen928
/

oncall-guide-ai

Running

App Files Files Community

oncall-guide-ai / README.md

YanBoChen

docs: update data flow diagram to reflect multi-level query processing stages

9a63f99 about 2 months ago

preview code

raw

history blame

10.1 kB

	# OnCall.ai - Medical Emergency Assistant

	A RAG-based medical assistant system that provides evidence-based clinical guidance for emergency medical situations using real medical guidelines and advanced language models.

	## 🎯 Project Overview

	OnCall.ai helps healthcare professionals by:
	- Processing medical queries through multi-level validation
	- Retrieving relevant medical guidelines from curated datasets
	- Generating evidence-based clinical advice using specialized medical LLMs
	- Providing transparent, traceable medical guidance

	## ✅ Current Implementation Status

	### 🎉 COMPLETED MODULES (2025-07-31)

	#### 1. Multi-Level Query Processing System
	- ✅ UserPromptProcessor (`src/user_prompt.py`)
	- Level 1: Predefined medical condition mapping (instant response)
	- Level 2: LLM-based condition extraction (Llama3-Med42-70B)
	- Level 3: Semantic search fallback
	- Level 4: Medical query validation (100% non-medical rejection)
	- Level 5: Generic medical search for rare conditions

	#### 2. Dual-Index Retrieval System
	- ✅ BasicRetrievalSystem (`src/retrieval.py`)
	- Emergency medical guidelines index (emergency.ann)
	- Treatment protocols index (treatment.ann)
	- Vector-based similarity search using PubMedBERT embeddings
	- Intelligent deduplication and result ranking

	#### 3. Medical Knowledge Base
	- ✅ MedicalConditions (`src/medical_conditions.py`)
	- Predefined condition-keyword mappings
	- Medical terminology validation
	- Extensible condition database

	#### 4. LLM Integration
	- ✅ Med42-70B Client (`src/llm_clients.py`)
	- Specialized medical language model integration
	- Dual-layer rejection detection for non-medical queries
	- Robust error handling and timeout management

	#### 5. Medical Advice Generation
	- ✅ MedicalAdviceGenerator (`src/generation.py`)
	- RAG-based prompt construction
	- Intention-aware chunk selection (treatment/diagnosis)
	- Confidence scoring and response formatting
	- Integration with Med42-70B for clinical advice generation

	#### 6. Data Processing Pipeline
	- ✅ Processed Medical Guidelines (`src/data_processing.py`)
	- ~4000 medical guidelines from EPFL-LLM dataset
	- Emergency subset: ~2000-2500 records
	- Treatment subset: ~2000-2500 records
	- PubMedBERT embeddings (768 dimensions)
	- ANNOY vector indices for fast retrieval

	## 📊 System Performance (Validated)

	### Test Results Summary
	```
	🎯 Multi-Level Fallback Validation: 69.2% success rate
	- Level 1 (Predefined): 100% success (instant response)
	- Level 4a (Non-medical rejection): 100% success
	- Level 4b→5 (Rare medical): 100% success

	📈 End-to-End Pipeline: 100% technical completion
	- Condition extraction: 2.6s average
	- Medical guideline retrieval: 0.3s average
	- Total pipeline: 15.5s average (including generation)
	```

	### Quality Metrics
	```
	🔍 Retrieval Performance:
	- Guidelines retrieved: 8-9 per query
	- Relevance scores: 0.245-0.326 (good for medical domain)
	- Emergency/Treatment balance: Correctly maintained

	🧠 Generation Quality:
	- Confidence scores: 0.90 for successful generations
	- Evidence-based responses with specific guideline references
	- Appropriate medical caution and clinical judgment emphasis
	```

	## 🛠️ Technical Architecture

	### Data Flow
	```
	User Query → Level 1: Predefined Mapping
	↓ (if fails)
	Level 2: LLM Extraction
	↓ (if fails)
	Level 3: Semantic Search
	↓ (if fails)
	Level 4: Medical Validation
	↓ (if fails)
	Level 5: Generic Search
	↓ (if fails)
	No Match Found
	```

	### Core Technologies
	- Embeddings: NeuML/pubmedbert-base-embeddings (768D)
	- Vector Search: ANNOY indices with angular distance
	- LLM: m42-health/Llama3-Med42-70B (medical specialist)
	- Dataset: EPFL-LLM medical guidelines (~4000 documents)

	### Fallback Mechanism
	```
	Level 1: Predefined Mapping (0.001s) → Success: Direct return
	Level 2: LLM Extraction (8-15s) → Success: Condition mapping
	Level 3: Semantic Search (1-2s) → Success: Sliding window chunks
	Level 4: Medical Validation (8-10s) → Fail: Return rejection
	Level 5: Generic Search (1s) → Final: General medical guidance
	```

	## 🚀 NEXT PHASE: Interactive Interface

	### 🎯 Immediate Goals (Next 1-2 Days)

	#### Phase 1: Gradio Interface Development
	- [ ] Create `app.py` - Interactive web interface
	- [ ] Complete pipeline integration
	- [ ] Multi-output display (advice + guidelines + technical details)
	- [ ] Environment-controlled debug mode
	- [ ] User-friendly error handling

	#### Phase 2: Local Validation Testing
	- [ ] Manual testing with 20-30 realistic medical queries
	- [ ] Emergency scenarios (cardiac arrest, stroke, MI)
	- [ ] Diagnostic queries (chest pain, respiratory distress)
	- [ ] Treatment protocols (medication management, procedures)
	- [ ] Edge cases (rare conditions, complex symptoms)

	#### Phase 3: HuggingFace Spaces Deployment
	- [ ] Create requirements.txt for deployment
	- [ ] Deploy to HF Spaces for public testing
	- [ ] Production mode configuration (limited technical details)
	- [ ] Performance monitoring and user feedback collection

	### 🔮 Future Enhancements (Next 1-2 Weeks)

	#### Audio Input Integration
	- [ ] Whisper ASR integration for voice queries
	- [ ] Audio preprocessing and quality validation
	- [ ] Multi-modal interface (text + audio input)

	#### Evaluation & Metrics
	- [ ] Faithfulness scoring implementation
	- [ ] Automated evaluation pipeline
	- [ ] Clinical validation with medical professionals
	- [ ] Performance benchmarking against target metrics

	#### Dataset Expansion (Future)
	- [ ] Dataset B integration (symptom/diagnosis subsets)
	- [ ] Multi-dataset RAG architecture
	- [ ] Enhanced medical knowledge coverage

	## 📋 Target Performance Metrics

	### Response Quality
	- [ ] Physician satisfaction: ≥ 4/5
	- [ ] RAG content coverage: ≥ 80%
	- [ ] Retrieval precision (P@5): ≥ 0.7
	- [ ] Medical advice faithfulness: ≥ 0.8

	### System Performance
	- [ ] Total response latency: ≤ 30 seconds
	- [ ] Condition extraction: ≤ 5 seconds
	- [ ] Guideline retrieval: ≤ 2 seconds
	- [ ] Medical advice generation: ≤ 25 seconds

	### User Experience
	- [ ] Non-medical query rejection: 100%
	- [ ] System availability: ≥ 99%
	- [ ] Error handling: Graceful degradation
	- [ ] Interface responsiveness: Immediate feedback

	## 🏗️ Project Structure
	```
	OnCall.ai/
	├── src/ # Core modules (✅ Complete)
	│ ├── user_prompt.py # Multi-level query processing
	│ ├── retrieval.py # Dual-index vector search
	│ ├── generation.py # RAG-based advice generation
	│ ├── llm_clients.py # Med42-70B integration
	│ ├── medical_conditions.py # Medical knowledge configuration
	│ └── data_processing.py # Dataset preprocessing
	├── models/ # Pre-processed data (✅ Complete)
	│ ├── embeddings/ # Vector embeddings and chunks
	│ └── indices/ # ANNOY vector indices
	├── tests/ # Validation tests (✅ Complete)
	│ ├── test_multilevel_fallback_validation.py
	│ ├── test_end_to_end_pipeline.py
	│ └── test_userinput_userprompt_medical_*.py
	├── docs/ # Documentation and planning
	│ ├── next/ # Current implementation docs
	│ └── next_gradio_evaluation/ # Interface planning
	├── app.py # 🎯 NEXT: Gradio interface
	├── requirements.txt # 🎯 NEXT: Deployment dependencies
	└── README.md # This file
	```

	## 🧪 Testing Validation

	### Completed Tests
	- ✅ Multi-level fallback validation: 13 test cases, 69.2% success
	- ✅ End-to-end pipeline testing: 6 scenarios, 100% technical completion
	- ✅ Component integration: All modules working together
	- ✅ Error handling: Graceful degradation and user-friendly messages

	### Key Findings
	- Predefined mapping: Instant response for known conditions
	- LLM extraction: Reliable for complex symptom descriptions
	- Non-medical rejection: Perfect accuracy with updated prompt engineering
	- Retrieval quality: High-relevance medical guidelines (0.2-0.4 relevance scores)
	- Generation capability: Evidence-based advice with proper medical caution

	## 🤝 Contributing & Development

	### Environment Setup
	```bash
	# Clone repository
	git clone [repository-url]
	cd OnCall.ai

	# Setup virtual environment
	python -m venv genAIvenv
	source genAIvenv/bin/activate # On Windows: genAIvenv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt

	# Run tests
	python tests/test_end_to_end_pipeline.py

	# Start Gradio interface (coming soon)
	python app.py
	```

	### API Configuration
	```bash
	# Set up HuggingFace token for LLM access
	export HF_TOKEN=your_huggingface_token

	# Enable debug mode for development
	export ONCALL_DEBUG=true
	```

	## ⚠️ Important Notes

	### Medical Disclaimer
	This system is designed for research and educational purposes only. It should not replace professional medical consultation, diagnosis, or treatment. Always consult qualified healthcare providers for medical decisions.

	### Current Limitations
	- API Dependencies: Requires HuggingFace API access for LLM functionality
	- Dataset Scope: Currently focused on emergency and treatment guidelines
	- Language Support: English medical terminology only
	- Validation Stage: System under active development and testing

	## 📞 Contact & Support

	Development Team: OnCall.ai Team
	Last Updated: 2025-07-31
	Version: 0.9.0 (Pre-release)
	Status: 🚧 Ready for Interactive Testing Phase

	---

	Built with ❤️ for healthcare professionals