Spaces:
Sleeping
Sleeping
Upload 4 files
Browse files- README.md +98 -0
- app.py +184 -0
- medical_chatbot.py +447 -0
- requirements.txt +33 -0
README.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: Pediatric Medical Assistant
|
3 |
+
emoji: 🩺
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: green
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 3.40.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
---
|
12 |
+
|
13 |
+
# 🩺 Pediatric Medical Assistant
|
14 |
+
|
15 |
+
An AI-powered medical assistant specialized in pediatric healthcare, powered by BioGPT and advanced medical knowledge retrieval.
|
16 |
+
|
17 |
+
## 🌟 Features
|
18 |
+
|
19 |
+
- **Medical Q&A**: Ask questions about pediatric health conditions, symptoms, and treatments
|
20 |
+
- **BioGPT Integration**: Powered by Microsoft's BioGPT, a medical language model trained on biomedical literature
|
21 |
+
- **Pediatric Focus**: Specialized knowledge base focused on children's health and medical conditions
|
22 |
+
- **Vector Search**: Advanced semantic search using sentence transformers and FAISS
|
23 |
+
- **Educational Content**: Provides evidence-based medical information for learning purposes
|
24 |
+
|
25 |
+
## 🚀 How to Use
|
26 |
+
|
27 |
+
1. **Ask Medical Questions**: Type your pediatric health questions in plain English
|
28 |
+
2. **Get AI Responses**: Receive evidence-based answers from the medical AI
|
29 |
+
3. **Explore Topics**: Ask about symptoms, treatments, prevention, and general pediatric health
|
30 |
+
|
31 |
+
### Example Questions:
|
32 |
+
- "What causes fever in children?"
|
33 |
+
- "How to treat a child's persistent cough?"
|
34 |
+
- "When should I be concerned about my baby's breathing?"
|
35 |
+
- "What are the signs of dehydration in infants?"
|
36 |
+
- "How can I prevent common childhood infections?"
|
37 |
+
|
38 |
+
## 🛠️ Technology Stack
|
39 |
+
|
40 |
+
- **AI Model**: BioGPT (Microsoft) - Medical language model
|
41 |
+
- **Embeddings**: Sentence Transformers for semantic search
|
42 |
+
- **Vector Database**: FAISS for efficient similarity search
|
43 |
+
- **Interface**: Gradio for user-friendly web interface
|
44 |
+
- **Deployment**: Hugging Face Spaces
|
45 |
+
|
46 |
+
## ⚠️ Important Disclaimer
|
47 |
+
|
48 |
+
**This tool is for educational and informational purposes only.**
|
49 |
+
|
50 |
+
- **Not Medical Advice**: The information provided is not intended as medical advice, diagnosis, or treatment
|
51 |
+
- **Consult Professionals**: Always consult qualified healthcare professionals for:
|
52 |
+
- Medical emergencies
|
53 |
+
- Diagnosis and treatment decisions
|
54 |
+
- Personalized medical advice
|
55 |
+
- Medication guidance
|
56 |
+
- **Educational Use**: This AI assistant is designed to provide general medical education and should supplement, not replace, professional medical consultation
|
57 |
+
|
58 |
+
## 🔧 Technical Details
|
59 |
+
|
60 |
+
- **Model**: BioGPT-Large with fallback to base BioGPT
|
61 |
+
- **Knowledge Base**: Curated pediatric medical content
|
62 |
+
- **Search Method**: Hybrid vector + keyword search
|
63 |
+
- **Response Generation**: Context-aware medical responses
|
64 |
+
- **Safety**: Built-in disclaimers and safety reminders
|
65 |
+
|
66 |
+
## 📊 Performance
|
67 |
+
|
68 |
+
- **Response Time**: Typically 2-5 seconds
|
69 |
+
- **Knowledge Coverage**: Focused on pediatric medicine
|
70 |
+
- **Accuracy**: Based on medical literature training data
|
71 |
+
- **Availability**: 24/7 through Hugging Face Spaces
|
72 |
+
|
73 |
+
## 🏥 Medical Specialization
|
74 |
+
|
75 |
+
This assistant specializes in:
|
76 |
+
- Pediatric symptoms and conditions
|
77 |
+
- Common childhood illnesses
|
78 |
+
- Preventive care guidance
|
79 |
+
- When to seek medical attention
|
80 |
+
- General health education for parents and caregivers
|
81 |
+
|
82 |
+
## 📝 License
|
83 |
+
|
84 |
+
This project is licensed under the MIT License - see the license file for details.
|
85 |
+
|
86 |
+
## 🤝 Contributing
|
87 |
+
|
88 |
+
This is an educational project. For suggestions or improvements, please reach out through appropriate channels.
|
89 |
+
|
90 |
+
## 🔗 Related Resources
|
91 |
+
|
92 |
+
- [BioGPT Research Paper](https://arxiv.org/abs/2210.10341)
|
93 |
+
- [Hugging Face Transformers](https://huggingface.co/transformers/)
|
94 |
+
- [American Academy of Pediatrics](https://www.aap.org/)
|
95 |
+
|
96 |
+
---
|
97 |
+
|
98 |
+
**Remember**: While this AI can provide helpful medical information, it cannot replace the expertise and judgment of trained healthcare professionals. Always prioritize professional medical care for your child's health needs.
|
app.py
ADDED
@@ -0,0 +1,184 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import os
|
3 |
+
import torch
|
4 |
+
from medical_chatbot import ColabBioGPTChatbot
|
5 |
+
|
6 |
+
def initialize_chatbot():
|
7 |
+
"""Initialize the chatbot with proper error handling"""
|
8 |
+
try:
|
9 |
+
print("🚀 Initializing Medical Chatbot...")
|
10 |
+
|
11 |
+
# Check if GPU is available but use CPU for stability on HF Spaces
|
12 |
+
use_gpu = torch.cuda.is_available()
|
13 |
+
use_8bit = use_gpu # Only use 8-bit if GPU is available
|
14 |
+
|
15 |
+
chatbot = ColabBioGPTChatbot(use_gpu=use_gpu, use_8bit=use_8bit)
|
16 |
+
|
17 |
+
# Try to load medical data
|
18 |
+
medical_file = "Pediatric_cleaned.txt"
|
19 |
+
if os.path.exists(medical_file):
|
20 |
+
chatbot.load_medical_data(medical_file)
|
21 |
+
status = f"✅ Medical file '{medical_file}' loaded successfully! Ready to chat!"
|
22 |
+
success = True
|
23 |
+
else:
|
24 |
+
status = f"❌ Medical file '{medical_file}' not found. Please ensure the file is in the same directory."
|
25 |
+
success = False
|
26 |
+
|
27 |
+
return chatbot, status, success
|
28 |
+
|
29 |
+
except Exception as e:
|
30 |
+
error_msg = f"❌ Failed to initialize chatbot: {str(e)}"
|
31 |
+
print(error_msg)
|
32 |
+
return None, error_msg, False
|
33 |
+
|
34 |
+
# Initialize chatbot at startup
|
35 |
+
print("🏥 Starting Pediatric Medical Assistant...")
|
36 |
+
chatbot, startup_status, medical_file_loaded = initialize_chatbot()
|
37 |
+
|
38 |
+
def generate_response(user_input, history):
|
39 |
+
"""Generate response with proper error handling"""
|
40 |
+
if not chatbot:
|
41 |
+
return history + [("System Error", "❌ Chatbot failed to initialize. Please refresh the page and try again.")], ""
|
42 |
+
|
43 |
+
if not medical_file_loaded:
|
44 |
+
return history + [(user_input, "⚠️ Medical data failed to load. The chatbot may not have access to the full medical knowledge base.")], ""
|
45 |
+
|
46 |
+
if not user_input.strip():
|
47 |
+
return history, ""
|
48 |
+
|
49 |
+
try:
|
50 |
+
# Generate response
|
51 |
+
bot_response = chatbot.chat(user_input)
|
52 |
+
|
53 |
+
# Add to history
|
54 |
+
history = history + [(user_input, bot_response)]
|
55 |
+
|
56 |
+
return history, ""
|
57 |
+
|
58 |
+
except Exception as e:
|
59 |
+
error_response = f"⚠️ Sorry, I encountered an error: {str(e)}. Please try rephrasing your question."
|
60 |
+
history = history + [(user_input, error_response)]
|
61 |
+
return history, ""
|
62 |
+
|
63 |
+
# Create custom CSS for better styling
|
64 |
+
custom_css = """
|
65 |
+
.gradio-container {
|
66 |
+
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
67 |
+
}
|
68 |
+
|
69 |
+
.chatbot {
|
70 |
+
height: 500px !important;
|
71 |
+
}
|
72 |
+
|
73 |
+
.message {
|
74 |
+
padding: 10px;
|
75 |
+
margin: 5px;
|
76 |
+
border-radius: 10px;
|
77 |
+
}
|
78 |
+
|
79 |
+
.user-message {
|
80 |
+
background-color: #e3f2fd;
|
81 |
+
margin-left: 20%;
|
82 |
+
}
|
83 |
+
|
84 |
+
.bot-message {
|
85 |
+
background-color: #f5f5f5;
|
86 |
+
margin-right: 20%;
|
87 |
+
}
|
88 |
+
"""
|
89 |
+
|
90 |
+
# Create Gradio interface
|
91 |
+
with gr.Blocks(css=custom_css, title="Pediatric Medical Assistant") as demo:
|
92 |
+
gr.Markdown(
|
93 |
+
"""
|
94 |
+
# 🩺 Pediatric Medical Assistant
|
95 |
+
|
96 |
+
Welcome to your AI-powered pediatric medical assistant! This chatbot uses advanced medical AI (BioGPT)
|
97 |
+
to provide evidence-based information about children's health and medical conditions.
|
98 |
+
|
99 |
+
**⚠️ Important Disclaimer:** This tool provides educational information only.
|
100 |
+
Always consult qualified healthcare professionals for medical diagnosis, treatment, and personalized advice.
|
101 |
+
"""
|
102 |
+
)
|
103 |
+
|
104 |
+
# Display startup status
|
105 |
+
gr.Markdown(f"**System Status:** {startup_status}")
|
106 |
+
|
107 |
+
# Chat interface
|
108 |
+
with gr.Row():
|
109 |
+
with gr.Column(scale=4):
|
110 |
+
chatbot_ui = gr.Chatbot(
|
111 |
+
label="💬 Chat with Medical AI",
|
112 |
+
height=500,
|
113 |
+
show_label=True,
|
114 |
+
avatar_images=("👤", "🤖")
|
115 |
+
)
|
116 |
+
|
117 |
+
with gr.Row():
|
118 |
+
user_input = gr.Textbox(
|
119 |
+
placeholder="Ask a pediatric health question... (e.g., 'What causes fever in children?')",
|
120 |
+
lines=2,
|
121 |
+
max_lines=5,
|
122 |
+
show_label=False,
|
123 |
+
scale=4
|
124 |
+
)
|
125 |
+
submit_btn = gr.Button("Send 📤", variant="primary", scale=1)
|
126 |
+
|
127 |
+
with gr.Column(scale=1):
|
128 |
+
gr.Markdown(
|
129 |
+
"""
|
130 |
+
### 💡 Example Questions:
|
131 |
+
|
132 |
+
- "What causes fever in children?"
|
133 |
+
- "How to treat a child's cough?"
|
134 |
+
- "When should I call the doctor?"
|
135 |
+
- "What are signs of dehydration?"
|
136 |
+
- "How to prevent common infections?"
|
137 |
+
|
138 |
+
### 🔧 System Info:
|
139 |
+
- **Model:** BioGPT (Medical AI)
|
140 |
+
- **Specialization:** Pediatric Medicine
|
141 |
+
- **Search:** Vector + Keyword
|
142 |
+
"""
|
143 |
+
)
|
144 |
+
|
145 |
+
# Event handlers
|
146 |
+
def submit_message(user_msg, history):
|
147 |
+
return generate_response(user_msg, history)
|
148 |
+
|
149 |
+
# Connect events
|
150 |
+
user_input.submit(
|
151 |
+
fn=submit_message,
|
152 |
+
inputs=[user_input, chatbot_ui],
|
153 |
+
outputs=[chatbot_ui, user_input],
|
154 |
+
show_progress=True
|
155 |
+
)
|
156 |
+
|
157 |
+
submit_btn.click(
|
158 |
+
fn=submit_message,
|
159 |
+
inputs=[user_input, chatbot_ui],
|
160 |
+
outputs=[chatbot_ui, user_input],
|
161 |
+
show_progress=True
|
162 |
+
)
|
163 |
+
|
164 |
+
# Footer
|
165 |
+
gr.Markdown(
|
166 |
+
"""
|
167 |
+
---
|
168 |
+
**🏥 Medical AI Assistant** | Powered by BioGPT | For Educational Purposes Only
|
169 |
+
|
170 |
+
**Remember:** Always consult healthcare professionals for medical emergencies and personalized medical advice.
|
171 |
+
"""
|
172 |
+
)
|
173 |
+
|
174 |
+
# Launch configuration for Hugging Face Spaces
|
175 |
+
if __name__ == "__main__":
|
176 |
+
# For Hugging Face Spaces deployment
|
177 |
+
demo.launch(
|
178 |
+
server_name="0.0.0.0", # Required for HF Spaces
|
179 |
+
server_port=7860, # Default port for HF Spaces
|
180 |
+
show_error=True, # Show errors for debugging
|
181 |
+
show_tips=False, # Disable tips for cleaner interface
|
182 |
+
enable_queue=True, # Enable queue for better performance
|
183 |
+
max_threads=10 # Limit concurrent users
|
184 |
+
)
|
medical_chatbot.py
ADDED
@@ -0,0 +1,447 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import re
|
3 |
+
import torch
|
4 |
+
import warnings
|
5 |
+
import numpy as np
|
6 |
+
import faiss
|
7 |
+
from transformers import (
|
8 |
+
AutoTokenizer,
|
9 |
+
AutoModelForCausalLM,
|
10 |
+
BitsAndBytesConfig
|
11 |
+
)
|
12 |
+
from sentence_transformers import SentenceTransformer
|
13 |
+
from typing import List, Dict, Optional
|
14 |
+
import time
|
15 |
+
from datetime import datetime
|
16 |
+
|
17 |
+
# Suppress warnings for cleaner output
|
18 |
+
warnings.filterwarnings('ignore')
|
19 |
+
|
20 |
+
class ColabBioGPTChatbot:
|
21 |
+
def __init__(self, use_gpu=True, use_8bit=True):
|
22 |
+
"""Initialize BioGPT chatbot optimized for Hugging Face Spaces"""
|
23 |
+
print("🏥 Initializing Medical Chatbot...")
|
24 |
+
self.use_gpu = use_gpu
|
25 |
+
self.use_8bit = use_8bit
|
26 |
+
self.device = "cuda" if torch.cuda.is_available() and use_gpu else "cpu"
|
27 |
+
print(f"🖥️ Using device: {self.device}")
|
28 |
+
|
29 |
+
self.tokenizer = None
|
30 |
+
self.model = None
|
31 |
+
self.knowledge_chunks = []
|
32 |
+
self.conversation_history = []
|
33 |
+
self.embedding_model = None
|
34 |
+
self.faiss_index = None
|
35 |
+
self.faiss_ready = False
|
36 |
+
self.use_embeddings = True
|
37 |
+
|
38 |
+
# Initialize components
|
39 |
+
self.setup_biogpt()
|
40 |
+
self.load_sentence_transformer()
|
41 |
+
|
42 |
+
def setup_biogpt(self):
|
43 |
+
"""Setup BioGPT model with fallback to base BioGPT if Large fails"""
|
44 |
+
print("🧠 Loading BioGPT model...")
|
45 |
+
|
46 |
+
try:
|
47 |
+
# Try BioGPT-Large first
|
48 |
+
model_name = "microsoft/BioGPT-Large"
|
49 |
+
print(f"Attempting to load {model_name}...")
|
50 |
+
|
51 |
+
if self.use_8bit and self.device == "cuda":
|
52 |
+
quantization_config = BitsAndBytesConfig(
|
53 |
+
load_in_8bit=True,
|
54 |
+
llm_int8_threshold=6.0,
|
55 |
+
llm_int8_has_fp16_weight=False,
|
56 |
+
)
|
57 |
+
else:
|
58 |
+
quantization_config = None
|
59 |
+
|
60 |
+
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
|
61 |
+
if self.tokenizer.pad_token is None:
|
62 |
+
self.tokenizer.pad_token = self.tokenizer.eos_token
|
63 |
+
|
64 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
65 |
+
model_name,
|
66 |
+
quantization_config=quantization_config,
|
67 |
+
torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
|
68 |
+
device_map="auto" if self.device == "cuda" else None,
|
69 |
+
trust_remote_code=True,
|
70 |
+
low_cpu_mem_usage=True
|
71 |
+
)
|
72 |
+
|
73 |
+
if self.device == "cuda" and quantization_config is None:
|
74 |
+
self.model = self.model.to(self.device)
|
75 |
+
|
76 |
+
print("✅ BioGPT-Large loaded successfully!")
|
77 |
+
|
78 |
+
except Exception as e:
|
79 |
+
print(f"❌ BioGPT-Large loading failed: {e}")
|
80 |
+
print("🔁 Falling back to base BioGPT...")
|
81 |
+
self.setup_fallback_biogpt()
|
82 |
+
|
83 |
+
def setup_fallback_biogpt(self):
|
84 |
+
"""Fallback to microsoft/BioGPT if BioGPT-Large fails"""
|
85 |
+
try:
|
86 |
+
model_name = "microsoft/BioGPT"
|
87 |
+
print(f"Loading fallback model: {model_name}")
|
88 |
+
|
89 |
+
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
|
90 |
+
if self.tokenizer.pad_token is None:
|
91 |
+
self.tokenizer.pad_token = self.tokenizer.eos_token
|
92 |
+
|
93 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
94 |
+
model_name,
|
95 |
+
torch_dtype=torch.float32,
|
96 |
+
trust_remote_code=True,
|
97 |
+
low_cpu_mem_usage=True
|
98 |
+
)
|
99 |
+
|
100 |
+
if self.device == "cuda":
|
101 |
+
self.model = self.model.to(self.device)
|
102 |
+
|
103 |
+
print("✅ Base BioGPT model loaded successfully!")
|
104 |
+
|
105 |
+
except Exception as e:
|
106 |
+
print(f"❌ Failed to load fallback BioGPT: {e}")
|
107 |
+
self.model = None
|
108 |
+
self.tokenizer = None
|
109 |
+
|
110 |
+
def load_sentence_transformer(self):
|
111 |
+
"""Load sentence transformer for embeddings"""
|
112 |
+
try:
|
113 |
+
print("🔮 Loading sentence transformer...")
|
114 |
+
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
|
115 |
+
|
116 |
+
# Initialize FAISS index (will be populated when data is loaded)
|
117 |
+
embedding_dim = 384 # Dimension for all-MiniLM-L6-v2
|
118 |
+
self.faiss_index = faiss.IndexFlatL2(embedding_dim)
|
119 |
+
self.faiss_ready = True
|
120 |
+
print("✅ Sentence transformer and FAISS index ready!")
|
121 |
+
|
122 |
+
except Exception as e:
|
123 |
+
print(f"❌ Failed to load sentence transformer: {e}")
|
124 |
+
self.use_embeddings = False
|
125 |
+
self.faiss_ready = False
|
126 |
+
|
127 |
+
def load_medical_data(self, file_path):
|
128 |
+
"""Load and process medical data"""
|
129 |
+
print(f"📖 Loading medical data from {file_path}...")
|
130 |
+
|
131 |
+
try:
|
132 |
+
if not os.path.exists(file_path):
|
133 |
+
raise FileNotFoundError(f"File {file_path} not found")
|
134 |
+
|
135 |
+
with open(file_path, 'r', encoding='utf-8') as f:
|
136 |
+
text = f.read()
|
137 |
+
print(f"📄 File loaded: {len(text):,} characters")
|
138 |
+
|
139 |
+
except Exception as e:
|
140 |
+
print(f"❌ Error loading file: {e}")
|
141 |
+
raise ValueError(f"Failed to load medical data: {e}")
|
142 |
+
|
143 |
+
# Create chunks
|
144 |
+
print("📝 Creating medical chunks...")
|
145 |
+
chunks = self.create_medical_chunks(text)
|
146 |
+
print(f"📋 Created {len(chunks)} medical chunks")
|
147 |
+
|
148 |
+
self.knowledge_chunks = chunks
|
149 |
+
|
150 |
+
# Generate embeddings if available
|
151 |
+
if self.use_embeddings and self.embedding_model and self.faiss_ready:
|
152 |
+
try:
|
153 |
+
self.generate_embeddings_with_progress(chunks)
|
154 |
+
print("✅ Medical data loaded with embeddings!")
|
155 |
+
except Exception as e:
|
156 |
+
print(f"⚠️ Embedding generation failed: {e}")
|
157 |
+
print("✅ Medical data loaded (keyword search mode)")
|
158 |
+
else:
|
159 |
+
print("✅ Medical data loaded (keyword search mode)")
|
160 |
+
|
161 |
+
def create_medical_chunks(self, text: str, chunk_size: int = 400) -> List[Dict]:
|
162 |
+
"""Create medically-optimized text chunks"""
|
163 |
+
chunks = []
|
164 |
+
|
165 |
+
# Split by paragraphs first
|
166 |
+
paragraphs = [p.strip() for p in text.split('\n\n') if len(p.strip()) > 50]
|
167 |
+
|
168 |
+
chunk_id = 0
|
169 |
+
for paragraph in paragraphs:
|
170 |
+
if len(paragraph.split()) <= chunk_size:
|
171 |
+
chunks.append({
|
172 |
+
'id': chunk_id,
|
173 |
+
'text': paragraph,
|
174 |
+
'medical_focus': self.identify_medical_focus(paragraph)
|
175 |
+
})
|
176 |
+
chunk_id += 1
|
177 |
+
else:
|
178 |
+
# Split large paragraphs by sentences
|
179 |
+
sentences = re.split(r'[.!?]+', paragraph)
|
180 |
+
current_chunk = ""
|
181 |
+
|
182 |
+
for sentence in sentences:
|
183 |
+
sentence = sentence.strip()
|
184 |
+
if not sentence:
|
185 |
+
continue
|
186 |
+
|
187 |
+
if len(current_chunk.split()) + len(sentence.split()) <= chunk_size:
|
188 |
+
current_chunk += sentence + ". "
|
189 |
+
else:
|
190 |
+
if current_chunk.strip():
|
191 |
+
chunks.append({
|
192 |
+
'id': chunk_id,
|
193 |
+
'text': current_chunk.strip(),
|
194 |
+
'medical_focus': self.identify_medical_focus(current_chunk)
|
195 |
+
})
|
196 |
+
chunk_id += 1
|
197 |
+
current_chunk = sentence + ". "
|
198 |
+
|
199 |
+
if current_chunk.strip():
|
200 |
+
chunks.append({
|
201 |
+
'id': chunk_id,
|
202 |
+
'text': current_chunk.strip(),
|
203 |
+
'medical_focus': self.identify_medical_focus(current_chunk)
|
204 |
+
})
|
205 |
+
chunk_id += 1
|
206 |
+
|
207 |
+
return chunks
|
208 |
+
|
209 |
+
def identify_medical_focus(self, text: str) -> str:
|
210 |
+
"""Identify the medical focus of a text chunk"""
|
211 |
+
text_lower = text.lower()
|
212 |
+
|
213 |
+
categories = {
|
214 |
+
'pediatric_symptoms': ['fever', 'cough', 'rash', 'vomiting', 'diarrhea'],
|
215 |
+
'treatments': ['treatment', 'therapy', 'medication', 'antibiotics'],
|
216 |
+
'diagnosis': ['diagnosis', 'diagnostic', 'symptoms', 'signs'],
|
217 |
+
'emergency': ['emergency', 'urgent', 'serious', 'hospital'],
|
218 |
+
'prevention': ['prevention', 'vaccine', 'immunization', 'avoid']
|
219 |
+
}
|
220 |
+
|
221 |
+
for category, keywords in categories.items():
|
222 |
+
if any(keyword in text_lower for keyword in keywords):
|
223 |
+
return category
|
224 |
+
|
225 |
+
return 'general_medical'
|
226 |
+
|
227 |
+
def generate_embeddings_with_progress(self, chunks: List[Dict]):
|
228 |
+
"""Generate embeddings and add to FAISS index"""
|
229 |
+
print("🔮 Generating embeddings...")
|
230 |
+
|
231 |
+
try:
|
232 |
+
texts = [chunk['text'] for chunk in chunks]
|
233 |
+
|
234 |
+
# Generate embeddings in batches
|
235 |
+
batch_size = 32
|
236 |
+
all_embeddings = []
|
237 |
+
|
238 |
+
for i in range(0, len(texts), batch_size):
|
239 |
+
batch_texts = texts[i:i+batch_size]
|
240 |
+
batch_embeddings = self.embedding_model.encode(batch_texts, show_progress_bar=False)
|
241 |
+
all_embeddings.extend(batch_embeddings)
|
242 |
+
|
243 |
+
progress = min(i + batch_size, len(texts))
|
244 |
+
print(f" Progress: {progress}/{len(texts)} chunks processed", end='\r')
|
245 |
+
|
246 |
+
print(f"\n ✅ Generated embeddings for {len(texts)} chunks")
|
247 |
+
|
248 |
+
# Add to FAISS index
|
249 |
+
embeddings_array = np.array(all_embeddings).astype('float32')
|
250 |
+
self.faiss_index.add(embeddings_array)
|
251 |
+
print("✅ Embeddings added to FAISS index!")
|
252 |
+
|
253 |
+
except Exception as e:
|
254 |
+
print(f"❌ Embedding generation failed: {e}")
|
255 |
+
raise
|
256 |
+
|
257 |
+
def retrieve_medical_context(self, query: str, n_results: int = 3) -> List[str]:
|
258 |
+
"""Retrieve relevant medical context"""
|
259 |
+
if self.use_embeddings and self.embedding_model and self.faiss_ready and self.faiss_index.ntotal > 0:
|
260 |
+
try:
|
261 |
+
# Generate query embedding
|
262 |
+
query_embedding = self.embedding_model.encode([query])
|
263 |
+
|
264 |
+
# Search FAISS index
|
265 |
+
distances, indices = self.faiss_index.search(
|
266 |
+
np.array(query_embedding).astype('float32'),
|
267 |
+
min(n_results, self.faiss_index.ntotal)
|
268 |
+
)
|
269 |
+
|
270 |
+
# Get relevant chunks
|
271 |
+
context_chunks = []
|
272 |
+
for idx in indices[0]:
|
273 |
+
if idx != -1 and idx < len(self.knowledge_chunks):
|
274 |
+
context_chunks.append(self.knowledge_chunks[idx]['text'])
|
275 |
+
|
276 |
+
if context_chunks:
|
277 |
+
return context_chunks
|
278 |
+
|
279 |
+
except Exception as e:
|
280 |
+
print(f"⚠️ Embedding search failed: {e}")
|
281 |
+
|
282 |
+
# Fallback to keyword search
|
283 |
+
return self.keyword_search_medical(query, n_results)
|
284 |
+
|
285 |
+
def keyword_search_medical(self, query: str, n_results: int) -> List[str]:
|
286 |
+
"""Medical-focused keyword search"""
|
287 |
+
if not self.knowledge_chunks:
|
288 |
+
return []
|
289 |
+
|
290 |
+
query_words = set(query.lower().split())
|
291 |
+
chunk_scores = []
|
292 |
+
|
293 |
+
for chunk_info in self.knowledge_chunks:
|
294 |
+
chunk_text = chunk_info['text']
|
295 |
+
chunk_words = set(chunk_text.lower().split())
|
296 |
+
|
297 |
+
# Calculate relevance score
|
298 |
+
word_overlap = len(query_words.intersection(chunk_words))
|
299 |
+
base_score = word_overlap / len(query_words) if query_words else 0
|
300 |
+
|
301 |
+
# Boost medical content
|
302 |
+
medical_boost = 0
|
303 |
+
if chunk_info.get('medical_focus') in ['pediatric_symptoms', 'treatments', 'diagnosis']:
|
304 |
+
medical_boost = 0.3
|
305 |
+
|
306 |
+
final_score = base_score + medical_boost
|
307 |
+
|
308 |
+
if final_score > 0:
|
309 |
+
chunk_scores.append((final_score, chunk_text))
|
310 |
+
|
311 |
+
# Return top matches
|
312 |
+
chunk_scores.sort(reverse=True)
|
313 |
+
return [chunk for _, chunk in chunk_scores[:n_results]]
|
314 |
+
|
315 |
+
def generate_biogpt_response(self, context: str, query: str) -> str:
|
316 |
+
"""Generate medical response using BioGPT"""
|
317 |
+
if not self.model or not self.tokenizer:
|
318 |
+
return "Medical model not available. Please check the setup."
|
319 |
+
|
320 |
+
try:
|
321 |
+
# Create medical prompt
|
322 |
+
prompt = f"""Medical Context: {context[:800]}
|
323 |
+
|
324 |
+
Question: {query}
|
325 |
+
|
326 |
+
Medical Answer:"""
|
327 |
+
|
328 |
+
# Tokenize
|
329 |
+
inputs = self.tokenizer(
|
330 |
+
prompt,
|
331 |
+
return_tensors="pt",
|
332 |
+
truncation=True,
|
333 |
+
max_length=1024
|
334 |
+
)
|
335 |
+
|
336 |
+
# Move to device
|
337 |
+
if self.device == "cuda":
|
338 |
+
inputs = {k: v.to(self.device) for k, v in inputs.items()}
|
339 |
+
|
340 |
+
# Generate response
|
341 |
+
with torch.no_grad():
|
342 |
+
outputs = self.model.generate(
|
343 |
+
**inputs,
|
344 |
+
max_new_tokens=150,
|
345 |
+
do_sample=True,
|
346 |
+
temperature=0.7,
|
347 |
+
top_p=0.9,
|
348 |
+
pad_token_id=self.tokenizer.eos_token_id,
|
349 |
+
repetition_penalty=1.1
|
350 |
+
)
|
351 |
+
|
352 |
+
# Decode response
|
353 |
+
full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
354 |
+
|
355 |
+
# Extract generated part
|
356 |
+
if "Medical Answer:" in full_response:
|
357 |
+
generated_response = full_response.split("Medical Answer:")[-1].strip()
|
358 |
+
else:
|
359 |
+
generated_response = full_response[len(prompt):].strip()
|
360 |
+
|
361 |
+
return self.clean_medical_response(generated_response)
|
362 |
+
|
363 |
+
except Exception as e:
|
364 |
+
print(f"⚠️ BioGPT generation failed: {e}")
|
365 |
+
return self.fallback_response(context, query)
|
366 |
+
|
367 |
+
def clean_medical_response(self, response: str) -> str:
|
368 |
+
"""Clean and format medical response"""
|
369 |
+
# Remove incomplete sentences and limit length
|
370 |
+
sentences = re.split(r'[.!?]+', response)
|
371 |
+
clean_sentences = []
|
372 |
+
|
373 |
+
for sentence in sentences:
|
374 |
+
sentence = sentence.strip()
|
375 |
+
if len(sentence) > 10 and not sentence.endswith(('and', 'or', 'but', 'however')):
|
376 |
+
clean_sentences.append(sentence)
|
377 |
+
if len(clean_sentences) >= 3:
|
378 |
+
break
|
379 |
+
|
380 |
+
if clean_sentences:
|
381 |
+
cleaned = '. '.join(clean_sentences) + '.'
|
382 |
+
else:
|
383 |
+
cleaned = response[:200] + '...' if len(response) > 200 else response
|
384 |
+
|
385 |
+
return cleaned
|
386 |
+
|
387 |
+
def fallback_response(self, context: str, query: str) -> str:
|
388 |
+
"""Fallback response when BioGPT fails"""
|
389 |
+
sentences = [s.strip() for s in context.split('.') if len(s.strip()) > 20]
|
390 |
+
|
391 |
+
if sentences:
|
392 |
+
response = sentences[0] + '.'
|
393 |
+
if len(sentences) > 1:
|
394 |
+
response += ' ' + sentences[1] + '.'
|
395 |
+
else:
|
396 |
+
response = context[:300] + '...'
|
397 |
+
|
398 |
+
return response
|
399 |
+
|
400 |
+
def handle_conversational_interactions(self, query: str) -> Optional[str]:
|
401 |
+
"""Handle conversational interactions"""
|
402 |
+
query_lower = query.lower().strip()
|
403 |
+
|
404 |
+
# Greetings
|
405 |
+
if any(greeting in query_lower for greeting in ['hello', 'hi', 'hey', 'good morning', 'good afternoon']):
|
406 |
+
return "👋 Hello! I'm your pediatric medical AI assistant. How can I help you with medical questions today?"
|
407 |
+
|
408 |
+
# Thanks
|
409 |
+
if any(thanks in query_lower for thanks in ['thank you', 'thanks', 'thx']):
|
410 |
+
return "🙏 You're welcome! I'm glad I could help. Remember to consult healthcare professionals for medical decisions. What else can I help you with?"
|
411 |
+
|
412 |
+
# Goodbyes
|
413 |
+
if any(bye in query_lower for bye in ['bye', 'goodbye', 'see you later']):
|
414 |
+
return "👋 Goodbye! Take care and remember to consult healthcare professionals for any medical concerns. Stay healthy!"
|
415 |
+
|
416 |
+
return None
|
417 |
+
|
418 |
+
def chat(self, query: str) -> str:
|
419 |
+
"""Main chat function"""
|
420 |
+
if not query.strip():
|
421 |
+
return "Hello! I'm your pediatric medical AI assistant. How can I help you today?"
|
422 |
+
|
423 |
+
# Handle conversational interactions
|
424 |
+
conversational_response = self.handle_conversational_interactions(query)
|
425 |
+
if conversational_response:
|
426 |
+
return conversational_response
|
427 |
+
|
428 |
+
if not self.knowledge_chunks:
|
429 |
+
return "Please load medical data first to access the medical knowledge base."
|
430 |
+
|
431 |
+
if not self.model or not self.tokenizer:
|
432 |
+
return "Medical model not available. Please check the setup and try again."
|
433 |
+
|
434 |
+
# Retrieve context
|
435 |
+
context = self.retrieve_medical_context(query)
|
436 |
+
|
437 |
+
if not context:
|
438 |
+
return "I don't have specific information about this topic in my medical database. Please consult with a healthcare professional for personalized medical advice."
|
439 |
+
|
440 |
+
# Generate response
|
441 |
+
main_context = '\n\n'.join(context)
|
442 |
+
response = self.generate_biogpt_response(main_context, query)
|
443 |
+
|
444 |
+
# Format final response
|
445 |
+
final_response = f"🩺 **Medical Information:** {response}\n\n⚠️ **Important:** This information is for educational purposes only. Always consult with qualified healthcare professionals for medical diagnosis, treatment, and personalized advice."
|
446 |
+
|
447 |
+
return final_response
|
requirements.txt
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Core ML and NLP libraries
|
2 |
+
torch>=2.0.0,<2.2.0
|
3 |
+
transformers>=4.30.0,<4.40.0
|
4 |
+
sentence-transformers>=2.2.0,<3.0.0
|
5 |
+
accelerate>=0.20.0,<0.25.0
|
6 |
+
|
7 |
+
# Quantization support (for GPU optimization)
|
8 |
+
bitsandbytes>=0.41.0,<0.43.0
|
9 |
+
|
10 |
+
# Vector search (CPU version for HF Spaces compatibility)
|
11 |
+
faiss-cpu>=1.7.4,<1.8.0
|
12 |
+
|
13 |
+
# Scientific computing
|
14 |
+
numpy>=1.21.0,<1.26.0
|
15 |
+
scipy>=1.9.0,<1.12.0
|
16 |
+
|
17 |
+
# Gradio for web interface
|
18 |
+
gradio>=3.40.0,<4.0.0
|
19 |
+
|
20 |
+
# Essential utilities
|
21 |
+
tqdm>=4.64.0
|
22 |
+
requests>=2.28.0
|
23 |
+
packaging>=21.0
|
24 |
+
|
25 |
+
# Tokenization support
|
26 |
+
tokenizers>=0.13.0,<0.16.0
|
27 |
+
|
28 |
+
# System monitoring
|
29 |
+
psutil>=5.9.0
|
30 |
+
|
31 |
+
# Additional stability packages
|
32 |
+
safetensors>=0.3.0
|
33 |
+
huggingface-hub>=0.15.0
|