Blaiseboy commited on
Commit
f1ca076
·
verified ·
1 Parent(s): 84c3636

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +98 -0
  2. app.py +184 -0
  3. medical_chatbot.py +447 -0
  4. requirements.txt +33 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Pediatric Medical Assistant
3
+ emoji: 🩺
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 3.40.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🩺 Pediatric Medical Assistant
14
+
15
+ An AI-powered medical assistant specialized in pediatric healthcare, powered by BioGPT and advanced medical knowledge retrieval.
16
+
17
+ ## 🌟 Features
18
+
19
+ - **Medical Q&A**: Ask questions about pediatric health conditions, symptoms, and treatments
20
+ - **BioGPT Integration**: Powered by Microsoft's BioGPT, a medical language model trained on biomedical literature
21
+ - **Pediatric Focus**: Specialized knowledge base focused on children's health and medical conditions
22
+ - **Vector Search**: Advanced semantic search using sentence transformers and FAISS
23
+ - **Educational Content**: Provides evidence-based medical information for learning purposes
24
+
25
+ ## 🚀 How to Use
26
+
27
+ 1. **Ask Medical Questions**: Type your pediatric health questions in plain English
28
+ 2. **Get AI Responses**: Receive evidence-based answers from the medical AI
29
+ 3. **Explore Topics**: Ask about symptoms, treatments, prevention, and general pediatric health
30
+
31
+ ### Example Questions:
32
+ - "What causes fever in children?"
33
+ - "How to treat a child's persistent cough?"
34
+ - "When should I be concerned about my baby's breathing?"
35
+ - "What are the signs of dehydration in infants?"
36
+ - "How can I prevent common childhood infections?"
37
+
38
+ ## 🛠️ Technology Stack
39
+
40
+ - **AI Model**: BioGPT (Microsoft) - Medical language model
41
+ - **Embeddings**: Sentence Transformers for semantic search
42
+ - **Vector Database**: FAISS for efficient similarity search
43
+ - **Interface**: Gradio for user-friendly web interface
44
+ - **Deployment**: Hugging Face Spaces
45
+
46
+ ## ⚠️ Important Disclaimer
47
+
48
+ **This tool is for educational and informational purposes only.**
49
+
50
+ - **Not Medical Advice**: The information provided is not intended as medical advice, diagnosis, or treatment
51
+ - **Consult Professionals**: Always consult qualified healthcare professionals for:
52
+ - Medical emergencies
53
+ - Diagnosis and treatment decisions
54
+ - Personalized medical advice
55
+ - Medication guidance
56
+ - **Educational Use**: This AI assistant is designed to provide general medical education and should supplement, not replace, professional medical consultation
57
+
58
+ ## 🔧 Technical Details
59
+
60
+ - **Model**: BioGPT-Large with fallback to base BioGPT
61
+ - **Knowledge Base**: Curated pediatric medical content
62
+ - **Search Method**: Hybrid vector + keyword search
63
+ - **Response Generation**: Context-aware medical responses
64
+ - **Safety**: Built-in disclaimers and safety reminders
65
+
66
+ ## 📊 Performance
67
+
68
+ - **Response Time**: Typically 2-5 seconds
69
+ - **Knowledge Coverage**: Focused on pediatric medicine
70
+ - **Accuracy**: Based on medical literature training data
71
+ - **Availability**: 24/7 through Hugging Face Spaces
72
+
73
+ ## 🏥 Medical Specialization
74
+
75
+ This assistant specializes in:
76
+ - Pediatric symptoms and conditions
77
+ - Common childhood illnesses
78
+ - Preventive care guidance
79
+ - When to seek medical attention
80
+ - General health education for parents and caregivers
81
+
82
+ ## 📝 License
83
+
84
+ This project is licensed under the MIT License - see the license file for details.
85
+
86
+ ## 🤝 Contributing
87
+
88
+ This is an educational project. For suggestions or improvements, please reach out through appropriate channels.
89
+
90
+ ## 🔗 Related Resources
91
+
92
+ - [BioGPT Research Paper](https://arxiv.org/abs/2210.10341)
93
+ - [Hugging Face Transformers](https://huggingface.co/transformers/)
94
+ - [American Academy of Pediatrics](https://www.aap.org/)
95
+
96
+ ---
97
+
98
+ **Remember**: While this AI can provide helpful medical information, it cannot replace the expertise and judgment of trained healthcare professionals. Always prioritize professional medical care for your child's health needs.
app.py ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ import torch
4
+ from medical_chatbot import ColabBioGPTChatbot
5
+
6
+ def initialize_chatbot():
7
+ """Initialize the chatbot with proper error handling"""
8
+ try:
9
+ print("🚀 Initializing Medical Chatbot...")
10
+
11
+ # Check if GPU is available but use CPU for stability on HF Spaces
12
+ use_gpu = torch.cuda.is_available()
13
+ use_8bit = use_gpu # Only use 8-bit if GPU is available
14
+
15
+ chatbot = ColabBioGPTChatbot(use_gpu=use_gpu, use_8bit=use_8bit)
16
+
17
+ # Try to load medical data
18
+ medical_file = "Pediatric_cleaned.txt"
19
+ if os.path.exists(medical_file):
20
+ chatbot.load_medical_data(medical_file)
21
+ status = f"✅ Medical file '{medical_file}' loaded successfully! Ready to chat!"
22
+ success = True
23
+ else:
24
+ status = f"❌ Medical file '{medical_file}' not found. Please ensure the file is in the same directory."
25
+ success = False
26
+
27
+ return chatbot, status, success
28
+
29
+ except Exception as e:
30
+ error_msg = f"❌ Failed to initialize chatbot: {str(e)}"
31
+ print(error_msg)
32
+ return None, error_msg, False
33
+
34
+ # Initialize chatbot at startup
35
+ print("🏥 Starting Pediatric Medical Assistant...")
36
+ chatbot, startup_status, medical_file_loaded = initialize_chatbot()
37
+
38
+ def generate_response(user_input, history):
39
+ """Generate response with proper error handling"""
40
+ if not chatbot:
41
+ return history + [("System Error", "❌ Chatbot failed to initialize. Please refresh the page and try again.")], ""
42
+
43
+ if not medical_file_loaded:
44
+ return history + [(user_input, "⚠️ Medical data failed to load. The chatbot may not have access to the full medical knowledge base.")], ""
45
+
46
+ if not user_input.strip():
47
+ return history, ""
48
+
49
+ try:
50
+ # Generate response
51
+ bot_response = chatbot.chat(user_input)
52
+
53
+ # Add to history
54
+ history = history + [(user_input, bot_response)]
55
+
56
+ return history, ""
57
+
58
+ except Exception as e:
59
+ error_response = f"⚠️ Sorry, I encountered an error: {str(e)}. Please try rephrasing your question."
60
+ history = history + [(user_input, error_response)]
61
+ return history, ""
62
+
63
+ # Create custom CSS for better styling
64
+ custom_css = """
65
+ .gradio-container {
66
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
67
+ }
68
+
69
+ .chatbot {
70
+ height: 500px !important;
71
+ }
72
+
73
+ .message {
74
+ padding: 10px;
75
+ margin: 5px;
76
+ border-radius: 10px;
77
+ }
78
+
79
+ .user-message {
80
+ background-color: #e3f2fd;
81
+ margin-left: 20%;
82
+ }
83
+
84
+ .bot-message {
85
+ background-color: #f5f5f5;
86
+ margin-right: 20%;
87
+ }
88
+ """
89
+
90
+ # Create Gradio interface
91
+ with gr.Blocks(css=custom_css, title="Pediatric Medical Assistant") as demo:
92
+ gr.Markdown(
93
+ """
94
+ # 🩺 Pediatric Medical Assistant
95
+
96
+ Welcome to your AI-powered pediatric medical assistant! This chatbot uses advanced medical AI (BioGPT)
97
+ to provide evidence-based information about children's health and medical conditions.
98
+
99
+ **⚠️ Important Disclaimer:** This tool provides educational information only.
100
+ Always consult qualified healthcare professionals for medical diagnosis, treatment, and personalized advice.
101
+ """
102
+ )
103
+
104
+ # Display startup status
105
+ gr.Markdown(f"**System Status:** {startup_status}")
106
+
107
+ # Chat interface
108
+ with gr.Row():
109
+ with gr.Column(scale=4):
110
+ chatbot_ui = gr.Chatbot(
111
+ label="💬 Chat with Medical AI",
112
+ height=500,
113
+ show_label=True,
114
+ avatar_images=("👤", "🤖")
115
+ )
116
+
117
+ with gr.Row():
118
+ user_input = gr.Textbox(
119
+ placeholder="Ask a pediatric health question... (e.g., 'What causes fever in children?')",
120
+ lines=2,
121
+ max_lines=5,
122
+ show_label=False,
123
+ scale=4
124
+ )
125
+ submit_btn = gr.Button("Send 📤", variant="primary", scale=1)
126
+
127
+ with gr.Column(scale=1):
128
+ gr.Markdown(
129
+ """
130
+ ### 💡 Example Questions:
131
+
132
+ - "What causes fever in children?"
133
+ - "How to treat a child's cough?"
134
+ - "When should I call the doctor?"
135
+ - "What are signs of dehydration?"
136
+ - "How to prevent common infections?"
137
+
138
+ ### 🔧 System Info:
139
+ - **Model:** BioGPT (Medical AI)
140
+ - **Specialization:** Pediatric Medicine
141
+ - **Search:** Vector + Keyword
142
+ """
143
+ )
144
+
145
+ # Event handlers
146
+ def submit_message(user_msg, history):
147
+ return generate_response(user_msg, history)
148
+
149
+ # Connect events
150
+ user_input.submit(
151
+ fn=submit_message,
152
+ inputs=[user_input, chatbot_ui],
153
+ outputs=[chatbot_ui, user_input],
154
+ show_progress=True
155
+ )
156
+
157
+ submit_btn.click(
158
+ fn=submit_message,
159
+ inputs=[user_input, chatbot_ui],
160
+ outputs=[chatbot_ui, user_input],
161
+ show_progress=True
162
+ )
163
+
164
+ # Footer
165
+ gr.Markdown(
166
+ """
167
+ ---
168
+ **🏥 Medical AI Assistant** | Powered by BioGPT | For Educational Purposes Only
169
+
170
+ **Remember:** Always consult healthcare professionals for medical emergencies and personalized medical advice.
171
+ """
172
+ )
173
+
174
+ # Launch configuration for Hugging Face Spaces
175
+ if __name__ == "__main__":
176
+ # For Hugging Face Spaces deployment
177
+ demo.launch(
178
+ server_name="0.0.0.0", # Required for HF Spaces
179
+ server_port=7860, # Default port for HF Spaces
180
+ show_error=True, # Show errors for debugging
181
+ show_tips=False, # Disable tips for cleaner interface
182
+ enable_queue=True, # Enable queue for better performance
183
+ max_threads=10 # Limit concurrent users
184
+ )
medical_chatbot.py ADDED
@@ -0,0 +1,447 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ import torch
4
+ import warnings
5
+ import numpy as np
6
+ import faiss
7
+ from transformers import (
8
+ AutoTokenizer,
9
+ AutoModelForCausalLM,
10
+ BitsAndBytesConfig
11
+ )
12
+ from sentence_transformers import SentenceTransformer
13
+ from typing import List, Dict, Optional
14
+ import time
15
+ from datetime import datetime
16
+
17
+ # Suppress warnings for cleaner output
18
+ warnings.filterwarnings('ignore')
19
+
20
+ class ColabBioGPTChatbot:
21
+ def __init__(self, use_gpu=True, use_8bit=True):
22
+ """Initialize BioGPT chatbot optimized for Hugging Face Spaces"""
23
+ print("🏥 Initializing Medical Chatbot...")
24
+ self.use_gpu = use_gpu
25
+ self.use_8bit = use_8bit
26
+ self.device = "cuda" if torch.cuda.is_available() and use_gpu else "cpu"
27
+ print(f"🖥️ Using device: {self.device}")
28
+
29
+ self.tokenizer = None
30
+ self.model = None
31
+ self.knowledge_chunks = []
32
+ self.conversation_history = []
33
+ self.embedding_model = None
34
+ self.faiss_index = None
35
+ self.faiss_ready = False
36
+ self.use_embeddings = True
37
+
38
+ # Initialize components
39
+ self.setup_biogpt()
40
+ self.load_sentence_transformer()
41
+
42
+ def setup_biogpt(self):
43
+ """Setup BioGPT model with fallback to base BioGPT if Large fails"""
44
+ print("🧠 Loading BioGPT model...")
45
+
46
+ try:
47
+ # Try BioGPT-Large first
48
+ model_name = "microsoft/BioGPT-Large"
49
+ print(f"Attempting to load {model_name}...")
50
+
51
+ if self.use_8bit and self.device == "cuda":
52
+ quantization_config = BitsAndBytesConfig(
53
+ load_in_8bit=True,
54
+ llm_int8_threshold=6.0,
55
+ llm_int8_has_fp16_weight=False,
56
+ )
57
+ else:
58
+ quantization_config = None
59
+
60
+ self.tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+ if self.tokenizer.pad_token is None:
62
+ self.tokenizer.pad_token = self.tokenizer.eos_token
63
+
64
+ self.model = AutoModelForCausalLM.from_pretrained(
65
+ model_name,
66
+ quantization_config=quantization_config,
67
+ torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
68
+ device_map="auto" if self.device == "cuda" else None,
69
+ trust_remote_code=True,
70
+ low_cpu_mem_usage=True
71
+ )
72
+
73
+ if self.device == "cuda" and quantization_config is None:
74
+ self.model = self.model.to(self.device)
75
+
76
+ print("✅ BioGPT-Large loaded successfully!")
77
+
78
+ except Exception as e:
79
+ print(f"❌ BioGPT-Large loading failed: {e}")
80
+ print("🔁 Falling back to base BioGPT...")
81
+ self.setup_fallback_biogpt()
82
+
83
+ def setup_fallback_biogpt(self):
84
+ """Fallback to microsoft/BioGPT if BioGPT-Large fails"""
85
+ try:
86
+ model_name = "microsoft/BioGPT"
87
+ print(f"Loading fallback model: {model_name}")
88
+
89
+ self.tokenizer = AutoTokenizer.from_pretrained(model_name)
90
+ if self.tokenizer.pad_token is None:
91
+ self.tokenizer.pad_token = self.tokenizer.eos_token
92
+
93
+ self.model = AutoModelForCausalLM.from_pretrained(
94
+ model_name,
95
+ torch_dtype=torch.float32,
96
+ trust_remote_code=True,
97
+ low_cpu_mem_usage=True
98
+ )
99
+
100
+ if self.device == "cuda":
101
+ self.model = self.model.to(self.device)
102
+
103
+ print("✅ Base BioGPT model loaded successfully!")
104
+
105
+ except Exception as e:
106
+ print(f"❌ Failed to load fallback BioGPT: {e}")
107
+ self.model = None
108
+ self.tokenizer = None
109
+
110
+ def load_sentence_transformer(self):
111
+ """Load sentence transformer for embeddings"""
112
+ try:
113
+ print("🔮 Loading sentence transformer...")
114
+ self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
115
+
116
+ # Initialize FAISS index (will be populated when data is loaded)
117
+ embedding_dim = 384 # Dimension for all-MiniLM-L6-v2
118
+ self.faiss_index = faiss.IndexFlatL2(embedding_dim)
119
+ self.faiss_ready = True
120
+ print("✅ Sentence transformer and FAISS index ready!")
121
+
122
+ except Exception as e:
123
+ print(f"❌ Failed to load sentence transformer: {e}")
124
+ self.use_embeddings = False
125
+ self.faiss_ready = False
126
+
127
+ def load_medical_data(self, file_path):
128
+ """Load and process medical data"""
129
+ print(f"📖 Loading medical data from {file_path}...")
130
+
131
+ try:
132
+ if not os.path.exists(file_path):
133
+ raise FileNotFoundError(f"File {file_path} not found")
134
+
135
+ with open(file_path, 'r', encoding='utf-8') as f:
136
+ text = f.read()
137
+ print(f"📄 File loaded: {len(text):,} characters")
138
+
139
+ except Exception as e:
140
+ print(f"❌ Error loading file: {e}")
141
+ raise ValueError(f"Failed to load medical data: {e}")
142
+
143
+ # Create chunks
144
+ print("📝 Creating medical chunks...")
145
+ chunks = self.create_medical_chunks(text)
146
+ print(f"📋 Created {len(chunks)} medical chunks")
147
+
148
+ self.knowledge_chunks = chunks
149
+
150
+ # Generate embeddings if available
151
+ if self.use_embeddings and self.embedding_model and self.faiss_ready:
152
+ try:
153
+ self.generate_embeddings_with_progress(chunks)
154
+ print("✅ Medical data loaded with embeddings!")
155
+ except Exception as e:
156
+ print(f"⚠️ Embedding generation failed: {e}")
157
+ print("✅ Medical data loaded (keyword search mode)")
158
+ else:
159
+ print("✅ Medical data loaded (keyword search mode)")
160
+
161
+ def create_medical_chunks(self, text: str, chunk_size: int = 400) -> List[Dict]:
162
+ """Create medically-optimized text chunks"""
163
+ chunks = []
164
+
165
+ # Split by paragraphs first
166
+ paragraphs = [p.strip() for p in text.split('\n\n') if len(p.strip()) > 50]
167
+
168
+ chunk_id = 0
169
+ for paragraph in paragraphs:
170
+ if len(paragraph.split()) <= chunk_size:
171
+ chunks.append({
172
+ 'id': chunk_id,
173
+ 'text': paragraph,
174
+ 'medical_focus': self.identify_medical_focus(paragraph)
175
+ })
176
+ chunk_id += 1
177
+ else:
178
+ # Split large paragraphs by sentences
179
+ sentences = re.split(r'[.!?]+', paragraph)
180
+ current_chunk = ""
181
+
182
+ for sentence in sentences:
183
+ sentence = sentence.strip()
184
+ if not sentence:
185
+ continue
186
+
187
+ if len(current_chunk.split()) + len(sentence.split()) <= chunk_size:
188
+ current_chunk += sentence + ". "
189
+ else:
190
+ if current_chunk.strip():
191
+ chunks.append({
192
+ 'id': chunk_id,
193
+ 'text': current_chunk.strip(),
194
+ 'medical_focus': self.identify_medical_focus(current_chunk)
195
+ })
196
+ chunk_id += 1
197
+ current_chunk = sentence + ". "
198
+
199
+ if current_chunk.strip():
200
+ chunks.append({
201
+ 'id': chunk_id,
202
+ 'text': current_chunk.strip(),
203
+ 'medical_focus': self.identify_medical_focus(current_chunk)
204
+ })
205
+ chunk_id += 1
206
+
207
+ return chunks
208
+
209
+ def identify_medical_focus(self, text: str) -> str:
210
+ """Identify the medical focus of a text chunk"""
211
+ text_lower = text.lower()
212
+
213
+ categories = {
214
+ 'pediatric_symptoms': ['fever', 'cough', 'rash', 'vomiting', 'diarrhea'],
215
+ 'treatments': ['treatment', 'therapy', 'medication', 'antibiotics'],
216
+ 'diagnosis': ['diagnosis', 'diagnostic', 'symptoms', 'signs'],
217
+ 'emergency': ['emergency', 'urgent', 'serious', 'hospital'],
218
+ 'prevention': ['prevention', 'vaccine', 'immunization', 'avoid']
219
+ }
220
+
221
+ for category, keywords in categories.items():
222
+ if any(keyword in text_lower for keyword in keywords):
223
+ return category
224
+
225
+ return 'general_medical'
226
+
227
+ def generate_embeddings_with_progress(self, chunks: List[Dict]):
228
+ """Generate embeddings and add to FAISS index"""
229
+ print("🔮 Generating embeddings...")
230
+
231
+ try:
232
+ texts = [chunk['text'] for chunk in chunks]
233
+
234
+ # Generate embeddings in batches
235
+ batch_size = 32
236
+ all_embeddings = []
237
+
238
+ for i in range(0, len(texts), batch_size):
239
+ batch_texts = texts[i:i+batch_size]
240
+ batch_embeddings = self.embedding_model.encode(batch_texts, show_progress_bar=False)
241
+ all_embeddings.extend(batch_embeddings)
242
+
243
+ progress = min(i + batch_size, len(texts))
244
+ print(f" Progress: {progress}/{len(texts)} chunks processed", end='\r')
245
+
246
+ print(f"\n ✅ Generated embeddings for {len(texts)} chunks")
247
+
248
+ # Add to FAISS index
249
+ embeddings_array = np.array(all_embeddings).astype('float32')
250
+ self.faiss_index.add(embeddings_array)
251
+ print("✅ Embeddings added to FAISS index!")
252
+
253
+ except Exception as e:
254
+ print(f"❌ Embedding generation failed: {e}")
255
+ raise
256
+
257
+ def retrieve_medical_context(self, query: str, n_results: int = 3) -> List[str]:
258
+ """Retrieve relevant medical context"""
259
+ if self.use_embeddings and self.embedding_model and self.faiss_ready and self.faiss_index.ntotal > 0:
260
+ try:
261
+ # Generate query embedding
262
+ query_embedding = self.embedding_model.encode([query])
263
+
264
+ # Search FAISS index
265
+ distances, indices = self.faiss_index.search(
266
+ np.array(query_embedding).astype('float32'),
267
+ min(n_results, self.faiss_index.ntotal)
268
+ )
269
+
270
+ # Get relevant chunks
271
+ context_chunks = []
272
+ for idx in indices[0]:
273
+ if idx != -1 and idx < len(self.knowledge_chunks):
274
+ context_chunks.append(self.knowledge_chunks[idx]['text'])
275
+
276
+ if context_chunks:
277
+ return context_chunks
278
+
279
+ except Exception as e:
280
+ print(f"⚠️ Embedding search failed: {e}")
281
+
282
+ # Fallback to keyword search
283
+ return self.keyword_search_medical(query, n_results)
284
+
285
+ def keyword_search_medical(self, query: str, n_results: int) -> List[str]:
286
+ """Medical-focused keyword search"""
287
+ if not self.knowledge_chunks:
288
+ return []
289
+
290
+ query_words = set(query.lower().split())
291
+ chunk_scores = []
292
+
293
+ for chunk_info in self.knowledge_chunks:
294
+ chunk_text = chunk_info['text']
295
+ chunk_words = set(chunk_text.lower().split())
296
+
297
+ # Calculate relevance score
298
+ word_overlap = len(query_words.intersection(chunk_words))
299
+ base_score = word_overlap / len(query_words) if query_words else 0
300
+
301
+ # Boost medical content
302
+ medical_boost = 0
303
+ if chunk_info.get('medical_focus') in ['pediatric_symptoms', 'treatments', 'diagnosis']:
304
+ medical_boost = 0.3
305
+
306
+ final_score = base_score + medical_boost
307
+
308
+ if final_score > 0:
309
+ chunk_scores.append((final_score, chunk_text))
310
+
311
+ # Return top matches
312
+ chunk_scores.sort(reverse=True)
313
+ return [chunk for _, chunk in chunk_scores[:n_results]]
314
+
315
+ def generate_biogpt_response(self, context: str, query: str) -> str:
316
+ """Generate medical response using BioGPT"""
317
+ if not self.model or not self.tokenizer:
318
+ return "Medical model not available. Please check the setup."
319
+
320
+ try:
321
+ # Create medical prompt
322
+ prompt = f"""Medical Context: {context[:800]}
323
+
324
+ Question: {query}
325
+
326
+ Medical Answer:"""
327
+
328
+ # Tokenize
329
+ inputs = self.tokenizer(
330
+ prompt,
331
+ return_tensors="pt",
332
+ truncation=True,
333
+ max_length=1024
334
+ )
335
+
336
+ # Move to device
337
+ if self.device == "cuda":
338
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
339
+
340
+ # Generate response
341
+ with torch.no_grad():
342
+ outputs = self.model.generate(
343
+ **inputs,
344
+ max_new_tokens=150,
345
+ do_sample=True,
346
+ temperature=0.7,
347
+ top_p=0.9,
348
+ pad_token_id=self.tokenizer.eos_token_id,
349
+ repetition_penalty=1.1
350
+ )
351
+
352
+ # Decode response
353
+ full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
354
+
355
+ # Extract generated part
356
+ if "Medical Answer:" in full_response:
357
+ generated_response = full_response.split("Medical Answer:")[-1].strip()
358
+ else:
359
+ generated_response = full_response[len(prompt):].strip()
360
+
361
+ return self.clean_medical_response(generated_response)
362
+
363
+ except Exception as e:
364
+ print(f"⚠️ BioGPT generation failed: {e}")
365
+ return self.fallback_response(context, query)
366
+
367
+ def clean_medical_response(self, response: str) -> str:
368
+ """Clean and format medical response"""
369
+ # Remove incomplete sentences and limit length
370
+ sentences = re.split(r'[.!?]+', response)
371
+ clean_sentences = []
372
+
373
+ for sentence in sentences:
374
+ sentence = sentence.strip()
375
+ if len(sentence) > 10 and not sentence.endswith(('and', 'or', 'but', 'however')):
376
+ clean_sentences.append(sentence)
377
+ if len(clean_sentences) >= 3:
378
+ break
379
+
380
+ if clean_sentences:
381
+ cleaned = '. '.join(clean_sentences) + '.'
382
+ else:
383
+ cleaned = response[:200] + '...' if len(response) > 200 else response
384
+
385
+ return cleaned
386
+
387
+ def fallback_response(self, context: str, query: str) -> str:
388
+ """Fallback response when BioGPT fails"""
389
+ sentences = [s.strip() for s in context.split('.') if len(s.strip()) > 20]
390
+
391
+ if sentences:
392
+ response = sentences[0] + '.'
393
+ if len(sentences) > 1:
394
+ response += ' ' + sentences[1] + '.'
395
+ else:
396
+ response = context[:300] + '...'
397
+
398
+ return response
399
+
400
+ def handle_conversational_interactions(self, query: str) -> Optional[str]:
401
+ """Handle conversational interactions"""
402
+ query_lower = query.lower().strip()
403
+
404
+ # Greetings
405
+ if any(greeting in query_lower for greeting in ['hello', 'hi', 'hey', 'good morning', 'good afternoon']):
406
+ return "👋 Hello! I'm your pediatric medical AI assistant. How can I help you with medical questions today?"
407
+
408
+ # Thanks
409
+ if any(thanks in query_lower for thanks in ['thank you', 'thanks', 'thx']):
410
+ return "🙏 You're welcome! I'm glad I could help. Remember to consult healthcare professionals for medical decisions. What else can I help you with?"
411
+
412
+ # Goodbyes
413
+ if any(bye in query_lower for bye in ['bye', 'goodbye', 'see you later']):
414
+ return "👋 Goodbye! Take care and remember to consult healthcare professionals for any medical concerns. Stay healthy!"
415
+
416
+ return None
417
+
418
+ def chat(self, query: str) -> str:
419
+ """Main chat function"""
420
+ if not query.strip():
421
+ return "Hello! I'm your pediatric medical AI assistant. How can I help you today?"
422
+
423
+ # Handle conversational interactions
424
+ conversational_response = self.handle_conversational_interactions(query)
425
+ if conversational_response:
426
+ return conversational_response
427
+
428
+ if not self.knowledge_chunks:
429
+ return "Please load medical data first to access the medical knowledge base."
430
+
431
+ if not self.model or not self.tokenizer:
432
+ return "Medical model not available. Please check the setup and try again."
433
+
434
+ # Retrieve context
435
+ context = self.retrieve_medical_context(query)
436
+
437
+ if not context:
438
+ return "I don't have specific information about this topic in my medical database. Please consult with a healthcare professional for personalized medical advice."
439
+
440
+ # Generate response
441
+ main_context = '\n\n'.join(context)
442
+ response = self.generate_biogpt_response(main_context, query)
443
+
444
+ # Format final response
445
+ final_response = f"🩺 **Medical Information:** {response}\n\n⚠️ **Important:** This information is for educational purposes only. Always consult with qualified healthcare professionals for medical diagnosis, treatment, and personalized advice."
446
+
447
+ return final_response
requirements.txt ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core ML and NLP libraries
2
+ torch>=2.0.0,<2.2.0
3
+ transformers>=4.30.0,<4.40.0
4
+ sentence-transformers>=2.2.0,<3.0.0
5
+ accelerate>=0.20.0,<0.25.0
6
+
7
+ # Quantization support (for GPU optimization)
8
+ bitsandbytes>=0.41.0,<0.43.0
9
+
10
+ # Vector search (CPU version for HF Spaces compatibility)
11
+ faiss-cpu>=1.7.4,<1.8.0
12
+
13
+ # Scientific computing
14
+ numpy>=1.21.0,<1.26.0
15
+ scipy>=1.9.0,<1.12.0
16
+
17
+ # Gradio for web interface
18
+ gradio>=3.40.0,<4.0.0
19
+
20
+ # Essential utilities
21
+ tqdm>=4.64.0
22
+ requests>=2.28.0
23
+ packaging>=21.0
24
+
25
+ # Tokenization support
26
+ tokenizers>=0.13.0,<0.16.0
27
+
28
+ # System monitoring
29
+ psutil>=5.9.0
30
+
31
+ # Additional stability packages
32
+ safetensors>=0.3.0
33
+ huggingface-hub>=0.15.0