Spaces:

Agents-MCP-Hackathon
/

Intelligent_Content_Organizer

Running

App Files Files Community

Nihal2000 commited on Jun 9

Commit

4a0fab5

1 Parent(s): 16ca714

Resolved Dropdown issue And MCP Server

Browse files

Files changed (7) hide show

README.md +152 -1
app.py +202 -443
config.py +5 -0
core/chunker.py +1 -0
mcp_server.py +1 -1
requirements.txt +4 -2
services/llm_service.py +229 -127

README.md CHANGED Viewed

@@ -6,8 +6,159 @@ colorTo: green
 sdk: gradio
 sdk_version: 5.32.0
 app_file: app.py
 pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 sdk: gradio
 sdk_version: 5.32.0
 app_file: app.py
+tag :
+    -mcp-server-track
+    -Agents-MCP-Hackathon
 pinned: false
 license: mit
 ---
+A powerful Model Context Protocol (MCP) server for intelligent content management with semantic search, summarization, and Q&A capabilities powered by **OpenAI, Mistral AI, and Anthropic Claude**.
+## 🎯 Features
+### 🔧 MCP Tools Available
+- **📄 Document Ingestion**: Upload and process documents (PDF, TXT, DOCX, images with OCR)
+- **🔍 Semantic Search**: Find relevant content using natural language queries
+- **📝 Summarization**: Generate summaries in different styles (concise, detailed, bullet points, executive)
+- **🏷️ Tag Generation**: Automatically generate relevant tags for content
+- **❓ Q&A System**: Ask questions about your documents using RAG (Retrieval-Augmented Generation)
+- **📊 Categorization**: Classify content into predefined or custom categories
+- **🔄 Batch Processing**: Process multiple documents at once
+- **📈 Analytics**: Get insights and statistics about your content
+### 🚀 Powered By
+- **🧠 OpenAI GPT models** for powerful text generation and understanding
+- **🔥 Mistral AI** for efficient text processing and analysis
+- **🤖 Anthropic Claude** for advanced reasoning (available as a specific choice or fallback)
+- **🔗 Sentence Transformers** for semantic embeddings
+- **📚 FAISS** for fast similarity search
+- **👁️ Tesseract OCR** for image text extraction
+- **🎨 Gradio** for the user interface and MCP server functionality
+**LLM Strategy**: The agent intelligently selects the best available LLM for most generative tasks when 'auto' model selection is used, prioritizing OpenAI, then Mistral, and finally Anthropic. Users can also specify a particular model family (e.g., 'gpt-', 'mistral-', 'claude-').
+## 📋 Complete File Structure
+intelligent-content-organizer/
+├── app.py                     # Main Gradio app and MCP server
+├── config.py                  # Configuration management
+├── mcp_server.py              # mcp server tools
+├── requirements.txt           # Dependencies
+├── README.md                  # Documentation
+├── .gitignore                # Git ignore rules
+├── core/                     # Core processing logic
+│   ├── init.py
+│   ├── models.py             # Data models
+│   ├── document_parser.py    # Document processing
+│   ├── text_preprocessor.py  # Text cleaning and processing
+│   └── chunker.py           # Text chunking strategies
+├── services/                 # Backend services
+│   ├── init.py
+│   ├── embedding_service.py  # Sentence transformers integration
+│   ├── llm_service.py       # Anthropic + Mistral integration
+│   ├── ocr_service.py       # Mistral OCR integration
+│   ├── vector_store_service.py # FAISS vector storage
+│   └── document_store_service.py # Document metadata storage
+└── mcp_tools/               # MCP tool definitions
+├── init.py
+├── ingestion_tool.py    # Document ingestion tool
+├── search_tool.py       # Semantic search tool
+├── generative_tool.py   # AI generation tool
+└── utils.py            # Utility functions
+## 🎯 Key Features Implemented
+1. **Full MCP Server**: Complete implementation with all tools exposed
+2. **Multi-Modal Processing**: PDF, TXT, DOCX, and image processing with OCR
+3. **Advanced Search**: Semantic search with FAISS, filtering, and multi-query support
+4. **AI-Powered Features**: Summarization, tagging, categorization, Q&A with RAG
+5. **Production Ready**: Error handling, logging, caching, rate limiting
+6. **Gradio UI**: Beautiful web interface for testing and direct use
+7. **Anthropic + Mistral**: Dual LLM support with fallbacks
+## 🎥 Demo Video
+[📹 Watch the demo video](https://your-demo-video-url.com)
+*The demo shows the MCP server in action, demonstrating document ingestion, semantic search, and Q&A capabilities, utilizing the configured LLM providers.*
+## 🛠️ Installation
+### Prerequisites
+- Python 3.9+
+- API keys for OpenAI and Mistral AI. An Anthropic API key.
+- **MCP Tools Reference** (Tool parameters like model allow specifying "auto" or a specific model family like "gpt-", "mistral-", "claude-")
+- **ingest_document**
+  - Process and index a document for searching.
+  - **Parameters:**
+    - `file_path` (string): Path to the document file (e.g., an uploaded file path).
+    - `file_type` (string, optional): File type/extension (e.g., ".pdf", ".txt"). If not provided, it's inferred from file_path.
+  - **Returns:**
+    - `success` (boolean): Whether the operation succeeded.
+    - `document_id` (string): Unique identifier for the processed document.
+    - `chunks_created` (integer): Number of text chunks created.
+    - `message` (string): Human-readable result message.
+- **semantic_search**
+  - Search through indexed content using natural language.
+  - **Parameters:**
+    - `query` (string): Search query.
+    - `top_k` (integer, optional): Number of results to return (default: 5).
+    - `filters` (object, optional): Search filters (e.g., {"document_id": "some_id"}).
+  - **Returns:**
+    - `success` (boolean): Whether the search succeeded.
+    - `results` (array of objects): Array of search results, each with content and score.
+    - `total_results` (integer): Number of results found.
+- **summarize_content**
+  - Generate a summary of provided content.
+  - **Parameters:**
+    - `content` (string, optional): Text content to summarize.
+    - `document_id` (string, optional): ID of document to summarize. (Either content or document_id must be provided).
+    - `style` (string, optional): Summary style: "concise", "detailed", "bullet_points", "executive" (default: "concise").
+    - `model` (string, optional): Specific LLM to use (e.g., "gpt-4o-mini", "mistral-large-latest", "auto"). Default: "auto".
+  - **Returns:**
+    - `success` (boolean): Whether summarization succeeded.
+    - `summary` (string): Generated summary.
+    - `original_length` (integer): Character length of original content.
+    - `summary_length` (integer): Character length of summary.
+- **generate_tags**
+  - Generate relevant tags for content.
+  - **Parameters:**
+    - `content` (string, optional): Text content to tag.
+    - `document_id` (string, optional): ID of document to tag. (Either content or document_id must be provided).
+    - `max_tags` (integer, optional): Maximum number of tags (default: 5).
+    - `model` (string, optional): Specific LLM to use. Default: "auto".
+  - **Returns:**
+    - `success` (boolean): Whether tag generation succeeded.
+    - `tags` (array of strings): Array of generated tags.
+- **answer_question**
+  - Answer questions using RAG over your indexed content.
+  - **Parameters:**
+    - `question` (string): Question to answer.
+    - `context_filter` (object, optional): Filters for context retrieval (e.g., {"document_id": "some_id"}).
+    - `model` (string, optional): Specific LLM to use. Default: "auto".
+  - **Returns:**
+    - `success` (boolean): Whether question answering succeeded.
+    - `answer` (string): Generated answer.
+    - `sources` (array of objects): Source document chunks used for context, each with document_id, chunk_id, and content.
+    - `confidence` (string, optional): Confidence level in the answer (LLM-dependent, might not always be present).
+📊 Performance
+Embedding Generation: ~100-500ms per document chunk
+Search: <50ms for most queries
+Summarization: 1-5s depending on content length
+Memory Usage: ~200-500MB base + ~1MB per 1000 document chunks
+Supported File Types: PDF, TXT, DOCX, PNG, JPG, JPEG, BMP, TIFF

app.py CHANGED Viewed

@@ -33,7 +33,6 @@ class ContentOrganizerMCPServer:
     def __init__(self):
         # Initialize services
         logger.info("Initializing Content Organizer MCP Server...")
         self.vector_store = VectorStoreService()
         self.document_store = DocumentStoreService()
         self.embedding_service = EmbeddingService()
@@ -56,13 +55,12 @@ class ContentOrganizerMCPServer:
             llm_service=self.llm_service,
             search_tool=self.search_tool
         )
         # Track processing status
         self.processing_status = {}
         # Document cache for quick access
         self.document_cache = {}
         logger.info("Content Organizer MCP Server initialized successfully!")
     def run_async(self, coro):
@@ -72,7 +70,6 @@ class ContentOrganizerMCPServer:
         except RuntimeError:
             loop = asyncio.new_event_loop()
             asyncio.set_event_loop(loop)
         if loop.is_running():
             # If loop is already running, create a task
             import concurrent.futures
@@ -87,31 +84,22 @@ class ContentOrganizerMCPServer:
         try:
             task_id = str(uuid.uuid4())
             self.processing_status[task_id] = {"status": "processing", "progress": 0}
             result = await self.ingestion_tool.process_document(file_path, file_type, task_id)
             if result.get("success"):
                 self.processing_status[task_id] = {"status": "completed", "progress": 100}
-                # Update document cache
                 doc_id = result.get("document_id")
                 if doc_id:
                     doc = await self.document_store.get_document(doc_id)
                     if doc:
                         self.document_cache[doc_id] = doc
                 return result
             else:
                 self.processing_status[task_id] = {"status": "failed", "error": result.get("error")}
                 return result
         except Exception as e:
             logger.error(f"Document ingestion failed: {str(e)}")
-            return {
-                "success": False,
-                "error": str(e),
-                "message": "Failed to process document"
-            }
     async def get_document_content_async(self, document_id: str) -> Optional[str]:
         """Get document content by ID"""
         try:
@@ -124,7 +112,6 @@ class ContentOrganizerMCPServer:
             if doc:
                 self.document_cache[document_id] = doc
                 return doc.content
             return None
         except Exception as e:
             logger.error(f"Error getting document content: {str(e)}")
@@ -134,149 +121,78 @@ class ContentOrganizerMCPServer:
         """MCP Tool: Perform semantic search"""
         try:
             results = await self.search_tool.search(query, top_k, filters)
-            return {
-                "success": True,
-                "query": query,
-                "results": [result.to_dict() for result in results],
-                "total_results": len(results)
-            }
         except Exception as e:
             logger.error(f"Semantic search failed: {str(e)}")
-            return {
-                "success": False,
-                "error": str(e),
-                "query": query,
-                "results": []
-            }
     async def summarize_content_async(self, content: str = None, document_id: str = None, style: str = "concise") -> Dict[str, Any]:
-        """MCP Tool: Summarize content or document"""
         try:
-            # If document_id provided, get content from document
             if document_id and document_id != "none":
                 content = await self.get_document_content_async(document_id)
                 if not content:
                     return {"success": False, "error": f"Document {document_id} not found"}
             if not content or not content.strip():
                 return {"success": False, "error": "No content provided for summarization"}
-            # Truncate content if too long (for API limits)
             max_content_length = 4000
             if len(content) > max_content_length:
                 content = content[:max_content_length] + "..."
             summary = await self.generative_tool.summarize(content, style)
-            return {
-                "success": True,
-                "summary": summary,
-                "original_length": len(content),
-                "summary_length": len(summary),
-                "style": style,
-                "document_id": document_id
-            }
         except Exception as e:
             logger.error(f"Summarization failed: {str(e)}")
-            return {
-                "success": False,
-                "error": str(e)
-            }
     async def generate_tags_async(self, content: str = None, document_id: str = None, max_tags: int = 5) -> Dict[str, Any]:
         """MCP Tool: Generate tags for content"""
         try:
-            # If document_id provided, get content from document
             if document_id and document_id != "none":
                 content = await self.get_document_content_async(document_id)
                 if not content:
                     return {"success": False, "error": f"Document {document_id} not found"}
             if not content or not content.strip():
                 return {"success": False, "error": "No content provided for tag generation"}
             tags = await self.generative_tool.generate_tags(content, max_tags)
-            # Update document tags if document_id provided
             if document_id and document_id != "none" and tags:
                 await self.document_store.update_document_metadata(document_id, {"tags": tags})
-            return {
-                "success": True,
-                "tags": tags,
-                "content_length": len(content),
-                "document_id": document_id
-            }
         except Exception as e:
             logger.error(f"Tag generation failed: {str(e)}")
-            return {
-                "success": False,
-                "error": str(e)
-            }
     async def answer_question_async(self, question: str, context_filter: Optional[Dict] = None) -> Dict[str, Any]:
-        """MCP Tool: Answer questions using RAG"""
         try:
-            # Search for relevant context
             search_results = await self.search_tool.search(question, top_k=5, filters=context_filter)
             if not search_results:
-                return {
-                    "success": False,
-                    "error": "No relevant context found in your documents. Please make sure you have uploaded relevant documents.",
-                    "question": question
-                }
-            # Generate answer using context
             answer = await self.generative_tool.answer_question(question, search_results)
-            return {
-                "success": True,
-                "question": question,
-                "answer": answer,
-                "sources": [result.to_dict() for result in search_results],
-                "confidence": "high" if len(search_results) >= 3 else "medium"
-            }
         except Exception as e:
             logger.error(f"Question answering failed: {str(e)}")
-            return {
-                "success": False,
-                "error": str(e),
-                "question": question
-            }
     def list_documents_sync(self, limit: int = 100, offset: int = 0) -> Dict[str, Any]:
-        """List stored documents"""
         try:
             documents = self.run_async(self.document_store.list_documents(limit, offset))
-            return {
-                "success": True,
-                "documents": [doc.to_dict() for doc in documents],
-                "total": len(documents)
-            }
         except Exception as e:
-            return {
-                "success": False,
-                "error": str(e)
-            }
-# Initialize the MCP server
 mcp_server = ContentOrganizerMCPServer()
-# Helper functions
 def get_document_list():
-    """Get list of documents for display"""
     try:
         result = mcp_server.list_documents_sync(limit=100)
         if result["success"]:
             if result["documents"]:
-                doc_list = "📚 Documents in Library:\n\n"
-                for i, doc in enumerate(result["documents"], 1):
-                    doc_list += f"{i}. {doc['filename']} (ID: {doc['id'][:8]}...)\n"
-                    doc_list += f"   Type: {doc['doc_type']}, Size: {doc['file_size']} bytes\n"
-                    if doc.get('tags'):
-                        doc_list += f"   Tags: {', '.join(doc['tags'])}\n"
-                    doc_list += f"   Created: {doc['created_at'][:10]}\n\n"
-                return doc_list
             else:
                 return "No documents in library yet. Upload some documents to get started!"
         else:
@@ -285,17 +201,10 @@ def get_document_list():
         return f"Error: {str(e)}"
 def get_document_choices():
-    """Get document choices for dropdown"""
     try:
         result = mcp_server.list_documents_sync(limit=100)
         if result["success"] and result["documents"]:
-            choices = []
-            for doc in result["documents"]:
-                # Create label with filename and shortened ID
-                choice_label = f"{doc['filename']} ({doc['id'][:8]}...)"
-                # Use full document ID as the value
-                choices.append((choice_label, doc['id']))
             logger.info(f"Generated {len(choices)} document choices")
             return choices
         return []
@@ -303,78 +212,82 @@ def get_document_choices():
         logger.error(f"Error getting document choices: {str(e)}")
         return []
-# Gradio Interface Functions
 def upload_and_process_file(file):
-    """Gradio interface for file upload"""
     if file is None:
-        return "No file uploaded", "", get_document_list(), gr.update(choices=get_document_choices()), gr.update(choices=get_document_choices()), gr.update(choices=get_document_choices()), gr.update(choices=get_document_choices())
     try:
-        # Get file path
         file_path = file.name if hasattr(file, 'name') else str(file)
-        file_type = Path(file_path).suffix.lower()
-        logger.info(f"Processing file: {file_path}")
-        # Process document
         result = mcp_server.run_async(mcp_server.ingest_document_async(file_path, file_type))
         if result["success"]:
-            # Get updated document list and choices
-            doc_list = get_document_list()
-            doc_choices = get_document_choices()
             return (
-                f"✅ Success: {result['message']}\nDocument ID: {result['document_id']}\nChunks created: {result['chunks_created']}",
                 result["document_id"],
-                doc_list,
-                gr.update(choices=doc_choices),
-                gr.update(choices=doc_choices),
-                gr.update(choices=doc_choices),
-                gr.update(choices=doc_choices)
             )
         else:
             return (
-                f"❌ Error: {result.get('error', 'Unknown error')}",
-                "",
-                get_document_list(),
-                gr.update(choices=get_document_choices()),
-                gr.update(choices=get_document_choices()),
-                gr.update(choices=get_document_choices()),
-                gr.update(choices=get_document_choices())
             )
     except Exception as e:
         logger.error(f"Error processing file: {str(e)}")
         return (
-            f"❌ Error: {str(e)}",
-            "",
-            get_document_list(),
-            gr.update(choices=get_document_choices()),
-            gr.update(choices=get_document_choices()),
-            gr.update(choices=get_document_choices()),
-            gr.update(choices=get_document_choices())
         )
 def perform_search(query, top_k):
-    """Gradio interface for search"""
     if not query.strip():
         return "Please enter a search query"
     try:
         result = mcp_server.run_async(mcp_server.semantic_search_async(query, int(top_k)))
         if result["success"]:
             if result["results"]:
-                output = f"🔍 Found {result['total_results']} results for: '{query}'\n\n"
-                for i, res in enumerate(result["results"], 1):
-                    output += f"Result {i}:\n"
-                    output += f"📊 Relevance Score: {res['score']:.3f}\n"
-                    output += f"📄 Content: {res['content'][:300]}...\n"
-                    if 'document_filename' in res.get('metadata', {}):
-                        output += f"📁 Source: {res['metadata']['document_filename']}\n"
-                    output += f"🔗 Document ID: {res.get('document_id', 'Unknown')}\n"
-                    output += "-" * 80 + "\n\n"
-                return output
             else:
                 return f"No results found for: '{query}'\n\nMake sure you have uploaded relevant documents first."
         else:
@@ -384,19 +297,10 @@ def perform_search(query, top_k):
         return f"❌ Error: {str(e)}"
 def summarize_document(doc_choice, custom_text, style):
-    """Gradio interface for summarization"""
     try:
-        # Debug logging
         logger.info(f"Summarize called with doc_choice: {doc_choice}, type: {type(doc_choice)}")
-        # Get document ID from dropdown choice
-        document_id = None
-        if doc_choice and doc_choice != "none" and doc_choice != "":
-            # When Gradio dropdown returns a choice, it returns the value part of the (label, value) tuple
-            document_id = doc_choice
-            logger.info(f"Using document ID: {document_id}")
-        # Use custom text if provided, otherwise use document
         if custom_text and custom_text.strip():
             logger.info("Using custom text for summarization")
             result = mcp_server.run_async(mcp_server.summarize_content_async(content=custom_text, style=style))
@@ -407,14 +311,14 @@ def summarize_document(doc_choice, custom_text, style):
             return "Please select a document from the dropdown or enter text to summarize"
         if result["success"]:
-            output = f"📝 Summary ({style} style):\n\n{result['summary']}\n\n"
-            output += f"📊 Statistics:\n"
-            output += f"- Original length: {result['original_length']} characters\n"
-            output += f"- Summary length: {result['summary_length']} characters\n"
-            output += f"- Compression ratio: {(1 - result['summary_length']/result['original_length'])*100:.1f}%\n"
             if result.get('document_id'):
-                output += f"- Document ID: {result['document_id']}\n"
-            return output
         else:
             return f"❌ Summarization failed: {result['error']}"
     except Exception as e:
@@ -422,19 +326,10 @@ def summarize_document(doc_choice, custom_text, style):
         return f"❌ Error: {str(e)}"
 def generate_tags_for_document(doc_choice, custom_text, max_tags):
-    """Gradio interface for tag generation"""
     try:
-        # Debug logging
         logger.info(f"Generate tags called with doc_choice: {doc_choice}, type: {type(doc_choice)}")
-        # Get document ID from dropdown choice
-        document_id = None
-        if doc_choice and doc_choice != "none" and doc_choice != "":
-            # When Gradio dropdown returns a choice, it returns the value part of the (label, value) tuple
-            document_id = doc_choice
-            logger.info(f"Using document ID: {document_id}")
-        # Use custom text if provided, otherwise use document
         if custom_text and custom_text.strip():
             logger.info("Using custom text for tag generation")
             result = mcp_server.run_async(mcp_server.generate_tags_async(content=custom_text, max_tags=int(max_tags)))
@@ -446,14 +341,14 @@ def generate_tags_for_document(doc_choice, custom_text, max_tags):
         if result["success"]:
             tags_str = ", ".join(result["tags"])
-            output = f"🏷️ Generated Tags:\n\n{tags_str}\n\n"
-            output += f"📊 Statistics:\n"
-            output += f"- Content length: {result['content_length']} characters\n"
-            output += f"- Number of tags: {len(result['tags'])}\n"
             if result.get('document_id'):
-                output += f"- Document ID: {result['document_id']}\n"
-                output += f"\n✅ Tags have been saved to the document."
-            return output
         else:
             return f"❌ Tag generation failed: {result['error']}"
     except Exception as e:
@@ -461,310 +356,174 @@ def generate_tags_for_document(doc_choice, custom_text, max_tags):
         return f"❌ Error: {str(e)}"
 def ask_question(question):
-    """Gradio interface for Q&A"""
     if not question.strip():
         return "Please enter a question"
     try:
         result = mcp_server.run_async(mcp_server.answer_question_async(question))
         if result["success"]:
-            output = f"❓ Question: {result['question']}\n\n"
-            output += f"💡 Answer:\n{result['answer']}\n\n"
-            output += f"🎯 Confidence: {result['confidence']}\n\n"
-            output += f"📚 Sources Used ({len(result['sources'])}):\n"
-            for i, source in enumerate(result['sources'], 1):
-                filename = source.get('metadata', {}).get('document_filename', 'Unknown')
-                output += f"\n{i}. 📄 {filename}\n"
-                output += f"   📝 Excerpt: {source['content'][:150]}...\n"
-                output += f"   📊 Relevance: {source['score']:.3f}\n"
-            return output
         else:
             return f"❌ {result.get('error', 'Failed to answer question')}"
     except Exception as e:
         return f"❌ Error: {str(e)}"
 def delete_document_from_library(document_id):
-    """deleting a document from the library"""
     try:
-        # Run the async delete_document method
-        result = mcp_server.run_async(mcp_server.document_store.delete_document(document_id))
-        if result:
-            msg = f"🗑️ Document {document_id[:8]}... deleted successfully."
         else:
-            msg = f"❌ Failed to delete document {document_id[:8]}..."
-        # Refresh document list and choices
-        doc_list = get_document_list()
-        doc_choices = get_document_choices()
-        return msg, doc_list, gr.update(choices=doc_choices), gr.update(choices=doc_choices), gr.update(choices=doc_choices), gr.update(choices=doc_choices)
-    except Exception as e:
-        return f"❌ Error: {str(e)}", get_document_list(), gr.update(choices=get_document_choices()), gr.update(choices=get_document_choices()), gr.update(choices=get_document_choices()), gr.update(choices=get_document_choices())
-def refresh_library():
-    """Refresh the document library display"""
-    doc_list = get_document_list()
-    doc_choices = get_document_choices()
-    return doc_list, gr.update(choices=doc_choices), gr.update(choices=doc_choices), gr.update(choices=doc_choices), gr.update(choices=doc_choices)
-# Create Gradio Interface
 def create_gradio_interface():
     with gr.Blocks(title="🧠 Intelligent Content Organizer MCP Agent", theme=gr.themes.Soft()) as interface:
         gr.Markdown("""
         # 🧠 Intelligent Content Organizer MCP Agent
         A powerful MCP (Model Context Protocol) server for intelligent content management with semantic search,
-        summarization, and Q&A capabilities powered by Anthropic Claude and Mistral AI.
         ## 🚀 Quick Start:
-        1. **Upload Documents** → Go to "📄 Upload Documents" tab
-        2. **Search Your Content** → Use "🔍 Search Documents" to find information
-        3. **Get Summaries** → Select any document in "📝 Summarize" tab
-        4. **Ask Questions** → Get answers from your documents in "❓ Ask Questions" tab
         """)
-        # State components for dropdowns
-        with gr.Row(visible=False):
-            doc_dropdown_sum = gr.Dropdown(label="Hidden", choices=get_document_choices())
-            doc_dropdown_tag = gr.Dropdown(label="Hidden", choices=get_document_choices())
-            delete_doc_dropdown = gr.Dropdown(label="Hidden", choices=get_document_choices())
         with gr.Tabs():
-            # 📚 Document Library Tab
             with gr.Tab("📚 Document Library"):
                 with gr.Row():
                     with gr.Column():
                         gr.Markdown("### Your Document Collection")
-                        document_list = gr.Textbox(
-                            label="Documents in Library",
-                            value=get_document_list(),
-                            lines=20,
-                            interactive=False
-                        )
-                        refresh_btn = gr.Button("🔄 Refresh Library", variant="secondary")
-                        delete_doc_dropdown_visible = gr.Dropdown(
-                            label="Select Document to Delete",
-                            choices=get_document_choices(),
-                            value=None,
-                            interactive=True,
-                            allow_custom_value=False
-                        )
                         delete_btn = gr.Button("🗑️ Delete Selected Document", variant="stop")
-                        delete_output = gr.Textbox(label="Delete Status", visible=True)
-                refresh_btn.click(
-                    fn=refresh_library,
-                    outputs=[document_list, delete_doc_dropdown_visible, doc_dropdown_sum, doc_dropdown_tag, delete_doc_dropdown]
-                )
-                delete_btn.click(
-                    delete_document_from_library,
-                    inputs=[delete_doc_dropdown_visible],
-                    outputs=[delete_output, document_list, delete_doc_dropdown_visible, doc_dropdown_sum, doc_dropdown_tag, delete_doc_dropdown]
-                )
-            # 📄 Upload Documents Tab
             with gr.Tab("📄 Upload Documents"):
                 with gr.Row():
                     with gr.Column():
                         gr.Markdown("### Add Documents to Your Library")
-                        file_input = gr.File(
-                            label="Select Document to Upload",
-                            file_types=[".pdf", ".txt", ".docx", ".png", ".jpg", ".jpeg"],
-                            type="filepath"
-                        )
-                        upload_btn = gr.Button("🚀 Process & Add to Library", variant="primary", size="lg")
                     with gr.Column():
-                        upload_output = gr.Textbox(
-                            label="Processing Result",
-                            lines=6,
-                            placeholder="Upload a document to see processing results..."
-                        )
-                        doc_id_output = gr.Textbox(
-                            label="Document ID",
-                            placeholder="Document ID will appear here after processing..."
-                        )
-                upload_btn.click(
-                    upload_and_process_file,
-                    inputs=[file_input],
-                    outputs=[upload_output, doc_id_output, document_list, delete_doc_dropdown_visible, doc_dropdown_sum, doc_dropdown_tag, delete_doc_dropdown]
-                )
-            # 🔍 Search Documents Tab
             with gr.Tab("🔍 Search Documents"):
                 with gr.Row():
                     with gr.Column(scale=1):
                         gr.Markdown("### Search Your Document Library")
-                        search_query = gr.Textbox(
-                            label="What are you looking for?",
-                            placeholder="Enter your search query...",
-                            lines=2
-                        )
-                        search_top_k = gr.Slider(
-                            label="Number of Results",
-                            minimum=1,
-                            maximum=20,
-                            value=5,
-                            step=1
-                        )
-                        search_btn = gr.Button("🔍 Search Library", variant="primary", size="lg")
                     with gr.Column(scale=2):
-                        search_output = gr.Textbox(
-                            label="Search Results",
-                            lines=20,
-                            placeholder="Search results will appear here..."
-                        )
-                search_btn.click(
-                    perform_search,
-                    inputs=[search_query, search_top_k],
-                    outputs=[search_output]
-                )
-            # 📝 Summarize Tab
             with gr.Tab("📝 Summarize"):
                 with gr.Row():
                     with gr.Column():
                         gr.Markdown("### Generate Document Summaries")
-                        doc_dropdown_sum_visible = gr.Dropdown(
-                            label="Select Document to Summarize",
-                            choices=get_document_choices(),
-                            value=None,
-                            interactive=True,
-                            allow_custom_value=False
-                        )
-                        summary_text = gr.Textbox(
-                            label="Or Paste Text to Summarize",
-                            placeholder="Paste any text here to summarize...",
-                            lines=8
-                        )
-                        summary_style = gr.Dropdown(
-                            label="Summary Style",
-                            choices=["concise", "detailed", "bullet_points", "executive"],
-                            value="concise",
-                            info="Choose how you want the summary formatted"
-                        )
-                        summarize_btn = gr.Button("📝 Generate Summary", variant="primary", size="lg")
                     with gr.Column():
-                        summary_output = gr.Textbox(
-                            label="Generated Summary",
-                            lines=20,
-                            placeholder="Summary will appear here..."
-                        )
-                summarize_btn.click(
-                    summarize_document,
-                    inputs=[doc_dropdown_sum_visible, summary_text, summary_style],
-                    outputs=[summary_output]
-                )
-            # 🏷️ Generate Tags Tab
             with gr.Tab("🏷️ Generate Tags"):
                 with gr.Row():
                     with gr.Column():
-                        gr.Markdown("### Auto-Generate Document Tags")
-                        doc_dropdown_tag_visible = gr.Dropdown(
-                            label="Select Document to Tag",
-                            choices=get_document_choices(),
-                            value=None,
-                            interactive=True,
-                            allow_custom_value=False
-                        )
-                        tag_text = gr.Textbox(
-                            label="Or Paste Text to Generate Tags",
-                            placeholder="Paste any text here to generate tags...",
-                            lines=8
-                        )
-                        max_tags = gr.Slider(
-                            label="Number of Tags",
-                            minimum=3,
-                            maximum=15,
-                            value=5,
-                            step=1
-                        )
-                        tag_btn = gr.Button("🏷️ Generate Tags", variant="primary", size="lg")
                     with gr.Column():
-                        tag_output = gr.Textbox(
-                            label="Generated Tags",
-                            lines=10,
-                            placeholder="Tags will appear here..."
-                        )
-                tag_btn.click(
-                    generate_tags_for_document,
-                    inputs=[doc_dropdown_tag_visible, tag_text, max_tags],
-                    outputs=[tag_output]
-                )
-            # ❓ Ask Questions Tab
             with gr.Tab("❓ Ask Questions"):
                 with gr.Row():
                     with gr.Column():
-                        gr.Markdown("""
-                        ### Ask Questions About Your Documents
                         The AI will search through all your uploaded documents to find relevant information
-                        and provide comprehensive answers with sources.
-                        """)
-                        qa_question = gr.Textbox(
-                            label="Your Question",
-                            placeholder="Ask anything about your documents...",
-                            lines=3
-                        )
-                        qa_btn = gr.Button("❓ Get Answer", variant="primary", size="lg")
                     with gr.Column():
-                        qa_output = gr.Textbox(
-                            label="AI Answer",
-                            lines=20,
-                            placeholder="Answer will appear here with sources..."
-                        )
-                qa_btn.click(
-                    ask_question,
-                    inputs=[qa_question],
-                    outputs=[qa_output]
-                )
-        # Update hidden dropdowns when visible ones change
-        doc_dropdown_sum_visible.change(
-            lambda x: x,
-            inputs=[doc_dropdown_sum_visible],
-            outputs=[doc_dropdown_sum]
-        )
-        doc_dropdown_tag_visible.change(
-            lambda x: x,
-            inputs=[doc_dropdown_tag_visible],
-            outputs=[doc_dropdown_tag]
-        )
-        delete_doc_dropdown_visible.change(
-            lambda x: x,
-            inputs=[delete_doc_dropdown_visible],
-            outputs=[delete_doc_dropdown]
-        )
-        # Auto-refresh dropdowns when the app loads
-        interface.load(
-            fn=refresh_library,
-            outputs=[document_list, delete_doc_dropdown_visible, doc_dropdown_sum_visible, doc_dropdown_tag_visible, delete_doc_dropdown]
-        )
         return interface
-# Create and launch the interface
 if __name__ == "__main__":
-    interface = create_gradio_interface()
-    # Launch with proper configuration for Hugging Face Spaces
-    interface.launch(mcp_server=True)

     def __init__(self):
         # Initialize services
         logger.info("Initializing Content Organizer MCP Server...")
         self.vector_store = VectorStoreService()
         self.document_store = DocumentStoreService()
         self.embedding_service = EmbeddingService()
             llm_service=self.llm_service,
             search_tool=self.search_tool
         )
         # Track processing status
         self.processing_status = {}
         # Document cache for quick access
         self.document_cache = {}
         logger.info("Content Organizer MCP Server initialized successfully!")
     def run_async(self, coro):
         except RuntimeError:
             loop = asyncio.new_event_loop()
             asyncio.set_event_loop(loop)
         if loop.is_running():
             # If loop is already running, create a task
             import concurrent.futures
         try:
             task_id = str(uuid.uuid4())
             self.processing_status[task_id] = {"status": "processing", "progress": 0}
             result = await self.ingestion_tool.process_document(file_path, file_type, task_id)
             if result.get("success"):
                 self.processing_status[task_id] = {"status": "completed", "progress": 100}
                 doc_id = result.get("document_id")
                 if doc_id:
                     doc = await self.document_store.get_document(doc_id)
                     if doc:
                         self.document_cache[doc_id] = doc
                 return result
             else:
                 self.processing_status[task_id] = {"status": "failed", "error": result.get("error")}
                 return result
         except Exception as e:
             logger.error(f"Document ingestion failed: {str(e)}")
+            return {"success": False, "error": str(e), "message": "Failed to process document"}
     async def get_document_content_async(self, document_id: str) -> Optional[str]:
         """Get document content by ID"""
         try:
             if doc:
                 self.document_cache[document_id] = doc
                 return doc.content
             return None
         except Exception as e:
             logger.error(f"Error getting document content: {str(e)}")
         """MCP Tool: Perform semantic search"""
         try:
             results = await self.search_tool.search(query, top_k, filters)
+            return {"success": True, "query": query, "results": [result.to_dict() for result in results], "total_results": len(results)}
         except Exception as e:
             logger.error(f"Semantic search failed: {str(e)}")
+            return {"success": False, "error": str(e), "query": query, "results": []}
     async def summarize_content_async(self, content: str = None, document_id: str = None, style: str = "concise") -> Dict[str, Any]:
         try:
             if document_id and document_id != "none":
                 content = await self.get_document_content_async(document_id)
                 if not content:
                     return {"success": False, "error": f"Document {document_id} not found"}
             if not content or not content.strip():
                 return {"success": False, "error": "No content provided for summarization"}
             max_content_length = 4000
             if len(content) > max_content_length:
                 content = content[:max_content_length] + "..."
             summary = await self.generative_tool.summarize(content, style)
+            return {"success": True, "summary": summary, "original_length": len(content), "summary_length": len(summary), "style": style, "document_id": document_id}
         except Exception as e:
             logger.error(f"Summarization failed: {str(e)}")
+            return {"success": False, "error": str(e)}
     async def generate_tags_async(self, content: str = None, document_id: str = None, max_tags: int = 5) -> Dict[str, Any]:
         """MCP Tool: Generate tags for content"""
         try:
             if document_id and document_id != "none":
                 content = await self.get_document_content_async(document_id)
                 if not content:
                     return {"success": False, "error": f"Document {document_id} not found"}
             if not content or not content.strip():
                 return {"success": False, "error": "No content provided for tag generation"}
             tags = await self.generative_tool.generate_tags(content, max_tags)
             if document_id and document_id != "none" and tags:
                 await self.document_store.update_document_metadata(document_id, {"tags": tags})
+            return {"success": True, "tags": tags, "content_length": len(content), "document_id": document_id}
         except Exception as e:
             logger.error(f"Tag generation failed: {str(e)}")
+            return {"success": False, "error": str(e)}
     async def answer_question_async(self, question: str, context_filter: Optional[Dict] = None) -> Dict[str, Any]:
         try:
             search_results = await self.search_tool.search(question, top_k=5, filters=context_filter)
             if not search_results:
+                return {"success": False, "error": "No relevant context found in your documents. Please make sure you have uploaded relevant documents.", "question": question}
             answer = await self.generative_tool.answer_question(question, search_results)
+            return {"success": True, "question": question, "answer": answer, "sources": [result.to_dict() for result in search_results], "confidence": "high" if len(search_results) >= 3 else "medium"}
         except Exception as e:
             logger.error(f"Question answering failed: {str(e)}")
+            return {"success": False, "error": str(e), "question": question}
     def list_documents_sync(self, limit: int = 100, offset: int = 0) -> Dict[str, Any]:
         try:
             documents = self.run_async(self.document_store.list_documents(limit, offset))
+            return {"success": True, "documents": [doc.to_dict() for doc in documents], "total": len(documents)}
         except Exception as e:
+            return {"success": False, "error": str(e)}
 mcp_server = ContentOrganizerMCPServer()
 def get_document_list():
     try:
         result = mcp_server.list_documents_sync(limit=100)
         if result["success"]:
             if result["documents"]:
+                doc_list_str = "📚 Documents in Library:\n\n"
+                for i, doc_item in enumerate(result["documents"], 1):
+                    doc_list_str += f"{i}. {doc_item['filename']} (ID: {doc_item['id'][:8]}...)\n"
+                    doc_list_str += f"   Type: {doc_item['doc_type']}, Size: {doc_item['file_size']} bytes\n"
+                    if doc_item.get('tags'):
+                        doc_list_str += f"   Tags: {', '.join(doc_item['tags'])}\n"
+                    doc_list_str += f"   Created: {doc_item['created_at'][:10]}\n\n"
+                return doc_list_str
             else:
                 return "No documents in library yet. Upload some documents to get started!"
         else:
         return f"Error: {str(e)}"
 def get_document_choices():
     try:
         result = mcp_server.list_documents_sync(limit=100)
         if result["success"] and result["documents"]:
+            choices = [(f"{doc['filename']} ({doc['id'][:8]}...)", doc['id']) for doc in result["documents"]]
             logger.info(f"Generated {len(choices)} document choices")
             return choices
         return []
         logger.error(f"Error getting document choices: {str(e)}")
         return []
+def refresh_library():
+    doc_list_refreshed = get_document_list()
+    doc_choices_refreshed = get_document_choices()
+    logger.info(f"Refreshing library. Found {len(doc_choices_refreshed)} choices.")
+    return (
+        doc_list_refreshed,
+        gr.update(choices=doc_choices_refreshed),
+        gr.update(choices=doc_choices_refreshed),
+        gr.update(choices=doc_choices_refreshed)
+    )
 def upload_and_process_file(file):
     if file is None:
+        doc_list_initial = get_document_list()
+        doc_choices_initial = get_document_choices()
+        return (
+            "No file uploaded", "", doc_list_initial,
+            gr.update(choices=doc_choices_initial),
+            gr.update(choices=doc_choices_initial),
+            gr.update(choices=doc_choices_initial)
+        )
     try:
         file_path = file.name if hasattr(file, 'name') else str(file)
+        file_type = Path(file_path).suffix.lower().strip('.') # Ensure suffix is clean
+        logger.info(f"Processing file: {file_path}, type: {file_type}")
         result = mcp_server.run_async(mcp_server.ingest_document_async(file_path, file_type))
+        doc_list_updated = get_document_list()
+        doc_choices_updated = get_document_choices()
         if result["success"]:
             return (
+                f"✅ Success: {result['message']}\nDocument ID: {result['document_id']}\nChunks created: {result['chunks_created']}",
                 result["document_id"],
+                doc_list_updated,
+                gr.update(choices=doc_choices_updated),
+                gr.update(choices=doc_choices_updated),
+                gr.update(choices=doc_choices_updated)
             )
         else:
             return (
+                f"❌ Error: {result.get('error', 'Unknown error')}", "",
+                doc_list_updated,
+                gr.update(choices=doc_choices_updated),
+                gr.update(choices=doc_choices_updated),
+                gr.update(choices=doc_choices_updated)
             )
     except Exception as e:
         logger.error(f"Error processing file: {str(e)}")
+        doc_list_error = get_document_list()
+        doc_choices_error = get_document_choices()
         return (
+            f"❌ Error: {str(e)}", "",
+            doc_list_error,
+            gr.update(choices=doc_choices_error),
+            gr.update(choices=doc_choices_error),
+            gr.update(choices=doc_choices_error)
         )
 def perform_search(query, top_k):
     if not query.strip():
         return "Please enter a search query"
     try:
         result = mcp_server.run_async(mcp_server.semantic_search_async(query, int(top_k)))
         if result["success"]:
             if result["results"]:
+                output_str = f"🔍 Found {result['total_results']} results for: '{query}'\n\n"
+                for i, res_item in enumerate(result["results"], 1):
+                    output_str += f"Result {i}:\n"
+                    output_str += f"📊 Relevance Score: {res_item['score']:.3f}\n"
+                    output_str += f"📄 Content: {res_item['content'][:300]}...\n"
+                    if 'document_filename' in res_item.get('metadata', {}):
+                        output_str += f"📁 Source: {res_item['metadata']['document_filename']}\n"
+                    output_str += f"🔗 Document ID: {res_item.get('document_id', 'Unknown')}\n"
+                    output_str += "-" * 80 + "\n\n"
+                return output_str
             else:
                 return f"No results found for: '{query}'\n\nMake sure you have uploaded relevant documents first."
         else:
         return f"❌ Error: {str(e)}"
 def summarize_document(doc_choice, custom_text, style):
     try:
         logger.info(f"Summarize called with doc_choice: {doc_choice}, type: {type(doc_choice)}")
+        document_id = doc_choice if doc_choice and doc_choice != "none" and doc_choice != "" else None
         if custom_text and custom_text.strip():
             logger.info("Using custom text for summarization")
             result = mcp_server.run_async(mcp_server.summarize_content_async(content=custom_text, style=style))
             return "Please select a document from the dropdown or enter text to summarize"
         if result["success"]:
+            output_str = f"📝 Summary ({style} style):\n\n{result['summary']}\n\n"
+            output_str += f"📊 Statistics:\n"
+            output_str += f"- Original length: {result['original_length']} characters\n"
+            output_str += f"- Summary length: {result['summary_length']} characters\n"
+            output_str += f"- Compression ratio: {(1 - result['summary_length']/max(1,result['original_length']))*100:.1f}%\n" # Avoid division by zero
             if result.get('document_id'):
+                output_str += f"- Document ID: {result['document_id']}\n"
+            return output_str
         else:
             return f"❌ Summarization failed: {result['error']}"
     except Exception as e:
         return f"❌ Error: {str(e)}"
 def generate_tags_for_document(doc_choice, custom_text, max_tags):
     try:
         logger.info(f"Generate tags called with doc_choice: {doc_choice}, type: {type(doc_choice)}")
+        document_id = doc_choice if doc_choice and doc_choice != "none" and doc_choice != "" else None
         if custom_text and custom_text.strip():
             logger.info("Using custom text for tag generation")
             result = mcp_server.run_async(mcp_server.generate_tags_async(content=custom_text, max_tags=int(max_tags)))
         if result["success"]:
             tags_str = ", ".join(result["tags"])
+            output_str = f"🏷️ Generated Tags:\n\n{tags_str}\n\n"
+            output_str += f"📊 Statistics:\n"
+            output_str += f"- Content length: {result['content_length']} characters\n"
+            output_str += f"- Number of tags: {len(result['tags'])}\n"
             if result.get('document_id'):
+                output_str += f"- Document ID: {result['document_id']}\n"
+                output_str += f"\n✅ Tags have been saved to the document."
+            return output_str
         else:
             return f"❌ Tag generation failed: {result['error']}"
     except Exception as e:
         return f"❌ Error: {str(e)}"
 def ask_question(question):
     if not question.strip():
         return "Please enter a question"
     try:
         result = mcp_server.run_async(mcp_server.answer_question_async(question))
         if result["success"]:
+            output_str = f"❓ Question: {result['question']}\n\n"
+            output_str += f"💡 Answer:\n{result['answer']}\n\n"
+            output_str += f"🎯 Confidence: {result['confidence']}\n\n"
+            output_str += f"📚 Sources Used ({len(result['sources'])}):\n"
+            for i, source_item in enumerate(result['sources'], 1):
+                filename = source_item.get('metadata', {}).get('document_filename', 'Unknown')
+                output_str += f"\n{i}. 📄 {filename}\n"
+                output_str += f"   📝 Excerpt: {source_item['content'][:150]}...\n"
+                output_str += f"   📊 Relevance: {source_item['score']:.3f}\n"
+            return output_str
         else:
             return f"❌ {result.get('error', 'Failed to answer question')}"
     except Exception as e:
         return f"❌ Error: {str(e)}"
 def delete_document_from_library(document_id):
+    if not document_id:
+        doc_list_current = get_document_list()
+        doc_choices_current = get_document_choices()
+        return (
+            "No document selected to delete.",
+            doc_list_current,
+            gr.update(choices=doc_choices_current),
+            gr.update(choices=doc_choices_current),
+            gr.update(choices=doc_choices_current)
+        )
     try:
+        delete_doc_store_result = mcp_server.run_async(mcp_server.document_store.delete_document(document_id))
+        delete_vec_store_result = mcp_server.run_async(mcp_server.vector_store.delete_document(document_id))
+        msg = ""
+        if delete_doc_store_result:
+            msg += f"🗑️ Document {document_id[:8]}... deleted from document store. "
         else:
+            msg += f"❌ Failed to delete document {document_id[:8]}... from document store. "
+        if delete_vec_store_result:
+             msg += "Embeddings deleted from vector store."
+        else:
+             msg += "Failed to delete embeddings from vector store (or no embeddings existed)."
+        doc_list_updated = get_document_list()
+        doc_choices_updated = get_document_choices()
+        return (
+            msg,
+            doc_list_updated,
+            gr.update(choices=doc_choices_updated),
+            gr.update(choices=doc_choices_updated),
+            gr.update(choices=doc_choices_updated)
+        )
+    except Exception as e:
+        logger.error(f"Error deleting document: {str(e)}")
+        doc_list_error = get_document_list()
+        doc_choices_error = get_document_choices()
+        return (
+            f"❌ Error deleting document: {str(e)}",
+            doc_list_error,
+            gr.update(choices=doc_choices_error),
+            gr.update(choices=doc_choices_error),
+            gr.update(choices=doc_choices_error)
+        )
 def create_gradio_interface():
     with gr.Blocks(title="🧠 Intelligent Content Organizer MCP Agent", theme=gr.themes.Soft()) as interface:
         gr.Markdown("""
         # 🧠 Intelligent Content Organizer MCP Agent
         A powerful MCP (Model Context Protocol) server for intelligent content management with semantic search,
+        summarization, and Q&A capabilities.
         ## 🚀 Quick Start:
+        1. **Documents in Library** → View your uploaded documents in the "📚 Document Library" tab
+        2. **Upload Documents** → Go to "📄 Upload Documents" tab
+        3. **Search Your Content** → Use "🔍 Search Documents" to find information
+        4. **Get Summaries** → Select any document in "📝 Summarize" tab
+        5. **Generate Tags** → Auto-generate tags for your documents in "🏷️ Generate Tags" tab
+        6. **Ask Questions** → Get answers from your documents in "❓ Ask Questions" tab
+        7. **Delete Documents** → Remove documents from your library in "📚 Document Library" tab
+        8. **Refresh Library** → Click the 🔄 button to refresh the document list
         """)
         with gr.Tabs():
             with gr.Tab("📚 Document Library"):
                 with gr.Row():
                     with gr.Column():
                         gr.Markdown("### Your Document Collection")
+                        document_list_display = gr.Textbox(label="Documents in Library", value=get_document_list(), lines=20, interactive=False)
+                        refresh_btn_library = gr.Button("🔄 Refresh Library", variant="secondary")
+                        delete_doc_dropdown_visible = gr.Dropdown(label="Select Document to Delete", choices=get_document_choices(), value=None, interactive=True, allow_custom_value=False)
                         delete_btn = gr.Button("🗑️ Delete Selected Document", variant="stop")
+                        delete_output_display = gr.Textbox(label="Delete Status", visible=True)
             with gr.Tab("📄 Upload Documents"):
                 with gr.Row():
                     with gr.Column():
                         gr.Markdown("### Add Documents to Your Library")
+                        file_input_upload = gr.File(label="Select Document to Upload", file_types=[".pdf", ".txt", ".docx", ".png", ".jpg", ".jpeg"], type="filepath")
+                        upload_btn_process = gr.Button("🚀 Process & Add to Library", variant="primary", size="lg")
                     with gr.Column():
+                        upload_output_display = gr.Textbox(label="Processing Result", lines=6, placeholder="Upload a document to see processing results...")
+                        doc_id_output_display = gr.Textbox(label="Document ID", placeholder="Document ID will appear here after processing...")
             with gr.Tab("🔍 Search Documents"):
                 with gr.Row():
                     with gr.Column(scale=1):
                         gr.Markdown("### Search Your Document Library")
+                        search_query_input = gr.Textbox(label="What are you looking for?", placeholder="Enter your search query...", lines=2)
+                        search_top_k_slider = gr.Slider(label="Number of Results", minimum=1, maximum=20, value=5, step=1)
+                        search_btn_action = gr.Button("🔍 Search Library", variant="primary", size="lg")
                     with gr.Column(scale=2):
+                        search_output_display = gr.Textbox(label="Search Results", lines=20, placeholder="Search results will appear here...")
             with gr.Tab("📝 Summarize"):
                 with gr.Row():
                     with gr.Column():
                         gr.Markdown("### Generate Document Summaries")
+                        doc_dropdown_sum_visible = gr.Dropdown(label="Select Document to Summarize", choices=get_document_choices(), value=None, interactive=True, allow_custom_value=False)
+                        summary_text_input = gr.Textbox(label="Or Paste Text to Summarize", placeholder="Paste any text here to summarize...", lines=8)
+                        summary_style_dropdown = gr.Dropdown(label="Summary Style", choices=["concise", "detailed", "bullet_points", "executive"], value="concise", info="Choose how you want the summary formatted")
+                        summarize_btn_action = gr.Button("📝 Generate Summary", variant="primary", size="lg")
                     with gr.Column():
+                        summary_output_display = gr.Textbox(label="Generated Summary", lines=20, placeholder="Summary will appear here...")
             with gr.Tab("🏷️ Generate Tags"):
                 with gr.Row():
                     with gr.Column():
+                        gr.Markdown("### Generate Document Tags")
+                        doc_dropdown_tag_visible = gr.Dropdown(label="Select Document to Tag", choices=get_document_choices(), value=None, interactive=True, allow_custom_value=False)
+                        tag_text_input = gr.Textbox(label="Or Paste Text to Generate Tags", placeholder="Paste any text here to generate tags...", lines=8)
+                        max_tags_slider = gr.Slider(label="Number of Tags", minimum=3, maximum=15, value=5, step=1)
+                        tag_btn_action = gr.Button("🏷��� Generate Tags", variant="primary", size="lg")
                     with gr.Column():
+                        tag_output_display = gr.Textbox(label="Generated Tags", lines=10, placeholder="Tags will appear here...")
             with gr.Tab("❓ Ask Questions"):
                 with gr.Row():
                     with gr.Column():
+                        gr.Markdown("""### Ask Questions About Your Documents
                         The AI will search through all your uploaded documents to find relevant information
+                        and provide comprehensive answers with sources.""")
+                        qa_question_input = gr.Textbox(label="Your Question", placeholder="Ask anything about your documents...", lines=3)
+                        qa_btn_action = gr.Button("❓ Get Answer", variant="primary", size="lg")
                     with gr.Column():
+                        qa_output_display = gr.Textbox(label="AI Answer", lines=20, placeholder="Answer will appear here with sources...")
+        all_dropdowns_to_update = [delete_doc_dropdown_visible, doc_dropdown_sum_visible, doc_dropdown_tag_visible]
+        refresh_outputs = [document_list_display] + [dd for dd in all_dropdowns_to_update]
+        refresh_btn_library.click(fn=refresh_library, outputs=refresh_outputs)
+        upload_outputs = [upload_output_display, doc_id_output_display, document_list_display] + [dd for dd in all_dropdowns_to_update]
+        upload_btn_process.click(upload_and_process_file, inputs=[file_input_upload], outputs=upload_outputs)
+        delete_outputs = [delete_output_display, document_list_display] + [dd for dd in all_dropdowns_to_update]
+        delete_btn.click(delete_document_from_library, inputs=[delete_doc_dropdown_visible], outputs=delete_outputs)
+        search_btn_action.click(perform_search, inputs=[search_query_input, search_top_k_slider], outputs=[search_output_display])
+        summarize_btn_action.click(summarize_document, inputs=[doc_dropdown_sum_visible, summary_text_input, summary_style_dropdown], outputs=[summary_output_display])
+        tag_btn_action.click(generate_tags_for_document, inputs=[doc_dropdown_tag_visible, tag_text_input, max_tags_slider], outputs=[tag_output_display])
+        qa_btn_action.click(ask_question, inputs=[qa_question_input], outputs=[qa_output_display])
+        interface.load(fn=refresh_library, outputs=refresh_outputs)
         return interface
 if __name__ == "__main__":
+    gradio_interface = create_gradio_interface()
+    gradio_interface.launch(mcp_server=True)

config.py CHANGED Viewed

@@ -1,5 +1,8 @@
 import os
 from typing import Optional
 class Config:
@@ -7,11 +10,13 @@ class Config:
     ANTHROPIC_API_KEY: Optional[str] = os.getenv("ANTHROPIC_API_KEY")
     MISTRAL_API_KEY: Optional[str] = os.getenv("MISTRAL_API_KEY")
     HUGGINGFACE_API_KEY: Optional[str] = os.getenv("HUGGINGFACE_API_KEY", os.getenv("HF_TOKEN"))
     # Model Configuration
     EMBEDDING_MODEL: str = os.getenv("EMBEDDING_MODEL", "sentence-transformers/all-MiniLM-L6-v2")
     ANTHROPIC_MODEL: str = os.getenv("ANTHROPIC_MODEL", "claude-3-haiku-20240307")  # Using faster model
     MISTRAL_MODEL: str = os.getenv("MISTRAL_MODEL", "mistral-small-latest")  # Using smaller model
     # Vector Store Configuration
     VECTOR_STORE_PATH: str = os.getenv("VECTOR_STORE_PATH", "./data/vector_store")

 import os
 from typing import Optional
+from dotenv import load_dotenv
+load_dotenv()
 class Config:
     ANTHROPIC_API_KEY: Optional[str] = os.getenv("ANTHROPIC_API_KEY")
     MISTRAL_API_KEY: Optional[str] = os.getenv("MISTRAL_API_KEY")
     HUGGINGFACE_API_KEY: Optional[str] = os.getenv("HUGGINGFACE_API_KEY", os.getenv("HF_TOKEN"))
+    OPENAI_API_KEY: Optional[str] = os.getenv("OPENAI_API_KEY")
     # Model Configuration
     EMBEDDING_MODEL: str = os.getenv("EMBEDDING_MODEL", "sentence-transformers/all-MiniLM-L6-v2")
     ANTHROPIC_MODEL: str = os.getenv("ANTHROPIC_MODEL", "claude-3-haiku-20240307")  # Using faster model
     MISTRAL_MODEL: str = os.getenv("MISTRAL_MODEL", "mistral-small-latest")  # Using smaller model
+    OPENAI_MODEL: str = os.getenv("OPENAI_MODEL", "gpt-4o-mini")
     # Vector Store Configuration
     VECTOR_STORE_PATH: str = os.getenv("VECTOR_STORE_PATH", "./data/vector_store")

core/chunker.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import logging
 from typing import List, Dict, Any, Optional
 import re

+# chunker.py
 import logging
 from typing import List, Dict, Any, Optional
 import re

mcp_server.py CHANGED Viewed

@@ -41,7 +41,7 @@ generative_tool_instance = GenerativeTool(
     search_tool=search_tool_instance
 )
-mcp = FastMCP("intelligent-content-organizer-fmcp")
 logger.info("FastMCP server initialized.")
 @mcp.tool()

     search_tool=search_tool_instance
 )
+mcp = FastMCP("content")
 logger.info("FastMCP server initialized.")
 @mcp.tool()

requirements.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 gradio
 anthropic>=0.7.0
-mistralai>=0.4.2
 sentence-transformers>=2.2.2
 transformers>=4.30.0
 torch>=2.0.0
@@ -20,4 +20,6 @@ asyncio-mqtt>=0.11.1
 nest-asyncio>=1.5.6
 httpx
 fastmcp
-mcp

 gradio
 anthropic>=0.7.0
+mistralai
 sentence-transformers>=2.2.2
 transformers>=4.30.0
 torch>=2.0.0
 nest-asyncio>=1.5.6
 httpx
 fastmcp
+mcp
+openai
+python-dotenv

services/llm_service.py CHANGED Viewed

@@ -1,8 +1,10 @@
 import logging
 import asyncio
 from typing import List, Dict, Any, Optional
 import anthropic
-from mistralai import Mistral
 import config
 logger = logging.getLogger(__name__)
@@ -11,9 +13,9 @@ class LLMService:
     def __init__(self):
         self.config = config.config
-        # Initialize clients
         self.anthropic_client = None
-        self.mistral_client = None
         self._initialize_clients()
@@ -27,51 +29,110 @@ class LLMService:
                 logger.info("Anthropic client initialized")
             if self.config.MISTRAL_API_KEY:
-                self.mistral_client = Mistral(
                     api_key=self.config.MISTRAL_API_KEY
                 )
                 logger.info("Mistral client initialized")
-            if not self.anthropic_client and not self.mistral_client:
-                raise ValueError("No LLM clients could be initialized. Check API keys.")
         except Exception as e:
             logger.error(f"Error initializing LLM clients: {str(e)}")
             raise
     async def generate_text(self, prompt: str, model: str = "auto", max_tokens: int = 1000, temperature: float = 0.7) -> str:
-        """Generate text using the specified model"""
         try:
             if model == "auto":
-                # Use Claude if available, otherwise Mistral
-                if self.anthropic_client:
-                    return await self._generate_with_claude(prompt, max_tokens, temperature)
-                elif self.mistral_client:
-                    return await self._generate_with_mistral(prompt, max_tokens, temperature)
                 else:
-                    raise ValueError("No LLM clients available")
-            elif model.startswith("claude"):
-                if not self.anthropic_client:
-                    raise ValueError("Anthropic client not available")
-                return await self._generate_with_claude(prompt, max_tokens, temperature)
             elif model.startswith("mistral"):
                 if not self.mistral_client:
-                    raise ValueError("Mistral client not available")
-                return await self._generate_with_mistral(prompt, max_tokens, temperature)
             else:
-                raise ValueError(f"Unsupported model: {model}")
         except Exception as e:
-            logger.error(f"Error generating text: {str(e)}")
             raise
-    async def _generate_with_claude(self, prompt: str, max_tokens: int, temperature: float) -> str:
-        """Generate text using Claude"""
         try:
             loop = asyncio.get_event_loop()
             response = await loop.run_in_executor(
                 None,
                 lambda: self.anthropic_client.messages.create(
-                    model=self.config.ANTHROPIC_MODEL,
                     max_tokens=max_tokens,
                     temperature=temperature,
                     messages=[
@@ -79,66 +140,78 @@ class LLMService:
                     ]
                 )
             )
-            return response.content[0].text
         except Exception as e:
-            logger.error(f"Error with Claude generation: {str(e)}")
             raise
-    async def _generate_with_mistral(self, prompt: str, max_tokens: int, temperature: float) -> str:
-        """Generate text using Mistral"""
         try:
             loop = asyncio.get_event_loop()
             response = await loop.run_in_executor(
                 None,
                 lambda: self.mistral_client.chat(
-                    model=self.config.MISTRAL_MODEL,
                     messages=[{"role": "user", "content": prompt}],
-                    max_tokens=max_tokens,
                     temperature=temperature
                 )
             )
-            return response.choices[0].message.content
         except Exception as e:
-            logger.error(f"Error with Mistral generation: {str(e)}")
             raise
     async def summarize(self, text: str, style: str = "concise", max_length: Optional[int] = None) -> str:
-        """Generate a summary of the given text"""
         if not text.strip():
             return ""
-        # Create style-specific prompts
         style_prompts = {
             "concise": "Provide a concise summary of the following text, focusing on the main points:",
             "detailed": "Provide a detailed summary of the following text, including key details and supporting information:",
             "bullet_points": "Summarize the following text as a list of bullet points highlighting the main ideas:",
             "executive": "Provide an executive summary of the following text, focusing on key findings and actionable insights:"
         }
         prompt_template = style_prompts.get(style, style_prompts["concise"])
         if max_length:
-            prompt_template += f" Keep the summary under {max_length} words."
         prompt = f"{prompt_template}\n\nText to summarize:\n{text}\n\nSummary:"
         try:
-            summary = await self.generate_text(prompt, max_tokens=500, temperature=0.3)
             return summary.strip()
         except Exception as e:
             logger.error(f"Error generating summary: {str(e)}")
             return "Error generating summary"
     async def generate_tags(self, text: str, max_tags: int = 5) -> List[str]:
-        """Generate relevant tags for the given text"""
         if not text.strip():
             return []
-        prompt = f"""Generate {max_tags} relevant tags for the following text.
-        Tags should be concise, descriptive keywords or phrases that capture the main topics, themes, or concepts.
-        Return only the tags, separated by commas.
         Text:
         {text}
@@ -146,28 +219,22 @@ class LLMService:
         Tags:"""
         try:
-            response = await self.generate_text(prompt, max_tokens=100, temperature=0.5)
-            # Parse tags from response
-            tags = [tag.strip() for tag in response.split(',')]
-            tags = [tag for tag in tags if tag and len(tag) > 1]
-            return tags[:max_tags]
         except Exception as e:
             logger.error(f"Error generating tags: {str(e)}")
             return []
     async def categorize(self, text: str, categories: List[str]) -> str:
-        """Categorize text into one of the provided categories"""
         if not text.strip() or not categories:
             return "Uncategorized"
-        categories_str = ", ".join(categories)
-        prompt = f"""Classify the following text into one of these categories: {categories_str}
-        Choose the most appropriate category based on the content and main theme of the text.
-        Return only the category name, nothing else.
         Text to classify:
         {text}
@@ -175,111 +242,146 @@ class LLMService:
         Category:"""
         try:
-            response = await self.generate_text(prompt, max_tokens=50, temperature=0.1)
-            category = response.strip()
-            # Validate that the response is one of the provided categories
-            if category in categories:
-                return category
-            else:
-                # Try to find a close match
-                category_lower = category.lower()
-                for cat in categories:
-                    if cat.lower() in category_lower or category_lower in cat.lower():
-                        return cat
-                return categories[0] if categories else "Uncategorized"
         except Exception as e:
             logger.error(f"Error categorizing text: {str(e)}")
             return "Uncategorized"
-    async def answer_question(self, question: str, context: str, max_context_length: int = 2000) -> str:
-        """Answer a question based on the provided context"""
         if not question.strip():
-            return "No question provided"
         if not context.strip():
-            return "I don't have enough context to answer this question. Please provide more relevant information."
-        # Truncate context if too long
         if len(context) > max_context_length:
             context = context[:max_context_length] + "..."
-        prompt = f"""Based on the following context, answer the question. If the context doesn't contain enough information to answer the question completely, say so and provide what information you can.
-        Context:
-        {context}
-        Question: {question}
-        Answer:"""
         try:
-            answer = await self.generate_text(prompt, max_tokens=300, temperature=0.3)
             return answer.strip()
         except Exception as e:
             logger.error(f"Error answering question: {str(e)}")
             return "I encountered an error while trying to answer your question."
     async def extract_key_information(self, text: str) -> Dict[str, Any]:
-        """Extract key information from text"""
         if not text.strip():
             return {}
-        prompt = f"""Analyze the following text and extract key information. Provide the response in the following format:
-        Main Topic: [main topic or subject]
-        Key Points: [list 3-5 key points]
-        Entities: [important people, places, organizations mentioned]
-        Sentiment: [positive/neutral/negative]
-        Content Type: [article/document/email/report/etc.]
         Text to analyze:
         {text}
-        Analysis:"""
         try:
-            response = await self.generate_text(prompt, max_tokens=400, temperature=0.4)
-            # Parse the structured response
-            info = {}
-            lines = response.split('\n')
-            for line in lines:
-                if ':' in line:
-                    key, value = line.split(':', 1)
-                    key = key.strip().lower().replace(' ', '_')
-                    value = value.strip()
-                    if value:
-                        info[key] = value
-            return info
         except Exception as e:
             logger.error(f"Error extracting key information: {str(e)}")
-            return {}
     async def check_availability(self) -> Dict[str, bool]:
-        """Check which LLM services are available"""
         availability = {
-            "anthropic": False,
-            "mistral": False
         }
-        try:
-            if self.anthropic_client:
-                # Test Claude availability with a simple request
-                test_response = await self._generate_with_claude("Hello", 10, 0.1)
-                availability["anthropic"] = bool(test_response)
-        except:
-            pass
-        try:
-            if self.mistral_client:
-                # Test Mistral availability with a simple request
-                test_response = await self._generate_with_mistral("Hello", 10, 0.1)
-                availability["mistral"] = bool(test_response)
-        except:
-            pass
         return availability

+from mistralai import Mistral
 import logging
 import asyncio
 from typing import List, Dict, Any, Optional
 import anthropic
+import openai
 import config
 logger = logging.getLogger(__name__)
     def __init__(self):
         self.config = config.config
         self.anthropic_client = None
+        self.mistral_client = None # Synchronous Mistral client
+        self.openai_async_client = None # Asynchronous OpenAI client
         self._initialize_clients()
                 logger.info("Anthropic client initialized")
             if self.config.MISTRAL_API_KEY:
+                self.mistral_client = Mistral( # Standard sync client
                     api_key=self.config.MISTRAL_API_KEY
                 )
                 logger.info("Mistral client initialized")
+            if self.config.OPENAI_API_KEY:
+                self.openai_async_client = openai.AsyncOpenAI(
+                    api_key=self.config.OPENAI_API_KEY
+                )
+                logger.info("OpenAI client initialized")
+            # Check if at least one client is initialized
+            if not any([self.openai_async_client, self.mistral_client, self.anthropic_client]):
+                logger.warning("No LLM clients could be initialized based on current config. Check API keys.")
+            else:
+                logger.info("LLM clients initialized successfully (at least one).")
         except Exception as e:
             logger.error(f"Error initializing LLM clients: {str(e)}")
             raise
     async def generate_text(self, prompt: str, model: str = "auto", max_tokens: int = 1000, temperature: float = 0.7) -> str:
+        """Generate text using the specified model, with new priority for 'auto'."""
         try:
+            selected_model_name_for_call: str = "" # Actual model name passed to the specific generator
             if model == "auto":
+                # New Priority: 1. OpenAI, 2. Mistral, 3. Anthropic
+                if self.openai_async_client and self.config.OPENAI_MODEL:
+                    selected_model_name_for_call = self.config.OPENAI_MODEL
+                    logger.debug(f"Auto-selected OpenAI model: {selected_model_name_for_call}")
+                    return await self._generate_with_openai(prompt, selected_model_name_for_call, max_tokens, temperature)
+                elif self.mistral_client and self.config.MISTRAL_MODEL:
+                    selected_model_name_for_call = self.config.MISTRAL_MODEL
+                    logger.debug(f"Auto-selected Mistral model: {selected_model_name_for_call}")
+                    return await self._generate_with_mistral(prompt, selected_model_name_for_call, max_tokens, temperature)
+                elif self.anthropic_client and self.config.ANTHROPIC_MODEL:
+                    selected_model_name_for_call = self.config.ANTHROPIC_MODEL
+                    logger.debug(f"Auto-selected Anthropic model: {selected_model_name_for_call}")
+                    return await self._generate_with_claude(prompt, selected_model_name_for_call, max_tokens, temperature)
                 else:
+                    logger.error("No LLM clients available for 'auto' mode or default models not configured.")
+                    raise ValueError("No LLM clients available for 'auto' mode or default models not configured.")
+            elif model.startswith("gpt-") or model.lower().startswith("openai/"):
+                if not self.openai_async_client:
+                    raise ValueError("OpenAI client not available. Check API key or model prefix.")
+                actual_model = model.split('/')[-1] if '/' in model else model
+                return await self._generate_with_openai(prompt, actual_model, max_tokens, temperature)
             elif model.startswith("mistral"):
                 if not self.mistral_client:
+                    raise ValueError("Mistral client not available. Check API key or model prefix.")
+                return await self._generate_with_mistral(prompt, model, max_tokens, temperature)
+            elif model.startswith("claude"):
+                if not self.anthropic_client:
+                    raise ValueError("Anthropic client not available. Check API key or model prefix.")
+                return await self._generate_with_claude(prompt, model, max_tokens, temperature)
             else:
+                raise ValueError(f"Unsupported model: {model}. Must start with 'gpt-', 'openai/', 'claude', 'mistral', or be 'auto'.")
         except Exception as e:
+            logger.error(f"Error generating text with model '{model}': {str(e)}")
             raise
+    async def _generate_with_openai(self, prompt: str, model_name: str, max_tokens: int, temperature: float) -> str:
+        """Generate text using OpenAI (Async)"""
+        if not self.openai_async_client:
+            raise RuntimeError("OpenAI async client not initialized.")
+        try:
+            logger.debug(f"Generating with OpenAI model: {model_name}, max_tokens: {max_tokens}, temp: {temperature}, prompt: '{prompt[:50]}...'")
+            response = await self.openai_async_client.chat.completions.create(
+                model=model_name,
+                messages=[{"role": "user", "content": prompt}],
+                max_tokens=max_tokens,
+                temperature=temperature
+            )
+            if response.choices and response.choices[0].message:
+                 content = response.choices[0].message.content
+                 if content is not None:
+                     return content.strip()
+                 else:
+                     logger.warning(f"OpenAI response message content is None for model {model_name}.")
+                     return ""
+            else:
+                logger.warning(f"OpenAI response did not contain expected choices or message for model {model_name}.")
+                return ""
+        except Exception as e:
+            logger.error(f"Error with OpenAI generation (model: {model_name}): {str(e)}")
+            raise
+    async def _generate_with_claude(self, prompt: str, model_name: str, max_tokens: int, temperature: float) -> str:
+        """Generate text using Anthropic/Claude (Sync via run_in_executor)"""
+        if not self.anthropic_client:
+            raise RuntimeError("Anthropic client not initialized.")
         try:
+            logger.debug(f"Generating with Anthropic model: {model_name}, max_tokens: {max_tokens}, temp: {temperature}, prompt: '{prompt[:50]}...'")
             loop = asyncio.get_event_loop()
             response = await loop.run_in_executor(
                 None,
                 lambda: self.anthropic_client.messages.create(
+                    model=model_name, # Use the passed model_name
                     max_tokens=max_tokens,
                     temperature=temperature,
                     messages=[
                     ]
                 )
             )
+            if response.content and response.content[0].text:
+                return response.content[0].text.strip()
+            else:
+                logger.warning(f"Anthropic response did not contain expected content for model {model_name}.")
+                return ""
         except Exception as e:
+            logger.error(f"Error with Anthropic (Claude) generation (model: {model_name}): {str(e)}")
             raise
+    async def _generate_with_mistral(self, prompt: str, model_name: str, max_tokens: int, temperature: float) -> str:
+        """Generate text using Mistral (Sync via run_in_executor)"""
+        if not self.mistral_client:
+            raise RuntimeError("Mistral client not initialized.")
         try:
+            logger.debug(f"Generating with Mistral model: {model_name}, temp: {temperature}, prompt: '{prompt[:50]}...' (max_tokens: {max_tokens} - note: not directly used by MistralClient.chat)")
             loop = asyncio.get_event_loop()
             response = await loop.run_in_executor(
                 None,
                 lambda: self.mistral_client.chat(
+                    model=model_name, # Use the passed model_name
                     messages=[{"role": "user", "content": prompt}],
+                    max_tokens=max_tokens,
                     temperature=temperature
                 )
             )
+            if response.choices and response.choices[0].message:
+                content = response.choices[0].message.content
+                if content is not None:
+                    return content.strip()
+                else:
+                    logger.warning(f"Mistral response message content is None for model {model_name}.")
+                    return ""
+            else:
+                logger.warning(f"Mistral response did not contain expected choices or message for model {model_name}.")
+                return ""
         except Exception as e:
+            logger.error(f"Error with Mistral generation (model: {model_name}): {str(e)}")
             raise
     async def summarize(self, text: str, style: str = "concise", max_length: Optional[int] = None) -> str:
         if not text.strip():
             return ""
         style_prompts = {
             "concise": "Provide a concise summary of the following text, focusing on the main points:",
             "detailed": "Provide a detailed summary of the following text, including key details and supporting information:",
             "bullet_points": "Summarize the following text as a list of bullet points highlighting the main ideas:",
             "executive": "Provide an executive summary of the following text, focusing on key findings and actionable insights:"
         }
         prompt_template = style_prompts.get(style, style_prompts["concise"])
         if max_length:
+            prompt_template += f" Keep the summary under approximately {max_length} words."
         prompt = f"{prompt_template}\n\nText to summarize:\n{text}\n\nSummary:"
         try:
+            summary_max_tokens = (max_length * 2) if max_length else 500
+            summary = await self.generate_text(prompt, model="auto", max_tokens=summary_max_tokens, temperature=0.3)
             return summary.strip()
         except Exception as e:
             logger.error(f"Error generating summary: {str(e)}")
             return "Error generating summary"
     async def generate_tags(self, text: str, max_tags: int = 5) -> List[str]:
         if not text.strip():
             return []
+        prompt = f"""Generate up to {max_tags} relevant tags for the following text.
+        Tags should be concise, descriptive keywords or phrases (1-3 words typically) that capture the main topics or themes.
+        Return only the tags, separated by commas. Do not include any preamble or explanation.
         Text:
         {text}
         Tags:"""
         try:
+            response = await self.generate_text(prompt, model="auto", max_tokens=100, temperature=0.5)
+            tags = [tag.strip().lower() for tag in response.split(',') if tag.strip()]
+            tags = [tag for tag in tags if tag and len(tag) > 1 and len(tag) < 50]
+            return list(dict.fromkeys(tags))[:max_tags]
         except Exception as e:
             logger.error(f"Error generating tags: {str(e)}")
             return []
     async def categorize(self, text: str, categories: List[str]) -> str:
         if not text.strip() or not categories:
             return "Uncategorized"
+        categories_str = ", ".join([f"'{cat}'" for cat in categories])
+        prompt = f"""Classify the following text into ONE of these categories: {categories_str}.
+        Choose the single most appropriate category based on the content and main theme of the text.
+        Return only the category name as a string, exactly as it appears in the list provided. Do not add any other text or explanation.
         Text to classify:
         {text}
         Category:"""
         try:
+            response = await self.generate_text(prompt, model="auto", max_tokens=50, temperature=0.1)
+            category_candidate = response.strip().strip("'\"")
+            for cat in categories:
+                if cat.lower() == category_candidate.lower():
+                    return cat
+            logger.warning(f"LLM returned category '{category_candidate}' which is not in the provided list: {categories}. Falling back.")
+            return categories[0] if categories else "Uncategorized"
         except Exception as e:
             logger.error(f"Error categorizing text: {str(e)}")
             return "Uncategorized"
+    async def answer_question(self, question: str, context: str, max_context_length: int = 3000) -> str:
         if not question.strip():
+            return "No question provided."
         if not context.strip():
+            return "I don't have enough context to answer this question. Please provide relevant information."
         if len(context) > max_context_length:
             context = context[:max_context_length] + "..."
+            logger.warning(f"Context truncated to {max_context_length} characters for question answering.")
+        prompt = f"""You are a helpful assistant. Answer the following question based ONLY on the provided context.
+If the context does not contain the information to answer the question, state that the context does not provide the answer.
+Do not make up information or use external knowledge.
+Context:
+---
+{context}
+---
+Question: {question}
+Answer:"""
         try:
+            answer = await self.generate_text(prompt, model="auto", max_tokens=300, temperature=0.2)
             return answer.strip()
         except Exception as e:
             logger.error(f"Error answering question: {str(e)}")
             return "I encountered an error while trying to answer your question."
     async def extract_key_information(self, text: str) -> Dict[str, Any]:
         if not text.strip():
             return {}
+        prompt = f"""Analyze the following text and extract key information.
+        Provide the response as a JSON object with the following keys:
+        - "main_topic": (string) The main topic or subject of the text.
+        - "key_points": (array of strings) A list of 3-5 key points or takeaways.
+        - "entities": (array of strings) Important people, places, organizations, or products mentioned.
+        - "sentiment": (string) Overall sentiment of the text (e.g., "positive", "neutral", "negative", "mixed").
+        - "content_type": (string) The perceived type of content (e.g., "article", "email", "report", "conversation", "advertisement", "other").
+        If a piece of information is not found or not applicable, use null or an empty array/string as appropriate for the JSON structure.
         Text to analyze:
+        ---
         {text}
+        ---
+        JSON Analysis:"""
         try:
+            response_str = await self.generate_text(prompt, model="auto", max_tokens=500, temperature=0.4)
+            import json
+            try:
+                if response_str.startswith("```json"):
+                    response_str = response_str.lstrip("```json").rstrip("```").strip()
+                info = json.loads(response_str)
+                expected_keys = {"main_topic", "key_points", "entities", "sentiment", "content_type"}
+                if not expected_keys.issubset(info.keys()):
+                    logger.warning(f"Extracted information missing some expected keys. Got: {info.keys()}")
+                return info
+            except json.JSONDecodeError as je:
+                logger.error(f"Failed to parse JSON from LLM response for key_information: {je}")
+                logger.debug(f"LLM Response string was: {response_str}")
+                info_fallback = {}
+                lines = response_str.split('\n')
+                for line in lines:
+                    if ':' in line:
+                        key, value = line.split(':', 1)
+                        key_clean = key.strip().lower().replace(' ', '_')
+                        value_clean = value.strip()
+                        if value_clean:
+                            if key_clean in ["key_points", "entities"] and '[' in value_clean and ']' in value_clean:
+                                try:
+                                    info_fallback[key_clean] = [item.strip().strip("'\"") for item in value_clean.strip('[]').split(',') if item.strip()]
+                                except: info_fallback[key_clean] = value_clean
+                            else: info_fallback[key_clean] = value_clean
+                if info_fallback:
+                    logger.info("Successfully parsed key information using fallback line-based method.")
+                    return info_fallback
+                return {"error": "Failed to parse LLM output", "raw_response": response_str}
         except Exception as e:
             logger.error(f"Error extracting key information: {str(e)}")
+            return {"error": f"General error extracting key information: {str(e)}"}
     async def check_availability(self) -> Dict[str, bool]:
+        """Check which LLM services are available by making a tiny test call."""
         availability = {
+            "openai": False,
+            "mistral": False,
+            "anthropic": False
         }
+        test_prompt = "Hello"
+        test_max_tokens = 5
+        test_temp = 0.1
+        logger.info("Checking LLM availability...")
+        if self.openai_async_client and self.config.OPENAI_MODEL:
+            try:
+                logger.debug(f"Testing OpenAI availability with model {self.config.OPENAI_MODEL}...")
+                test_response = await self._generate_with_openai(test_prompt, self.config.OPENAI_MODEL, test_max_tokens, test_temp)
+                availability["openai"] = bool(test_response.strip())
+            except Exception as e:
+                logger.warning(f"OpenAI availability check failed for model {self.config.OPENAI_MODEL}: {e}")
+        logger.info(f"OpenAI available: {availability['openai']}")
+        if self.mistral_client and self.config.MISTRAL_MODEL:
+            try:
+                logger.debug(f"Testing Mistral availability with model {self.config.MISTRAL_MODEL}...")
+                test_response = await self._generate_with_mistral(test_prompt, self.config.MISTRAL_MODEL, test_max_tokens, test_temp)
+                availability["mistral"] = bool(test_response.strip())
+            except Exception as e:
+                logger.warning(f"Mistral availability check failed for model {self.config.MISTRAL_MODEL}: {e}")
+        logger.info(f"Mistral available: {availability['mistral']}")
+        if self.anthropic_client and self.config.ANTHROPIC_MODEL:
+            try:
+                logger.debug(f"Testing Anthropic availability with model {self.config.ANTHROPIC_MODEL}...")
+                test_response = await self._generate_with_claude(test_prompt, self.config.ANTHROPIC_MODEL, test_max_tokens, test_temp)
+                availability["anthropic"] = bool(test_response.strip())
+            except Exception as e:
+                logger.warning(f"Anthropic availability check failed for model {self.config.ANTHROPIC_MODEL}: {e}")
+        logger.info(f"Anthropic available: {availability['anthropic']}")
+        logger.info(f"Final LLM Availability: {availability}")
         return availability