--- license: mit sdk: gradio emoji: 🚀 colorFrom: gray sdk_version: 5.34.0 --- # 🔥 Hybrid Search RAGtim Bot A sophisticated hybrid search system combining semantic vector search with BM25 keyword matching for optimal information retrieval. ## 🚀 Features - **Hybrid Search**: Combines transformer-based semantic similarity with BM25 keyword ranking - **Multi-Modal Search**: Vector search, BM25 search, and intelligent fusion - **Real-time API**: RESTful endpoints for integration - **Interactive UI**: Three interfaces - Chat, Advanced Search, and Statistics - **Knowledge Base**: Comprehensive markdown-based knowledge system ## 🔧 Technology Stack - **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 (384-dim) - **Search**: Custom BM25 implementation + Vector similarity - **Framework**: Gradio 4.44.0 - **ML**: Transformers, PyTorch, NumPy - **Deployment**: Hugging Face Spaces ## 📚 Knowledge Base Structure The system processes markdown files from the `knowledge_base/` directory: - `about.md` - Personal information and professional summary - `research_details.md` - Research projects and methodologies - `publications_detailed.md` - Publications with technical details - `skills_expertise.md` - Technical skills and expertise - `experience_detailed.md` - Professional experience - `statistics.md` - Statistical methods and biostatistics ## 🔍 Search Methods ### Hybrid Search (Recommended) Combines semantic and keyword search with configurable weights: - Default: 60% vector + 40% BM25 - Optimal for most queries - Balances meaning and exact term matching ### Vector Search Pure semantic similarity using transformer embeddings: - Best for conceptual questions - Finds semantically related content - Language-agnostic similarity ### BM25 Search Traditional keyword-based ranking: - Excellent for specific terms - TF-IDF with document length normalization - Fast and interpretable ## 🛠️ API Endpoints ### Search API GET /api/stats ## 📊 Configuration Key parameters in `config.py`: - `BM25_K1 = 1.5` - Term frequency saturation - `BM25_B = 0.75` - Document length normalization - `DEFAULT_VECTOR_WEIGHT = 0.6` - Hybrid search weighting - `DEFAULT_BM25_WEIGHT = 0.4` - Hybrid search weighting ## 🚀 Deployment 1. Clone to Hugging Face Spaces 2. Ensure all markdown files are in `knowledge_base/` 3. The system auto-initializes on startup 4. Access via the provided Space URL ## 💡 Usage Examples **Chat Interface:** - "What is Raktim's LLM research?" - "Tell me about statistical methods" - "Describe multimodal AI capabilities" **Advanced Search:** - Adjust vector/BM25 weights - Compare search methods - Fine-tune result count **API Integration:** ```python import requests response = requests.get( "https://your-space.hf.space/api/search", params={ "query": "machine learning research", "top_k": 5, "search_type": "hybrid" } ) ```