Spaces:

raktimhugging
/

ragtim-bot

Running

App Files Files Community

raktimhugging commited on Jun 14

Commit

6e1da14

verified ·

1 Parent(s): 39dacf3

Update README.md

Browse files

Files changed (1) hide show

README.md +83 -81

README.md CHANGED Viewed

@@ -1,92 +1,94 @@
----
-title: RAGtim Bot - Raktim's AI Assistant
-emoji: 🤖
-colorFrom: green
-colorTo: blue
-sdk: gradio
-sdk_version: "4.44.0"
-app_file: app.py
-pinned: false
-license: mit
----
-# 🤖 RAGtim Bot - Raktim's AI Assistant
-An intelligent AI assistant powered by Hugging Face Transformers that answers questions about Raktim Mondol's research, expertise, and professional background.
-## 🌟 Features
-- **Complete Markdown Knowledge Base**: Loads all portfolio content from markdown files
-- **GPU-Accelerated Search**: Uses `sentence-transformers/all-MiniLM-L6-v2` for semantic similarity
-- **Comprehensive Coverage**: Research, publications, skills, experience, education, statistics
-- **API Endpoints**: Direct access to search and statistics
-- **Real-time Chat**: Interactive conversational interface
-## 📚 Knowledge Base
-This Space loads comprehensive information from:
-- **about.md** - Personal information, contact details, professional summary
-- **research_details.md** - Detailed research projects, methodologies, current work
-- **publications_detailed.md** - Complete publication details, technical contributions
-- **skills_expertise.md** - Comprehensive technical skills, tools, frameworks
-- **experience_detailed.md** - Professional experience, teaching, research roles
-- **statistics.md** - Statistical methods, biostatistics expertise, methodologies
-## 🔍 What You Can Ask
-- Research projects and methodologies
-- Publications with technical details
-- Technical skills and programming expertise
-- Educational background and achievements
-- Professional experience and teaching roles
-- Statistical methods and biostatistics applications
-- Awards, recognition, and professional development
-- Contact information and collaboration opportunities
-## 🚀 API Usage
-### Search API
-```python
-import requests
-response = requests.post(
-    "https://raktimhugging-ragtim-bot.hf.space/api/search",
-    json={"query": "What is Raktim's research about?", "top_k": 5}
-)
-results = response.json()
-```
-### Stats API
-```python
-response = requests.get("https://raktimhugging-ragtim-bot.hf.space/api/stats")
-stats = response.json()
-```
-## 🔧 Technical Details
-- **Model**: sentence-transformers/all-MiniLM-L6-v2
-- **Embedding Dimension**: 384
-- **Search Type**: Semantic similarity with relevance scoring
-- **Knowledge Sections**: 50+ sections across 6 markdown files
-- **GPU Acceleration**: Automatic CUDA detection and usage
-## 🌐 Integration
-This Space can be integrated with:
-- Portfolio websites for intelligent chat assistance
-- Research collaboration platforms
-- Academic networking tools
-- Professional inquiry systems
-## 📞 Contact
-For questions about Raktim Mondol or collaboration opportunities:
-- **Email**: [email protected]
-- **Portfolio**: [mondol.me](https://mondol.me)
-- **Institution**: UNSW Sydney, School of Computer Science & Engineering
----
-**Built with**: Gradio, Hugging Face Transformers, PyTorch
-**Powered by**: GPU-accelerated semantic search and comprehensive markdown knowledge base

+# 🔥 Hybrid Search RAGtim Bot
+A sophisticated hybrid search system combining semantic vector search with BM25 keyword matching for optimal information retrieval.
+## 🚀 Features
+- **Hybrid Search**: Combines transformer-based semantic similarity with BM25 keyword ranking
+- **Multi-Modal Search**: Vector search, BM25 search, and intelligent fusion
+- **Real-time API**: RESTful endpoints for integration
+- **Interactive UI**: Three interfaces - Chat, Advanced Search, and Statistics
+- **Knowledge Base**: Comprehensive markdown-based knowledge system
+## 🔧 Technology Stack
+- **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 (384-dim)
+- **Search**: Custom BM25 implementation + Vector similarity
+- **Framework**: Gradio 4.44.0
+- **ML**: Transformers, PyTorch, NumPy
+- **Deployment**: Hugging Face Spaces
+## 📚 Knowledge Base Structure
+The system processes markdown files from the `knowledge_base/` directory:
+- `about.md` - Personal information and professional summary
+- `research_details.md` - Research projects and methodologies
+- `publications_detailed.md` - Publications with technical details
+- `skills_expertise.md` - Technical skills and expertise
+- `experience_detailed.md` - Professional experience
+- `statistics.md` - Statistical methods and biostatistics
+## 🔍 Search Methods
+### Hybrid Search (Recommended)
+Combines semantic and keyword search with configurable weights:
+- Default: 60% vector + 40% BM25
+- Optimal for most queries
+- Balances meaning and exact term matching
+### Vector Search
+Pure semantic similarity using transformer embeddings:
+- Best for conceptual questions
+- Finds semantically related content
+- Language-agnostic similarity
+### BM25 Search
+Traditional keyword-based ranking:
+- Excellent for specific terms
+- TF-IDF with document length normalization
+- Fast and interpretable
+## 🛠️ API Endpoints
+### Search API
+GET /api/stats
+## 📊 Configuration
+Key parameters in `config.py`:
+- `BM25_K1 = 1.5` - Term frequency saturation
+- `BM25_B = 0.75` - Document length normalization
+- `DEFAULT_VECTOR_WEIGHT = 0.6` - Hybrid search weighting
+- `DEFAULT_BM25_WEIGHT = 0.4` - Hybrid search weighting
+## 🚀 Deployment
+1. Clone to Hugging Face Spaces
+2. Ensure all markdown files are in `knowledge_base/`
+3. The system auto-initializes on startup
+4. Access via the provided Space URL
+## 💡 Usage Examples
+**Chat Interface:**
+- "What is Raktim's LLM research?"
+- "Tell me about statistical methods"
+- "Describe multimodal AI capabilities"
+**Advanced Search:**
+- Adjust vector/BM25 weights
+- Compare search methods
+- Fine-tune result count
+**API Integration:**
+```python
+import requests
+response = requests.get(
+    "https://your-space.hf.space/api/search",
+    params={
+        "query": "machine learning research",
+        "top_k": 5,
+        "search_type": "hybrid"
+    }
+)