Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -1,92 +1,94 @@
|
|
1 |
-
|
2 |
-
title: RAGtim Bot - Raktim's AI Assistant
|
3 |
-
emoji: π€
|
4 |
-
colorFrom: green
|
5 |
-
colorTo: blue
|
6 |
-
sdk: gradio
|
7 |
-
sdk_version: "4.44.0"
|
8 |
-
app_file: app.py
|
9 |
-
pinned: false
|
10 |
-
license: mit
|
11 |
-
---
|
12 |
-
|
13 |
-
# π€ RAGtim Bot - Raktim's AI Assistant
|
14 |
-
|
15 |
-
An intelligent AI assistant powered by Hugging Face Transformers that answers questions about Raktim Mondol's research, expertise, and professional background.
|
16 |
-
|
17 |
-
## π Features
|
18 |
-
|
19 |
-
- **Complete Markdown Knowledge Base**: Loads all portfolio content from markdown files
|
20 |
-
- **GPU-Accelerated Search**: Uses `sentence-transformers/all-MiniLM-L6-v2` for semantic similarity
|
21 |
-
- **Comprehensive Coverage**: Research, publications, skills, experience, education, statistics
|
22 |
-
- **API Endpoints**: Direct access to search and statistics
|
23 |
-
- **Real-time Chat**: Interactive conversational interface
|
24 |
-
|
25 |
-
## π Knowledge Base
|
26 |
-
|
27 |
-
This Space loads comprehensive information from:
|
28 |
-
|
29 |
-
- **about.md** - Personal information, contact details, professional summary
|
30 |
-
- **research_details.md** - Detailed research projects, methodologies, current work
|
31 |
-
- **publications_detailed.md** - Complete publication details, technical contributions
|
32 |
-
- **skills_expertise.md** - Comprehensive technical skills, tools, frameworks
|
33 |
-
- **experience_detailed.md** - Professional experience, teaching, research roles
|
34 |
-
- **statistics.md** - Statistical methods, biostatistics expertise, methodologies
|
35 |
-
|
36 |
-
## π What You Can Ask
|
37 |
-
|
38 |
-
- Research projects and methodologies
|
39 |
-
- Publications with technical details
|
40 |
-
- Technical skills and programming expertise
|
41 |
-
- Educational background and achievements
|
42 |
-
- Professional experience and teaching roles
|
43 |
-
- Statistical methods and biostatistics applications
|
44 |
-
- Awards, recognition, and professional development
|
45 |
-
- Contact information and collaboration opportunities
|
46 |
-
|
47 |
-
## π API Usage
|
48 |
|
49 |
-
|
50 |
-
```python
|
51 |
-
import requests
|
52 |
|
53 |
-
|
54 |
-
"https://raktimhugging-ragtim-bot.hf.space/api/search",
|
55 |
-
json={"query": "What is Raktim's research about?", "top_k": 5}
|
56 |
-
)
|
57 |
-
results = response.json()
|
58 |
-
```
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
-
|
|
|
|
|
|
|
|
|
67 |
|
68 |
-
|
69 |
-
|
70 |
-
-
|
71 |
-
-
|
72 |
-
-
|
73 |
|
74 |
-
|
|
|
|
|
|
|
|
|
75 |
|
76 |
-
|
77 |
-
- Portfolio websites for intelligent chat assistance
|
78 |
-
- Research collaboration platforms
|
79 |
-
- Academic networking tools
|
80 |
-
- Professional inquiry systems
|
81 |
|
82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
-
|
85 |
-
- **Email**: [email protected]
|
86 |
-
- **Portfolio**: [mondol.me](https://mondol.me)
|
87 |
-
- **Institution**: UNSW Sydney, School of Computer Science & Engineering
|
88 |
|
89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
-
|
92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π₯ Hybrid Search RAGtim Bot
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
+
A sophisticated hybrid search system combining semantic vector search with BM25 keyword matching for optimal information retrieval.
|
|
|
|
|
4 |
|
5 |
+
## π Features
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
+
- **Hybrid Search**: Combines transformer-based semantic similarity with BM25 keyword ranking
|
8 |
+
- **Multi-Modal Search**: Vector search, BM25 search, and intelligent fusion
|
9 |
+
- **Real-time API**: RESTful endpoints for integration
|
10 |
+
- **Interactive UI**: Three interfaces - Chat, Advanced Search, and Statistics
|
11 |
+
- **Knowledge Base**: Comprehensive markdown-based knowledge system
|
12 |
+
|
13 |
+
## π§ Technology Stack
|
14 |
+
|
15 |
+
- **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 (384-dim)
|
16 |
+
- **Search**: Custom BM25 implementation + Vector similarity
|
17 |
+
- **Framework**: Gradio 4.44.0
|
18 |
+
- **ML**: Transformers, PyTorch, NumPy
|
19 |
+
- **Deployment**: Hugging Face Spaces
|
20 |
+
|
21 |
+
## π Knowledge Base Structure
|
22 |
+
|
23 |
+
The system processes markdown files from the `knowledge_base/` directory:
|
24 |
+
- `about.md` - Personal information and professional summary
|
25 |
+
- `research_details.md` - Research projects and methodologies
|
26 |
+
- `publications_detailed.md` - Publications with technical details
|
27 |
+
- `skills_expertise.md` - Technical skills and expertise
|
28 |
+
- `experience_detailed.md` - Professional experience
|
29 |
+
- `statistics.md` - Statistical methods and biostatistics
|
30 |
+
|
31 |
+
## π Search Methods
|
32 |
|
33 |
+
### Hybrid Search (Recommended)
|
34 |
+
Combines semantic and keyword search with configurable weights:
|
35 |
+
- Default: 60% vector + 40% BM25
|
36 |
+
- Optimal for most queries
|
37 |
+
- Balances meaning and exact term matching
|
38 |
|
39 |
+
### Vector Search
|
40 |
+
Pure semantic similarity using transformer embeddings:
|
41 |
+
- Best for conceptual questions
|
42 |
+
- Finds semantically related content
|
43 |
+
- Language-agnostic similarity
|
44 |
|
45 |
+
### BM25 Search
|
46 |
+
Traditional keyword-based ranking:
|
47 |
+
- Excellent for specific terms
|
48 |
+
- TF-IDF with document length normalization
|
49 |
+
- Fast and interpretable
|
50 |
|
51 |
+
## π οΈ API Endpoints
|
|
|
|
|
|
|
|
|
52 |
|
53 |
+
### Search API
|
54 |
+
GET /api/stats
|
55 |
+
|
56 |
+
## π Configuration
|
57 |
+
|
58 |
+
Key parameters in `config.py`:
|
59 |
+
- `BM25_K1 = 1.5` - Term frequency saturation
|
60 |
+
- `BM25_B = 0.75` - Document length normalization
|
61 |
+
- `DEFAULT_VECTOR_WEIGHT = 0.6` - Hybrid search weighting
|
62 |
+
- `DEFAULT_BM25_WEIGHT = 0.4` - Hybrid search weighting
|
63 |
+
|
64 |
+
## π Deployment
|
65 |
+
|
66 |
+
1. Clone to Hugging Face Spaces
|
67 |
+
2. Ensure all markdown files are in `knowledge_base/`
|
68 |
+
3. The system auto-initializes on startup
|
69 |
+
4. Access via the provided Space URL
|
70 |
|
71 |
+
## π‘ Usage Examples
|
|
|
|
|
|
|
72 |
|
73 |
+
**Chat Interface:**
|
74 |
+
- "What is Raktim's LLM research?"
|
75 |
+
- "Tell me about statistical methods"
|
76 |
+
- "Describe multimodal AI capabilities"
|
77 |
+
|
78 |
+
**Advanced Search:**
|
79 |
+
- Adjust vector/BM25 weights
|
80 |
+
- Compare search methods
|
81 |
+
- Fine-tune result count
|
82 |
+
|
83 |
+
**API Integration:**
|
84 |
+
```python
|
85 |
+
import requests
|
86 |
|
87 |
+
response = requests.get(
|
88 |
+
"https://your-space.hf.space/api/search",
|
89 |
+
params={
|
90 |
+
"query": "machine learning research",
|
91 |
+
"top_k": 5,
|
92 |
+
"search_type": "hybrid"
|
93 |
+
}
|
94 |
+
)
|