raktimhugging commited on
Commit
6e1da14
Β·
verified Β·
1 Parent(s): 39dacf3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -81
README.md CHANGED
@@ -1,92 +1,94 @@
1
- ---
2
- title: RAGtim Bot - Raktim's AI Assistant
3
- emoji: πŸ€–
4
- colorFrom: green
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: "4.44.0"
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- # πŸ€– RAGtim Bot - Raktim's AI Assistant
14
-
15
- An intelligent AI assistant powered by Hugging Face Transformers that answers questions about Raktim Mondol's research, expertise, and professional background.
16
-
17
- ## 🌟 Features
18
-
19
- - **Complete Markdown Knowledge Base**: Loads all portfolio content from markdown files
20
- - **GPU-Accelerated Search**: Uses `sentence-transformers/all-MiniLM-L6-v2` for semantic similarity
21
- - **Comprehensive Coverage**: Research, publications, skills, experience, education, statistics
22
- - **API Endpoints**: Direct access to search and statistics
23
- - **Real-time Chat**: Interactive conversational interface
24
-
25
- ## πŸ“š Knowledge Base
26
-
27
- This Space loads comprehensive information from:
28
-
29
- - **about.md** - Personal information, contact details, professional summary
30
- - **research_details.md** - Detailed research projects, methodologies, current work
31
- - **publications_detailed.md** - Complete publication details, technical contributions
32
- - **skills_expertise.md** - Comprehensive technical skills, tools, frameworks
33
- - **experience_detailed.md** - Professional experience, teaching, research roles
34
- - **statistics.md** - Statistical methods, biostatistics expertise, methodologies
35
-
36
- ## πŸ” What You Can Ask
37
-
38
- - Research projects and methodologies
39
- - Publications with technical details
40
- - Technical skills and programming expertise
41
- - Educational background and achievements
42
- - Professional experience and teaching roles
43
- - Statistical methods and biostatistics applications
44
- - Awards, recognition, and professional development
45
- - Contact information and collaboration opportunities
46
-
47
- ## πŸš€ API Usage
48
 
49
- ### Search API
50
- ```python
51
- import requests
52
 
53
- response = requests.post(
54
- "https://raktimhugging-ragtim-bot.hf.space/api/search",
55
- json={"query": "What is Raktim's research about?", "top_k": 5}
56
- )
57
- results = response.json()
58
- ```
59
 
60
- ### Stats API
61
- ```python
62
- response = requests.get("https://raktimhugging-ragtim-bot.hf.space/api/stats")
63
- stats = response.json()
64
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
- ## πŸ”§ Technical Details
 
 
 
 
67
 
68
- - **Model**: sentence-transformers/all-MiniLM-L6-v2
69
- - **Embedding Dimension**: 384
70
- - **Search Type**: Semantic similarity with relevance scoring
71
- - **Knowledge Sections**: 50+ sections across 6 markdown files
72
- - **GPU Acceleration**: Automatic CUDA detection and usage
73
 
74
- ## 🌐 Integration
 
 
 
 
75
 
76
- This Space can be integrated with:
77
- - Portfolio websites for intelligent chat assistance
78
- - Research collaboration platforms
79
- - Academic networking tools
80
- - Professional inquiry systems
81
 
82
- ## πŸ“ž Contact
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
- For questions about Raktim Mondol or collaboration opportunities:
85
- - **Email**: [email protected]
86
- - **Portfolio**: [mondol.me](https://mondol.me)
87
- - **Institution**: UNSW Sydney, School of Computer Science & Engineering
88
 
89
- ---
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
- **Built with**: Gradio, Hugging Face Transformers, PyTorch
92
- **Powered by**: GPU-accelerated semantic search and comprehensive markdown knowledge base
 
 
 
 
 
 
 
1
+ # πŸ”₯ Hybrid Search RAGtim Bot
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ A sophisticated hybrid search system combining semantic vector search with BM25 keyword matching for optimal information retrieval.
 
 
4
 
5
+ ## πŸš€ Features
 
 
 
 
 
6
 
7
+ - **Hybrid Search**: Combines transformer-based semantic similarity with BM25 keyword ranking
8
+ - **Multi-Modal Search**: Vector search, BM25 search, and intelligent fusion
9
+ - **Real-time API**: RESTful endpoints for integration
10
+ - **Interactive UI**: Three interfaces - Chat, Advanced Search, and Statistics
11
+ - **Knowledge Base**: Comprehensive markdown-based knowledge system
12
+
13
+ ## πŸ”§ Technology Stack
14
+
15
+ - **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 (384-dim)
16
+ - **Search**: Custom BM25 implementation + Vector similarity
17
+ - **Framework**: Gradio 4.44.0
18
+ - **ML**: Transformers, PyTorch, NumPy
19
+ - **Deployment**: Hugging Face Spaces
20
+
21
+ ## πŸ“š Knowledge Base Structure
22
+
23
+ The system processes markdown files from the `knowledge_base/` directory:
24
+ - `about.md` - Personal information and professional summary
25
+ - `research_details.md` - Research projects and methodologies
26
+ - `publications_detailed.md` - Publications with technical details
27
+ - `skills_expertise.md` - Technical skills and expertise
28
+ - `experience_detailed.md` - Professional experience
29
+ - `statistics.md` - Statistical methods and biostatistics
30
+
31
+ ## πŸ” Search Methods
32
 
33
+ ### Hybrid Search (Recommended)
34
+ Combines semantic and keyword search with configurable weights:
35
+ - Default: 60% vector + 40% BM25
36
+ - Optimal for most queries
37
+ - Balances meaning and exact term matching
38
 
39
+ ### Vector Search
40
+ Pure semantic similarity using transformer embeddings:
41
+ - Best for conceptual questions
42
+ - Finds semantically related content
43
+ - Language-agnostic similarity
44
 
45
+ ### BM25 Search
46
+ Traditional keyword-based ranking:
47
+ - Excellent for specific terms
48
+ - TF-IDF with document length normalization
49
+ - Fast and interpretable
50
 
51
+ ## πŸ› οΈ API Endpoints
 
 
 
 
52
 
53
+ ### Search API
54
+ GET /api/stats
55
+
56
+ ## πŸ“Š Configuration
57
+
58
+ Key parameters in `config.py`:
59
+ - `BM25_K1 = 1.5` - Term frequency saturation
60
+ - `BM25_B = 0.75` - Document length normalization
61
+ - `DEFAULT_VECTOR_WEIGHT = 0.6` - Hybrid search weighting
62
+ - `DEFAULT_BM25_WEIGHT = 0.4` - Hybrid search weighting
63
+
64
+ ## πŸš€ Deployment
65
+
66
+ 1. Clone to Hugging Face Spaces
67
+ 2. Ensure all markdown files are in `knowledge_base/`
68
+ 3. The system auto-initializes on startup
69
+ 4. Access via the provided Space URL
70
 
71
+ ## πŸ’‘ Usage Examples
 
 
 
72
 
73
+ **Chat Interface:**
74
+ - "What is Raktim's LLM research?"
75
+ - "Tell me about statistical methods"
76
+ - "Describe multimodal AI capabilities"
77
+
78
+ **Advanced Search:**
79
+ - Adjust vector/BM25 weights
80
+ - Compare search methods
81
+ - Fine-tune result count
82
+
83
+ **API Integration:**
84
+ ```python
85
+ import requests
86
 
87
+ response = requests.get(
88
+ "https://your-space.hf.space/api/search",
89
+ params={
90
+ "query": "machine learning research",
91
+ "top_k": 5,
92
+ "search_type": "hybrid"
93
+ }
94
+ )