Spaces:
Building
A newer version of the Gradio SDK is available:
5.33.0
RAG_BOT: Telegram RAG Agent to answer spiritual questions
This project implements a Telegram bot powered by a Retrieval-Augmented Generation (RAG) agent built with Langchain and LangGraph. The agent is designed to answer questions based on a collection of spiritual documents stored in a ChromaDB vector store.
Features
- Telegram Interface: Interact with the RAG agent directly through Telegram.
- RAG Pipeline: Utilizes a sophisticated LangGraph agent for:
- Retrieving relevant context from spiritual documents based on user queries.
- Evaluating the relevance of retrieved context.
- Reframing the user's query if the initial context is insufficient.
- Generating answers grounded in the retrieved documents using Google's Gemini models.
- Vector Store: Uses ChromaDB to store and query document embeddings.
- Document Indexing: Upload PDF documents directly to the Telegram bot for automatic indexing into the vector store.
- Date Filtering: Supports filtering queries by date using the format
date:YYYY-MM-DD
within the/query
command or general messages. - Session Management: Basic in-memory session handling for conversation context (via
MessageHandler
). - Webhook Deployment: Designed for deployment using Flask and Telegram webhooks.
- Configuration: Centralized configuration management (
config.py
). - Logging: Structured logging for monitoring and debugging (
logger.py
). - Integration Tests: Includes tests to verify core functionalities like indexing, retrieval, and agent logic.
Technology Stack
- Python: Core programming language.
- Langchain & LangGraph: Framework for building the RAG agent and defining the workflow.
- Google Generative AI (Gemini): LLM used for understanding queries, evaluating context, reframing questions, and generating answers.
- ChromaDB: Vector database for storing and retrieving document embeddings.
- Sentence Transformers: (via
langchain-huggingface
) For generating document embeddings. - pyTelegramBotAPI: Library for interacting with the Telegram Bot API.
- Flask: Web framework for handling Telegram webhooks.
How It Works
Think of the bot as an intelligent assistant that uses a specific process to answer your questions, especially those about the indexed spiritual documents. Here's the flow:
- Query Analysis: First, the agent analyzes your question. Does it need to consult the documents, or is it a general question it already knows?
- Smart Retrieval: If documents are needed, it searches the knowledge base (ChromaDB) for the most relevant snippets based on your query.
- Relevance Check: The agent doesn't just blindly use what it finds. It evaluates if the retrieved information actually helps answer your original question.
- Self-Correction Loop: If the initial context isn't good enough, the agent smartly rephrases the query and tries searching again – a built-in retry mechanism for better results.
- Grounded Generation: Finally, using the validated context, the agent generates a clear answer. If no relevant context is found even after the retry, it informs the user gracefully.
This graph-based approach allows the agent to dynamically decide its path, evaluate its own findings, and even self-correct, leading to more accurate and relevant answers.
Workflow Diagram
The following diagram visualizes the agent's workflow:
Setup
Clone the repository:
git clone <your-repository-url> cd RAG_BOT
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install dependencies:
pip install -r requirements.txt
Configure Environment Variables: Create a
.env
file in the project root directory and add the following variables:# Telegram TELEGRAM_BOT_TOKEN="YOUR_TELEGRAM_BOT_TOKEN" WEBHOOK_URL="YOUR_PUBLIC_HTTPS_URL_FOR_WEBHOOK" # e.g., from ngrok or your deployment # Google Gemini GEMINI_API_KEY="YOUR_GOOGLE_API_KEY" # Paths (adjust if needed) VECTOR_STORE_PATH="./chroma_db" # Default path for ChromaDB # Agent/Model Config (adjust defaults in config.py or override here) # LLM_MODEL_NAME="gemini-1.5-flash-latest" # Or another compatible Gemini model # EMBEDDING_MODEL_NAME="all-MiniLM-L6-v2" # TEMPERATURE=0.1 # K=10 # Number of documents to retrieve # SEARCH_TYPE="similarity" # Or "mmr" # SEMANTIC_CHUNKING=True # Or False # MAX_CONVERSATION_HISTORY=10 # PORT=5000 # Port for Flask app
- Replace placeholders with your actual credentials and desired settings.
- Ensure the
WEBHOOK_URL
is a publicly accessible HTTPS URL pointing to where your Flask app will run. Tools likengrok
can be useful for local development.
Usage
Start the Bot: Run the Flask application:
python bot.py
This will start the Flask server and set up the Telegram webhook.
Interact with the Bot on Telegram:
- Find your bot on Telegram.
- Send
/start
to initiate interaction. - Send
/help
to see available commands. - Upload PDFs: Send PDF documents directly to the chat to have them indexed.
- Query Documents:
- Use the
/query
command:/query What is the essence of the Murli?
- Query with a date filter:
/query Summarize the main points date:1969-01-18
- Send a general message:
What were the main points about soul consciousness on 1969-01-23?
(The agent will attempt retrieval).
- Use the
- General Questions: Ask general knowledge questions (e.g., "What is the capital of France?"). The agent should answer directly without using the retrieval tool.
Running Tests
Navigate to the project root directory and run the integration tests using unittest
:
python -m unittest discover -s RAG_BOT/tests/integration -p 'test_*.py'
Contributing
Contributions are welcome! Feel free to open issues or submit pull requests to improve the project.
License
This project is licensed under the MIT License. See the LICENSE
file for details.