Spaces:

bk-anupam
/

SpiritualChatBot

Building

File size: 9,498 Bytes

d606279

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Implementing Retrieval Ranking in Your RAG System (LangChain + ChromaDB)**  \n",
    "To improve the retrieval quality in your **RAG pipeline**, you can implement **retrieval ranking** before passing the retrieved documents to the LLM. This ensures the **most relevant** documents are prioritized.\n",
    "\n",
    "---\n",
    "\n",
    "## **🔹 Why is Retrieval Ranking Important?**\n",
    "When using a **vector database like ChromaDB**, similarity search retrieves the `k` closest documents. However, these documents:\n",
    "1. **May not always be in the best order** based on query relevance.\n",
    "2. **Might contain redundant or less useful information**.\n",
    "3. **Need re-ranking to prioritize more relevant content**.\n",
    "\n",
    "---\n",
    "\n",
    "## **🔹 Methods for Retrieval Ranking**\n",
    "There are multiple ways to rank retrieved documents:\n",
    "1. **Re-rank using an LLM** (best for complex queries).\n",
    "2. **Use BM25 (Lexical Ranking)** alongside embeddings.\n",
    "3. **Apply a trained re-ranking model** (e.g., `ColBERT`, `BGE-reranker`).\n",
    "4. **Score documents using a similarity function**.\n",
    "\n",
    "---\n",
    "\n",
    "## **✅ Approach 1: LLM-Based Re-Ranking (Recommended)**\n",
    "We can use an LLM to **re-rank** retrieved documents based on how well they match the query.\n",
    "\n",
    "### **🔹 Steps:**\n",
    "1. Retrieve `k` documents from ChromaDB.\n",
    "2. Ask the LLM to rank them by relevance.\n",
    "3. Use the **top `n` ranked documents** in the final response generation.\n",
    "\n",
    "### **🔹 Implementation**\n",
    "Modify your `query_index` function to **re-rank documents using an LLM**:\n",
    "\n",
    "```python\n",
    "from langchain.prompts import PromptTemplate\n",
    "from langchain.schema import AIMessage\n",
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "import datetime\n",
    "import logging\n",
    "\n",
    "logger = logging.getLogger(\"logger\")\n",
    "\n",
    "def query_index(vectordb, query, chain_type=\"stuff\", k=25, model_name=\"gemini-2.0-flash\", \n",
    "                date_filter=None, rerank_top_n=10):\n",
    "    \"\"\"\n",
    "    Queries the vectorstore with retrieval ranking.\n",
    "    \n",
    "    Args:\n",
    "        vectordb: The vector database.\n",
    "        query (str): The query string.\n",
    "        chain_type (str): The chain type.\n",
    "        k (int): The number of documents to retrieve.\n",
    "        model_name (str): The name of the language model.\n",
    "        date_filter (str, optional): A date string (YYYY-MM-DD) to filter documents by. Defaults to None.\n",
    "        rerank_top_n (int): Number of top documents to keep after re-ranking.\n",
    "    \n",
    "    Returns:\n",
    "        str: The answer from the language model.\n",
    "    \"\"\"\n",
    "    llm = ChatGoogleGenerativeAI(model=model_name, temperature=0.3)  \n",
    "\n",
    "    # Create the retriever with filter if date is provided\n",
    "    if date_filter:\n",
    "        try:\n",
    "            filter_date = datetime.datetime.strptime(date_filter, '%Y-%m-%d')\n",
    "            formatted_date = filter_date.strftime('%Y-%m-%d')\n",
    "            logger.info(f\"Date filter with query: {formatted_date}\")\n",
    "        except ValueError:\n",
    "            raise ValueError(\"Invalid date format. Please use YYYY-MM-DD.\")\n",
    "\n",
    "        filter_criteria = {\"date\": { \"$eq\": formatted_date}}\n",
    "        retriever = vectordb.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": k, \"filter\": filter_criteria})\n",
    "    else:\n",
    "        retriever = vectordb.as_retriever(search_kwargs={\"k\": k})\n",
    "\n",
    "    # Retrieve initial `k` documents\n",
    "    retrieved_docs = retriever.invoke(query)\n",
    "    context = \"\\n\\n\".join([doc.page_content for doc in retrieved_docs])  \n",
    "    logger.info(f\"Retrieved {len(retrieved_docs)} documents for query: {query}\")\n",
    "\n",
    "    # Define re-ranking prompt\n",
    "    rerank_prompt = PromptTemplate(\n",
    "        input_variables=[\"query\", \"documents\"],\n",
    "        template=(\n",
    "            \"You are an AI assistant ranking documents based on their relevance to the query.\\n\\n\"\n",
    "            \"Query: {query}\\n\\n\"\n",
    "            \"Documents:\\n{documents}\\n\\n\"\n",
    "            \"Rank the documents from most relevant to least relevant and return the top {rerank_top_n}.\"\n",
    "        ),\n",
    "    )\n",
    "\n",
    "    # Rank retrieved documents using the LLM\n",
    "    rank_chain = rerank_prompt | llm | RunnablePassthrough()\n",
    "    ranked_response = rank_chain.invoke({\"query\": query, \"documents\": context, \"rerank_top_n\": rerank_top_n})\n",
    "\n",
    "    # Extract ranked top `n` documents\n",
    "    ranked_docs = ranked_response.content.split(\"\\n\")[:rerank_top_n]\n",
    "    final_context = \"\\n\\n\".join(ranked_docs)\n",
    "\n",
    "    # Define final QA prompt\n",
    "    qa_prompt = PromptTemplate(\n",
    "        input_variables=[\"context\", \"question\"],\n",
    "        template=(\n",
    "            \"You are an AI assistant retrieving factual and structured information.\\n\"\n",
    "            \"Use the following retrieved documents to answer the question accurately.\\n\\n\"\n",
    "            \"Context: {context}\\n\\n\"\n",
    "            \"Question: {question}\"\n",
    "        ),\n",
    "    )\n",
    "\n",
    "    # Use RunnableSequence instead of deprecated LLMChain\n",
    "    final_chain = qa_prompt | llm | RunnablePassthrough()\n",
    "    response = final_chain.invoke({\"context\": final_context, \"question\": query})\n",
    "\n",
    "    # Ensure response is extracted correctly\n",
    "    if isinstance(response, AIMessage):\n",
    "        return response.content  \n",
    "    else:\n",
    "        return str(response)  \n",
    "```\n",
    "\n",
    "---\n",
    "\n",
    "### **🔹 What’s New in This Implementation?**\n",
    "1. **Retrieve more documents (`k=25`) from ChromaDB.**\n",
    "2. **Re-rank the documents using an LLM** to select the `rerank_top_n=10` most relevant.\n",
    "3. **Use only top-ranked documents for the final response** to the user.\n",
    "\n",
    "---\n",
    "\n",
    "## **✅ Approach 2: Use BM25 for Hybrid Ranking**\n",
    "If you want **lexical matching + embeddings**, use **BM25 + ChromaDB embeddings**:\n",
    "\n",
    "### **Steps**\n",
    "1. Retrieve top `k` documents using **ChromaDB embeddings**.\n",
    "2. Use **BM25 (bag-of-words ranking)** on these documents.\n",
    "3. Take **top-ranked** documents and pass them to the LLM.\n",
    "\n",
    "### **Implementation**\n",
    "- Install `rank_bm25`:\n",
    "  ```bash\n",
    "  pip install rank-bm25\n",
    "  ```\n",
    "- Add BM25 re-ranking to your pipeline:\n",
    "  ```python\n",
    "  from rank_bm25 import BM25Okapi\n",
    "  from langchain.schema import Document\n",
    "\n",
    "  def bm25_rerank(query, retrieved_docs, top_n=10):\n",
    "      \"\"\"Re-rank retrieved documents using BM25.\"\"\"\n",
    "      tokenized_docs = [doc.page_content.split() for doc in retrieved_docs]\n",
    "      bm25 = BM25Okapi(tokenized_docs)\n",
    "      scores = bm25.get_scores(query.split())\n",
    "\n",
    "      # Sort documents based on BM25 score\n",
    "      ranked_docs = [retrieved_docs[i] for i in sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)]\n",
    "      return ranked_docs[:top_n]\n",
    "  ```\n",
    "\n",
    "Then, **modify `query_index` to use BM25 ranking** before passing docs to the LLM.\n",
    "\n",
    "---\n",
    "\n",
    "## **🔹 Which Approach Should You Use?**\n",
    "| **Method**       | **Pros** | **Cons** |\n",
    "|------------------|---------|----------|\n",
    "| **LLM-based Re-Ranking** (Approach 1) | Best for complex queries, uses language understanding | Slower, needs API calls |\n",
    "| **BM25 + Embeddings (Approach 2)** | Improves lexical matching, fast & efficient | Doesn't handle semantics well |\n",
    "| **ColBERT / BGE-reranker** | State-of-the-art accuracy, deep semantic ranking | Requires a fine-tuned model |\n",
    "\n",
    "For **best performance**, **combine both methods**:  \n",
    "1. Use **BM25 ranking first** to boost keyword relevance.  \n",
    "2. Then **use LLM-based ranking** on the BM25-ranked results.  \n",
    "\n",
    "---\n",
    "\n",
    "## **✅ Final Takeaways**\n",
    "- **Use LLM re-ranking** if you want **best semantic ranking**.\n",
    "- **Use BM25 + embeddings** for **fast hybrid search**.\n",
    "- **Try ColBERT/BGE-reranker** if you need **state-of-the-art retrieval ranking**.\n",
    "\n",
    "Would you like help setting up **ColBERT/BGE-reranker** in LangChain? 🚀"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ml_env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}