A newer version of the Gradio SDK is available:
5.42.0
title: Fine-Tuned RAG Framework For Code Analysis
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: mit
short_description: RAG + LoRA Fine-Tuning for Code Analysis
Fine-Tuned RAG Framework for Code Analysis
This project is a production-ready, Retrieval-Augmented Generation (RAG) system specifically designed for code analysis and software development queries. It integrates a fine-tuned Large Language Model (LLM) with a vector database to provide accurate, context-aware answers based on a comprehensive knowledge base of programming concepts.
A key feature of this system is its automatic fine-tuning process. On initialization, it automatically fine-tunes the base model (Salesforce/codegen-350M-mono
) on a curated dataset of code-related questions and answers using Parameter-Efficient Fine-Tuning (PEFT) with LoRA. This ensures the model is specialized for the domain of software engineering, resulting in higher quality and more relevant responses.
Core Features
- Automatic Model Fine-Tuning: The system automatically fine-tunes a code-specific language model (
Salesforce/codegen-350M-mono
) on startup using LoRA to specialize it for code analysis tasks. - Retrieval-Augmented Generation (RAG): Leverages a
ChromaDB
vector store and asentence-transformers
model to retrieve the most relevant documents from a knowledge base to ground the LLM's responses in factual information. - Code-Specific Knowledge Base: The system is pre-loaded with a detailed knowledge base covering software architecture patterns, clean code principles, testing strategies, performance optimization, and API design best practices.
- Comprehensive Evaluation Metrics: Every generated response is evaluated in real-time for relevance, context grounding, hallucination, and technical accuracy. The system also calculates a performance improvement score based on whether the fine-tuned model is active.
- Performance & Cost Tracking: A built-in
PerformanceTracker
monitors system usage, including query latency, tokens processed, and estimated operational costs, providing a full overview of system efficiency. - Source Attribution: To ensure transparency and trustworthiness, every answer is accompanied by a list of the source documents from the knowledge base that were used to generate the response.
How It Works
The system follows an automated pipeline from startup to query processing:
- Automatic Fine-Tuning (On Startup): The
ModelFineTuner
class initiates an automatic fine-tuning process. It loads the baseSalesforce/codegen-350M-mono
model, applies a LoRA configuration, and trains it on a specialized dataset of code analysis Q&A pairs. The resulting fine-tuned model is then used for generation. - Knowledge Base Indexing: The
RAGSystem
class initializes aChromaDB
vector store. It processes and chunks the provided code documentation, computes embeddings using aSentenceTransformer
model, and indexes these embeddings for efficient retrieval. - Query Processing: A user submits a query through the Gradio interface.
- Retrieval: The system encodes the user's query into an embedding and uses it to search the
ChromaDB
vector store, retrieving the top-k most relevant text chunks from the knowledge base. - Generation: The retrieved chunks are formatted as context and prepended to the user's query in a prompt. This combined prompt is then passed to the fine-tuned LLM, which generates a context-aware and accurate answer.
- Evaluation & Display: The final response is evaluated for quality, and all relevant information—the answer, sources, metrics, and performance data—is presented to the user in the interactive dashboard.
Technical Stack
- AI & Machine Learning:
transformers
,peft
,bitsandbytes
,torch
,accelerate
- Retrieval & Vector Search:
chromadb
,sentence-transformers
,langchain
- Data Processing:
pandas
,numpy
,datasets
- Web Interface & Dashboard:
gradio
How to Use the Demo
The interface is designed for simplicity and provides a wealth of information with each query.
- Enter a question related to software development, architecture, or best practices into the text box. You can use the provided sample queries as inspiration.
- Click the Analyze Query button.
- Review the output in the panels:
- Analysis Result: The generated answer from the fine-tuned RAG system.
- Referenced Sources: The documents from the knowledge base used to formulate the answer.
- Response Metrics: A detailed breakdown of the response quality, including relevance, grounding, and technical accuracy scores.
- Performance Data: Information on the query processing time, tokens used, and estimated cost.
- System Statistics: An overview of cumulative usage and performance.
Disclaimer
This project is a demonstration of a production-ready RAG system for a specialized domain. The automatic fine-tuning process is computationally intensive and may be slow on CPU. For optimal performance, running this demo on a GPU-accelerated environment is recommended. All generated responses are for informational purposes and should be validated by a qualified professional.