SpencerCPurdy commited on
Commit
4988b57
·
verified ·
1 Parent(s): e7c1dab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -39
README.md CHANGED
@@ -10,61 +10,52 @@ license: mit
10
  short_description: RAG + LoRA Fine-Tuning for Code Analysis
11
  ---
12
 
13
- # Fine-tuned LLM with RAG for Codebase Analysis
14
 
15
- This project demonstrates a production-ready, sophisticated Retrieval-Augmented Generation (RAG) system specifically engineered for codebase analysis. Its core innovation is the **automatic fine-tuning** of a code-specialized language model (`Salesforce/codegen-350M-mono`) on startup. This process creates a highly specialized expert model that provides accurate, context-aware, and reliable answers to complex software engineering questions.
16
 
17
- The system is designed to be a transparent and robust framework, featuring detailed performance evaluation, cost tracking, and clear source attribution for every generated response.
18
 
19
  ## Core Features
20
 
21
- This system integrates a complete, automated pipeline for building and querying a specialized code analysis engine.
22
-
23
- * **Automatic Model Fine-Tuning**: On initialization, the system automatically fine-tunes the `Salesforce/codegen-350M-mono` model using Parameter-Efficient Fine-Tuning (PEFT/LoRA). This process adapts the model to the specific nuances of software development Q&A, significantly improving its accuracy and relevance.
24
-
25
- * **Retrieval-Augmented Generation (RAG)**: The framework leverages a `ChromaDB` vector store and a `sentence-transformers` model to create and query a knowledge base. When a question is asked, the system retrieves the most relevant document chunks to ground the language model's response in factual, verifiable information.
26
-
27
- * **Code-Specific Language Model**: The entire system is built upon `Salesforce/codegen-350M-mono`, a powerful model pre-trained specifically on code. This provides a strong foundation for understanding programming concepts, syntax, and architecture.
28
-
29
- * **Comprehensive Evaluation Metrics**: Every response is critically evaluated in real-time. The system calculates and displays scores for:
30
- * **Relevance**: How closely the answer matches the user's query.
31
- * **Context Grounding**: How well the answer is supported by the retrieved documents.
32
- * **Hallucination Score**: An estimation of how much the model deviates from the provided context (lower is better).
33
- * **Technical Accuracy**: A measure of the response's use of correct technical terminology.
34
-
35
- * **Performance & Cost Tracking**: A built-in `PerformanceTracker` monitors key operational metrics, including query latency, the number of tokens processed, and the estimated cost of each interaction, providing insights needed for production deployment.
36
-
37
- * **Source Attribution**: To ensure transparency and trust, the system clearly cites the source documents that were used to formulate each answer.
38
 
39
  ## How It Works
40
 
41
- The system follows an automated, multi-stage process to deliver high-quality codebase analysis.
42
 
43
- 1. **Initialization & Fine-Tuning**: On the very first run, the system fine-tunes the base CodeGen model using a curated dataset of software development Q&A. This one-time process creates a specialized LoRA adapter, which is then loaded for all subsequent operations.
44
- 2. **Knowledge Ingestion**: A knowledge base of software engineering documents (covering architecture, testing, best practices, etc.) is processed. Each document is split into manageable chunks, converted into vector embeddings, and indexed into a `ChromaDB` vector store.
45
- 3. **Query & Retrieval**: When a user submits a query, the system embeds the question and searches the vector store to find the most semantically similar document chunks.
46
- 4. **Augmented Generation**: The user's query and the retrieved context chunks are combined into a detailed prompt. This prompt is then passed to the fine-tuned language model, which generates a comprehensive, context-aware answer.
47
- 5. **Evaluation & Presentation**: The final answer, its sources, and the full suite of performance and quality metrics are presented to the user in a clean, interactive Gradio dashboard.
 
48
 
49
  ## Technical Stack
50
 
51
- * **LLM & Fine-Tuning**: Transformers, PEFT (LoRA), PyTorch, BitsAndBytes
52
- * **Retrieval & Embeddings**: ChromaDB, Sentence-Transformers, LangChain
53
- * **Core Data Science**: Pandas, NumPy, Scikit-learn
54
- * **Web Interface**: Gradio
55
- * **Core Language**: Python
56
 
57
  ## How to Use the Demo
58
 
59
- The interface is designed for simplicity and clarity.
60
 
61
- 1. **Wait for Initialization**: The first time the application starts, it will perform the automatic fine-tuning process. A status banner will indicate when the fine-tuned model is active.
62
- 2. **Ask a Question**: Use the text box to ask a question related to software development, such as "What are microservices architecture?" or "Explain test-driven development."
63
- 3. **Analyze Query**: Click the "Analyze Query" button to submit your question.
64
- 4. **Review the Results**:
65
- * The generated **Analysis Result** will appear on the left.
66
- * The **Referenced Sources**, **Response Metrics**, and **Performance Data** will be displayed on the right, giving you a complete overview of the system's operation for your query.
 
 
67
 
68
  ## Disclaimer
69
 
70
- This application is a demonstration of a sophisticated RAG and fine-tuning pipeline. The knowledge base is pre-loaded with general software engineering documents and does not reflect any specific proprietary codebase.
 
10
  short_description: RAG + LoRA Fine-Tuning for Code Analysis
11
  ---
12
 
13
+ # Fine-Tuned RAG Framework for Code Analysis
14
 
15
+ This project is a production-ready, Retrieval-Augmented Generation (RAG) system specifically designed for code analysis and software development queries. It integrates a fine-tuned Large Language Model (LLM) with a vector database to provide accurate, context-aware answers based on a comprehensive knowledge base of programming concepts.
16
 
17
+ A key feature of this system is its **automatic fine-tuning process**. On initialization, it automatically fine-tunes the base model (`Salesforce/codegen-350M-mono`) on a curated dataset of code-related questions and answers using Parameter-Efficient Fine-Tuning (PEFT) with LoRA. This ensures the model is specialized for the domain of software engineering, resulting in higher quality and more relevant responses.
18
 
19
  ## Core Features
20
 
21
+ * **Automatic Model Fine-Tuning**: The system automatically fine-tunes a code-specific language model (`Salesforce/codegen-350M-mono`) on startup using LoRA to specialize it for code analysis tasks.
22
+ * **Retrieval-Augmented Generation (RAG)**: Leverages a `ChromaDB` vector store and a `sentence-transformers` model to retrieve the most relevant documents from a knowledge base to ground the LLM's responses in factual information.
23
+ * **Code-Specific Knowledge Base**: The system is pre-loaded with a detailed knowledge base covering software architecture patterns, clean code principles, testing strategies, performance optimization, and API design best practices.
24
+ * **Comprehensive Evaluation Metrics**: Every generated response is evaluated in real-time for relevance, context grounding, hallucination, and technical accuracy. The system also calculates a performance improvement score based on whether the fine-tuned model is active.
25
+ * **Performance & Cost Tracking**: A built-in `PerformanceTracker` monitors system usage, including query latency, tokens processed, and estimated operational costs, providing a full overview of system efficiency.
26
+ * **Source Attribution**: To ensure transparency and trustworthiness, every answer is accompanied by a list of the source documents from the knowledge base that were used to generate the response.
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## How It Works
29
 
30
+ The system follows an automated pipeline from startup to query processing:
31
 
32
+ 1. **Automatic Fine-Tuning (On Startup)**: The `ModelFineTuner` class initiates an automatic fine-tuning process. It loads the base `Salesforce/codegen-350M-mono` model, applies a LoRA configuration, and trains it on a specialized dataset of code analysis Q&A pairs. The resulting fine-tuned model is then used for generation.
33
+ 2. **Knowledge Base Indexing**: The `RAGSystem` class initializes a `ChromaDB` vector store. It processes and chunks the provided code documentation, computes embeddings using a `SentenceTransformer` model, and indexes these embeddings for efficient retrieval.
34
+ 3. **Query Processing**: A user submits a query through the Gradio interface.
35
+ 4. **Retrieval**: The system encodes the user's query into an embedding and uses it to search the `ChromaDB` vector store, retrieving the top-k most relevant text chunks from the knowledge base.
36
+ 5. **Generation**: The retrieved chunks are formatted as context and prepended to the user's query in a prompt. This combined prompt is then passed to the fine-tuned LLM, which generates a context-aware and accurate answer.
37
+ 6. **Evaluation & Display**: The final response is evaluated for quality, and all relevant information—the answer, sources, metrics, and performance data—is presented to the user in the interactive dashboard.
38
 
39
  ## Technical Stack
40
 
41
+ * **AI & Machine Learning**: `transformers`, `peft`, `bitsandbytes`, `torch`, `accelerate`
42
+ * **Retrieval & Vector Search**: `chromadb`, `sentence-transformers`, `langchain`
43
+ * **Data Processing**: `pandas`, `numpy`, `datasets`
44
+ * **Web Interface & Dashboard**: `gradio`
 
45
 
46
  ## How to Use the Demo
47
 
48
+ The interface is designed for simplicity and provides a wealth of information with each query.
49
 
50
+ 1. Enter a question related to software development, architecture, or best practices into the text box. You can use the provided sample queries as inspiration.
51
+ 2. Click the **Analyze Query** button.
52
+ 3. Review the output in the panels:
53
+ * **Analysis Result**: The generated answer from the fine-tuned RAG system.
54
+ * **Referenced Sources**: The documents from the knowledge base used to formulate the answer.
55
+ * **Response Metrics**: A detailed breakdown of the response quality, including relevance, grounding, and technical accuracy scores.
56
+ * **Performance Data**: Information on the query processing time, tokens used, and estimated cost.
57
+ * **System Statistics**: An overview of cumulative usage and performance.
58
 
59
  ## Disclaimer
60
 
61
+ This project is a demonstration of a production-ready RAG system for a specialized domain. The automatic fine-tuning process is computationally intensive and may be slow on CPU. For optimal performance, running this demo on a GPU-accelerated environment is recommended. All generated responses are for informational purposes and should be validated by a qualified professional.