Update README.md
Browse files
README.md
CHANGED
@@ -10,61 +10,52 @@ license: mit
|
|
10 |
short_description: RAG + LoRA Fine-Tuning for Code Analysis
|
11 |
---
|
12 |
|
13 |
-
# Fine-
|
14 |
|
15 |
-
This project
|
16 |
|
17 |
-
|
18 |
|
19 |
## Core Features
|
20 |
|
21 |
-
|
22 |
-
|
23 |
-
* **
|
24 |
-
|
25 |
-
* **
|
26 |
-
|
27 |
-
* **Code-Specific Language Model**: The entire system is built upon `Salesforce/codegen-350M-mono`, a powerful model pre-trained specifically on code. This provides a strong foundation for understanding programming concepts, syntax, and architecture.
|
28 |
-
|
29 |
-
* **Comprehensive Evaluation Metrics**: Every response is critically evaluated in real-time. The system calculates and displays scores for:
|
30 |
-
* **Relevance**: How closely the answer matches the user's query.
|
31 |
-
* **Context Grounding**: How well the answer is supported by the retrieved documents.
|
32 |
-
* **Hallucination Score**: An estimation of how much the model deviates from the provided context (lower is better).
|
33 |
-
* **Technical Accuracy**: A measure of the response's use of correct technical terminology.
|
34 |
-
|
35 |
-
* **Performance & Cost Tracking**: A built-in `PerformanceTracker` monitors key operational metrics, including query latency, the number of tokens processed, and the estimated cost of each interaction, providing insights needed for production deployment.
|
36 |
-
|
37 |
-
* **Source Attribution**: To ensure transparency and trust, the system clearly cites the source documents that were used to formulate each answer.
|
38 |
|
39 |
## How It Works
|
40 |
|
41 |
-
The system follows an automated
|
42 |
|
43 |
-
1. **
|
44 |
-
2. **Knowledge
|
45 |
-
3. **Query
|
46 |
-
4. **
|
47 |
-
5. **
|
|
|
48 |
|
49 |
## Technical Stack
|
50 |
|
51 |
-
* **
|
52 |
-
* **Retrieval &
|
53 |
-
* **
|
54 |
-
* **Web Interface**:
|
55 |
-
* **Core Language**: Python
|
56 |
|
57 |
## How to Use the Demo
|
58 |
|
59 |
-
The interface is designed for simplicity and
|
60 |
|
61 |
-
1.
|
62 |
-
2.
|
63 |
-
3.
|
64 |
-
|
65 |
-
* The
|
66 |
-
*
|
|
|
|
|
67 |
|
68 |
## Disclaimer
|
69 |
|
70 |
-
This
|
|
|
10 |
short_description: RAG + LoRA Fine-Tuning for Code Analysis
|
11 |
---
|
12 |
|
13 |
+
# Fine-Tuned RAG Framework for Code Analysis
|
14 |
|
15 |
+
This project is a production-ready, Retrieval-Augmented Generation (RAG) system specifically designed for code analysis and software development queries. It integrates a fine-tuned Large Language Model (LLM) with a vector database to provide accurate, context-aware answers based on a comprehensive knowledge base of programming concepts.
|
16 |
|
17 |
+
A key feature of this system is its **automatic fine-tuning process**. On initialization, it automatically fine-tunes the base model (`Salesforce/codegen-350M-mono`) on a curated dataset of code-related questions and answers using Parameter-Efficient Fine-Tuning (PEFT) with LoRA. This ensures the model is specialized for the domain of software engineering, resulting in higher quality and more relevant responses.
|
18 |
|
19 |
## Core Features
|
20 |
|
21 |
+
* **Automatic Model Fine-Tuning**: The system automatically fine-tunes a code-specific language model (`Salesforce/codegen-350M-mono`) on startup using LoRA to specialize it for code analysis tasks.
|
22 |
+
* **Retrieval-Augmented Generation (RAG)**: Leverages a `ChromaDB` vector store and a `sentence-transformers` model to retrieve the most relevant documents from a knowledge base to ground the LLM's responses in factual information.
|
23 |
+
* **Code-Specific Knowledge Base**: The system is pre-loaded with a detailed knowledge base covering software architecture patterns, clean code principles, testing strategies, performance optimization, and API design best practices.
|
24 |
+
* **Comprehensive Evaluation Metrics**: Every generated response is evaluated in real-time for relevance, context grounding, hallucination, and technical accuracy. The system also calculates a performance improvement score based on whether the fine-tuned model is active.
|
25 |
+
* **Performance & Cost Tracking**: A built-in `PerformanceTracker` monitors system usage, including query latency, tokens processed, and estimated operational costs, providing a full overview of system efficiency.
|
26 |
+
* **Source Attribution**: To ensure transparency and trustworthiness, every answer is accompanied by a list of the source documents from the knowledge base that were used to generate the response.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
## How It Works
|
29 |
|
30 |
+
The system follows an automated pipeline from startup to query processing:
|
31 |
|
32 |
+
1. **Automatic Fine-Tuning (On Startup)**: The `ModelFineTuner` class initiates an automatic fine-tuning process. It loads the base `Salesforce/codegen-350M-mono` model, applies a LoRA configuration, and trains it on a specialized dataset of code analysis Q&A pairs. The resulting fine-tuned model is then used for generation.
|
33 |
+
2. **Knowledge Base Indexing**: The `RAGSystem` class initializes a `ChromaDB` vector store. It processes and chunks the provided code documentation, computes embeddings using a `SentenceTransformer` model, and indexes these embeddings for efficient retrieval.
|
34 |
+
3. **Query Processing**: A user submits a query through the Gradio interface.
|
35 |
+
4. **Retrieval**: The system encodes the user's query into an embedding and uses it to search the `ChromaDB` vector store, retrieving the top-k most relevant text chunks from the knowledge base.
|
36 |
+
5. **Generation**: The retrieved chunks are formatted as context and prepended to the user's query in a prompt. This combined prompt is then passed to the fine-tuned LLM, which generates a context-aware and accurate answer.
|
37 |
+
6. **Evaluation & Display**: The final response is evaluated for quality, and all relevant information—the answer, sources, metrics, and performance data—is presented to the user in the interactive dashboard.
|
38 |
|
39 |
## Technical Stack
|
40 |
|
41 |
+
* **AI & Machine Learning**: `transformers`, `peft`, `bitsandbytes`, `torch`, `accelerate`
|
42 |
+
* **Retrieval & Vector Search**: `chromadb`, `sentence-transformers`, `langchain`
|
43 |
+
* **Data Processing**: `pandas`, `numpy`, `datasets`
|
44 |
+
* **Web Interface & Dashboard**: `gradio`
|
|
|
45 |
|
46 |
## How to Use the Demo
|
47 |
|
48 |
+
The interface is designed for simplicity and provides a wealth of information with each query.
|
49 |
|
50 |
+
1. Enter a question related to software development, architecture, or best practices into the text box. You can use the provided sample queries as inspiration.
|
51 |
+
2. Click the **Analyze Query** button.
|
52 |
+
3. Review the output in the panels:
|
53 |
+
* **Analysis Result**: The generated answer from the fine-tuned RAG system.
|
54 |
+
* **Referenced Sources**: The documents from the knowledge base used to formulate the answer.
|
55 |
+
* **Response Metrics**: A detailed breakdown of the response quality, including relevance, grounding, and technical accuracy scores.
|
56 |
+
* **Performance Data**: Information on the query processing time, tokens used, and estimated cost.
|
57 |
+
* **System Statistics**: An overview of cumulative usage and performance.
|
58 |
|
59 |
## Disclaimer
|
60 |
|
61 |
+
This project is a demonstration of a production-ready RAG system for a specialized domain. The automatic fine-tuning process is computationally intensive and may be slow on CPU. For optimal performance, running this demo on a GPU-accelerated environment is recommended. All generated responses are for informational purposes and should be validated by a qualified professional.
|