Maaac commited on
Commit
988ef73
Β·
verified Β·
1 Parent(s): 8e8eaf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +249 -251
README.md CHANGED
@@ -1,327 +1,325 @@
1
- ---
2
- license: mit
3
- tags:
4
- - codellama
5
- - linux
6
- - bugfix
7
- - lora
8
- - qlora
9
- - git-diff
10
- base_model: codellama/CodeLLaMA-7b-Instruct-hf
11
- model_type: LlamaForCausalLM
12
- library_name: peft
13
- pipeline_tag: text-generation
14
- ---
15
 
16
- # CodeLLaMA-Linux-BugFix
17
 
18
- A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
19
 
20
- ---
21
 
22
- ## 🎯 Overview
23
 
24
- This project targets automated Linux kernel bug fixing by:
25
 
26
- - **Mining real commit data** from the kernel Git history
27
- - **Training a specialized QLoRA model** on diff-style fixes
28
- - **Generating Git patches** in response to bug-prone code
29
- - **Evaluating results** using BLEU, ROUGE, and human inspection
30
 
31
- The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
32
 
33
- ---
34
 
35
- ## πŸ“Š Performance Results
36
 
37
- ### Evaluation Metrics
38
 
39
- βœ… **BLEU Score**: 33.87
40
 
41
- βœ… **ROUGE Scores**:
42
- - **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
43
- - **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
44
- - **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
45
 
46
- These results demonstrate the model's ability to:
47
- - Generate syntactically correct Git diff patches
48
- - Maintain semantic similarity to reference fixes
49
- - Produce meaningful code changes that address the underlying bugs
50
 
51
- ---
52
 
53
- ## 🧠 Model Configuration
54
 
55
- - **Base model**: `CodeLLaMA-7B-Instruct`
56
- - **Fine-tuning method**: QLoRA with 4-bit quantization
57
- - **Training setup**:
58
- - LoRA r=64, alpha=16, dropout=0.1
59
- - Batch size: 64, LR: 2e-4, Epochs: 3
60
- - Mixed precision (bfloat16), gradient checkpointing
61
- - **Hardware**: Optimized for NVIDIA H200 GPUs
62
 
63
- ---
64
 
65
- ## πŸ“Š Dataset
66
 
67
- Custom dataset extracted from Linux kernel Git history.
68
 
69
- ### Filtering Criteria
70
- Bug-fix commits containing:
71
- `fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
72
 
73
- ### Structure
74
- - Language: C (`.c`, `.h`)
75
- - Context: 10 lines before/after the change
76
- - Format:
77
 
78
- ```json
79
- {
80
- "input": {
81
- "original code": "C code snippet with bug",
82
- "instruction": "Commit message or fix description"
83
- },
84
- "output": {
85
- "diff codes": "Git diff showing the fix"
 
86
  }
87
- }
88
- ```
89
-
90
- * **File**: `training_data_100k.jsonl` (100,000 samples)
91
-
92
- ---
93
-
94
- ## πŸš€ Quick Start
95
-
96
- ### Prerequisites
97
-
98
- - Python 3.8+
99
- - CUDA-compatible GPU (recommended)
100
- - 16GB+ RAM
101
- - 50GB+ disk space
102
-
103
- ### Install dependencies
104
-
105
- ```bash
106
- pip install -r requirements.txt
107
- ```
108
-
109
- ### 1. Build the Dataset
110
-
111
- ```bash
112
- cd dataset_builder
113
- python extract_linux_bugfixes_parallel.py
114
- python format_for_training.py
115
- ```
116
-
117
- ### 2. Fine-tune the Model
118
-
119
- ```bash
120
- cd train
121
- python train_codellama_qlora_linux_bugfix.py
122
- ```
123
-
124
- ### 3. Run Evaluation
125
 
126
- ```bash
127
- cd evaluate
128
- python evaluate_linux_bugfix_model.py
129
- ```
 
 
 
 
130
 
131
- ### 4. Use the Model
132
 
133
- ```python
134
- from transformers import AutoTokenizer, AutoModelForCausalLM
135
- from peft import PeftModel
136
 
137
- # Load the fine-tuned model
138
- model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
139
- model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
140
- tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
141
 
142
- # Generate a bug fix
143
- prompt = """
144
- Given the following original C code:
145
- ```c
146
- if (!file->filter)
147
- return;
148
- ```
149
 
150
- Instruction: Fix the null pointer dereference
151
 
152
- Return the diff that fixes it:
153
- """
 
 
154
 
155
- inputs = tokenizer(prompt, return_tensors="pt")
156
- outputs = model.generate(**inputs, max_length=512, temperature=0.1)
157
- fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
158
- print(fix)
159
- ```
160
 
161
- ---
 
 
 
162
 
163
- ## πŸ“ Project Structure
164
 
165
- ```
166
- CodeLLaMA-Linux-BugFix/
167
- β”œβ”€β”€ dataset_builder/
168
- β”‚ β”œβ”€β”€ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes
169
- β”‚ β”œβ”€β”€ format_for_training.py # Format data for training
170
- β”‚ └── build_dataset.py # Main dataset builder
171
- β”œβ”€β”€ dataset/
172
- β”‚ β”œβ”€β”€ training_data_100k.jsonl # 100K training samples
173
- β”‚ └── training_data_prompt_completion.jsonl # Formatted training data
174
- β”œβ”€β”€ train/
175
- β”‚ β”œβ”€β”€ train_codellama_qlora_linux_bugfix.py # Main training script
176
- β”‚ β”œβ”€β”€ train_codellama_qlora_simple.py # Simplified training
177
- β”‚ β”œβ”€β”€ download_codellama_model.py # Model download utility
178
- β”‚ └── output/
179
- β”‚ └── qlora-codellama-bugfix/ # Trained model checkpoints
180
- β”œβ”€β”€ evaluate/
181
- β”‚ β”œβ”€β”€ evaluate_linux_bugfix_model.py # Evaluation script
182
- β”‚ β”œβ”€β”€ test_samples.jsonl # Test dataset
183
- β”‚ └── output/ # Evaluation results
184
- β”‚ β”œβ”€β”€ eval_results.csv # Detailed results
185
- β”‚ └── eval_results.json # JSON format results
186
- β”œβ”€β”€ requirements.txt # Python dependencies
187
- β”œβ”€β”€ README.md # This file
188
- └── PROJECT_STRUCTURE.md # Detailed project overview
189
- ```
190
 
191
- ---
 
 
 
192
 
193
- ## 🧩 Features
 
 
 
 
194
 
195
- * πŸ”§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
196
- * 🧠 **Real-world commits**: From actual Linux kernel development
197
- * πŸ’‘ **Context-aware**: Code context extraction around bug lines
198
- * πŸ’» **Output-ready**: Generates valid Git-style diffs
199
- * πŸ“ˆ **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
200
- * πŸš€ **Production-ready**: Optimized for real-world deployment
201
 
202
- ---
 
203
 
204
- ## πŸ“ˆ Evaluation Metrics
 
 
 
 
205
 
206
- * **BLEU**: Translation-style match to reference diffs
207
- * **ROUGE**: Overlap in fix content and semantic similarity
208
- * **Human Evaluation**: Subjective patch quality assessment
209
 
210
- ### Current Performance
211
- - **BLEU Score**: 33.87 (excellent for code generation tasks)
212
- - **ROUGE-1 F1**: 0.4355 (good semantic overlap)
213
- - **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
214
- - **ROUGE-L F1**: 0.3612 (good longest common subsequence)
215
 
216
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217
 
218
- ## πŸ§ͺ Use Cases
219
 
220
- * **Automated kernel bug fixing**: Generate fixes for common kernel bugs
221
- * **Code review assistance**: Help reviewers identify potential issues
222
- * **Teaching/debugging kernel code**: Educational tool for kernel development
223
- * **Research in automated program repair (APR)**: Academic research applications
224
- * **CI/CD integration**: Automated testing and fixing in development pipelines
225
 
226
- ---
 
 
 
 
 
227
 
228
- ## πŸ”¬ Technical Highlights
229
 
230
- ### Memory & Speed Optimizations
231
 
232
- * 4-bit quantization (NF4)
233
- * Gradient checkpointing
234
- * Mixed precision (bfloat16)
235
- * Gradient accumulation
236
- * LoRA parameter efficiency
237
 
238
- ### Training Efficiency
 
 
 
 
239
 
240
- * **QLoRA**: Reduces memory usage by ~75%
241
- * **4-bit quantization**: Further memory optimization
242
- * **Gradient checkpointing**: Trades compute for memory
243
- * **Mixed precision**: Faster training with maintained accuracy
244
 
245
- ---
246
 
247
- ## πŸ› οΈ Advanced Usage
 
 
 
 
248
 
249
- ### Custom Training
250
 
251
- ```bash
252
- # Train with custom parameters
253
- python train_codellama_qlora_linux_bugfix.py \
254
- --learning_rate 1e-4 \
255
- --num_epochs 5 \
256
- --batch_size 32 \
257
- --lora_r 32 \
258
- --lora_alpha 16
259
- ```
260
 
261
- ### Evaluation on Custom Data
262
 
263
- ```bash
264
- # Evaluate on your own test set
265
- python evaluate_linux_bugfix_model.py \
266
- --test_file your_test_data.jsonl \
267
- --output_dir custom_eval_results
268
- ```
269
 
270
- ---
271
 
272
- ## 🀝 Contributing
 
 
 
273
 
274
- 1. Fork this repo
275
- 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
276
- 3. Commit your changes (`git commit -m 'Add amazing feature'`)
277
- 4. Push to the branch (`git push origin feature/amazing-feature`)
278
- 5. Open a Pull Request πŸ™Œ
279
 
280
- ### Development Guidelines
281
 
282
- - Follow PEP 8 style guidelines
283
- - Add tests for new features
284
- - Update documentation for API changes
285
- - Ensure all tests pass before submitting PR
286
 
287
- ---
 
 
 
 
 
 
 
 
288
 
289
- ## πŸ“„ License
290
 
291
- MIT License – see `LICENSE` file for details.
 
 
 
 
 
292
 
293
- ---
294
 
295
- ## πŸ™ Acknowledgments
296
 
297
- * **Meta** for CodeLLaMA base model
298
- * **Hugging Face** for Transformers + PEFT libraries
299
- * **The Linux kernel community** for open access to commit data
300
- * **Microsoft** for introducing LoRA technique
301
- * **University of Washington** for QLoRA research
302
 
303
- ---
304
 
305
- ## πŸ“š References
 
 
 
306
 
307
- * [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
308
- * [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
309
- * [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
310
- * [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
311
 
312
- ---
313
 
314
- ## πŸ“ž Support
315
 
316
- For questions, issues, or contributions:
317
- - Open an issue on GitHub
318
- - Check the project documentation
319
- - Review the evaluation results in `evaluate/output/`
320
 
321
- ---
322
 
323
- ## πŸ”„ Version History
 
 
 
 
324
 
325
- - **v1.0.0**: Initial release with QLoRA training
326
- - **v1.1.0**: Added parallel dataset extraction
327
- - **v1.2.0**: Improved evaluation metrics and documentation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - codellama
5
+ - linux
6
+ - bugfix
7
+ - lora
8
+ - qlora
9
+ - git-diff
10
+ base_model: codellama/CodeLLaMA-7b-Instruct-hf
11
+ model_type: LlamaForCausalLM
12
+ library_name: peft
13
+ pipeline_tag: text-generation
14
+ ---
15
 
16
+ # CodeLLaMA-Linux-BugFix
17
 
18
+ A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
19
 
20
+ ---
21
 
22
+ ## 🎯 Overview
23
 
24
+ This project targets automated Linux kernel bug fixing by:
25
 
26
+ - **Mining real commit data** from the kernel Git history
27
+ - **Training a specialized QLoRA model** on diff-style fixes
28
+ - **Generating Git patches** in response to bug-prone code
29
+ - **Evaluating results** using BLEU, ROUGE, and human inspection
30
 
31
+ The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
32
 
33
+ ---
34
 
35
+ ## πŸ“Š Performance Results
36
 
37
+ ### Evaluation Metrics
38
 
39
+ βœ… **BLEU Score**: 33.87
40
 
41
+ βœ… **ROUGE Scores**:
42
+ - **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
43
+ - **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
44
+ - **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
45
 
46
+ These results demonstrate the model's ability to:
47
+ - Generate syntactically correct Git diff patches
48
+ - Maintain semantic similarity to reference fixes
49
+ - Produce meaningful code changes that address the underlying bugs
50
 
51
+ ---
52
 
53
+ ## 🧠 Model Configuration
54
 
55
+ - **Base model**: `CodeLLaMA-7B-Instruct`
56
+ - **Fine-tuning method**: QLoRA with 4-bit quantization
57
+ - **Training setup**:
58
+ - LoRA r=64, alpha=16, dropout=0.1
59
+ - Batch size: 64, LR: 2e-4, Epochs: 3
60
+ - Mixed precision (bfloat16), gradient checkpointing
61
+ - **Hardware**: Optimized for NVIDIA H200 GPUs
62
 
63
+ ---
64
 
65
+ ## πŸ“Š Dataset
66
 
67
+ Custom dataset extracted from Linux kernel Git history.
68
 
69
+ ### Filtering Criteria
70
+ Bug-fix commits containing:
71
+ `fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
72
 
73
+ ### Structure
74
+ - Language: C (`.c`, `.h`)
75
+ - Context: 10 lines before/after the change
76
+ - Format:
77
 
78
+ ```json
79
+ {
80
+ "input": {
81
+ "original code": "C code snippet with bug",
82
+ "instruction": "Commit message or fix description"
83
+ },
84
+ "output": {
85
+ "diff codes": "Git diff showing the fix"
86
+ }
87
  }
88
+ ```
89
+
90
+ * **File**: `training_data_100k.jsonl` (100,000 samples)
91
+
92
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
+ ## πŸš€ Quick Start
95
+
96
+ ### Prerequisites
97
+
98
+ - Python 3.8+
99
+ - CUDA-compatible GPU (recommended)
100
+ - 16GB+ RAM
101
+ - 50GB+ disk space
102
 
103
+ ### Install dependencies
104
 
105
+ ```bash
106
+ pip install -r requirements.txt
107
+ ```
108
 
109
+ ### 1. Build the Dataset
 
 
 
110
 
111
+ ```bash
112
+ cd dataset_builder
113
+ python extract_linux_bugfixes_parallel.py
114
+ python format_for_training.py
115
+ ```
 
 
116
 
117
+ ### 2. Fine-tune the Model
118
 
119
+ ```bash
120
+ cd train
121
+ python train_codellama_qlora_linux_bugfix.py
122
+ ```
123
 
124
+ ### 3. Run Evaluation
 
 
 
 
125
 
126
+ ```bash
127
+ cd evaluate
128
+ python evaluate_linux_bugfix_model.py
129
+ ```
130
 
131
+ ### 4. Use the Model
132
 
133
+ ```python
134
+ from transformers import AutoTokenizer, AutoModelForCausalLM
135
+ from peft import PeftModel
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
 
137
+ # Load the fine-tuned model
138
+ model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
139
+ model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
140
+ tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
141
 
142
+ # Generate a bug fix
143
+ prompt = """
144
+ Given the following original C code:
145
+ if (!file->filter)
146
+ return;
147
 
148
+ Instruction: Fix the null pointer dereference
 
 
 
 
 
149
 
150
+ Return the diff that fixes it:
151
+ """
152
 
153
+ inputs = tokenizer(prompt, return_tensors="pt")
154
+ outputs = model.generate(**inputs, max_length=512, temperature=0.1)
155
+ fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
156
+ print(fix)
157
+ ```
158
 
159
+ ---
 
 
160
 
161
+ ## πŸ“ Project Structure
 
 
 
 
162
 
163
+ ```
164
+ CodeLLaMA-Linux-BugFix/
165
+ β”œβ”€β”€ dataset_builder/
166
+ β”‚ β”œβ”€β”€ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes
167
+ β”‚ β”œβ”€β”€ format_for_training.py # Format data for training
168
+ β”‚ └── build_dataset.py # Main dataset builder
169
+ β”œβ”€β”€ dataset/
170
+ β”‚ β”œβ”€β”€ training_data_100k.jsonl # 100K training samples
171
+ β”‚ └── training_data_prompt_completion.jsonl # Formatted training data
172
+ β”œβ”€β”€ train/
173
+ β”‚ β”œβ”€β”€ train_codellama_qlora_linux_bugfix.py # Main training script
174
+ β”‚ β”œβ”€β”€ train_codellama_qlora_simple.py # Simplified training
175
+ β”‚ β”œβ”€β”€ download_codellama_model.py # Model download utility
176
+ β”‚ └── output/
177
+ β”‚ └── qlora-codellama-bugfix/ # Trained model checkpoints
178
+ β”œβ”€β”€ evaluate/
179
+ β”‚ β”œβ”€β”€ evaluate_linux_bugfix_model.py # Evaluation script
180
+ β”‚ β”œβ”€β”€ test_samples.jsonl # Test dataset
181
+ β”‚ └── output/ # Evaluation results
182
+ β”‚ β”œβ”€β”€ eval_results.csv # Detailed results
183
+ β”‚ └── eval_results.json # JSON format results
184
+ β”œβ”€β”€ requirements.txt # Python dependencies
185
+ β”œβ”€β”€ README.md # This file
186
+ └── PROJECT_STRUCTURE.md # Detailed project overview
187
+ ```
188
 
189
+ ---
190
 
191
+ ## 🧩 Features
 
 
 
 
192
 
193
+ * πŸ”§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
194
+ * 🧠 **Real-world commits**: From actual Linux kernel development
195
+ * πŸ’‘ **Context-aware**: Code context extraction around bug lines
196
+ * πŸ’» **Output-ready**: Generates valid Git-style diffs
197
+ * πŸ“ˆ **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
198
+ * πŸš€ **Production-ready**: Optimized for real-world deployment
199
 
200
+ ---
201
 
202
+ ## πŸ“ˆ Evaluation Metrics
203
 
204
+ * **BLEU**: Translation-style match to reference diffs
205
+ * **ROUGE**: Overlap in fix content and semantic similarity
206
+ * **Human Evaluation**: Subjective patch quality assessment
 
 
207
 
208
+ ### Current Performance
209
+ - **BLEU Score**: 33.87 (excellent for code generation tasks)
210
+ - **ROUGE-1 F1**: 0.4355 (good semantic overlap)
211
+ - **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
212
+ - **ROUGE-L F1**: 0.3612 (good longest common subsequence)
213
 
214
+ ---
 
 
 
215
 
216
+ ## πŸ§ͺ Use Cases
217
 
218
+ * **Automated kernel bug fixing**: Generate fixes for common kernel bugs
219
+ * **Code review assistance**: Help reviewers identify potential issues
220
+ * **Teaching/debugging kernel code**: Educational tool for kernel development
221
+ * **Research in automated program repair (APR)**: Academic research applications
222
+ * **CI/CD integration**: Automated testing and fixing in development pipelines
223
 
224
+ ---
225
 
226
+ ## πŸ”¬ Technical Highlights
 
 
 
 
 
 
 
 
227
 
228
+ ### Memory & Speed Optimizations
229
 
230
+ * 4-bit quantization (NF4)
231
+ * Gradient checkpointing
232
+ * Mixed precision (bfloat16)
233
+ * Gradient accumulation
234
+ * LoRA parameter efficiency
 
235
 
236
+ ### Training Efficiency
237
 
238
+ * **QLoRA**: Reduces memory usage by ~75%
239
+ * **4-bit quantization**: Further memory optimization
240
+ * **Gradient checkpointing**: Trades compute for memory
241
+ * **Mixed precision**: Faster training with maintained accuracy
242
 
243
+ ---
 
 
 
 
244
 
245
+ ## πŸ› οΈ Advanced Usage
246
 
247
+ ### Custom Training
 
 
 
248
 
249
+ ```bash
250
+ # Train with custom parameters
251
+ python train_codellama_qlora_linux_bugfix.py \
252
+ --learning_rate 1e-4 \
253
+ --num_epochs 5 \
254
+ --batch_size 32 \
255
+ --lora_r 32 \
256
+ --lora_alpha 16
257
+ ```
258
 
259
+ ### Evaluation on Custom Data
260
 
261
+ ```bash
262
+ # Evaluate on your own test set
263
+ python evaluate_linux_bugfix_model.py \
264
+ --test_file your_test_data.jsonl \
265
+ --output_dir custom_eval_results
266
+ ```
267
 
268
+ ---
269
 
270
+ ## 🀝 Contributing
271
 
272
+ 1. Fork this repo
273
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
274
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
275
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
276
+ 5. Open a Pull Request πŸ™Œ
277
 
278
+ ### Development Guidelines
279
 
280
+ - Follow PEP 8 style guidelines
281
+ - Add tests for new features
282
+ - Update documentation for API changes
283
+ - Ensure all tests pass before submitting PR
284
 
285
+ ---
 
 
 
286
 
287
+ ## πŸ“„ License
288
 
289
+ MIT License – see `LICENSE` file for details.
290
 
291
+ ---
 
 
 
292
 
293
+ ## πŸ™ Acknowledgments
294
 
295
+ * **Meta** for CodeLLaMA base model
296
+ * **Hugging Face** for Transformers + PEFT libraries
297
+ * **The Linux kernel community** for open access to commit data
298
+ * **Microsoft** for introducing LoRA technique
299
+ * **University of Washington** for QLoRA research
300
 
301
+ ---
302
+
303
+ ## πŸ“š References
304
+
305
+ * [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
306
+ * [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
307
+ * [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
308
+ * [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
309
+
310
+ ---
311
+
312
+ ## πŸ“ž Support
313
+
314
+ For questions, issues, or contributions:
315
+ - Open an issue on GitHub
316
+ - Check the project documentation
317
+ - Review the evaluation results in `evaluate/output/`
318
+
319
+ ---
320
+
321
+ ## πŸ”„ Version History
322
+
323
+ - **v1.0.0**: Initial release with QLoRA training
324
+ - **v1.1.0**: Added parallel dataset extraction
325
+ - **v1.2.0**: Improved evaluation metrics and documentation