Maaac commited on
Commit
84237cb
·
verified ·
1 Parent(s): 5de6ff4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -327
README.md CHANGED
@@ -1,327 +1,144 @@
1
- ---
2
- license: mit
3
- tags:
4
- - codellama
5
- - linux
6
- - bugfix
7
- - lora
8
- - qlora
9
- - git-diff
10
- base_model: codellama/CodeLLaMA-7b-Instruct-hf
11
- model_type: LlamaForCausalLM
12
- library_name: peft
13
- pipeline_tag: text-generation
14
- ---
15
-
16
- # CodeLLaMA-Linux-BugFix
17
-
18
- A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
19
-
20
- ---
21
-
22
- ## 🎯 Overview
23
-
24
- This project targets automated Linux kernel bug fixing by:
25
-
26
- - **Mining real commit data** from the kernel Git history
27
- - **Training a specialized QLoRA model** on diff-style fixes
28
- - **Generating Git patches** in response to bug-prone code
29
- - **Evaluating results** using BLEU, ROUGE, and human inspection
30
-
31
- The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
32
-
33
- ---
34
-
35
- ## 📊 Performance Results
36
-
37
- ### Evaluation Metrics
38
-
39
- **BLEU Score**: 33.87
40
-
41
- **ROUGE Scores**:
42
- - **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
43
- - **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
44
- - **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
45
-
46
- These results demonstrate the model's ability to:
47
- - Generate syntactically correct Git diff patches
48
- - Maintain semantic similarity to reference fixes
49
- - Produce meaningful code changes that address the underlying bugs
50
-
51
- ---
52
-
53
- ## 🧠 Model Configuration
54
-
55
- - **Base model**: `CodeLLaMA-7B-Instruct`
56
- - **Fine-tuning method**: QLoRA with 4-bit quantization
57
- - **Training setup**:
58
- - LoRA r=64, alpha=16, dropout=0.1
59
- - Batch size: 64, LR: 2e-4, Epochs: 3
60
- - Mixed precision (bfloat16), gradient checkpointing
61
- - **Hardware**: Optimized for NVIDIA H200 GPUs
62
-
63
- ---
64
-
65
- ## 📊 Dataset
66
-
67
- Custom dataset extracted from Linux kernel Git history.
68
-
69
- ### Filtering Criteria
70
- Bug-fix commits containing:
71
- `fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
72
-
73
- ### Structure
74
- - Language: C (`.c`, `.h`)
75
- - Context: 10 lines before/after the change
76
- - Format:
77
-
78
- ```json
79
- {
80
- "input": {
81
- "original code": "C code snippet with bug",
82
- "instruction": "Commit message or fix description"
83
- },
84
- "output": {
85
- "diff codes": "Git diff showing the fix"
86
- }
87
- }
88
- ```
89
-
90
- * **File**: `training_data_100k.jsonl` (100,000 samples)
91
-
92
- ---
93
-
94
- ## 🚀 Quick Start
95
-
96
- ### Prerequisites
97
-
98
- - Python 3.8+
99
- - CUDA-compatible GPU (recommended)
100
- - 16GB+ RAM
101
- - 50GB+ disk space
102
-
103
- ### Install dependencies
104
-
105
- ```bash
106
- pip install -r requirements.txt
107
- ```
108
-
109
- ### 1. Build the Dataset
110
-
111
- ```bash
112
- cd dataset_builder
113
- python extract_linux_bugfixes_parallel.py
114
- python format_for_training.py
115
- ```
116
-
117
- ### 2. Fine-tune the Model
118
-
119
- ```bash
120
- cd train
121
- python train_codellama_qlora_linux_bugfix.py
122
- ```
123
-
124
- ### 3. Run Evaluation
125
-
126
- ```bash
127
- cd evaluate
128
- python evaluate_linux_bugfix_model.py
129
- ```
130
-
131
- ### 4. Use the Model
132
-
133
- ```python
134
- from transformers import AutoTokenizer, AutoModelForCausalLM
135
- from peft import PeftModel
136
-
137
- # Load the fine-tuned model
138
- model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
139
- model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
140
- tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
141
-
142
- # Generate a bug fix
143
- prompt = """
144
- Given the following original C code:
145
- ```c
146
- if (!file->filter)
147
- return;
148
- ```
149
-
150
- Instruction: Fix the null pointer dereference
151
-
152
- Return the diff that fixes it:
153
- """
154
-
155
- inputs = tokenizer(prompt, return_tensors="pt")
156
- outputs = model.generate(**inputs, max_length=512, temperature=0.1)
157
- fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
158
- print(fix)
159
- ```
160
-
161
- ---
162
-
163
- ## 📁 Project Structure
164
-
165
- ```
166
- CodeLLaMA-Linux-BugFix/
167
- ├── dataset_builder/
168
- │ ├── extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes
169
- │ ├── format_for_training.py # Format data for training
170
- │ └── build_dataset.py # Main dataset builder
171
- ├── dataset/
172
- │ ├── training_data_100k.jsonl # 100K training samples
173
- │ └── training_data_prompt_completion.jsonl # Formatted training data
174
- ├── train/
175
- │ ├── train_codellama_qlora_linux_bugfix.py # Main training script
176
- │ ├── train_codellama_qlora_simple.py # Simplified training
177
- │ ├── download_codellama_model.py # Model download utility
178
- │ └── output/
179
- │ └── qlora-codellama-bugfix/ # Trained model checkpoints
180
- ├── evaluate/
181
- │ ├── evaluate_linux_bugfix_model.py # Evaluation script
182
- │ ├── test_samples.jsonl # Test dataset
183
- │ └── output/ # Evaluation results
184
- │ ├── eval_results.csv # Detailed results
185
- │ └── eval_results.json # JSON format results
186
- ├── requirements.txt # Python dependencies
187
- ├── README.md # This file
188
- └── PROJECT_STRUCTURE.md # Detailed project overview
189
- ```
190
-
191
- ---
192
-
193
- ## 🧩 Features
194
-
195
- * 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
196
- * 🧠 **Real-world commits**: From actual Linux kernel development
197
- * 💡 **Context-aware**: Code context extraction around bug lines
198
- * 💻 **Output-ready**: Generates valid Git-style diffs
199
- * 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
200
- * 🚀 **Production-ready**: Optimized for real-world deployment
201
-
202
- ---
203
-
204
- ## 📈 Evaluation Metrics
205
-
206
- * **BLEU**: Translation-style match to reference diffs
207
- * **ROUGE**: Overlap in fix content and semantic similarity
208
- * **Human Evaluation**: Subjective patch quality assessment
209
-
210
- ### Current Performance
211
- - **BLEU Score**: 33.87 (excellent for code generation tasks)
212
- - **ROUGE-1 F1**: 0.4355 (good semantic overlap)
213
- - **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
214
- - **ROUGE-L F1**: 0.3612 (good longest common subsequence)
215
-
216
- ---
217
-
218
- ## 🧪 Use Cases
219
-
220
- * **Automated kernel bug fixing**: Generate fixes for common kernel bugs
221
- * **Code review assistance**: Help reviewers identify potential issues
222
- * **Teaching/debugging kernel code**: Educational tool for kernel development
223
- * **Research in automated program repair (APR)**: Academic research applications
224
- * **CI/CD integration**: Automated testing and fixing in development pipelines
225
-
226
- ---
227
-
228
- ## 🔬 Technical Highlights
229
-
230
- ### Memory & Speed Optimizations
231
-
232
- * 4-bit quantization (NF4)
233
- * Gradient checkpointing
234
- * Mixed precision (bfloat16)
235
- * Gradient accumulation
236
- * LoRA parameter efficiency
237
-
238
- ### Training Efficiency
239
-
240
- * **QLoRA**: Reduces memory usage by ~75%
241
- * **4-bit quantization**: Further memory optimization
242
- * **Gradient checkpointing**: Trades compute for memory
243
- * **Mixed precision**: Faster training with maintained accuracy
244
-
245
- ---
246
-
247
- ## 🛠️ Advanced Usage
248
-
249
- ### Custom Training
250
-
251
- ```bash
252
- # Train with custom parameters
253
- python train_codellama_qlora_linux_bugfix.py \
254
- --learning_rate 1e-4 \
255
- --num_epochs 5 \
256
- --batch_size 32 \
257
- --lora_r 32 \
258
- --lora_alpha 16
259
- ```
260
-
261
- ### Evaluation on Custom Data
262
-
263
- ```bash
264
- # Evaluate on your own test set
265
- python evaluate_linux_bugfix_model.py \
266
- --test_file your_test_data.jsonl \
267
- --output_dir custom_eval_results
268
- ```
269
-
270
- ---
271
-
272
- ## 🤝 Contributing
273
-
274
- 1. Fork this repo
275
- 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
276
- 3. Commit your changes (`git commit -m 'Add amazing feature'`)
277
- 4. Push to the branch (`git push origin feature/amazing-feature`)
278
- 5. Open a Pull Request 🙌
279
-
280
- ### Development Guidelines
281
-
282
- - Follow PEP 8 style guidelines
283
- - Add tests for new features
284
- - Update documentation for API changes
285
- - Ensure all tests pass before submitting PR
286
-
287
- ---
288
-
289
- ## 📄 License
290
-
291
- MIT License – see `LICENSE` file for details.
292
-
293
- ---
294
-
295
- ## 🙏 Acknowledgments
296
-
297
- * **Meta** for CodeLLaMA base model
298
- * **Hugging Face** for Transformers + PEFT libraries
299
- * **The Linux kernel community** for open access to commit data
300
- * **Microsoft** for introducing LoRA technique
301
- * **University of Washington** for QLoRA research
302
-
303
- ---
304
-
305
- ## 📚 References
306
-
307
- * [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
308
- * [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
309
- * [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
310
- * [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
311
-
312
- ---
313
-
314
- ## 📞 Support
315
-
316
- For questions, issues, or contributions:
317
- - Open an issue on GitHub
318
- - Check the project documentation
319
- - Review the evaluation results in `evaluate/output/`
320
-
321
- ---
322
-
323
- ## 🔄 Version History
324
-
325
- - **v1.0.0**: Initial release with QLoRA training
326
- - **v1.1.0**: Added parallel dataset extraction
327
- - **v1.2.0**: Improved evaluation metrics and documentation
 
1
+ ````markdown
2
+ ---
3
+ license: mit
4
+ tags:
5
+ - codellama
6
+ - linux
7
+ - bugfix
8
+ - lora
9
+ - qlora
10
+ - git-diff
11
+ base_model: codellama/CodeLLaMA-7b-Instruct-hf
12
+ model_type: LlamaForCausalLM
13
+ library_name: peft
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # CodeLLaMA-Linux-BugFix
18
+
19
+ A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
20
+
21
+ ---
22
+
23
+ ## 🎯 Overview
24
+
25
+ This project targets automated Linux kernel bug fixing by:
26
+
27
+ - Mining real commit data from kernel Git history
28
+ - Training a QLoRA model to generate Git-style fixes
29
+ - Evaluating performance using BLEU and ROUGE
30
+ - Supporting integration into code review pipelines
31
+
32
+ ---
33
+
34
+ ## 📊 Performance Results
35
+
36
+ **BLEU Score**: 33.87
37
+
38
+ **ROUGE Scores**:
39
+ - ROUGE-1: P=0.3775, R=0.7306, F1=0.4355
40
+ - ROUGE-2: P=0.2898, R=0.6096, F1=0.3457
41
+ - ROUGE-L: P=0.3023, R=0.6333, F1=0.3612
42
+
43
+ These results show that the model generates high-quality diffs with good semantic similarity to ground-truth patches.
44
+
45
+ ---
46
+
47
+ ## 🧠 Model Configuration
48
+
49
+ - **Base model**: `CodeLLaMA-7B-Instruct`
50
+ - **Fine-tuning**: QLoRA (LoRA r=64, α=16, dropout=0.1)
51
+ - **Quantization**: 4-bit NF4
52
+ - **Training**: 3 epochs, batch size 64, LR 2e-4
53
+ - **Precision**: bfloat16 with gradient checkpointing
54
+ - **Hardware**: 1× NVIDIA H200 (144 GB VRAM)
55
+
56
+ ---
57
+
58
+ ## 🗃️ Dataset
59
+
60
+ - 100,000 samples from Linux kernel Git commits
61
+ - Format: JSONL with `"prompt"` and `"completion"` fields
62
+ - Content: C code segments + commit messages → Git diffs
63
+ - Source: Bug-fix commits filtered by keywords like `fix`, `null`, `race`, `panic`
64
+
65
+ ---
66
+
67
+ ## 🚀 Usage
68
+
69
+ ```python
70
+ from transformers import AutoTokenizer, AutoModelForCausalLM
71
+ from peft import PeftModel
72
+
73
+ model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
74
+ model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
75
+ tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
76
+
77
+ prompt = '''
78
+ Given the following original C code:
79
+ ```c
80
+ if (!file->filter)
81
+ return;
82
+ ````
83
+
84
+ Instruction: Fix the null pointer dereference
85
+
86
+ Return the diff that fixes it:
87
+ '''
88
+
89
+ inputs = tokenizer(prompt, return\_tensors="pt")
90
+ outputs = model.generate(\*\*inputs, max\_length=512, temperature=0.1)
91
+ fix = tokenizer.decode(outputs\[0], skip\_special\_tokens=True)
92
+ print(fix)
93
+
94
+ ```
95
+
96
+ ---
97
+
98
+ ## 📁 Structure
99
+
100
+ ```
101
+
102
+ CodeLLaMA-Linux-BugFix/
103
+ ├── dataset/ # Raw and processed JSONL files
104
+ ├── dataset\_builder/ # Scripts for mining & formatting commits
105
+ ├── train/ # Training scripts & checkpoints
106
+ ├── evaluate/ # Evaluation scripts & results
107
+ └── requirements.txt # Dependencies
108
+
109
+ ```
110
+
111
+ ---
112
+
113
+ ## 📈 Metrics
114
+
115
+ | Metric | Score |
116
+ |----------|--------|
117
+ | BLEU | 33.87 |
118
+ | ROUGE-1 | 0.4355 |
119
+ | ROUGE-2 | 0.3457 |
120
+ | ROUGE-L | 0.3612 |
121
+
122
+ ---
123
+
124
+ ## 🔬 Use Cases
125
+
126
+ - Kernel patch suggestion tools
127
+ - Code review assistants
128
+ - Bug localization + repair research
129
+ - APR benchmarks for kernel code
130
+
131
+ ---
132
+
133
+ ## 📄 License
134
+
135
+ MIT License
136
+
137
+ ---
138
+
139
+ ## 📚 References
140
+
141
+ - [CodeLLaMA](https://arxiv.org/abs/2308.12950)
142
+ - [QLoRA](https://arxiv.org/abs/2305.14314)
143
+ - [LoRA](https://arxiv.org/abs/2106.09685)
144
+ ```