|
--- |
|
license: mit |
|
tags: |
|
- codellama |
|
- linux |
|
- bugfix |
|
- lora |
|
- qlora |
|
- git-diff |
|
base_model: codellama/CodeLLaMA-7b-Instruct-hf |
|
model_type: LlamaForCausalLM |
|
library_name: peft |
|
pipeline_tag: text-generation |
|
|
|
model-index: |
|
- name: CodeLLaMA-Linux-BugFix |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Bug-fix Patch Generation |
|
dataset: |
|
type: custom |
|
name: Linux Kernel Bugfix Commits |
|
config: linux-bugfix-prompt-completion |
|
split: test |
|
metrics: |
|
- type: bleu |
|
value: 33.87 |
|
name: BLEU |
|
- type: rouge1 |
|
value: 0.4355 |
|
name: ROUGE-1 F1 |
|
- type: rouge2 |
|
value: 0.3457 |
|
name: ROUGE-2 F1 |
|
- type: rougeL |
|
value: 0.3612 |
|
name: ROUGE-L F1 |
|
--- |
|
|
|
# CodeLLaMA-Linux-BugFix |
|
|
|
A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages. |
|
|
|
--- |
|
|
|
## π― Overview |
|
|
|
This project targets automated Linux kernel bug fixing by: |
|
|
|
- **Mining real commit data** from the kernel Git history |
|
- **Training a specialized QLoRA model** on diff-style fixes |
|
- **Generating Git patches** in response to bug-prone code |
|
- **Evaluating results** using BLEU, ROUGE, and human inspection |
|
|
|
The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection. |
|
|
|
--- |
|
|
|
## π Performance Results |
|
|
|
### Evaluation Metrics |
|
|
|
β
**BLEU Score**: 33.87 |
|
|
|
β
**ROUGE Scores**: |
|
- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355 |
|
- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457 |
|
- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612 |
|
|
|
These results demonstrate the model's ability to: |
|
- Generate syntactically correct Git diff patches |
|
- Maintain semantic similarity to reference fixes |
|
- Produce meaningful code changes that address the underlying bugs |
|
|
|
--- |
|
|
|
## π§ Model Configuration |
|
|
|
- **Base model**: `CodeLLaMA-7B-Instruct` |
|
- **Fine-tuning method**: QLoRA with 4-bit quantization |
|
- **Training setup**: |
|
- LoRA r=64, alpha=16, dropout=0.1 |
|
- Batch size: 64, LR: 2e-4, Epochs: 3 |
|
- Mixed precision (bfloat16), gradient checkpointing |
|
- **Hardware**: Optimized for NVIDIA H200 GPUs |
|
|
|
--- |
|
|
|
## π Training Progress |
|
The model was trained for 1000 steps with the following key metrics: |
|
### Training Results |
|
- **Final Loss**: ~0.3335 (converged) |
|
- **Final Learning Rate**: 2.08304527802282E-06 |
|
- **Training Steps**: 1000 |
|
- **Convergence**: Stable loss plateau achieved |
|
### Training Curves |
|
 |
|
*Training loss over 1000 steps showing convergence around 0.3335* |
|
 |
|
*Learning rate decay schedule with final rate of 2.08304527802282E-06* |
|
|
|
--- |
|
|
|
## π Dataset |
|
|
|
Custom dataset extracted from Linux kernel Git history. |
|
|
|
### Filtering Criteria |
|
Bug-fix commits containing: |
|
`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc. |
|
|
|
### Structure |
|
- Language: C (`.c`, `.h`) |
|
- Context: 10 lines before/after the change |
|
- Format: |
|
|
|
```json |
|
{ |
|
"input": { |
|
"original code": "C code snippet with bug", |
|
"instruction": "Commit message or fix description" |
|
}, |
|
"output": { |
|
"diff codes": "Git diff showing the fix" |
|
} |
|
} |
|
``` |
|
|
|
* **File**: `training_data_100k.jsonl` (100,000 samples) |
|
|
|
--- |
|
|
|
## π Quick Start |
|
|
|
### Prerequisites |
|
|
|
- Python 3.8+ |
|
- CUDA-compatible GPU (recommended) |
|
- 16GB+ RAM |
|
- 50GB+ disk space |
|
|
|
### Install dependencies |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### 1. Build the Dataset |
|
|
|
```bash |
|
cd dataset_builder |
|
python extract_linux_bugfixes_parallel.py |
|
python format_for_training.py |
|
``` |
|
|
|
### 2. Fine-tune the Model |
|
|
|
```bash |
|
cd train |
|
python train_codellama_qlora_linux_bugfix.py |
|
``` |
|
|
|
### 3. Run Evaluation |
|
|
|
```bash |
|
cd evaluate |
|
python evaluate_linux_bugfix_model.py |
|
``` |
|
|
|
### 4. Use the Model |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from peft import PeftModel |
|
|
|
# Load the fine-tuned model |
|
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") |
|
model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix") |
|
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") |
|
|
|
# Generate a bug fix |
|
prompt = """ |
|
Given the following original C code: |
|
if (!file->filter) |
|
return; |
|
|
|
Instruction: Fix the null pointer dereference |
|
|
|
Return the diff that fixes it: |
|
""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=512, temperature=0.1) |
|
fix = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(fix) |
|
``` |
|
|
|
--- |
|
|
|
## π Project Structure |
|
|
|
``` |
|
CodeLLaMA-Linux-BugFix/ |
|
βββ dataset_builder/ |
|
β βββ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes |
|
β βββ format_for_training.py # Format data for training |
|
β βββ build_dataset.py # Main dataset builder |
|
βββ dataset/ |
|
β βββ training_data_100k.jsonl # 100K training samples |
|
β βββ training_data_prompt_completion.jsonl # Formatted training data |
|
βββ train/ |
|
β βββ train_codellama_qlora_linux_bugfix.py # Main training script |
|
β βββ train_codellama_qlora_simple.py # Simplified training |
|
β βββ download_codellama_model.py # Model download utility |
|
β βββ output/ |
|
β βββ qlora-codellama-bugfix/ # Trained model checkpoints |
|
βββ evaluate/ |
|
β βββ evaluate_linux_bugfix_model.py # Evaluation script |
|
β βββ test_samples.jsonl # Test dataset |
|
β βββ output/ # Evaluation results |
|
β βββ eval_results.csv # Detailed results |
|
β βββ eval_results.json # JSON format results |
|
βββ requirements.txt # Python dependencies |
|
βββ README.md # This file |
|
βββ PROJECT_STRUCTURE.md # Detailed project overview |
|
``` |
|
|
|
--- |
|
|
|
## π§© Features |
|
|
|
* π§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings |
|
* π§ **Real-world commits**: From actual Linux kernel development |
|
* π‘ **Context-aware**: Code context extraction around bug lines |
|
* π» **Output-ready**: Generates valid Git-style diffs |
|
* π **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics |
|
* π **Production-ready**: Optimized for real-world deployment |
|
|
|
--- |
|
|
|
## π Evaluation Metrics |
|
|
|
* **BLEU**: Translation-style match to reference diffs |
|
* **ROUGE**: Overlap in fix content and semantic similarity |
|
* **Human Evaluation**: Subjective patch quality assessment |
|
|
|
### Current Performance |
|
- **BLEU Score**: 33.87 (excellent for code generation tasks) |
|
- **ROUGE-1 F1**: 0.4355 (good semantic overlap) |
|
- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching) |
|
- **ROUGE-L F1**: 0.3612 (good longest common subsequence) |
|
|
|
--- |
|
|
|
## π§ͺ Use Cases |
|
|
|
* **Automated kernel bug fixing**: Generate fixes for common kernel bugs |
|
* **Code review assistance**: Help reviewers identify potential issues |
|
* **Teaching/debugging kernel code**: Educational tool for kernel development |
|
* **Research in automated program repair (APR)**: Academic research applications |
|
* **CI/CD integration**: Automated testing and fixing in development pipelines |
|
|
|
--- |
|
|
|
## π¬ Technical Highlights |
|
|
|
### Memory & Speed Optimizations |
|
|
|
* 4-bit quantization (NF4) |
|
* Gradient checkpointing |
|
* Mixed precision (bfloat16) |
|
* Gradient accumulation |
|
* LoRA parameter efficiency |
|
|
|
### Training Efficiency |
|
|
|
* **QLoRA**: Reduces memory usage by ~75% |
|
* **4-bit quantization**: Further memory optimization |
|
* **Gradient checkpointing**: Trades compute for memory |
|
* **Mixed precision**: Faster training with maintained accuracy |
|
|
|
--- |
|
|
|
## π οΈ Advanced Usage |
|
|
|
### Custom Training |
|
|
|
```bash |
|
# Train with custom parameters |
|
python train_codellama_qlora_linux_bugfix.py \ |
|
--learning_rate 1e-4 \ |
|
--num_epochs 5 \ |
|
--batch_size 32 \ |
|
--lora_r 32 \ |
|
--lora_alpha 16 |
|
``` |
|
|
|
### Evaluation on Custom Data |
|
|
|
```bash |
|
# Evaluate on your own test set |
|
python evaluate_linux_bugfix_model.py \ |
|
--test_file your_test_data.jsonl \ |
|
--output_dir custom_eval_results |
|
``` |
|
|
|
--- |
|
|
|
## π€ Contributing |
|
|
|
1. Fork this repo |
|
2. Create a feature branch (`git checkout -b feature/amazing-feature`) |
|
3. Commit your changes (`git commit -m 'Add amazing feature'`) |
|
4. Push to the branch (`git push origin feature/amazing-feature`) |
|
5. Open a Pull Request π |
|
|
|
### Development Guidelines |
|
|
|
- Follow PEP 8 style guidelines |
|
- Add tests for new features |
|
- Update documentation for API changes |
|
- Ensure all tests pass before submitting PR |
|
|
|
--- |
|
|
|
## π License |
|
|
|
MIT License β see `LICENSE` file for details. |
|
|
|
--- |
|
|
|
## π Acknowledgments |
|
|
|
* **Meta** for CodeLLaMA base model |
|
* **Hugging Face** for Transformers + PEFT libraries |
|
* **The Linux kernel community** for open access to commit data |
|
* **Microsoft** for introducing LoRA technique |
|
* **University of Washington** for QLoRA research |
|
|
|
--- |
|
|
|
## π References |
|
|
|
* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950) |
|
* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314) |
|
* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685) |
|
* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519) |
|
|
|
--- |
|
|
|
## π Support |
|
|
|
For questions, issues, or contributions: |
|
- Open an issue on GitHub |
|
- Check the project documentation |
|
- Review the evaluation results in `evaluate/output/` |
|
|
|
--- |
|
|
|
## π Version History |
|
|
|
- **v1.0.0**: Initial release with QLoRA training |
|
- **v1.1.0**: Added parallel dataset extraction |
|
- **v1.2.0**: Improved evaluation metrics and documentation |
|
======= |
|
--- |
|
license: mit |
|
tags: |
|
- codellama |
|
- linux |
|
- bugfix |
|
- lora |
|
- qlora |
|
- git-diff |
|
base_model: codellama/CodeLLaMA-7b-Instruct-hf |
|
model_type: LlamaForCausalLM |
|
library_name: peft |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# CodeLLaMA-Linux-BugFix |
|
|
|
A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages. |
|
|
|
--- |
|
|
|
## π― Overview |
|
|
|
This project targets automated Linux kernel bug fixing by: |
|
|
|
- **Mining real commit data** from the kernel Git history |
|
- **Training a specialized QLoRA model** on diff-style fixes |
|
- **Generating Git patches** in response to bug-prone code |
|
- **Evaluating results** using BLEU, ROUGE, and human inspection |
|
|
|
The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection. |
|
|
|
--- |
|
|
|
## π Performance Results |
|
|
|
### Evaluation Metrics |
|
|
|
β
**BLEU Score**: 33.87 |
|
|
|
β
**ROUGE Scores**: |
|
- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355 |
|
- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457 |
|
- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612 |
|
|
|
These results demonstrate the model's ability to: |
|
- Generate syntactically correct Git diff patches |
|
- Maintain semantic similarity to reference fixes |
|
- Produce meaningful code changes that address the underlying bugs |
|
|
|
--- |
|
|
|
## π§ Model Configuration |
|
|
|
- **Base model**: `CodeLLaMA-7B-Instruct` |
|
- **Fine-tuning method**: QLoRA with 4-bit quantization |
|
- **Training setup**: |
|
- LoRA r=64, alpha=16, dropout=0.1 |
|
- Batch size: 64, LR: 2e-4, Epochs: 3 |
|
- Mixed precision (bfloat16), gradient checkpointing |
|
- **Hardware**: Optimized for NVIDIA H200 GPUs |
|
|
|
--- |
|
|
|
## π Dataset |
|
|
|
Custom dataset extracted from Linux kernel Git history. |
|
|
|
### Filtering Criteria |
|
Bug-fix commits containing: |
|
`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc. |
|
|
|
### Structure |
|
- Language: C (`.c`, `.h`) |
|
- Context: 10 lines before/after the change |
|
- Format: |
|
|
|
```json |
|
{ |
|
"input": { |
|
"original code": "C code snippet with bug", |
|
"instruction": "Commit message or fix description" |
|
}, |
|
"output": { |
|
"diff codes": "Git diff showing the fix" |
|
} |
|
} |
|
``` |
|
|
|
* **File**: `training_data_100k.jsonl` (100,000 samples) |
|
|
|
--- |
|
|
|
## π Quick Start |
|
|
|
### Prerequisites |
|
|
|
- Python 3.8+ |
|
- CUDA-compatible GPU (recommended) |
|
- 16GB+ RAM |
|
- 50GB+ disk space |
|
|
|
### Install dependencies |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### 1. Build the Dataset |
|
|
|
```bash |
|
cd dataset_builder |
|
python extract_linux_bugfixes_parallel.py |
|
python format_for_training.py |
|
``` |
|
|
|
### 2. Fine-tune the Model |
|
|
|
```bash |
|
cd train |
|
python train_codellama_qlora_linux_bugfix.py |
|
``` |
|
|
|
### 3. Run Evaluation |
|
|
|
```bash |
|
cd evaluate |
|
python evaluate_linux_bugfix_model.py |
|
``` |
|
|
|
### 4. Use the Model |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from peft import PeftModel |
|
|
|
# Load the fine-tuned model |
|
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") |
|
model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix") |
|
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") |
|
|
|
# Generate a bug fix |
|
prompt = """ |
|
Given the following original C code: |
|
if (!file->filter) |
|
return; |
|
|
|
Instruction: Fix the null pointer dereference |
|
|
|
Return the diff that fixes it: |
|
""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=512, temperature=0.1) |
|
fix = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(fix) |
|
``` |
|
|
|
--- |
|
|
|
## π Project Structure |
|
|
|
``` |
|
CodeLLaMA-Linux-BugFix/ |
|
βββ dataset_builder/ |
|
β βββ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes |
|
β βββ format_for_training.py # Format data for training |
|
β βββ build_dataset.py # Main dataset builder |
|
βββ dataset/ |
|
β βββ training_data_100k.jsonl # 100K training samples |
|
β βββ training_data_prompt_completion.jsonl # Formatted training data |
|
βββ train/ |
|
β βββ train_codellama_qlora_linux_bugfix.py # Main training script |
|
β βββ train_codellama_qlora_simple.py # Simplified training |
|
β βββ download_codellama_model.py # Model download utility |
|
β βββ output/ |
|
β βββ qlora-codellama-bugfix/ # Trained model checkpoints |
|
βββ evaluate/ |
|
β βββ evaluate_linux_bugfix_model.py # Evaluation script |
|
β βββ test_samples.jsonl # Test dataset |
|
β βββ output/ # Evaluation results |
|
β βββ eval_results.csv # Detailed results |
|
β βββ eval_results.json # JSON format results |
|
βββ requirements.txt # Python dependencies |
|
βββ README.md # This file |
|
βββ PROJECT_STRUCTURE.md # Detailed project overview |
|
``` |
|
|
|
--- |
|
|
|
## π§© Features |
|
|
|
* π§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings |
|
* π§ **Real-world commits**: From actual Linux kernel development |
|
* π‘ **Context-aware**: Code context extraction around bug lines |
|
* π» **Output-ready**: Generates valid Git-style diffs |
|
* π **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics |
|
* π **Production-ready**: Optimized for real-world deployment |
|
|
|
--- |
|
|
|
## π Evaluation Metrics |
|
|
|
* **BLEU**: Translation-style match to reference diffs |
|
* **ROUGE**: Overlap in fix content and semantic similarity |
|
* **Human Evaluation**: Subjective patch quality assessment |
|
|
|
### Current Performance |
|
- **BLEU Score**: 33.87 (excellent for code generation tasks) |
|
- **ROUGE-1 F1**: 0.4355 (good semantic overlap) |
|
- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching) |
|
- **ROUGE-L F1**: 0.3612 (good longest common subsequence) |
|
|
|
--- |
|
|
|
## π§ͺ Use Cases |
|
|
|
* **Automated kernel bug fixing**: Generate fixes for common kernel bugs |
|
* **Code review assistance**: Help reviewers identify potential issues |
|
* **Teaching/debugging kernel code**: Educational tool for kernel development |
|
* **Research in automated program repair (APR)**: Academic research applications |
|
* **CI/CD integration**: Automated testing and fixing in development pipelines |
|
|
|
--- |
|
|
|
## π¬ Technical Highlights |
|
|
|
### Memory & Speed Optimizations |
|
|
|
* 4-bit quantization (NF4) |
|
* Gradient checkpointing |
|
* Mixed precision (bfloat16) |
|
* Gradient accumulation |
|
* LoRA parameter efficiency |
|
|
|
### Training Efficiency |
|
|
|
* **QLoRA**: Reduces memory usage by ~75% |
|
* **4-bit quantization**: Further memory optimization |
|
* **Gradient checkpointing**: Trades compute for memory |
|
* **Mixed precision**: Faster training with maintained accuracy |
|
|
|
--- |
|
|
|
## π οΈ Advanced Usage |
|
|
|
### Custom Training |
|
|
|
```bash |
|
# Train with custom parameters |
|
python train_codellama_qlora_linux_bugfix.py \ |
|
--learning_rate 1e-4 \ |
|
--num_epochs 5 \ |
|
--batch_size 32 \ |
|
--lora_r 32 \ |
|
--lora_alpha 16 |
|
``` |
|
|
|
### Evaluation on Custom Data |
|
|
|
```bash |
|
# Evaluate on your own test set |
|
python evaluate_linux_bugfix_model.py \ |
|
--test_file your_test_data.jsonl \ |
|
--output_dir custom_eval_results |
|
``` |
|
|
|
--- |
|
|
|
## π€ Contributing |
|
|
|
1. Fork this repo |
|
2. Create a feature branch (`git checkout -b feature/amazing-feature`) |
|
3. Commit your changes (`git commit -m 'Add amazing feature'`) |
|
4. Push to the branch (`git push origin feature/amazing-feature`) |
|
5. Open a Pull Request π |
|
|
|
### Development Guidelines |
|
|
|
- Follow PEP 8 style guidelines |
|
- Add tests for new features |
|
- Update documentation for API changes |
|
- Ensure all tests pass before submitting PR |
|
|
|
--- |
|
|
|
## π License |
|
|
|
MIT License β see `LICENSE` file for details. |
|
|
|
--- |
|
|
|
## π Acknowledgments |
|
|
|
* **Meta** for CodeLLaMA base model |
|
* **Hugging Face** for Transformers + PEFT libraries |
|
* **The Linux kernel community** for open access to commit data |
|
* **Microsoft** for introducing LoRA technique |
|
* **University of Washington** for QLoRA research |
|
|
|
--- |
|
|
|
## π References |
|
|
|
* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950) |
|
* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314) |
|
* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685) |
|
* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519) |
|
|
|
--- |
|
|
|
## π Support |
|
|
|
For questions, issues, or contributions: |
|
- Open an issue on GitHub |
|
- Check the project documentation |
|
- Review the evaluation results in `evaluate/output/` |
|
|
|
--- |
|
|
|
## π Version History |
|
|
|
- **v1.0.0**: Initial release with QLoRA training |
|
- **v1.1.0**: Added parallel dataset extraction |
|
- **v1.2.0**: Improved evaluation metrics and documentation |
|
|