Maaac's picture
Update README.md
b36f238 verified
---
license: mit
tags:
- codellama
- linux
- bugfix
- lora
- qlora
- git-diff
base_model: codellama/CodeLLaMA-7b-Instruct-hf
model_type: LlamaForCausalLM
library_name: peft
pipeline_tag: text-generation
model-index:
- name: CodeLLaMA-Linux-BugFix
results:
- task:
type: text-generation
name: Bug-fix Patch Generation
dataset:
type: custom
name: Linux Kernel Bugfix Commits
config: linux-bugfix-prompt-completion
split: test
metrics:
- type: bleu
value: 33.87
name: BLEU
- type: rouge1
value: 0.4355
name: ROUGE-1 F1
- type: rouge2
value: 0.3457
name: ROUGE-2 F1
- type: rougeL
value: 0.3612
name: ROUGE-L F1
---
# CodeLLaMA-Linux-BugFix
A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
---
## 🎯 Overview
This project targets automated Linux kernel bug fixing by:
- **Mining real commit data** from the kernel Git history
- **Training a specialized QLoRA model** on diff-style fixes
- **Generating Git patches** in response to bug-prone code
- **Evaluating results** using BLEU, ROUGE, and human inspection
The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
---
## πŸ“Š Performance Results
### Evaluation Metrics
βœ… **BLEU Score**: 33.87
βœ… **ROUGE Scores**:
- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
These results demonstrate the model's ability to:
- Generate syntactically correct Git diff patches
- Maintain semantic similarity to reference fixes
- Produce meaningful code changes that address the underlying bugs
---
## 🧠 Model Configuration
- **Base model**: `CodeLLaMA-7B-Instruct`
- **Fine-tuning method**: QLoRA with 4-bit quantization
- **Training setup**:
- LoRA r=64, alpha=16, dropout=0.1
- Batch size: 64, LR: 2e-4, Epochs: 3
- Mixed precision (bfloat16), gradient checkpointing
- **Hardware**: Optimized for NVIDIA H200 GPUs
---
## πŸ“ˆ Training Progress
The model was trained for 1000 steps with the following key metrics:
### Training Results
- **Final Loss**: ~0.3335 (converged)
- **Final Learning Rate**: 2.08304527802282E-06
- **Training Steps**: 1000
- **Convergence**: Stable loss plateau achieved
### Training Curves
![Training Loss](train/output/loss.png)
*Training loss over 1000 steps showing convergence around 0.3335*
![Learning Rate Schedule](train/output/learning_rate.png)
*Learning rate decay schedule with final rate of 2.08304527802282E-06*
---
## πŸ“Š Dataset
Custom dataset extracted from Linux kernel Git history.
### Filtering Criteria
Bug-fix commits containing:
`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
### Structure
- Language: C (`.c`, `.h`)
- Context: 10 lines before/after the change
- Format:
```json
{
"input": {
"original code": "C code snippet with bug",
"instruction": "Commit message or fix description"
},
"output": {
"diff codes": "Git diff showing the fix"
}
}
```
* **File**: `training_data_100k.jsonl` (100,000 samples)
---
## πŸš€ Quick Start
### Prerequisites
- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
- 50GB+ disk space
### Install dependencies
```bash
pip install -r requirements.txt
```
### 1. Build the Dataset
```bash
cd dataset_builder
python extract_linux_bugfixes_parallel.py
python format_for_training.py
```
### 2. Fine-tune the Model
```bash
cd train
python train_codellama_qlora_linux_bugfix.py
```
### 3. Run Evaluation
```bash
cd evaluate
python evaluate_linux_bugfix_model.py
```
### 4. Use the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
# Generate a bug fix
prompt = """
Given the following original C code:
if (!file->filter)
return;
Instruction: Fix the null pointer dereference
Return the diff that fixes it:
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.1)
fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(fix)
```
---
## πŸ“ Project Structure
```
CodeLLaMA-Linux-BugFix/
β”œβ”€β”€ dataset_builder/
β”‚ β”œβ”€β”€ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes
β”‚ β”œβ”€β”€ format_for_training.py # Format data for training
β”‚ └── build_dataset.py # Main dataset builder
β”œβ”€β”€ dataset/
β”‚ β”œβ”€β”€ training_data_100k.jsonl # 100K training samples
β”‚ └── training_data_prompt_completion.jsonl # Formatted training data
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ train_codellama_qlora_linux_bugfix.py # Main training script
β”‚ β”œβ”€β”€ train_codellama_qlora_simple.py # Simplified training
β”‚ β”œβ”€β”€ download_codellama_model.py # Model download utility
β”‚ └── output/
β”‚ └── qlora-codellama-bugfix/ # Trained model checkpoints
β”œβ”€β”€ evaluate/
β”‚ β”œβ”€β”€ evaluate_linux_bugfix_model.py # Evaluation script
β”‚ β”œβ”€β”€ test_samples.jsonl # Test dataset
β”‚ └── output/ # Evaluation results
β”‚ β”œβ”€β”€ eval_results.csv # Detailed results
β”‚ └── eval_results.json # JSON format results
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
└── PROJECT_STRUCTURE.md # Detailed project overview
```
---
## 🧩 Features
* πŸ”§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
* 🧠 **Real-world commits**: From actual Linux kernel development
* πŸ’‘ **Context-aware**: Code context extraction around bug lines
* πŸ’» **Output-ready**: Generates valid Git-style diffs
* πŸ“ˆ **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
* πŸš€ **Production-ready**: Optimized for real-world deployment
---
## πŸ“ˆ Evaluation Metrics
* **BLEU**: Translation-style match to reference diffs
* **ROUGE**: Overlap in fix content and semantic similarity
* **Human Evaluation**: Subjective patch quality assessment
### Current Performance
- **BLEU Score**: 33.87 (excellent for code generation tasks)
- **ROUGE-1 F1**: 0.4355 (good semantic overlap)
- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
- **ROUGE-L F1**: 0.3612 (good longest common subsequence)
---
## πŸ§ͺ Use Cases
* **Automated kernel bug fixing**: Generate fixes for common kernel bugs
* **Code review assistance**: Help reviewers identify potential issues
* **Teaching/debugging kernel code**: Educational tool for kernel development
* **Research in automated program repair (APR)**: Academic research applications
* **CI/CD integration**: Automated testing and fixing in development pipelines
---
## πŸ”¬ Technical Highlights
### Memory & Speed Optimizations
* 4-bit quantization (NF4)
* Gradient checkpointing
* Mixed precision (bfloat16)
* Gradient accumulation
* LoRA parameter efficiency
### Training Efficiency
* **QLoRA**: Reduces memory usage by ~75%
* **4-bit quantization**: Further memory optimization
* **Gradient checkpointing**: Trades compute for memory
* **Mixed precision**: Faster training with maintained accuracy
---
## πŸ› οΈ Advanced Usage
### Custom Training
```bash
# Train with custom parameters
python train_codellama_qlora_linux_bugfix.py \
--learning_rate 1e-4 \
--num_epochs 5 \
--batch_size 32 \
--lora_r 32 \
--lora_alpha 16
```
### Evaluation on Custom Data
```bash
# Evaluate on your own test set
python evaluate_linux_bugfix_model.py \
--test_file your_test_data.jsonl \
--output_dir custom_eval_results
```
---
## 🀝 Contributing
1. Fork this repo
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request πŸ™Œ
### Development Guidelines
- Follow PEP 8 style guidelines
- Add tests for new features
- Update documentation for API changes
- Ensure all tests pass before submitting PR
---
## πŸ“„ License
MIT License – see `LICENSE` file for details.
---
## πŸ™ Acknowledgments
* **Meta** for CodeLLaMA base model
* **Hugging Face** for Transformers + PEFT libraries
* **The Linux kernel community** for open access to commit data
* **Microsoft** for introducing LoRA technique
* **University of Washington** for QLoRA research
---
## πŸ“š References
* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
---
## πŸ“ž Support
For questions, issues, or contributions:
- Open an issue on GitHub
- Check the project documentation
- Review the evaluation results in `evaluate/output/`
---
## πŸ”„ Version History
- **v1.0.0**: Initial release with QLoRA training
- **v1.1.0**: Added parallel dataset extraction
- **v1.2.0**: Improved evaluation metrics and documentation
=======
---
license: mit
tags:
- codellama
- linux
- bugfix
- lora
- qlora
- git-diff
base_model: codellama/CodeLLaMA-7b-Instruct-hf
model_type: LlamaForCausalLM
library_name: peft
pipeline_tag: text-generation
---
# CodeLLaMA-Linux-BugFix
A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
---
## 🎯 Overview
This project targets automated Linux kernel bug fixing by:
- **Mining real commit data** from the kernel Git history
- **Training a specialized QLoRA model** on diff-style fixes
- **Generating Git patches** in response to bug-prone code
- **Evaluating results** using BLEU, ROUGE, and human inspection
The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
---
## πŸ“Š Performance Results
### Evaluation Metrics
βœ… **BLEU Score**: 33.87
βœ… **ROUGE Scores**:
- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
These results demonstrate the model's ability to:
- Generate syntactically correct Git diff patches
- Maintain semantic similarity to reference fixes
- Produce meaningful code changes that address the underlying bugs
---
## 🧠 Model Configuration
- **Base model**: `CodeLLaMA-7B-Instruct`
- **Fine-tuning method**: QLoRA with 4-bit quantization
- **Training setup**:
- LoRA r=64, alpha=16, dropout=0.1
- Batch size: 64, LR: 2e-4, Epochs: 3
- Mixed precision (bfloat16), gradient checkpointing
- **Hardware**: Optimized for NVIDIA H200 GPUs
---
## πŸ“Š Dataset
Custom dataset extracted from Linux kernel Git history.
### Filtering Criteria
Bug-fix commits containing:
`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
### Structure
- Language: C (`.c`, `.h`)
- Context: 10 lines before/after the change
- Format:
```json
{
"input": {
"original code": "C code snippet with bug",
"instruction": "Commit message or fix description"
},
"output": {
"diff codes": "Git diff showing the fix"
}
}
```
* **File**: `training_data_100k.jsonl` (100,000 samples)
---
## πŸš€ Quick Start
### Prerequisites
- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
- 50GB+ disk space
### Install dependencies
```bash
pip install -r requirements.txt
```
### 1. Build the Dataset
```bash
cd dataset_builder
python extract_linux_bugfixes_parallel.py
python format_for_training.py
```
### 2. Fine-tune the Model
```bash
cd train
python train_codellama_qlora_linux_bugfix.py
```
### 3. Run Evaluation
```bash
cd evaluate
python evaluate_linux_bugfix_model.py
```
### 4. Use the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
# Generate a bug fix
prompt = """
Given the following original C code:
if (!file->filter)
return;
Instruction: Fix the null pointer dereference
Return the diff that fixes it:
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.1)
fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(fix)
```
---
## πŸ“ Project Structure
```
CodeLLaMA-Linux-BugFix/
β”œβ”€β”€ dataset_builder/
β”‚ β”œβ”€β”€ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes
β”‚ β”œβ”€β”€ format_for_training.py # Format data for training
β”‚ └── build_dataset.py # Main dataset builder
β”œβ”€β”€ dataset/
β”‚ β”œβ”€β”€ training_data_100k.jsonl # 100K training samples
β”‚ └── training_data_prompt_completion.jsonl # Formatted training data
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ train_codellama_qlora_linux_bugfix.py # Main training script
β”‚ β”œβ”€β”€ train_codellama_qlora_simple.py # Simplified training
β”‚ β”œβ”€β”€ download_codellama_model.py # Model download utility
β”‚ └── output/
β”‚ └── qlora-codellama-bugfix/ # Trained model checkpoints
β”œβ”€β”€ evaluate/
β”‚ β”œβ”€β”€ evaluate_linux_bugfix_model.py # Evaluation script
β”‚ β”œβ”€β”€ test_samples.jsonl # Test dataset
β”‚ └── output/ # Evaluation results
β”‚ β”œβ”€β”€ eval_results.csv # Detailed results
β”‚ └── eval_results.json # JSON format results
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
└── PROJECT_STRUCTURE.md # Detailed project overview
```
---
## 🧩 Features
* πŸ”§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
* 🧠 **Real-world commits**: From actual Linux kernel development
* πŸ’‘ **Context-aware**: Code context extraction around bug lines
* πŸ’» **Output-ready**: Generates valid Git-style diffs
* πŸ“ˆ **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
* πŸš€ **Production-ready**: Optimized for real-world deployment
---
## πŸ“ˆ Evaluation Metrics
* **BLEU**: Translation-style match to reference diffs
* **ROUGE**: Overlap in fix content and semantic similarity
* **Human Evaluation**: Subjective patch quality assessment
### Current Performance
- **BLEU Score**: 33.87 (excellent for code generation tasks)
- **ROUGE-1 F1**: 0.4355 (good semantic overlap)
- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
- **ROUGE-L F1**: 0.3612 (good longest common subsequence)
---
## πŸ§ͺ Use Cases
* **Automated kernel bug fixing**: Generate fixes for common kernel bugs
* **Code review assistance**: Help reviewers identify potential issues
* **Teaching/debugging kernel code**: Educational tool for kernel development
* **Research in automated program repair (APR)**: Academic research applications
* **CI/CD integration**: Automated testing and fixing in development pipelines
---
## πŸ”¬ Technical Highlights
### Memory & Speed Optimizations
* 4-bit quantization (NF4)
* Gradient checkpointing
* Mixed precision (bfloat16)
* Gradient accumulation
* LoRA parameter efficiency
### Training Efficiency
* **QLoRA**: Reduces memory usage by ~75%
* **4-bit quantization**: Further memory optimization
* **Gradient checkpointing**: Trades compute for memory
* **Mixed precision**: Faster training with maintained accuracy
---
## πŸ› οΈ Advanced Usage
### Custom Training
```bash
# Train with custom parameters
python train_codellama_qlora_linux_bugfix.py \
--learning_rate 1e-4 \
--num_epochs 5 \
--batch_size 32 \
--lora_r 32 \
--lora_alpha 16
```
### Evaluation on Custom Data
```bash
# Evaluate on your own test set
python evaluate_linux_bugfix_model.py \
--test_file your_test_data.jsonl \
--output_dir custom_eval_results
```
---
## 🀝 Contributing
1. Fork this repo
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request πŸ™Œ
### Development Guidelines
- Follow PEP 8 style guidelines
- Add tests for new features
- Update documentation for API changes
- Ensure all tests pass before submitting PR
---
## πŸ“„ License
MIT License – see `LICENSE` file for details.
---
## πŸ™ Acknowledgments
* **Meta** for CodeLLaMA base model
* **Hugging Face** for Transformers + PEFT libraries
* **The Linux kernel community** for open access to commit data
* **Microsoft** for introducing LoRA technique
* **University of Washington** for QLoRA research
---
## πŸ“š References
* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
---
## πŸ“ž Support
For questions, issues, or contributions:
- Open an issue on GitHub
- Check the project documentation
- Review the evaluation results in `evaluate/output/`
---
## πŸ”„ Version History
- **v1.0.0**: Initial release with QLoRA training
- **v1.1.0**: Added parallel dataset extraction
- **v1.2.0**: Improved evaluation metrics and documentation