Maaac
/

CodeLLaMA-Linux-BugFix

@@ -1,327 +1,144 @@
----
-license: mit
-tags:
-  - codellama
-  - linux
-  - bugfix
-  - lora
-  - qlora
-  - git-diff
-base_model: codellama/CodeLLaMA-7b-Instruct-hf
-model_type: LlamaForCausalLM
-library_name: peft
-pipeline_tag: text-generation
----
-# CodeLLaMA-Linux-BugFix
-A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
----
-## 🎯 Overview
-This project targets automated Linux kernel bug fixing by:
-- **Mining real commit data** from the kernel Git history
-- **Training a specialized QLoRA model** on diff-style fixes
-- **Generating Git patches** in response to bug-prone code
-- **Evaluating results** using BLEU, ROUGE, and human inspection
-The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
----
-## 📊 Performance Results
-### Evaluation Metrics
-✅ **BLEU Score**: 33.87
-✅ **ROUGE Scores**:
-- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
-- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
-- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
-These results demonstrate the model's ability to:
-- Generate syntactically correct Git diff patches
-- Maintain semantic similarity to reference fixes
-- Produce meaningful code changes that address the underlying bugs
----
-## 🧠 Model Configuration
-- **Base model**: `CodeLLaMA-7B-Instruct`
-- **Fine-tuning method**: QLoRA with 4-bit quantization
-- **Training setup**:
-  - LoRA r=64, alpha=16, dropout=0.1
-  - Batch size: 64, LR: 2e-4, Epochs: 3
-  - Mixed precision (bfloat16), gradient checkpointing
-- **Hardware**: Optimized for NVIDIA H200 GPUs
----
-## 📊 Dataset
-Custom dataset extracted from Linux kernel Git history.
-### Filtering Criteria
-Bug-fix commits containing:
-`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
-### Structure
-- Language: C (`.c`, `.h`)
-- Context: 10 lines before/after the change
-- Format:
-```json
-{
-  "input": {
-    "original code": "C code snippet with bug",
-    "instruction": "Commit message or fix description"
-  },
-  "output": {
-    "diff codes": "Git diff showing the fix"
-  }
-}
-```
-* **File**: `training_data_100k.jsonl` (100,000 samples)
----
-## 🚀 Quick Start
-### Prerequisites
-- Python 3.8+
-- CUDA-compatible GPU (recommended)
-- 16GB+ RAM
-- 50GB+ disk space
-### Install dependencies
-```bash
-pip install -r requirements.txt
-```
-### 1. Build the Dataset
-```bash
-cd dataset_builder
-python extract_linux_bugfixes_parallel.py
-python format_for_training.py
-```
-### 2. Fine-tune the Model
-```bash
-cd train
-python train_codellama_qlora_linux_bugfix.py
-```
-### 3. Run Evaluation
-```bash
-cd evaluate
-python evaluate_linux_bugfix_model.py
-```
-### 4. Use the Model
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-from peft import PeftModel
-# Load the fine-tuned model
-model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
-model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
-tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
-# Generate a bug fix
-prompt = """
-Given the following original C code:
-```c
-if (!file->filter)
-    return;
-```
-Instruction: Fix the null pointer dereference
-Return the diff that fixes it:
-"""
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=512, temperature=0.1)
-fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(fix)
-```
----
-## 📁 Project Structure
-```
-CodeLLaMA-Linux-BugFix/
-├── dataset_builder/
-│   ├── extract_linux_bugfixes_parallel.py    # Parallel extraction of bug fixes
-│   ├── format_for_training.py                # Format data for training
-│   └── build_dataset.py                      # Main dataset builder
-├── dataset/
-│   ├── training_data_100k.jsonl              # 100K training samples
-│   └── training_data_prompt_completion.jsonl # Formatted training data
-├── train/
-│   ├── train_codellama_qlora_linux_bugfix.py # Main training script
-│   ├── train_codellama_qlora_simple.py       # Simplified training
-│   ├── download_codellama_model.py           # Model download utility
-│   └── output/
-│       └── qlora-codellama-bugfix/           # Trained model checkpoints
-├── evaluate/
-│   ├── evaluate_linux_bugfix_model.py        # Evaluation script
-│   ├── test_samples.jsonl                    # Test dataset
-│   └── output/                               # Evaluation results
-│       ├── eval_results.csv                  # Detailed results
-│       └── eval_results.json                 # JSON format results
-├── requirements.txt                          # Python dependencies
-├── README.md                                 # This file
-└── PROJECT_STRUCTURE.md                      # Detailed project overview
-```
----
-## 🧩 Features
-* 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
-* 🧠 **Real-world commits**: From actual Linux kernel development
-* 💡 **Context-aware**: Code context extraction around bug lines
-* 💻 **Output-ready**: Generates valid Git-style diffs
-* 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
-* 🚀 **Production-ready**: Optimized for real-world deployment
----
-## 📈 Evaluation Metrics
-* **BLEU**: Translation-style match to reference diffs
-* **ROUGE**: Overlap in fix content and semantic similarity
-* **Human Evaluation**: Subjective patch quality assessment
-### Current Performance
-- **BLEU Score**: 33.87 (excellent for code generation tasks)
-- **ROUGE-1 F1**: 0.4355 (good semantic overlap)
-- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
-- **ROUGE-L F1**: 0.3612 (good longest common subsequence)
----
-## 🧪 Use Cases
-* **Automated kernel bug fixing**: Generate fixes for common kernel bugs
-* **Code review assistance**: Help reviewers identify potential issues
-* **Teaching/debugging kernel code**: Educational tool for kernel development
-* **Research in automated program repair (APR)**: Academic research applications
-* **CI/CD integration**: Automated testing and fixing in development pipelines
----
-## 🔬 Technical Highlights
-### Memory & Speed Optimizations
-* 4-bit quantization (NF4)
-* Gradient checkpointing
-* Mixed precision (bfloat16)
-* Gradient accumulation
-* LoRA parameter efficiency
-### Training Efficiency
-* **QLoRA**: Reduces memory usage by ~75%
-* **4-bit quantization**: Further memory optimization
-* **Gradient checkpointing**: Trades compute for memory
-* **Mixed precision**: Faster training with maintained accuracy
----
-## 🛠️ Advanced Usage
-### Custom Training
-```bash
-# Train with custom parameters
-python train_codellama_qlora_linux_bugfix.py \
-    --learning_rate 1e-4 \
-    --num_epochs 5 \
-    --batch_size 32 \
-    --lora_r 32 \
-    --lora_alpha 16
-```
-### Evaluation on Custom Data
-```bash
-# Evaluate on your own test set
-python evaluate_linux_bugfix_model.py \
-    --test_file your_test_data.jsonl \
-    --output_dir custom_eval_results
-```
----
-## 🤝 Contributing
-1. Fork this repo
-2. Create a feature branch (`git checkout -b feature/amazing-feature`)
-3. Commit your changes (`git commit -m 'Add amazing feature'`)
-4. Push to the branch (`git push origin feature/amazing-feature`)
-5. Open a Pull Request 🙌
-### Development Guidelines
-- Follow PEP 8 style guidelines
-- Add tests for new features
-- Update documentation for API changes
-- Ensure all tests pass before submitting PR
----
-## 📄 License
-MIT License – see `LICENSE` file for details.
----
-## 🙏 Acknowledgments
-* **Meta** for CodeLLaMA base model
-* **Hugging Face** for Transformers + PEFT libraries
-* **The Linux kernel community** for open access to commit data
-* **Microsoft** for introducing LoRA technique
-* **University of Washington** for QLoRA research
----
-## 📚 References
-* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
-* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
-* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
-* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
----
-## 📞 Support
-For questions, issues, or contributions:
-- Open an issue on GitHub
-- Check the project documentation
-- Review the evaluation results in `evaluate/output/`
----
-## 🔄 Version History
-- **v1.0.0**: Initial release with QLoRA training
-- **v1.1.0**: Added parallel dataset extraction
-- **v1.2.0**: Improved evaluation metrics and documentation

+````markdown
+---
+license: mit
+tags:
+  - codellama
+  - linux
+  - bugfix
+  - lora
+  - qlora
+  - git-diff
+base_model: codellama/CodeLLaMA-7b-Instruct-hf
+model_type: LlamaForCausalLM
+library_name: peft
+pipeline_tag: text-generation
+---
+# CodeLLaMA-Linux-BugFix
+A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
+---
+## 🎯 Overview
+This project targets automated Linux kernel bug fixing by:
+- Mining real commit data from kernel Git history
+- Training a QLoRA model to generate Git-style fixes
+- Evaluating performance using BLEU and ROUGE
+- Supporting integration into code review pipelines
+---
+## 📊 Performance Results
+**BLEU Score**: 33.87
+**ROUGE Scores**:
+- ROUGE-1: P=0.3775, R=0.7306, F1=0.4355
+- ROUGE-2: P=0.2898, R=0.6096, F1=0.3457
+- ROUGE-L: P=0.3023, R=0.6333, F1=0.3612
+These results show that the model generates high-quality diffs with good semantic similarity to ground-truth patches.
+---
+## 🧠 Model Configuration
+- **Base model**: `CodeLLaMA-7B-Instruct`
+- **Fine-tuning**: QLoRA (LoRA r=64, α=16, dropout=0.1)
+- **Quantization**: 4-bit NF4
+- **Training**: 3 epochs, batch size 64, LR 2e-4
+- **Precision**: bfloat16 with gradient checkpointing
+- **Hardware**: 1× NVIDIA H200 (144 GB VRAM)
+---
+## 🗃️ Dataset
+- 100,000 samples from Linux kernel Git commits
+- Format: JSONL with `"prompt"` and `"completion"` fields
+- Content: C code segments + commit messages → Git diffs
+- Source: Bug-fix commits filtered by keywords like `fix`, `null`, `race`, `panic`
+---
+## 🚀 Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
+model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
+tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
+prompt = '''
+Given the following original C code:
+```c
+if (!file->filter)
+    return;
+````
+Instruction: Fix the null pointer dereference
+Return the diff that fixes it:
+'''
+inputs = tokenizer(prompt, return\_tensors="pt")
+outputs = model.generate(\*\*inputs, max\_length=512, temperature=0.1)
+fix = tokenizer.decode(outputs\[0], skip\_special\_tokens=True)
+print(fix)
+```
+---
+## 📁 Structure
+```
+CodeLLaMA-Linux-BugFix/
+├── dataset/                     # Raw and processed JSONL files
+├── dataset\_builder/            # Scripts for mining & formatting commits
+├── train/                      # Training scripts & checkpoints
+├── evaluate/                   # Evaluation scripts & results
+└── requirements.txt            # Dependencies
+```
+---
+## 📈 Metrics
+| Metric   | Score  |
+|----------|--------|
+| BLEU     | 33.87  |
+| ROUGE-1  | 0.4355 |
+| ROUGE-2  | 0.3457 |
+| ROUGE-L  | 0.3612 |
+---
+## 🔬 Use Cases
+- Kernel patch suggestion tools
+- Code review assistants
+- Bug localization + repair research
+- APR benchmarks for kernel code
+---
+## 📄 License
+MIT License
+---
+## 📚 References
+- [CodeLLaMA](https://arxiv.org/abs/2308.12950)
+- [QLoRA](https://arxiv.org/abs/2305.14314)
+- [LoRA](https://arxiv.org/abs/2106.09685)
+```