Maaac
/

CodeLLaMA-Linux-BugFix

@@ -1,327 +1,325 @@
----
-license: mit
-tags:
-  - codellama
-  - linux
-  - bugfix
-  - lora
-  - qlora
-  - git-diff
-base_model: codellama/CodeLLaMA-7b-Instruct-hf
-model_type: LlamaForCausalLM
-library_name: peft
-pipeline_tag: text-generation
----
-# CodeLLaMA-Linux-BugFix
-A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
----
-## 🎯 Overview
-This project targets automated Linux kernel bug fixing by:
-- **Mining real commit data** from the kernel Git history
-- **Training a specialized QLoRA model** on diff-style fixes
-- **Generating Git patches** in response to bug-prone code
-- **Evaluating results** using BLEU, ROUGE, and human inspection
-The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
----
-## 📊 Performance Results
-### Evaluation Metrics
-✅ **BLEU Score**: 33.87
-✅ **ROUGE Scores**:
-- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
-- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
-- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
-These results demonstrate the model's ability to:
-- Generate syntactically correct Git diff patches
-- Maintain semantic similarity to reference fixes
-- Produce meaningful code changes that address the underlying bugs
----
-## 🧠 Model Configuration
-- **Base model**: `CodeLLaMA-7B-Instruct`
-- **Fine-tuning method**: QLoRA with 4-bit quantization
-- **Training setup**:
-  - LoRA r=64, alpha=16, dropout=0.1
-  - Batch size: 64, LR: 2e-4, Epochs: 3
-  - Mixed precision (bfloat16), gradient checkpointing
-- **Hardware**: Optimized for NVIDIA H200 GPUs
----
-## 📊 Dataset
-Custom dataset extracted from Linux kernel Git history.
-### Filtering Criteria
-Bug-fix commits containing:
-`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
-### Structure
-- Language: C (`.c`, `.h`)
-- Context: 10 lines before/after the change
-- Format:
-```json
-{
-  "input": {
-    "original code": "C code snippet with bug",
-    "instruction": "Commit message or fix description"
-  },
-  "output": {
-    "diff codes": "Git diff showing the fix"
   }
-}
-```
-* **File**: `training_data_100k.jsonl` (100,000 samples)
----
-## 🚀 Quick Start
-### Prerequisites
-- Python 3.8+
-- CUDA-compatible GPU (recommended)
-- 16GB+ RAM
-- 50GB+ disk space
-### Install dependencies
-```bash
-pip install -r requirements.txt
-```
-### 1. Build the Dataset
-```bash
-cd dataset_builder
-python extract_linux_bugfixes_parallel.py
-python format_for_training.py
-```
-### 2. Fine-tune the Model
-```bash
-cd train
-python train_codellama_qlora_linux_bugfix.py
-```
-### 3. Run Evaluation
-```bash
-cd evaluate
-python evaluate_linux_bugfix_model.py
-```
-### 4. Use the Model
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-from peft import PeftModel
-# Load the fine-tuned model
-model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
-model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
-tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
-# Generate a bug fix
-prompt = """
-Given the following original C code:
-```c
-if (!file->filter)
-    return;
-```
-Instruction: Fix the null pointer dereference
-Return the diff that fixes it:
-"""
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=512, temperature=0.1)
-fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(fix)
-```
----
-## 📁 Project Structure
-```
-CodeLLaMA-Linux-BugFix/
-├── dataset_builder/
-│   ├── extract_linux_bugfixes_parallel.py    # Parallel extraction of bug fixes
-│   ├── format_for_training.py                # Format data for training
-│   └── build_dataset.py                      # Main dataset builder
-├── dataset/
-│   ├── training_data_100k.jsonl              # 100K training samples
-│   └── training_data_prompt_completion.jsonl # Formatted training data
-├── train/
-│   ├── train_codellama_qlora_linux_bugfix.py # Main training script
-│   ├── train_codellama_qlora_simple.py       # Simplified training
-│   ├── download_codellama_model.py           # Model download utility
-│   └── output/
-│       └── qlora-codellama-bugfix/           # Trained model checkpoints
-├── evaluate/
-│   ├── evaluate_linux_bugfix_model.py        # Evaluation script
-│   ├── test_samples.jsonl                    # Test dataset
-│   └── output/                               # Evaluation results
-│       ├── eval_results.csv                  # Detailed results
-│       └── eval_results.json                 # JSON format results
-├── requirements.txt                          # Python dependencies
-├── README.md                                 # This file
-└── PROJECT_STRUCTURE.md                      # Detailed project overview
-```
----
-## 🧩 Features
-* 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
-* 🧠 **Real-world commits**: From actual Linux kernel development
-* 💡 **Context-aware**: Code context extraction around bug lines
-* 💻 **Output-ready**: Generates valid Git-style diffs
-* 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
-* 🚀 **Production-ready**: Optimized for real-world deployment
----
-## 📈 Evaluation Metrics
-* **BLEU**: Translation-style match to reference diffs
-* **ROUGE**: Overlap in fix content and semantic similarity
-* **Human Evaluation**: Subjective patch quality assessment
-### Current Performance
-- **BLEU Score**: 33.87 (excellent for code generation tasks)
-- **ROUGE-1 F1**: 0.4355 (good semantic overlap)
-- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
-- **ROUGE-L F1**: 0.3612 (good longest common subsequence)
----
-## 🧪 Use Cases
-* **Automated kernel bug fixing**: Generate fixes for common kernel bugs
-* **Code review assistance**: Help reviewers identify potential issues
-* **Teaching/debugging kernel code**: Educational tool for kernel development
-* **Research in automated program repair (APR)**: Academic research applications
-* **CI/CD integration**: Automated testing and fixing in development pipelines
----
-## 🔬 Technical Highlights
-### Memory & Speed Optimizations
-* 4-bit quantization (NF4)
-* Gradient checkpointing
-* Mixed precision (bfloat16)
-* Gradient accumulation
-* LoRA parameter efficiency
-### Training Efficiency
-* **QLoRA**: Reduces memory usage by ~75%
-* **4-bit quantization**: Further memory optimization
-* **Gradient checkpointing**: Trades compute for memory
-* **Mixed precision**: Faster training with maintained accuracy
----
-## 🛠️ Advanced Usage
-### Custom Training
-```bash
-# Train with custom parameters
-python train_codellama_qlora_linux_bugfix.py \
-    --learning_rate 1e-4 \
-    --num_epochs 5 \
-    --batch_size 32 \
-    --lora_r 32 \
-    --lora_alpha 16
-```
-### Evaluation on Custom Data
-```bash
-# Evaluate on your own test set
-python evaluate_linux_bugfix_model.py \
-    --test_file your_test_data.jsonl \
-    --output_dir custom_eval_results
-```
----
-## 🤝 Contributing
-1. Fork this repo
-2. Create a feature branch (`git checkout -b feature/amazing-feature`)
-3. Commit your changes (`git commit -m 'Add amazing feature'`)
-4. Push to the branch (`git push origin feature/amazing-feature`)
-5. Open a Pull Request 🙌
-### Development Guidelines
-- Follow PEP 8 style guidelines
-- Add tests for new features
-- Update documentation for API changes
-- Ensure all tests pass before submitting PR
----
-## 📄 License
-MIT License – see `LICENSE` file for details.
----
-## 🙏 Acknowledgments
-* **Meta** for CodeLLaMA base model
-* **Hugging Face** for Transformers + PEFT libraries
-* **The Linux kernel community** for open access to commit data
-* **Microsoft** for introducing LoRA technique
-* **University of Washington** for QLoRA research
----
-## 📚 References
-* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
-* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
-* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
-* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
----
-## 📞 Support
-For questions, issues, or contributions:
-- Open an issue on GitHub
-- Check the project documentation
-- Review the evaluation results in `evaluate/output/`
----
-## 🔄 Version History
-- **v1.0.0**: Initial release with QLoRA training
-- **v1.1.0**: Added parallel dataset extraction
-- **v1.2.0**: Improved evaluation metrics and documentation

+  ---
+  license: mit
+  tags:
+    - codellama
+    - linux
+    - bugfix
+    - lora
+    - qlora
+    - git-diff
+  base_model: codellama/CodeLLaMA-7b-Instruct-hf
+  model_type: LlamaForCausalLM
+  library_name: peft
+  pipeline_tag: text-generation
+  ---
+  # CodeLLaMA-Linux-BugFix
+  A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
+  ---
+  ## 🎯 Overview
+  This project targets automated Linux kernel bug fixing by:
+  - **Mining real commit data** from the kernel Git history
+  - **Training a specialized QLoRA model** on diff-style fixes
+  - **Generating Git patches** in response to bug-prone code
+  - **Evaluating results** using BLEU, ROUGE, and human inspection
+  The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
+  ---
+  ## 📊 Performance Results
+  ### Evaluation Metrics
+  ✅ **BLEU Score**: 33.87
+  ✅ **ROUGE Scores**:
+  - **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
+  - **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
+  - **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
+  These results demonstrate the model's ability to:
+  - Generate syntactically correct Git diff patches
+  - Maintain semantic similarity to reference fixes
+  - Produce meaningful code changes that address the underlying bugs
+  ---
+  ## 🧠 Model Configuration
+  - **Base model**: `CodeLLaMA-7B-Instruct`
+  - **Fine-tuning method**: QLoRA with 4-bit quantization
+  - **Training setup**:
+    - LoRA r=64, alpha=16, dropout=0.1
+    - Batch size: 64, LR: 2e-4, Epochs: 3
+    - Mixed precision (bfloat16), gradient checkpointing
+  - **Hardware**: Optimized for NVIDIA H200 GPUs
+  ---
+  ## 📊 Dataset
+  Custom dataset extracted from Linux kernel Git history.
+  ### Filtering Criteria
+  Bug-fix commits containing:
+  `fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
+  ### Structure
+  - Language: C (`.c`, `.h`)
+  - Context: 10 lines before/after the change
+  - Format:
+  ```json
+  {
+    "input": {
+      "original code": "C code snippet with bug",
+      "instruction": "Commit message or fix description"
+    },
+    "output": {
+      "diff codes": "Git diff showing the fix"
+    }
   }
+  ```
+  * **File**: `training_data_100k.jsonl` (100,000 samples)
+  ---
+  ## 🚀 Quick Start
+  ### Prerequisites
+  - Python 3.8+
+  - CUDA-compatible GPU (recommended)
+  - 16GB+ RAM
+  - 50GB+ disk space
+  ### Install dependencies
+  ```bash
+  pip install -r requirements.txt
+  ```
+  ### 1. Build the Dataset
+  ```bash
+  cd dataset_builder
+  python extract_linux_bugfixes_parallel.py
+  python format_for_training.py
+  ```
+  ### 2. Fine-tune the Model
+  ```bash
+  cd train
+  python train_codellama_qlora_linux_bugfix.py
+  ```
+  ### 3. Run Evaluation
+  ```bash
+  cd evaluate
+  python evaluate_linux_bugfix_model.py
+  ```
+  ### 4. Use the Model
+  ```python
+  from transformers import AutoTokenizer, AutoModelForCausalLM
+  from peft import PeftModel
+  # Load the fine-tuned model
+  model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
+  model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
+  tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
+  # Generate a bug fix
+  prompt = """
+  Given the following original C code:
+  if (!file->filter)
+      return;
+  Instruction: Fix the null pointer dereference
+  Return the diff that fixes it:
+  """
+  inputs = tokenizer(prompt, return_tensors="pt")
+  outputs = model.generate(**inputs, max_length=512, temperature=0.1)
+  fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
+  print(fix)
+  ```
+  ---
+  ## 📁 Project Structure
+  ```
+  CodeLLaMA-Linux-BugFix/
+  ├── dataset_builder/
+  │   ├── extract_linux_bugfixes_parallel.py    # Parallel extraction of bug fixes
+  │   ├── format_for_training.py                # Format data for training
+  │   └── build_dataset.py                      # Main dataset builder
+  ├── dataset/
+  │   ├── training_data_100k.jsonl              # 100K training samples
+  │   └── training_data_prompt_completion.jsonl # Formatted training data
+  ├── train/
+  │   ├── train_codellama_qlora_linux_bugfix.py # Main training script
+  │   ├── train_codellama_qlora_simple.py       # Simplified training
+  │   ├── download_codellama_model.py           # Model download utility
+  │   └── output/
+  │       └── qlora-codellama-bugfix/           # Trained model checkpoints
+  ├── evaluate/
+  │   ├── evaluate_linux_bugfix_model.py        # Evaluation script
+  │   ├── test_samples.jsonl                    # Test dataset
+  │   └── output/                               # Evaluation results
+  │       ├── eval_results.csv                  # Detailed results
+  │       └── eval_results.json                 # JSON format results
+  ├── requirements.txt                          # Python dependencies
+  ├── README.md                                 # This file
+  └── PROJECT_STRUCTURE.md                      # Detailed project overview
+  ```
+  ---
+  ## 🧩 Features
+  * 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
+  * 🧠 **Real-world commits**: From actual Linux kernel development
+  * 💡 **Context-aware**: Code context extraction around bug lines
+  * 💻 **Output-ready**: Generates valid Git-style diffs
+  * 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
+  * 🚀 **Production-ready**: Optimized for real-world deployment
+  ---
+  ## 📈 Evaluation Metrics
+  * **BLEU**: Translation-style match to reference diffs
+  * **ROUGE**: Overlap in fix content and semantic similarity
+  * **Human Evaluation**: Subjective patch quality assessment
+  ### Current Performance
+  - **BLEU Score**: 33.87 (excellent for code generation tasks)
+  - **ROUGE-1 F1**: 0.4355 (good semantic overlap)
+  - **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
+  - **ROUGE-L F1**: 0.3612 (good longest common subsequence)
+  ---
+  ## 🧪 Use Cases
+  * **Automated kernel bug fixing**: Generate fixes for common kernel bugs
+  * **Code review assistance**: Help reviewers identify potential issues
+  * **Teaching/debugging kernel code**: Educational tool for kernel development
+  * **Research in automated program repair (APR)**: Academic research applications
+  * **CI/CD integration**: Automated testing and fixing in development pipelines
+  ---
+  ## 🔬 Technical Highlights
+  ### Memory & Speed Optimizations
+  * 4-bit quantization (NF4)
+  * Gradient checkpointing
+  * Mixed precision (bfloat16)
+  * Gradient accumulation
+  * LoRA parameter efficiency
+  ### Training Efficiency
+  * **QLoRA**: Reduces memory usage by ~75%
+  * **4-bit quantization**: Further memory optimization
+  * **Gradient checkpointing**: Trades compute for memory
+  * **Mixed precision**: Faster training with maintained accuracy
+  ---
+  ## 🛠️ Advanced Usage
+  ### Custom Training
+  ```bash
+  # Train with custom parameters
+  python train_codellama_qlora_linux_bugfix.py \
+      --learning_rate 1e-4 \
+      --num_epochs 5 \
+      --batch_size 32 \
+      --lora_r 32 \
+      --lora_alpha 16
+  ```
+  ### Evaluation on Custom Data
+  ```bash
+  # Evaluate on your own test set
+  python evaluate_linux_bugfix_model.py \
+      --test_file your_test_data.jsonl \
+      --output_dir custom_eval_results
+  ```
+  ---
+  ## 🤝 Contributing
+  1. Fork this repo
+  2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+  3. Commit your changes (`git commit -m 'Add amazing feature'`)
+  4. Push to the branch (`git push origin feature/amazing-feature`)
+  5. Open a Pull Request 🙌
+  ### Development Guidelines
+  - Follow PEP 8 style guidelines
+  - Add tests for new features
+  - Update documentation for API changes
+  - Ensure all tests pass before submitting PR
+  ---
+  ## 📄 License
+  MIT License – see `LICENSE` file for details.
+  ---
+  ## 🙏 Acknowledgments
+  * **Meta** for CodeLLaMA base model
+  * **Hugging Face** for Transformers + PEFT libraries
+  * **The Linux kernel community** for open access to commit data
+  * **Microsoft** for introducing LoRA technique
+  * **University of Washington** for QLoRA research
+  ---
+  ## 📚 References
+  * [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
+  * [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
+  * [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
+  * [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
+  ---
+  ## 📞 Support
+  For questions, issues, or contributions:
+  - Open an issue on GitHub
+  - Check the project documentation
+  - Review the evaluation results in `evaluate/output/`
+  ---
+  ## 🔄 Version History
+  - **v1.0.0**: Initial release with QLoRA training
+  - **v1.1.0**: Added parallel dataset extraction
+  - **v1.2.0**: Improved evaluation metrics and documentation