πŸ”₯ BLAZE: Cross-Language and Cross-Project Bug Localization

BLAZE is a transformer-based bug localization model that works across languages and software projects. It enhances source-bug alignment using dynamic chunking and hard example learning, enabling precise bug localization in unseen codebases and programming languages.

Paper Dataset


✨ Highlights

  • πŸ“Œ Cross-project & cross-language bug localization with no re-training
  • πŸ“ Dynamic Chunking handles long files within LLM context windows
  • 🧠 Hard Example Learning improves generalization and ranking accuracy
  • 🌍 Supports Java, Python, C++, JavaScript, and Go
  • πŸ“Š Outperforms both cross-project and embedding-based baselines

πŸ“‚ Dataset: BeetleBox

BeetleBox is the largest curated dataset for bug localization:

  • 23,782 real-world bugs
  • 29 repositories
  • 5 programming languages
  • Cleaned and de-duplicated to remove overlaps with training data

πŸ“₯ Available on Zenodo πŸ“š Also listed on Hugging Face Datasets: bug-localization/BeetleBox


πŸš€ Demo & Usage

All code, usage instructions, model files, and scripts are available via:

πŸ‘‰ BLAZE Repository & Demo (Zenodo)


πŸ“ Citation

Please cite the following paper if you use BLAZE or BeetleBox in your work:

@article{Chakraborty2025,
  title = {BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning},
  ISSN = {2326-3881},
  url = {http://dx.doi.org/10.1109/TSE.2025.3579574},
  DOI = {10.1109/TSE.2025.3579574},
  journal = {IEEE Transactions on Software Engineering},
  publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
  author = {Chakraborty, Partha and Alfadel, Mahmoud and Nagappan, Meiyappan},
  year = {2025},
  pages = {1--14}
}
Downloads last month
25
Safetensors
Model size
128M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bug-localization/BLAZE

Finetuned
(1)
this model

Datasets used to train bug-localization/BLAZE