BLAZE / README.md
Partha117's picture
Update README.md
1b14d31 verified
metadata
license: mit
datasets:
  - bug-localization/BeetleBox
  - princeton-nlp/SWE-bench
language:
  - en
base_model:
  - codesage/codesage-base
tags:
  - bug
  - localization
  - embedding
  - multi-language

πŸ”₯ BLAZE: Cross-Language and Cross-Project Bug Localization

BLAZE is a transformer-based bug localization model that works across languages and software projects. It enhances source-bug alignment using dynamic chunking and hard example learning, enabling precise bug localization in unseen codebases and programming languages.

Paper Dataset


✨ Highlights

  • πŸ“Œ Cross-project & cross-language bug localization with no re-training
  • πŸ“ Dynamic Chunking handles long files within LLM context windows
  • 🧠 Hard Example Learning improves generalization and ranking accuracy
  • 🌍 Supports Java, Python, C++, JavaScript, and Go
  • πŸ“Š Outperforms both cross-project and embedding-based baselines

πŸ“‚ Dataset: BeetleBox

BeetleBox is the largest curated dataset for bug localization:

  • 23,782 real-world bugs
  • 29 repositories
  • 5 programming languages
  • Cleaned and de-duplicated to remove overlaps with training data

πŸ“₯ Available on Zenodo πŸ“š Also listed on Hugging Face Datasets: bug-localization/BeetleBox


πŸš€ Demo & Usage

All code, usage instructions, model files, and scripts are available via:

πŸ‘‰ BLAZE Repository & Demo (Zenodo)


πŸ“ Citation

Please cite the following paper if you use BLAZE or BeetleBox in your work:

@article{Chakraborty2025,
  title = {BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning},
  ISSN = {2326-3881},
  url = {http://dx.doi.org/10.1109/TSE.2025.3579574},
  DOI = {10.1109/TSE.2025.3579574},
  journal = {IEEE Transactions on Software Engineering},
  publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
  author = {Chakraborty, Partha and Alfadel, Mahmoud and Nagappan, Meiyappan},
  year = {2025},
  pages = {1--14}
}