π₯ BLAZE: Cross-Language and Cross-Project Bug Localization
BLAZE is a transformer-based bug localization model that works across languages and software projects. It enhances source-bug alignment using dynamic chunking and hard example learning, enabling precise bug localization in unseen codebases and programming languages.
β¨ Highlights
- π Cross-project & cross-language bug localization with no re-training
- π Dynamic Chunking handles long files within LLM context windows
- π§ Hard Example Learning improves generalization and ranking accuracy
- π Supports Java, Python, C++, JavaScript, and Go
- π Outperforms both cross-project and embedding-based baselines
π Dataset: BeetleBox
BeetleBox is the largest curated dataset for bug localization:
- 23,782 real-world bugs
- 29 repositories
- 5 programming languages
- Cleaned and de-duplicated to remove overlaps with training data
π₯ Available on Zenodo
π Also listed on Hugging Face Datasets: bug-localization/BeetleBox
π Demo & Usage
All code, usage instructions, model files, and scripts are available via:
π BLAZE Repository & Demo (Zenodo)
π Citation
Please cite the following paper if you use BLAZE or BeetleBox in your work:
@article{Chakraborty2025,
title = {BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning},
ISSN = {2326-3881},
url = {http://dx.doi.org/10.1109/TSE.2025.3579574},
DOI = {10.1109/TSE.2025.3579574},
journal = {IEEE Transactions on Software Engineering},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
author = {Chakraborty, Partha and Alfadel, Mahmoud and Nagappan, Meiyappan},
year = {2025},
pages = {1--14}
}
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for bug-localization/BLAZE
Base model
codesage/codesage-base