fabiancpl's picture
Pushing model, tokenizer and model card
a14b963
---
library_name: transformers
license: mit
language:
- en
metrics:
- seqeval
base_model:
- microsoft/codebert-base
pipeline_tag: token-classification
---
# CodeBERT base for detecting software engineering terminology
This model detects software engineering terminology in developer forums (e.g., Stack Overflow) as 'Data_Structure', 'Application', 'Code_Block', 'Function", 'Data_Type', 'Language', 'Library', 'Variable', 'Device', 'User_Name', 'User_Interface_Element', 'Class', 'Website', 'Version', 'File_Name', 'File_Type', 'Operating_System', 'Output_Block', 'Algorithm' or 'HTML_XML_Tag'.
- **Developed by:** Fabian C. Peña, Steffen Herbold
- **Finetuned from:** [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base)
- **Replication kit:** [https://github.com/aieng-lab/senlp-benchmark](https://github.com/aieng-lab/senlp-benchmark)
- **Language:** English
- **License:** MIT
## Citation
```
@misc{pena2025benchmark,
author = {Fabian Peña and Steffen Herbold},
title = {Evaluating Large Language Models on Non-Code Software Engineering Tasks},
year = {2025}
}
```