|
--- |
|
library_name: transformers |
|
license: mit |
|
language: |
|
- en |
|
metrics: |
|
- seqeval |
|
base_model: |
|
- microsoft/codebert-base |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# CodeBERT base for detecting software engineering terminology |
|
|
|
This model detects software engineering terminology in developer forums (e.g., Stack Overflow) as 'Data_Structure', 'Application', 'Code_Block', 'Function", 'Data_Type', 'Language', 'Library', 'Variable', 'Device', 'User_Name', 'User_Interface_Element', 'Class', 'Website', 'Version', 'File_Name', 'File_Type', 'Operating_System', 'Output_Block', 'Algorithm' or 'HTML_XML_Tag'. |
|
|
|
- **Developed by:** Fabian C. Peña, Steffen Herbold |
|
- **Finetuned from:** [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) |
|
- **Replication kit:** [https://github.com/aieng-lab/senlp-benchmark](https://github.com/aieng-lab/senlp-benchmark) |
|
- **Language:** English |
|
- **License:** MIT |
|
|
|
## Citation |
|
|
|
``` |
|
@misc{pena2025benchmark, |
|
author = {Fabian Peña and Steffen Herbold}, |
|
title = {Evaluating Large Language Models on Non-Code Software Engineering Tasks}, |
|
year = {2025} |
|
} |
|
``` |
|
|