aieng-lab
/

codebert-base_se-entities

Token Classification

Model card Files Files and versions Community

codebert-base_se-entities / README.md

fabiancpl's picture

Pushing model, tokenizer and model card

a14b963 18 days ago

|

history blame contribute delete

1.12 kB

	---
	library_name: transformers
	license: mit
	language:
	- en
	metrics:
	- seqeval
	base_model:
	- microsoft/codebert-base
	pipeline_tag: token-classification
	---

	# CodeBERT base for detecting software engineering terminology

	This model detects software engineering terminology in developer forums (e.g., Stack Overflow) as 'Data_Structure', 'Application', 'Code_Block', 'Function", 'Data_Type', 'Language', 'Library', 'Variable', 'Device', 'User_Name', 'User_Interface_Element', 'Class', 'Website', 'Version', 'File_Name', 'File_Type', 'Operating_System', 'Output_Block', 'Algorithm' or 'HTML_XML_Tag'.

	- Developed by: Fabian C. Peña, Steffen Herbold
	- Finetuned from: [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base)
	- Replication kit: [https://github.com/aieng-lab/senlp-benchmark](https://github.com/aieng-lab/senlp-benchmark)
	- Language: English
	- License: MIT

	## Citation

	```
	@misc{pena2025benchmark,
	author = {Fabian Peña and Steffen Herbold},
	title = {Evaluating Large Language Models on Non-Code Software Engineering Tasks},
	year = {2025}
	}
	```