|
--- |
|
language: code |
|
tags: |
|
- code |
|
- translation |
|
- codet5 |
|
- vbnet |
|
- csharp |
|
- programming |
|
- source-code |
|
datasets: |
|
- custom |
|
license: mit |
|
library_name: transformers |
|
pipeline_tag: translation |
|
model_type: codet5 |
|
--- |
|
# π CodeT5 VB.NET β C# Translator |
|
|
|
This is a fine-tuned version of [Salesforce/CodeT5-base](https://huggingface.co/Salesforce/codet5-base) for translating VB.NET to C#. |
|
|
|
--- |
|
|
|
# π Evaluation Metrics |
|
|
|
**BLEU Score:** 0.4506 |
|
- 1-gram: 0.6698 |
|
- 2-gram: 0.5402 |
|
- 3-gram: 0.4656 |
|
- 4-gram: 0.4132 |
|
- Brevity penalty: 0.8773 |
|
- Length ratio: 0.8843 |
|
|
|
**ROUGE Scores:** |
|
- ROUGE-1: 0.5836 |
|
- ROUGE-2: 0.4586 |
|
- ROUGE-L: 0.5378 |
|
- ROUGE-Lsum: 0.5781 |
|
|
|
--- |
|
|
|
# π§ Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("{repo_id}") |
|
tokenizer = AutoTokenizer.from_pretrained("{repo_id}") |
|
|
|
vb_code = "Dim x As Integer = 5" |
|
inputs = tokenizer(f"translate VB.NET to C#: {vb_code}", return_tensors="pt") |
|
outputs = model.generate(**inputs) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
# π Dataset Format |
|
|
|
Training data was in JSONL with fields: |
|
- `"vb_code"`: VB.NET input |
|
- `"csharp_code"`: corresponding C# output |
|
|
|
# π License |
|
|
|
MIT |
|
|