metadata

license: cc-by-4.0
language:
  - en
library_name: transformers
pipeline_tag: text-classification
tags:
  - code
metrics:
  - accuracy
  - f1

CodeBERT-SO

Repository for CodeBERT, fine-tuned on Stack Overflow snippets with respect to NL-PL pairs of 6 languages (Python, Java, JavaScript, PHP, Ruby, Go).

Training Objective

This model is initialized with CodeBERT-base and trained to classify whether a user will drop out given their posts and code snippets.

Training Regime

Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8. A random 20% sample of the entire dataset was used as the validation set.

Performance

Final validation accuracy: 0.822
Final validation F1: 0.809
Final validation loss: 0.5