modelling101 commited on
Commit
4bc8d62
·
verified ·
1 Parent(s): 66c229a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -17,7 +17,8 @@ This model is initialized with [CodeBERT-base](https://huggingface.co/microsoft/
17
  ## Training Regime
18
  Preprocessing methods for input texts include unicode normalisation (NFC form), removal of extraneous whitespaces, removal of punctuations (except within links), lowercasing and removal of stopwords.
19
  Code snippets were also removed of their in-line comments or docstrings (cf. the main manuscript).
20
- Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
 
21
  A random 20% sample of the entire dataset was used as the validation set.
22
  ## Performance
23
  * Final validation accuracy: 0.822
 
17
  ## Training Regime
18
  Preprocessing methods for input texts include unicode normalisation (NFC form), removal of extraneous whitespaces, removal of punctuations (except within links), lowercasing and removal of stopwords.
19
  Code snippets were also removed of their in-line comments or docstrings (cf. the main manuscript).
20
+
21
+ RoBERTa tokenizer was used, as the built-in tokenizer for the original [CodeBERT implementation](https://arxiv.org/abs/2002.08155). Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
22
  A random 20% sample of the entire dataset was used as the validation set.
23
  ## Performance
24
  * Final validation accuracy: 0.822