modelling101 commited on
Commit
66c229a
·
verified ·
1 Parent(s): 015fdae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -15,6 +15,8 @@ Repository for CodeBERT, fine-tuned on Stack Overflow snippets with respect to N
15
  ## Training Objective
16
  This model is initialized with [CodeBERT-base](https://huggingface.co/microsoft/codebert-base) and trained to classify whether a user will drop out given their posts and code snippets.
17
  ## Training Regime
 
 
18
  Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
19
  A random 20% sample of the entire dataset was used as the validation set.
20
  ## Performance
 
15
  ## Training Objective
16
  This model is initialized with [CodeBERT-base](https://huggingface.co/microsoft/codebert-base) and trained to classify whether a user will drop out given their posts and code snippets.
17
  ## Training Regime
18
+ Preprocessing methods for input texts include unicode normalisation (NFC form), removal of extraneous whitespaces, removal of punctuations (except within links), lowercasing and removal of stopwords.
19
+ Code snippets were also removed of their in-line comments or docstrings (cf. the main manuscript).
20
  Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
21
  A random 20% sample of the entire dataset was used as the validation set.
22
  ## Performance