Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,8 @@ Repository for CodeBERT, fine-tuned on Stack Overflow snippets with respect to N
|
|
15 |
## Training Objective
|
16 |
This model is initialized with [CodeBERT-base](https://huggingface.co/microsoft/codebert-base) and trained to classify whether a user will drop out given their posts and code snippets.
|
17 |
## Training Regime
|
|
|
|
|
18 |
Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
|
19 |
A random 20% sample of the entire dataset was used as the validation set.
|
20 |
## Performance
|
|
|
15 |
## Training Objective
|
16 |
This model is initialized with [CodeBERT-base](https://huggingface.co/microsoft/codebert-base) and trained to classify whether a user will drop out given their posts and code snippets.
|
17 |
## Training Regime
|
18 |
+
Preprocessing methods for input texts include unicode normalisation (NFC form), removal of extraneous whitespaces, removal of punctuations (except within links), lowercasing and removal of stopwords.
|
19 |
+
Code snippets were also removed of their in-line comments or docstrings (cf. the main manuscript).
|
20 |
Training was done across 8 epochs with a batch size of 8, learning rate of 1e-5, epsilon (weight update denominator) of 1e-8.
|
21 |
A random 20% sample of the entire dataset was used as the validation set.
|
22 |
## Performance
|