Commit
·
0b28f00
1
Parent(s):
9853dfd
update with new training params and perf
Browse files
README.md
CHANGED
@@ -51,15 +51,14 @@ The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
|
|
51 |
|
52 |
Training code is available at https://github.com/source-data/soda-roberta
|
53 |
|
54 |
-
- Command: `python -m tokcl.train NER --num_train_epochs=3.5`
|
55 |
- Tokenizer vocab size: 50265
|
56 |
- Training data: EMBO/sd-nlp NER
|
57 |
-
- Training with
|
58 |
-
- Evaluating on
|
59 |
- Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
|
60 |
-
- Epochs:
|
61 |
-
- `per_device_train_batch_size`:
|
62 |
-
- `per_device_eval_batch_size`:
|
63 |
- `learning_rate`: 0.0001
|
64 |
- `weight_decay`: 0.0
|
65 |
- `adam_beta1`: 0.9
|
@@ -69,20 +68,22 @@ Training code is available at https://github.com/source-data/soda-roberta
|
|
69 |
|
70 |
## Eval results
|
71 |
|
72 |
-
Testing on
|
73 |
|
74 |
```
|
75 |
precision recall f1-score support
|
76 |
|
77 |
-
CELL 0.
|
78 |
-
EXP_ASSAY 0.
|
79 |
-
GENEPROD 0.
|
80 |
-
ORGANISM 0.
|
81 |
-
SMALL_MOLECULE 0.
|
82 |
-
SUBCELLULAR 0.
|
83 |
-
TISSUE 0.
|
84 |
-
|
85 |
-
micro avg 0.
|
86 |
-
macro avg 0.
|
87 |
-
weighted avg 0.
|
|
|
|
|
88 |
```
|
|
|
51 |
|
52 |
Training code is available at https://github.com/source-data/soda-roberta
|
53 |
|
|
|
54 |
- Tokenizer vocab size: 50265
|
55 |
- Training data: EMBO/sd-nlp NER
|
56 |
+
- Training with 48771 examples.
|
57 |
+
- Evaluating on 13801 examples.
|
58 |
- Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
|
59 |
+
- Epochs: 0.6
|
60 |
+
- `per_device_train_batch_size`: 16
|
61 |
+
- `per_device_eval_batch_size`: 16
|
62 |
- `learning_rate`: 0.0001
|
63 |
- `weight_decay`: 0.0
|
64 |
- `adam_beta1`: 0.9
|
|
|
68 |
|
69 |
## Eval results
|
70 |
|
71 |
+
Testing on 7178 examples of test set with `sklearn.metrics`:
|
72 |
|
73 |
```
|
74 |
precision recall f1-score support
|
75 |
|
76 |
+
CELL 0.69 0.81 0.74 5245
|
77 |
+
EXP_ASSAY 0.56 0.57 0.56 10067
|
78 |
+
GENEPROD 0.77 0.89 0.82 23587
|
79 |
+
ORGANISM 0.72 0.82 0.77 3623
|
80 |
+
SMALL_MOLECULE 0.70 0.80 0.75 6187
|
81 |
+
SUBCELLULAR 0.65 0.72 0.69 3700
|
82 |
+
TISSUE 0.62 0.73 0.67 3207
|
83 |
+
|
84 |
+
micro avg 0.70 0.79 0.74 55616
|
85 |
+
macro avg 0.67 0.77 0.72 55616
|
86 |
+
weighted avg 0.70 0.79 0.74 55616
|
87 |
+
|
88 |
+
{'test_loss': 0.1830928772687912, 'test_accuracy_score': 0.9334821000160841, 'test_precision': 0.6987463009514112, 'test_recall': 0.789682825086306, 'test_f1': 0.7414366506288511, 'test_runtime': 61.0547, 'test_samples_per_second': 117.567, 'test_steps_per_second': 1.851}
|
89 |
```
|