mmokoatle commited on
Commit
4f36d2b
·
verified ·
1 Parent(s): c1d7e42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -45,47 +45,47 @@ The retrieved embeddings can be utilized as input for a machine learning classif
45
 
46
  Find out more about the datasets and access in the paper **(TBA)**
47
 
48
- **Table:** Accuracy scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method. Best results per column are in **bold**, while scores for the Proposed model are _underlined_.
49
 
50
  | Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
51
  |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
52
  | LR | Proposed | _0.65 ± 0.01_ | _0.67 ± 0.0_ | _0.85 ± 0.01_ | _0.64 ± 0.01_ | _0.80 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.70 ± 0.01_ |
53
  | | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.0 | 0.84 ± 0.04 | 0.69 ± 0.01 | 0.85 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.60 ± 0.01 |
54
  | | NT | **0.66 ± 0.0** | **0.67 ± 0.0** | 0.84 ± 0.01 | **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
55
-
56
  | LGBM | Proposed | _0.64 ± 0.01_ | _0.66 ± 0.0_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.78 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.81 ± 0.01_ |
57
  | | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.01 | 0.90 ± 0.02 | 0.65 ± 0.01 | 0.83 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.75 ± 0.01 |
58
  | | NT | 0.63 ± 0.01 | 0.66 ± 0.0 | **0.91 ± 0.02**| 0.72 ± 0.0 | **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| 0.97 ± 0.0 |
59
-
60
  | XGB | Proposed | _0.60 ± 0.01_ | _0.62 ± 0.0_ | _0.90 ± 0.02_ | _0.60 ± 0.0_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.85 ± 0.01_ |
61
  | | DNABERT | 0.59 ± 0.01 | 0.62 ± 0.01 | 0.90 ± 0.01 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.79 ± 0.01 |
62
  | | NT | 0.61 ± 0.01 | 0.64 ± 0.0 | 0.90 ± 0.02 | **0.89 ± 0.03**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| 0.98 ± 0.0 |
63
-
64
  | RF | Proposed | _0.61 ± 0.0_ | _0.66 ± 0.01_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.86 ± 0.0_ |
65
  | | DNABERT | 0.60 ± 0.0 | 0.66 ± 0.01 | 0.90 ± 0.02 | 0.63 ± 0.01 | 0.82 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.81 ± 0.01 |
66
  | | NT | 0.62 ± 0.01 | **0.67 ± 0.01**| 0.90 ± 0.01 | 0.71 ± 0.01 | **0.85 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| 0.97 ± 0.0 |
67
 
68
- **Table:** F1-scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method. Best results per column are in **bold**, while scores for the Proposed model are _underlined_.
 
69
 
70
  | Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
71
  |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
72
  | LR | Proposed | **_0.78 ± 0.0_** | **_0.80 ± 0.01_** | _0.20 ± 0.05_ | _0.64 ± 0.01_ | _0.79 ± 0.0_ | _0.13 ± 0.37_ | _0.16 ± 0.0_ | _0.70 ± 0.01_ |
73
  | | DNABERT | 0.75 ± 0.01 | 0.78 ± 0.0 | 0.47 ± 0.09 | 0.69 ± 0.01 | 0.84 ± 0.01 | 0.13 ± 0.37 | 0.16 ± 0.0 | 0.59 ± 0.01 |
74
  | | NT | 0.56 ± 0.01 | 0.54 ± 0.0 | **0.78 ± 0.01**| **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
75
-
76
  | LGBM | Proposed | _0.76 ± 0.01_ | _0.79 ± 0.0_ | _0.60 ± 0.11_ | _0.63 ± 0.01_ | _0.77 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.82 ± 0.0_ |
77
  | | DNABERT | 0.74 ± 0.0 | 0.78 ± 0.0 | 0.60 ± 0.08 | 0.66 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.75 ± 0.01 |
78
  | | NT | 0.59 ± 0.01 | 0.56 ± 0.0 | **0.89 ± 0.02**| **0.72 ± 0.01**| **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| **0.97 ± 0.0** |
79
-
80
  | XGB | Proposed | _0.72 ± 0.01_ | _0.75 ± 0.0_ | _0.59 ± 0.08_ | _0.60 ± 0.0_ | _0.76 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.85 ± 0.01_ |
81
  | | DNABERT | 0.71 ± 0.01 | 0.75 ± 0.01 | 0.58 ± 0.05 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.79 ± 0.01 |
82
  | | NT | 0.59 ± 0.01 | 0.57 ± 0.01 | 0.72 ± 0.01 | **0.85 ± 0.01**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| **0.9893 ± 0.0** |
83
-
84
  | RF | Proposed | _0.73 ± 0.0_ | _0.79 ± 0.0_ | _0.58 ± 0.08_ | _0.61 ± 0.01_ | _0.75 ± 0.0_ | _0.53 ± 0.17_ | _0.24 ± 0.05_ | _0.86 ± 0.0_ |
85
  | | DNABERT | 0.72 ± 0.0 | 0.79 ± 0.0 | 0.59 ± 0.09 | 0.63 ± 0.01 | 0.80 ± 0.01 | 0.53 ± 0.17 | 0.24 ± 0.05 | 0.82 ± 0.01 |
86
  | | NT | 0.59 ± 0.01 | 0.56 ± 0.01 | **0.89 ± 0.02**| **0.71 ± 0.01**| **0.84 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| **0.97 ± 0.0** |
87
 
88
-
89
  ## Authors
90
  -----------
91
 
 
45
 
46
  Find out more about the datasets and access in the paper **(TBA)**
47
 
48
+ **Table:** Accuracy scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method.
49
 
50
  | Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
51
  |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
52
  | LR | Proposed | _0.65 ± 0.01_ | _0.67 ± 0.0_ | _0.85 ± 0.01_ | _0.64 ± 0.01_ | _0.80 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.70 ± 0.01_ |
53
  | | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.0 | 0.84 ± 0.04 | 0.69 ± 0.01 | 0.85 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.60 ± 0.01 |
54
  | | NT | **0.66 ± 0.0** | **0.67 ± 0.0** | 0.84 ± 0.01 | **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
55
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
56
  | LGBM | Proposed | _0.64 ± 0.01_ | _0.66 ± 0.0_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.78 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.81 ± 0.01_ |
57
  | | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.01 | 0.90 ± 0.02 | 0.65 ± 0.01 | 0.83 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.75 ± 0.01 |
58
  | | NT | 0.63 ± 0.01 | 0.66 ± 0.0 | **0.91 ± 0.02**| 0.72 ± 0.0 | **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| 0.97 ± 0.0 |
59
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
60
  | XGB | Proposed | _0.60 ± 0.01_ | _0.62 ± 0.0_ | _0.90 ± 0.02_ | _0.60 ± 0.0_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.85 ± 0.01_ |
61
  | | DNABERT | 0.59 ± 0.01 | 0.62 ± 0.01 | 0.90 ± 0.01 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.79 ± 0.01 |
62
  | | NT | 0.61 ± 0.01 | 0.64 ± 0.0 | 0.90 ± 0.02 | **0.89 ± 0.03**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| 0.98 ± 0.0 |
63
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
64
  | RF | Proposed | _0.61 ± 0.0_ | _0.66 ± 0.01_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.86 ± 0.0_ |
65
  | | DNABERT | 0.60 ± 0.0 | 0.66 ± 0.01 | 0.90 ± 0.02 | 0.63 ± 0.01 | 0.82 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.81 ± 0.01 |
66
  | | NT | 0.62 ± 0.01 | **0.67 ± 0.01**| 0.90 ± 0.01 | 0.71 ± 0.01 | **0.85 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| 0.97 ± 0.0 |
67
 
68
+
69
+ **Table:** F1-scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method.
70
 
71
  | Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
72
  |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
73
  | LR | Proposed | **_0.78 ± 0.0_** | **_0.80 ± 0.01_** | _0.20 ± 0.05_ | _0.64 ± 0.01_ | _0.79 ± 0.0_ | _0.13 ± 0.37_ | _0.16 ± 0.0_ | _0.70 ± 0.01_ |
74
  | | DNABERT | 0.75 ± 0.01 | 0.78 ± 0.0 | 0.47 ± 0.09 | 0.69 ± 0.01 | 0.84 ± 0.01 | 0.13 ± 0.37 | 0.16 ± 0.0 | 0.59 ± 0.01 |
75
  | | NT | 0.56 ± 0.01 | 0.54 ± 0.0 | **0.78 ± 0.01**| **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
76
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
77
  | LGBM | Proposed | _0.76 ± 0.01_ | _0.79 ± 0.0_ | _0.60 ± 0.11_ | _0.63 ± 0.01_ | _0.77 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.82 ± 0.0_ |
78
  | | DNABERT | 0.74 ± 0.0 | 0.78 ± 0.0 | 0.60 ± 0.08 | 0.66 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.75 ± 0.01 |
79
  | | NT | 0.59 ± 0.01 | 0.56 ± 0.0 | **0.89 ± 0.02**| **0.72 ± 0.01**| **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| **0.97 ± 0.0** |
80
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
81
  | XGB | Proposed | _0.72 ± 0.01_ | _0.75 ± 0.0_ | _0.59 ± 0.08_ | _0.60 ± 0.0_ | _0.76 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.85 ± 0.01_ |
82
  | | DNABERT | 0.71 ± 0.01 | 0.75 ± 0.01 | 0.58 ± 0.05 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.79 ± 0.01 |
83
  | | NT | 0.59 ± 0.01 | 0.57 ± 0.01 | 0.72 ± 0.01 | **0.85 ± 0.01**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| **0.9893 ± 0.0** |
84
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
85
  | RF | Proposed | _0.73 ± 0.0_ | _0.79 ± 0.0_ | _0.58 ± 0.08_ | _0.61 ± 0.01_ | _0.75 ± 0.0_ | _0.53 ± 0.17_ | _0.24 ± 0.05_ | _0.86 ± 0.0_ |
86
  | | DNABERT | 0.72 ± 0.0 | 0.79 ± 0.0 | 0.59 ± 0.09 | 0.63 ± 0.01 | 0.80 ± 0.01 | 0.53 ± 0.17 | 0.24 ± 0.05 | 0.82 ± 0.01 |
87
  | | NT | 0.59 ± 0.01 | 0.56 ± 0.01 | **0.89 ± 0.02**| **0.71 ± 0.01**| **0.84 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| **0.97 ± 0.0** |
88
 
 
89
  ## Authors
90
  -----------
91