Update README.md
Browse files
README.md
CHANGED
@@ -45,47 +45,47 @@ The retrieved embeddings can be utilized as input for a machine learning classif
|
|
45 |
|
46 |
Find out more about the datasets and access in the paper **(TBA)**
|
47 |
|
48 |
-
**Table:** Accuracy scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method.
|
49 |
|
50 |
| Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
|
51 |
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
52 |
| LR | Proposed | _0.65 ± 0.01_ | _0.67 ± 0.0_ | _0.85 ± 0.01_ | _0.64 ± 0.01_ | _0.80 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.70 ± 0.01_ |
|
53 |
| | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.0 | 0.84 ± 0.04 | 0.69 ± 0.01 | 0.85 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.60 ± 0.01 |
|
54 |
| | NT | **0.66 ± 0.0** | **0.67 ± 0.0** | 0.84 ± 0.01 | **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
|
55 |
-
|
56 |
| LGBM | Proposed | _0.64 ± 0.01_ | _0.66 ± 0.0_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.78 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.81 ± 0.01_ |
|
57 |
| | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.01 | 0.90 ± 0.02 | 0.65 ± 0.01 | 0.83 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.75 ± 0.01 |
|
58 |
| | NT | 0.63 ± 0.01 | 0.66 ± 0.0 | **0.91 ± 0.02**| 0.72 ± 0.0 | **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| 0.97 ± 0.0 |
|
59 |
-
|
60 |
| XGB | Proposed | _0.60 ± 0.01_ | _0.62 ± 0.0_ | _0.90 ± 0.02_ | _0.60 ± 0.0_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.85 ± 0.01_ |
|
61 |
| | DNABERT | 0.59 ± 0.01 | 0.62 ± 0.01 | 0.90 ± 0.01 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.79 ± 0.01 |
|
62 |
| | NT | 0.61 ± 0.01 | 0.64 ± 0.0 | 0.90 ± 0.02 | **0.89 ± 0.03**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| 0.98 ± 0.0 |
|
63 |
-
|
64 |
| RF | Proposed | _0.61 ± 0.0_ | _0.66 ± 0.01_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.86 ± 0.0_ |
|
65 |
| | DNABERT | 0.60 ± 0.0 | 0.66 ± 0.01 | 0.90 ± 0.02 | 0.63 ± 0.01 | 0.82 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.81 ± 0.01 |
|
66 |
| | NT | 0.62 ± 0.01 | **0.67 ± 0.01**| 0.90 ± 0.01 | 0.71 ± 0.01 | **0.85 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| 0.97 ± 0.0 |
|
67 |
|
68 |
-
|
|
|
69 |
|
70 |
| Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
|
71 |
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
72 |
| LR | Proposed | **_0.78 ± 0.0_** | **_0.80 ± 0.01_** | _0.20 ± 0.05_ | _0.64 ± 0.01_ | _0.79 ± 0.0_ | _0.13 ± 0.37_ | _0.16 ± 0.0_ | _0.70 ± 0.01_ |
|
73 |
| | DNABERT | 0.75 ± 0.01 | 0.78 ± 0.0 | 0.47 ± 0.09 | 0.69 ± 0.01 | 0.84 ± 0.01 | 0.13 ± 0.37 | 0.16 ± 0.0 | 0.59 ± 0.01 |
|
74 |
| | NT | 0.56 ± 0.01 | 0.54 ± 0.0 | **0.78 ± 0.01**| **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
|
75 |
-
|
76 |
| LGBM | Proposed | _0.76 ± 0.01_ | _0.79 ± 0.0_ | _0.60 ± 0.11_ | _0.63 ± 0.01_ | _0.77 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.82 ± 0.0_ |
|
77 |
| | DNABERT | 0.74 ± 0.0 | 0.78 ± 0.0 | 0.60 ± 0.08 | 0.66 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.75 ± 0.01 |
|
78 |
| | NT | 0.59 ± 0.01 | 0.56 ± 0.0 | **0.89 ± 0.02**| **0.72 ± 0.01**| **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| **0.97 ± 0.0** |
|
79 |
-
|
80 |
| XGB | Proposed | _0.72 ± 0.01_ | _0.75 ± 0.0_ | _0.59 ± 0.08_ | _0.60 ± 0.0_ | _0.76 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.85 ± 0.01_ |
|
81 |
| | DNABERT | 0.71 ± 0.01 | 0.75 ± 0.01 | 0.58 ± 0.05 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.79 ± 0.01 |
|
82 |
| | NT | 0.59 ± 0.01 | 0.57 ± 0.01 | 0.72 ± 0.01 | **0.85 ± 0.01**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| **0.9893 ± 0.0** |
|
83 |
-
|
84 |
| RF | Proposed | _0.73 ± 0.0_ | _0.79 ± 0.0_ | _0.58 ± 0.08_ | _0.61 ± 0.01_ | _0.75 ± 0.0_ | _0.53 ± 0.17_ | _0.24 ± 0.05_ | _0.86 ± 0.0_ |
|
85 |
| | DNABERT | 0.72 ± 0.0 | 0.79 ± 0.0 | 0.59 ± 0.09 | 0.63 ± 0.01 | 0.80 ± 0.01 | 0.53 ± 0.17 | 0.24 ± 0.05 | 0.82 ± 0.01 |
|
86 |
| | NT | 0.59 ± 0.01 | 0.56 ± 0.01 | **0.89 ± 0.02**| **0.71 ± 0.01**| **0.84 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| **0.97 ± 0.0** |
|
87 |
|
88 |
-
|
89 |
## Authors
|
90 |
-----------
|
91 |
|
|
|
45 |
|
46 |
Find out more about the datasets and access in the paper **(TBA)**
|
47 |
|
48 |
+
**Table:** Accuracy scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method.
|
49 |
|
50 |
| Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
|
51 |
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
52 |
| LR | Proposed | _0.65 ± 0.01_ | _0.67 ± 0.0_ | _0.85 ± 0.01_ | _0.64 ± 0.01_ | _0.80 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.70 ± 0.01_ |
|
53 |
| | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.0 | 0.84 ± 0.04 | 0.69 ± 0.01 | 0.85 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.60 ± 0.01 |
|
54 |
| | NT | **0.66 ± 0.0** | **0.67 ± 0.0** | 0.84 ± 0.01 | **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
|
55 |
+
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
56 |
| LGBM | Proposed | _0.64 ± 0.01_ | _0.66 ± 0.0_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.78 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.81 ± 0.01_ |
|
57 |
| | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.01 | 0.90 ± 0.02 | 0.65 ± 0.01 | 0.83 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.75 ± 0.01 |
|
58 |
| | NT | 0.63 ± 0.01 | 0.66 ± 0.0 | **0.91 ± 0.02**| 0.72 ± 0.0 | **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| 0.97 ± 0.0 |
|
59 |
+
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
60 |
| XGB | Proposed | _0.60 ± 0.01_ | _0.62 ± 0.0_ | _0.90 ± 0.02_ | _0.60 ± 0.0_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.85 ± 0.01_ |
|
61 |
| | DNABERT | 0.59 ± 0.01 | 0.62 ± 0.01 | 0.90 ± 0.01 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.79 ± 0.01 |
|
62 |
| | NT | 0.61 ± 0.01 | 0.64 ± 0.0 | 0.90 ± 0.02 | **0.89 ± 0.03**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| 0.98 ± 0.0 |
|
63 |
+
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
64 |
| RF | Proposed | _0.61 ± 0.0_ | _0.66 ± 0.01_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.86 ± 0.0_ |
|
65 |
| | DNABERT | 0.60 ± 0.0 | 0.66 ± 0.01 | 0.90 ± 0.02 | 0.63 ± 0.01 | 0.82 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.81 ± 0.01 |
|
66 |
| | NT | 0.62 ± 0.01 | **0.67 ± 0.01**| 0.90 ± 0.01 | 0.71 ± 0.01 | **0.85 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| 0.97 ± 0.0 |
|
67 |
|
68 |
+
|
69 |
+
**Table:** F1-scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method.
|
70 |
|
71 |
| Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
|
72 |
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
73 |
| LR | Proposed | **_0.78 ± 0.0_** | **_0.80 ± 0.01_** | _0.20 ± 0.05_ | _0.64 ± 0.01_ | _0.79 ± 0.0_ | _0.13 ± 0.37_ | _0.16 ± 0.0_ | _0.70 ± 0.01_ |
|
74 |
| | DNABERT | 0.75 ± 0.01 | 0.78 ± 0.0 | 0.47 ± 0.09 | 0.69 ± 0.01 | 0.84 ± 0.01 | 0.13 ± 0.37 | 0.16 ± 0.0 | 0.59 ± 0.01 |
|
75 |
| | NT | 0.56 ± 0.01 | 0.54 ± 0.0 | **0.78 ± 0.01**| **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
|
76 |
+
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
77 |
| LGBM | Proposed | _0.76 ± 0.01_ | _0.79 ± 0.0_ | _0.60 ± 0.11_ | _0.63 ± 0.01_ | _0.77 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.82 ± 0.0_ |
|
78 |
| | DNABERT | 0.74 ± 0.0 | 0.78 ± 0.0 | 0.60 ± 0.08 | 0.66 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.75 ± 0.01 |
|
79 |
| | NT | 0.59 ± 0.01 | 0.56 ± 0.0 | **0.89 ± 0.02**| **0.72 ± 0.01**| **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| **0.97 ± 0.0** |
|
80 |
+
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
81 |
| XGB | Proposed | _0.72 ± 0.01_ | _0.75 ± 0.0_ | _0.59 ± 0.08_ | _0.60 ± 0.0_ | _0.76 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.85 ± 0.01_ |
|
82 |
| | DNABERT | 0.71 ± 0.01 | 0.75 ± 0.01 | 0.58 ± 0.05 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.79 ± 0.01 |
|
83 |
| | NT | 0.59 ± 0.01 | 0.57 ± 0.01 | 0.72 ± 0.01 | **0.85 ± 0.01**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| **0.9893 ± 0.0** |
|
84 |
+
|-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
85 |
| RF | Proposed | _0.73 ± 0.0_ | _0.79 ± 0.0_ | _0.58 ± 0.08_ | _0.61 ± 0.01_ | _0.75 ± 0.0_ | _0.53 ± 0.17_ | _0.24 ± 0.05_ | _0.86 ± 0.0_ |
|
86 |
| | DNABERT | 0.72 ± 0.0 | 0.79 ± 0.0 | 0.59 ± 0.09 | 0.63 ± 0.01 | 0.80 ± 0.01 | 0.53 ± 0.17 | 0.24 ± 0.05 | 0.82 ± 0.01 |
|
87 |
| | NT | 0.59 ± 0.01 | 0.56 ± 0.01 | **0.89 ± 0.02**| **0.71 ± 0.01**| **0.84 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| **0.97 ± 0.0** |
|
88 |
|
|
|
89 |
## Authors
|
90 |
-----------
|
91 |
|