mmokoatle commited on
Commit
c1d7e42
·
verified ·
1 Parent(s): 918b935

Updated readme

Browse files
Files changed (1) hide show
  1. README.md +39 -47
README.md CHANGED
@@ -45,53 +45,45 @@ The retrieved embeddings can be utilized as input for a machine learning classif
45
 
46
  Find out more about the datasets and access in the paper **(TBA)**
47
 
48
- ### Task 1: Detection of colorectal cancer cases (after oversampling)
49
-
50
- | | 5-fold Cross Validation accuracy | Test accuracy |
51
- | --- | --- | ---|
52
- | LightGBM | 91 | 63 |
53
- | Random Forest | **94** | **71** |
54
- | XGBoost | 93 | 66 |
55
- | CNN | 42 | 52 |
56
-
57
- | | 5-fold Cross Validation F1 | Test F1 |
58
- | --- | --- | ---|
59
- | LightGBM | 91 | 66 |
60
- | Random Forest | **94** | **72** |
61
- | XGBoost | 93 | 66 |
62
- | CNN | 41 | 60 |
63
-
64
- ### Task 2: Prediction of the Gleason grade group (after oversampling)
65
-
66
- | | 5-fold Cross Validation accuracy | Test accuracy |
67
- | --- | --- | ---|
68
- | LightGBM | 97 | 68 |
69
- | Random Forest | **98** | **78** |
70
- | XGBoost |97 | 70 |
71
- | CNN | 35 | 50 |
72
-
73
- | | 5-fold Cross Validation F1 | Test F1 |
74
- | --- | --- | ---|
75
- | LightGBM | 97 | 70 |
76
- | Random Forest | **98** | **80** |
77
- | XGBoost |97 | 70 |
78
- | CNN | 33 | 59 |
79
-
80
- ### Task 3: Detection of human TATA sequences (after oversampling)
81
-
82
- | | 5-fold Cross Validation accuracy | Test accuracy |
83
- | --- | --- | ---|
84
- | LightGBM | 98 | 93 |
85
- | Random Forest | **99** | **96** |
86
- | XGBoost |**99** | 95 |
87
- | CNN | 38 | 59 |
88
-
89
- | | 5-fold Cross Validation F1 | Test F1 |
90
- | --- | --- | ---|
91
- | LightGBM | 98 | 92 |
92
- | Random Forest | **99** | **95** |
93
- | XGBoost | **99** | 92 |
94
- | CNN | 58 | 10 |
95
 
96
 
97
  ## Authors
 
45
 
46
  Find out more about the datasets and access in the paper **(TBA)**
47
 
48
+ **Table:** Accuracy scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method. Best results per column are in **bold**, while scores for the Proposed model are _underlined_.
49
+
50
+ | Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
51
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
52
+ | LR | Proposed | _0.65 ± 0.01_ | _0.67 ± 0.0_ | _0.85 ± 0.01_ | _0.64 ± 0.01_ | _0.80 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.70 ± 0.01_ |
53
+ | | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.0 | 0.84 ± 0.04 | 0.69 ± 0.01 | 0.85 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.60 ± 0.01 |
54
+ | | NT | **0.66 ± 0.0** | **0.67 ± 0.0** | 0.84 ± 0.01 | **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
55
+
56
+ | LGBM | Proposed | _0.64 ± 0.01_ | _0.66 ± 0.0_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.78 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.81 ± 0.01_ |
57
+ | | DNABERT | 0.62 ± 0.01 | 0.65 ± 0.01 | 0.90 ± 0.02 | 0.65 ± 0.01 | 0.83 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.75 ± 0.01 |
58
+ | | NT | 0.63 ± 0.01 | 0.66 ± 0.0 | **0.91 ± 0.02**| 0.72 ± 0.0 | **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| 0.97 ± 0.0 |
59
+
60
+ | XGB | Proposed | _0.60 ± 0.01_ | _0.62 ± 0.0_ | _0.90 ± 0.02_ | _0.60 ± 0.0_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.85 ± 0.01_ |
61
+ | | DNABERT | 0.59 ± 0.01 | 0.62 ± 0.01 | 0.90 ± 0.01 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.79 ± 0.01 |
62
+ | | NT | 0.61 ± 0.01 | 0.64 ± 0.0 | 0.90 ± 0.02 | **0.89 ± 0.03**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| 0.98 ± 0.0 |
63
+
64
+ | RF | Proposed | _0.61 ± 0.0_ | _0.66 ± 0.01_ | _0.90 ± 0.02_ | _0.61 ± 0.01_ | _0.77 ± 0.0_ | _0.49 ± 0.0_ | _0.33 ± 0.0_ | _0.86 ± 0.0_ |
65
+ | | DNABERT | 0.60 ± 0.0 | 0.66 ± 0.01 | 0.90 ± 0.02 | 0.63 ± 0.01 | 0.82 ± 0.0 | 0.49 ± 0.0 | 0.33 ± 0.0 | 0.81 ± 0.01 |
66
+ | | NT | 0.62 ± 0.01 | **0.67 ± 0.01**| 0.90 ± 0.01 | 0.71 ± 0.01 | **0.85 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| 0.97 ± 0.0 |
67
+
68
+ **Table:** F1-scores (with 95% confidence intervals) across datasets T1–T8 for each model and embedding method. Best results per column are in **bold**, while scores for the Proposed model are _underlined_.
69
+
70
+ | Model | Embed. | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 |
71
+ |-------|-----------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
72
+ | LR | Proposed | **_0.78 ± 0.0_** | **_0.80 ± 0.01_** | _0.20 ± 0.05_ | _0.64 ± 0.01_ | _0.79 ± 0.0_ | _0.13 ± 0.37_ | _0.16 ± 0.0_ | _0.70 ± 0.01_ |
73
+ | | DNABERT | 0.75 ± 0.01 | 0.78 ± 0.0 | 0.47 ± 0.09 | 0.69 ± 0.01 | 0.84 ± 0.01 | 0.13 ± 0.37 | 0.16 ± 0.0 | 0.59 ± 0.01 |
74
+ | | NT | 0.56 ± 0.01 | 0.54 ± 0.0 | **0.78 ± 0.01**| **0.73 ± 0.0** | **0.85 ± 0.01**| **0.81 ± 0.0** | **0.62 ± 0.01**| **0.99 ± 0.0** |
75
+
76
+ | LGBM | Proposed | _0.76 ± 0.01_ | _0.79 ± 0.0_ | _0.60 ± 0.11_ | _0.63 ± 0.01_ | _0.77 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.82 ± 0.0_ |
77
+ | | DNABERT | 0.74 ± 0.0 | 0.78 ± 0.0 | 0.60 ± 0.08 | 0.66 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.75 ± 0.01 |
78
+ | | NT | 0.59 ± 0.01 | 0.56 ± 0.0 | **0.89 ± 0.02**| **0.72 ± 0.01**| **0.85 ± 0.0** | **0.80 ± 0.0** | **0.59 ± 0.01**| **0.97 ± 0.0** |
79
+
80
+ | XGB | Proposed | _0.72 ± 0.01_ | _0.75 ± 0.0_ | _0.59 ± 0.08_ | _0.60 ± 0.0_ | _0.76 ± 0.0_ | _0.47 ± 0.20_ | _0.26 ± 0.04_ | _0.85 ± 0.01_ |
81
+ | | DNABERT | 0.71 ± 0.01 | 0.75 ± 0.01 | 0.58 ± 0.05 | 0.64 ± 0.01 | 0.82 ± 0.01 | 0.47 ± 0.20 | 0.26 ± 0.04 | 0.79 ± 0.01 |
82
+ | | NT | 0.59 ± 0.01 | 0.57 ± 0.01 | 0.72 ± 0.01 | **0.85 ± 0.01**| **0.85 ± 0.01**| **0.81 ± 0.01**| **0.60 ± 0.01**| **0.9893 ± 0.0** |
83
+
84
+ | RF | Proposed | _0.73 ± 0.0_ | _0.79 ± 0.0_ | _0.58 ± 0.08_ | _0.61 ± 0.01_ | _0.75 ± 0.0_ | _0.53 ± 0.17_ | _0.24 ± 0.05_ | _0.86 ± 0.0_ |
85
+ | | DNABERT | 0.72 ± 0.0 | 0.79 ± 0.0 | 0.59 ± 0.09 | 0.63 ± 0.01 | 0.80 ± 0.01 | 0.53 ± 0.17 | 0.24 ± 0.05 | 0.82 ± 0.01 |
86
+ | | NT | 0.59 ± 0.01 | 0.56 ± 0.01 | **0.89 ± 0.02**| **0.71 ± 0.01**| **0.84 ± 0.0** | **0.79 ± 0.0** | **0.55 ± 0.01**| **0.97 ± 0.0** |
 
 
 
 
 
 
 
 
87
 
88
 
89
  ## Authors