Update README.md
Browse files
README.md
CHANGED
@@ -18,12 +18,20 @@ Learning high-quality text representations is fundamental to a wide range of NLP
|
|
18 |
|
19 |
## Models
|
20 |
|
21 |
-
We release all the models trained and evaluated in the paper.
|
22 |
-
|
23 |
-
*
|
24 |
-
|
25 |
-
|
26 |
-
*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
|
29 |
## First authors' contact information
|
|
|
18 |
|
19 |
## Models
|
20 |
|
21 |
+
We release all the models trained and evaluated in the paper. Model identifiers follow a consistent format that encodes key training details:
|
22 |
+
|
23 |
+
* **Single-stage models**:
|
24 |
+
`[model size]-[objective]-[number of steps]`.
|
25 |
+
Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
|
26 |
+
* **Two-stage models**:
|
27 |
+
`[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
|
28 |
+
Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
|
29 |
+
* **Continued pretraining from decayed checkpoints**:
|
30 |
+
These use the dec prefix on the first training stage.
|
31 |
+
Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
|
32 |
+
* **Intermediate checkpoints**:
|
33 |
+
To refer to a specific training step before the final checkpoint, append the step number at the end.
|
34 |
+
Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
|
35 |
|
36 |
|
37 |
## First authors' contact information
|