hgissbkh commited on
Commit
92586e0
·
verified ·
1 Parent(s): 3a687cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -6
README.md CHANGED
@@ -18,12 +18,20 @@ Learning high-quality text representations is fundamental to a wide range of NLP
18
 
19
  ## Models
20
 
21
- We release all the models trained and evaluated in the paper.
22
-
23
- * Model names follow the format `[model size]-[objective]-[number of steps]`: e.g., `610m-clm-42k` refers to a 610M-parameter model trained with CLM for 42k steps.
24
- * For models trained in two stages, names follow the extended format `[model size]-[objective #1]-[number of steps #1]-[objective #2]-[number of steps #2]`, `[number of steps #2]` indicates the total number of training steps: e.g., `610m-clm-10k-mlm40-42k` is a 610M model first trained with CLM for 10k steps, then further trained with MLM (using a 40% masking ratio) for an additional 32k steps, totaling 42k.
25
- * Models that were continued from a decayed checkpoint use the "dec" prefix for the first step count: e.g., `610m-clm-dec42k-mlm40-64k` represents a 610M model first trained and decayed with CLM for 42k steps, then continued with MLM (40% masking ratio) for 22k more steps, totaling 64k.
26
- * By default, model names refer to the final checkpoint. Intermediate checkpoints are indicated by appending the step number at the end: e.g., `610m-mlm40-42k-1000` corresponds to checkpoint 1,000 of a 610M model trained with MLM (40% masking) for 42k steps.
 
 
 
 
 
 
 
 
27
 
28
 
29
  ## First authors' contact information
 
18
 
19
  ## Models
20
 
21
+ We release all the models trained and evaluated in the paper. Model identifiers follow a consistent format that encodes key training details:
22
+
23
+ * **Single-stage models**:
24
+ `[model size]-[objective]-[number of steps]`.
25
+ Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
26
+ * **Two-stage models**:
27
+ `[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
28
+ Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
29
+ * **Continued pretraining from decayed checkpoints**:
30
+ These use the dec prefix on the first training stage.
31
+ Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
32
+ * **Intermediate checkpoints**:
33
+ To refer to a specific training step before the final checkpoint, append the step number at the end.
34
+ Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
35
 
36
 
37
  ## First authors' contact information