Spaces:

MLMvsCLM
/

README

Configuration error

hgissbkh commited on 4 days ago

Commit

92586e0

verified ·

1 Parent(s): 3a687cb

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -18,12 +18,20 @@ Learning high-quality text representations is fundamental to a wide range of NLP
 ## Models
-We release all the models trained and evaluated in the paper.
-* Model names follow the format `[model size]-[objective]-[number of steps]`: e.g., `610m-clm-42k` refers to a 610M-parameter model trained with CLM for 42k steps.
-* For models trained in two stages, names follow the extended format `[model size]-[objective #1]-[number of steps #1]-[objective #2]-[number of steps #2]`, `[number of steps #2]` indicates the total number of training steps: e.g., `610m-clm-10k-mlm40-42k` is a 610M model first trained with CLM for 10k steps, then further trained with MLM (using a 40% masking ratio) for an additional 32k steps, totaling 42k.
-* Models that were continued from a decayed checkpoint use the "dec" prefix for the first step count: e.g., `610m-clm-dec42k-mlm40-64k` represents a 610M model first trained and decayed with CLM for 42k steps, then continued with MLM (40% masking ratio) for 22k more steps, totaling 64k.
-* By default, model names refer to the final checkpoint. Intermediate checkpoints are indicated by appending the step number at the end: e.g., `610m-mlm40-42k-1000` corresponds to checkpoint 1,000 of a 610M model trained with MLM (40% masking) for 42k steps.
 ## First authors' contact information

 ## Models
+We release all the models trained and evaluated in the paper. Model identifiers follow a consistent format that encodes key training details:
+* **Single-stage models**:
+`[model size]-[objective]-[number of steps]`.
+Example: `610m-clm-42k` denotes a 610M-parameter model trained with CLM for 42,000 steps.
+* **Two-stage models**:
+`[model size]-[objective #1]-[steps #1]-[objective #2]-[total steps]`.
+Example: `610m-clm-10k-mlm40-42k` indicates a 610M model trained first with CLM for 10k steps, then continued with MLM (40% masking ratio) for 32k more steps, totaling 42k steps.
+* **Continued pretraining from decayed checkpoints**:
+These use the dec prefix on the first training stage.
+Example: `610m-clm-dec42k-mlm40-64k refers` to a 610M model pretrained with CLM for 42k steps (with weight decay), then further trained with MLM (40% masking) for 22k additional steps, totaling 64k.
+* **Intermediate checkpoints**:
+To refer to a specific training step before the final checkpoint, append the step number at the end.
+Example: `610m-mlm40-42k-1000` corresponds to step 1,000 during the MLM training phase of a 610M model trained for 42k steps.
 ## First authors' contact information