File size: 7,135 Bytes
038f0de c5c7625 038f0de c5c7625 e15928d c5c7625 18659f6 c5c7625 18659f6 c5c7625 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
license: mit
pipeline_tag: text-generation
library_name: transformers
language: [
'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
]
datasets:
# core - base
- ontocord/fineweb-permissive-multilingual-2m
- distily/c4_multilingual_1M
- data-silence/sumnews
- xu-song/cc100-samples
- badrex/llm-emoji-dataset
- fblgit/simple-math
- Gusarich/math-expressions-1m
- neuralwork/arxiver
- christopher/rosetta-code
- nampdn-ai/tiny-codes
- JeanKaddour/minipile
# core - instruct
- NousResearch/hermes-function-calling-v1
- simplescaling/s1K-1.1
# base - instruct
- mlabonne/open-perfectblend
- allenai/tulu-3-sft-mixture
- rombodawg/Everything_Instruct_Multilingual
# base - reason
- open-r1/OpenR1-Math-220k
- open-thoughts/OpenThoughts-114k
- cognitivecomputations/dolphin-r1
- simplescaling/s1K-1.1
tags:
- chat
- core
- base
- instruct
- reason
---
# tangled-alpha-0.4-core

```bash
time python -B prepare_core_datasets.py
```
```
i=0, min_len=0, max_len=1048576, block_size=4097, chunk_size=16388000, len(dataset)=1567386, len(dataset) * block_size=6421580442
Total number of tokens in the optimized dataset '../core-data-0-0-1048576-4097-4000' is 6421580442
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
```
```
Seed set to 23
Time to instantiate model: 0.23 seconds.
Total parameters: 185,631,232
Verifying settings ...
Measured TFLOPs: 7047.32
Epoch 1 | iter 256 step 1 | loss train: 11.714, val: n/a | iter time: 370.39 ms (step) remaining time: 4 days, 1:24:16
Epoch 1 | iter 512 step 2 | loss train: 11.711, val: n/a | iter time: 311.97 ms (step) remaining time: 3 days, 8:48:48
Epoch 1 | iter 768 step 3 | loss train: 11.708, val: n/a | iter time: 313.48 ms (step) remaining time: 3 days, 3:22:46
Epoch 1 | iter 1024 step 4 | loss train: 11.704, val: n/a | iter time: 313.71 ms (step) remaining time: 3 days, 0:41:32
Epoch 1 | iter 1280 step 5 | loss train: 11.694, val: n/a | iter time: 314.42 ms (step) remaining time: 2 days, 23:05:08
Epoch 1 | iter 1536 step 6 | loss train: 11.687, val: n/a | iter time: 314.62 ms (step) remaining time: 2 days, 22:00:35
Epoch 1 | iter 1792 step 7 | loss train: 11.668, val: n/a | iter time: 314.94 ms (step) remaining time: 2 days, 21:14:06
Epoch 1 | iter 2048 step 8 | loss train: 11.645, val: n/a | iter time: 316.28 ms (step) remaining time: 2 days, 20:39:12
Epoch 1 | iter 2304 step 9 | loss train: 11.630, val: n/a | iter time: 315.29 ms (step) remaining time: 2 days, 20:11:52
Epoch 1 | iter 2560 step 10 | loss train: 11.609, val: n/a | iter time: 315.53 ms (step) remaining time: 2 days, 19:49:36
Epoch 1 | iter 2816 step 11 | loss train: 11.564, val: n/a | iter time: 314.95 ms (step) remaining time: 2 days, 19:31:09
Epoch 1 | iter 3072 step 12 | loss train: 11.510, val: n/a | iter time: 314.23 ms (step) remaining time: 2 days, 19:15:24
Epoch 1 | iter 3328 step 13 | loss train: 11.453, val: n/a | iter time: 315.71 ms (step) remaining time: 2 days, 19:02:02
Epoch 1 | iter 3584 step 14 | loss train: 11.411, val: n/a | iter time: 316.43 ms (step) remaining time: 2 days, 18:50:24
Epoch 1 | iter 3840 step 15 | loss train: 11.346, val: n/a | iter time: 314.83 ms (step) remaining time: 2 days, 18:40:08
Epoch 1 | iter 4096 step 16 | loss train: 11.300, val: n/a | iter time: 314.94 ms (step) remaining time: 2 days, 18:30:57
Epoch 1 | iter 4352 step 17 | loss train: 11.237, val: n/a | iter time: 314.13 ms (step) remaining time: 2 days, 18:22:39
Epoch 1 | iter 4608 step 18 | loss train: 11.193, val: n/a | iter time: 314.85 ms (step) remaining time: 2 days, 18:15:08
Epoch 1 | iter 4864 step 19 | loss train: 11.131, val: n/a | iter time: 315.23 ms (step) remaining time: 2 days, 18:08:16
Epoch 1 | iter 5120 step 20 | loss train: 11.084, val: n/a | iter time: 314.08 ms (step) remaining time: 2 days, 18:03:14
# ...
Epoch 1 | iter 780800 step 3050 | loss train: 3.176, val: 3.554 | iter time: 314.97 ms (step) remaining time: 0:15:21
Epoch 1 | iter 781056 step 3051 | loss train: 3.207, val: 3.554 | iter time: 315.53 ms (step) remaining time: 0:14:05
Epoch 1 | iter 781312 step 3052 | loss train: 3.186, val: 3.554 | iter time: 315.74 ms (step) remaining time: 0:12:48
Epoch 1 | iter 781568 step 3053 | loss train: 3.189, val: 3.554 | iter time: 315.17 ms (step) remaining time: 0:11:32
Epoch 1 | iter 781824 step 3054 | loss train: 3.305, val: 3.554 | iter time: 315.29 ms (step) remaining time: 0:10:15
Epoch 1 | iter 782080 step 3055 | loss train: 3.173, val: 3.554 | iter time: 315.11 ms (step) remaining time: 0:08:59
Epoch 1 | iter 782336 step 3056 | loss train: 3.223, val: 3.554 | iter time: 315.35 ms (step) remaining time: 0:07:42
Epoch 1 | iter 782592 step 3057 | loss train: 3.182, val: 3.554 | iter time: 315.18 ms (step) remaining time: 0:06:26
Epoch 1 | iter 782848 step 3058 | loss train: 3.196, val: 3.554 | iter time: 316.37 ms (step) remaining time: 0:05:09
Epoch 1 | iter 783104 step 3059 | loss train: 3.187, val: 3.554 | iter time: 315.86 ms (step) remaining time: 0:03:53
Epoch 1 | iter 783360 step 3060 | loss train: 3.163, val: 3.554 | iter time: 314.81 ms (step) remaining time: 0:02:36
Epoch 1 | iter 783616 step 3061 | loss train: 3.190, val: 3.554 | iter time: 315.23 ms (step) remaining time: 0:01:20
Epoch 2 | iter 783872 step 3062 | loss train: 3.239, val: 3.554 | iter time: 317.71 ms (step) remaining time: 0:00:03
Validating ...
Final evaluation | val loss: 3.552 | val ppl: 34.896
Saving checkpoint to '../out/pretrain-core/final/lit_model.pth'
----------------------------------------
| Performance
| - Total tokens : 6,421,577,728
| - Training Time : 234340.96 s
| - Tok/sec : 17286.07 tok/s
| ----------------------------------------
| Memory Usage
| - Memory Used : 17.30 GB
----------------------------------------
```
Backup `wandb`:
```bash
mv wandb wandb-pretrain-core
```
Chat with model:
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
```
```
# ...
```
|