report

12444b3 3 months ago

7.51 kB

	---
	license: mit
	pipeline_tag: text-generation
	library_name: transformers
	language: [
	'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
	'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
	'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
	'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
	'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
	'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
	'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
	]
	datasets:
	# core - base
	- ontocord/fineweb-permissive-multilingual-2m
	- distily/c4_multilingual_1M
	- data-silence/sumnews
	- xu-song/cc100-samples
	- badrex/llm-emoji-dataset
	- fblgit/simple-math
	- Gusarich/math-expressions-1m
	- neuralwork/arxiver
	- christopher/rosetta-code
	- nampdn-ai/tiny-codes
	- JeanKaddour/minipile
	# core - instruct
	- NousResearch/hermes-function-calling-v1
	- simplescaling/s1K-1.1
	# base - instruct
	- mlabonne/open-perfectblend
	- allenai/tulu-3-sft-mixture
	- rombodawg/Everything_Instruct_Multilingual
	# base - reason
	- open-r1/OpenR1-Math-220k
	- open-thoughts/OpenThoughts-114k
	- cognitivecomputations/dolphin-r1
	- simplescaling/s1K-1.1
	tags:
	- chat
	- core
	- base
	- instruct
	- reason
	---

	# tangled-alpha-0.1-core

	![logo](./misc/logo.jpg)

	```bash
	time python -B prepare_core_datasets.py
	```

	```
	Progress: 100%\|████████\| 220/220 [23:15<00:00, 6.34s/it]
	Workers are finished.██\| 220/220 [23:15<00:00, 6.34s/it]
	Finished data processing!
	i=0, block_size=8192, chunk_size=16384000, len(dataset)=893355, len(dataset) * block_size=7318364160
	Total number of tokens in the optimized dataset '../core-data-0-8192-2000' is 7318364160
	```

	```bash
	CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
	```

	```
	Seed set to 23
	Time to instantiate model: 0.24 seconds.
	Total parameters: 182,125,056
	Verifying settings ...
	Measured TFLOPs: 7041.81

	Epoch 1 \| iter 256 step 1 \| loss train: 10.529, val: n/a \| iter time: 1696.67 ms (step) remaining time: 4 days, 7:44:36
	Epoch 1 \| iter 512 step 2 \| loss train: 10.200, val: n/a \| iter time: 1260.46 ms (step) remaining time: 4 days, 2:29:51
	Epoch 1 \| iter 768 step 3 \| loss train: 9.875, val: n/a \| iter time: 1246.06 ms (step) remaining time: 4 days, 0:59:11
	Epoch 1 \| iter 1024 step 4 \| loss train: 9.634, val: n/a \| iter time: 1245.91 ms (step) remaining time: 4 days, 0:38:01
	Epoch 1 \| iter 1280 step 5 \| loss train: 9.504, val: n/a \| iter time: 1248.04 ms (step) remaining time: 4 days, 0:28:49
	Epoch 1 \| iter 1536 step 6 \| loss train: 9.371, val: n/a \| iter time: 1220.81 ms (step) remaining time: 4 days, 0:32:52
	Epoch 1 \| iter 1792 step 7 \| loss train: 9.269, val: n/a \| iter time: 1238.00 ms (step) remaining time: 4 days, 0:30:03
	Epoch 1 \| iter 2048 step 8 \| loss train: 9.214, val: n/a \| iter time: 1244.22 ms (step) remaining time: 4 days, 0:30:30
	Epoch 1 \| iter 2304 step 9 \| loss train: 9.109, val: n/a \| iter time: 1220.57 ms (step) remaining time: 4 days, 0:25:37
	Epoch 1 \| iter 2560 step 10 \| loss train: 9.061, val: n/a \| iter time: 1251.13 ms (step) remaining time: 4 days, 0:12:57
	Epoch 1 \| iter 2816 step 11 \| loss train: 9.031, val: n/a \| iter time: 1241.17 ms (step) remaining time: 4 days, 0:05:06
	Epoch 1 \| iter 3072 step 12 \| loss train: 8.944, val: n/a \| iter time: 1280.45 ms (step) remaining time: 4 days, 0:00:31
	Epoch 1 \| iter 3328 step 13 \| loss train: 8.931, val: n/a \| iter time: 1241.07 ms (step) remaining time: 4 days, 0:00:08
	Epoch 1 \| iter 3584 step 14 \| loss train: 8.910, val: n/a \| iter time: 1229.04 ms (step) remaining time: 3 days, 23:59:03
	Epoch 1 \| iter 3840 step 15 \| loss train: 8.823, val: n/a \| iter time: 1239.92 ms (step) remaining time: 3 days, 23:55:02
	Epoch 1 \| iter 4096 step 16 \| loss train: 8.745, val: n/a \| iter time: 1239.53 ms (step) remaining time: 3 days, 23:50:02
	Epoch 1 \| iter 4352 step 17 \| loss train: 8.679, val: n/a \| iter time: 1271.10 ms (step) remaining time: 3 days, 23:46:19
	Epoch 1 \| iter 4608 step 18 \| loss train: 8.654, val: n/a \| iter time: 1246.47 ms (step) remaining time: 3 days, 23:43:27
	Epoch 1 \| iter 4864 step 19 \| loss train: 8.651, val: n/a \| iter time: 1246.56 ms (step) remaining time: 3 days, 23:41:11
	Epoch 1 \| iter 5120 step 20 \| loss train: 8.639, val: n/a \| iter time: 1219.66 ms (step) remaining time: 3 days, 23:35:38
	# ...
	Epoch 1 \| iter 442880 step 1730 \| loss train: 2.740, val: 2.863 \| iter time: 1340.98 ms (step) remaining time: 0:51:28
	Epoch 1 \| iter 443136 step 1731 \| loss train: 2.734, val: 2.863 \| iter time: 1387.92 ms (step) remaining time: 0:48:00
	Epoch 1 \| iter 443392 step 1732 \| loss train: 2.730, val: 2.863 \| iter time: 1309.36 ms (step) remaining time: 0:44:31
	Epoch 1 \| iter 443648 step 1733 \| loss train: 2.715, val: 2.863 \| iter time: 1292.23 ms (step) remaining time: 0:41:03
	Epoch 1 \| iter 443904 step 1734 \| loss train: 2.718, val: 2.863 \| iter time: 1311.24 ms (step) remaining time: 0:37:35
	Epoch 1 \| iter 444160 step 1735 \| loss train: 2.709, val: 2.863 \| iter time: 1291.09 ms (step) remaining time: 0:34:07
	Epoch 1 \| iter 444416 step 1736 \| loss train: 2.723, val: 2.863 \| iter time: 1304.14 ms (step) remaining time: 0:30:39
	Epoch 1 \| iter 444672 step 1737 \| loss train: 2.721, val: 2.863 \| iter time: 1278.33 ms (step) remaining time: 0:27:10
	Epoch 1 \| iter 444928 step 1738 \| loss train: 2.697, val: 2.863 \| iter time: 1292.86 ms (step) remaining time: 0:23:42
	Epoch 1 \| iter 445184 step 1739 \| loss train: 2.763, val: 2.863 \| iter time: 1284.40 ms (step) remaining time: 0:20:14
	Epoch 1 \| iter 445440 step 1740 \| loss train: 2.775, val: 2.863 \| iter time: 1302.58 ms (step) remaining time: 0:16:46
	Epoch 1 \| iter 445696 step 1741 \| loss train: 2.756, val: 2.863 \| iter time: 1298.86 ms (step) remaining time: 0:13:18
	Epoch 1 \| iter 445952 step 1742 \| loss train: 2.728, val: 2.863 \| iter time: 1279.11 ms (step) remaining time: 0:09:49
	Epoch 1 \| iter 446208 step 1743 \| loss train: 2.637, val: 2.863 \| iter time: 1308.11 ms (step) remaining time: 0:06:21
	Epoch 1 \| iter 446464 step 1744 \| loss train: 2.638, val: 2.863 \| iter time: 1294.08 ms (step) remaining time: 0:02:53
	Validating ...
	Final evaluation \| val loss: 2.862 \| val ppl: 17.494
	Saving checkpoint to '../out/pretrain-core/final/lit_model.pth'
	----------------------------------------
	\| Performance
	\| - Total tokens : 7,318,355,968
	\| - Training Time : 363457.29 s
	\| - Tok/sec : 2103064.60 tok/s
	\| ----------------------------------------
	\| Memory Usage
	\| - Memory Used : 20.93 GB
	----------------------------------------
	```

	Backup `wandb`:

	```bash
	mv wandb wandb-pretrain-core
	```

	Chat with model:

	```bash
	CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
	```

	```bash
	CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
	```

	```
	# ...
	```