Kiều Sơn Tùng's picture

5 12

Kiều Sơn Tùng

codemaivanngu

·

AI & ML interests

None yet

Recent Activity

reacted to Jaward's post with 🔥 21 days ago

Towards batch sizes too small to meter🎉 beautiful work! And my personal favorite so far - I adore peak performance at small/nano scale. Everyone deserves to run/train AGI locally:) our data, our god model! They showed that: - you can train LLMs (upto 1B params) with as low as batch_size=1. This is unconventional given small batch sizes can lead to unstable/spiky training runs. - you can have a stable train run with just vanilla SGD(stochastic gradient descent), no momentum required🤯 - small batch sizes are more robust to hyperparameters (i.e no worries with initialization) - smaller batch sizes outperforms (“better per-Flops performance”) larger batch sizes. “We recommend that practitioners training large models in memory-constrained settings exploit the benefits of small batch sizes rather than trying to emulate the large batch size setting (e.g., through gradient accumulation) typically used in industry.” I’ve been doing this for ages - my mantra: all my experiments must scale on my 8gb ram m2 before moving to gpu. IOW I love being gpu poor. Checkout my nanoAI algo repo: https://github.com/Jaykef/ai-algorithms, all notebooks run on memory as low as 8gb ram

upvoted a collection 21 days ago

liked a model 3 months ago

Qwen/Qwen3-Embedding-0.6B-GGUF

View all activity

Organizations

None yet

spaces 1

No application file

ML

models 1

codemaivanngu/hybrid-filtering

datasets 0

None public yet