Jim Lai's picture

Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for research and/or educational purposes.

Recent Activity

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Debased AI's profile picture Anthracite's profile picture Anthracite Core's profile picture

Posts 22

view post
Post
1713
I recently have been looking at a paper titled "Why Warmup the Learning Rate? Underlying Mechanisms and Improvements", by Dayal Singh Kalra and Maissam Barkeshli, and was struck by "warmup" being analogous to simulated annealing.
https://arxiv.org/abs/2406.09405
Taking the physical analogy further, the "warmup" is a stochastic process to knock the system out of current local minima, allowing easier transition toward newer minima. It works because it reduces "fit" and therefore "friction".

Articles 2

Article

NOPE: Normative Ontological Prompt Engineering