-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 145 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 17 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 4 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 147
Jeffrey Magder
jmagder
AI & ML interests
None yet
Recent Activity
updated
a collection
2 days ago
To read
upvoted
a
paper
2 days ago
Why do LLMs attend to the first token?
updated
a collection
30 days ago
Finished Reading
Organizations
None yet