SSM_blog_posts / README.md
lbourdois's picture
Update README.md
fbed166
|
raw
history blame
2.41 kB
metadata
title: SSM Blog Posts
emoji: 📝
colorFrom: purple
colorTo: yellow
sdk: static
pinned: false

Une version en français est disponible sur mon blog


October 7, 2021, while wondering whether AK was a bot or a human, I saw one of his tweets. A link to a publication on open-review.net accompanied by the following image:

alt text

Intrigued by the results announced, I decided to read about this S3 model, which would be renamed less than a month later to S4 (link of the version from when it was still called S3 for those interested).
This brilliant article impressed me. At the time, I was convinced that State Space Models (SSM) were going to be a revolution, replacing transformers in the coming months. Two years later, I'm forced to admit that I was completely wrong, given the tsunami of LLMs making the news in NLP.
Nevertheless, on Monday December 4, 2023, the announcement of Mamba by Albert Gu and Tri Dao revived their interest. This phenomenon was accentuated 4 days later with the announcement of StripedHyena by Together AI.
A good opportunity for me to write a few words about the developments in SSM over the last two years.

I plan to write three articles first, where the aim is to illustrate the basics of SSM with S4 (the "Attention is all you need" of the field) before doing a literature review of the evolution of SSM since that first paper:

I also hope in a second time to go into the details of the architectures of some specific SSMs with animations ✨