The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"
-
fnlp/SmolLM-135M-MLA-d_kv_8-refactor
Text Generation • 0.1B • Updated • 7 -
fnlp/SmolLM-135M-MLA-d_kv_32-refactor
Text Generation • 0.1B • Updated • 3 -
fnlp/SmolLM-135M-MLA-d_kv_16-refactor
Text Generation • 0.1B • Updated • 2 -
fnlp/SmolLM-360M-MLA-d_kv_8-refactor
Text Generation • 0.3B • Updated • 2