view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • about 1 month ago • 614
Encoders vs Decoders: the Ettin Suite Collection A collection of SOTA, open-data, paired encoder-only and decoder only models ranging from 17M params to 1B. See the paper at https://arxiv.org/abs/250 • 32 items • Updated 22 days ago • 16
FLEXITOKENS: Flexible Tokenization for Evolving Language Models Paper • 2507.12720 • Published 22 days ago • 8 • 2
FLEXITOKENS: Flexible Tokenization for Evolving Language Models Paper • 2507.12720 • Published 22 days ago • 8
view article Article Transformers Are Getting Old: Variants and Alternatives Exist! By ProCreations • Jul 5 • 42
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Paper • 2506.16054 • Published Jun 19 • 60
FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies Paper • 2506.17673 • Published Jun 21 • 6
Steering Conceptual Bias via Transformer Latent-Subspace Activation Paper • 2506.18887 • Published Jun 23 • 6
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Paper • 2506.08672 • Published Jun 10 • 31
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 176
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 145