LLM Models
updated
gradientai/Llama-3-8B-Instruct-Gradient-1048k
Text Generation
•
8B
•
Updated
•
8.87k
•
679
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
93
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within Generation
Paper
•
2412.11919
•
Published
•
36
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
•
2412.18925
•
Published
•
106
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
33
VideoRAG: Retrieval-Augmented Generation over Video Corpus
Paper
•
2501.05874
•
Published
•
75
Baichuan-Omni-1.5 Technical Report
Paper
•
2501.15368
•
Published
•
60
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
•
2502.08910
•
Published
•
148
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper
•
2502.18137
•
Published
•
59
Slamming: Training a Speech Language Model on One GPU in a Day
Paper
•
2502.15814
•
Published
•
69
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
170
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Paper
•
2504.00072
•
Published
•
6
ReZero: Enhancing LLM search ability by trying one-more-time
Paper
•
2504.11001
•
Published
•
16
Paper
•
2506.03569
•
Published
•
80
Sentinel: SOTA model to protect against prompt injections
Paper
•
2506.05446
•
Published
•
23
qualifire/prompt-injection-sentinel
Text Classification
•
0.4B
•
Updated
•
808
•
15
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
•
2506.07900
•
Published
•
93
Multiverse: Your Language Models Secretly Decide How to Parallelize and
Merge Generation
Paper
•
2506.09991
•
Published
•
55
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention
Paper
•
2506.13585
•
Published
•
273
Demystifying the Visual Quality Paradox in Multimodal Large Language
Models
Paper
•
2506.15645
•
Published
•
4
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
•
2506.16406
•
Published
•
130
GPTailor: Large Language Model Pruning Through Layer Cutting and
Stitching
Paper
•
2506.20480
•
Published
•
7
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic
Empirical Study
Paper
•
2506.19794
•
Published
•
8
Where to find Grokking in LLM Pretraining? Monitor
Memorization-to-Generalization without Test
Paper
•
2506.21551
•
Published
•
28
MMSearch-R1: Incentivizing LMMs to Search
Paper
•
2506.20670
•
Published
•
64
Multi-Granular Spatio-Temporal Token Merging for Training-Free
Acceleration of Video LLMs
Paper
•
2507.07990
•
Published
•
45
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
•
2508.06471
•
Published
•
195
Qwen3Guard Technical Report
Paper
•
2510.14276
•
Published
•
14
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented
Generation
Paper
•
2510.17354
•
Published
•
33
RL makes MLLMs see better than SFT
Paper
•
2510.16333
•
Published
•
48
Image-Text-to-Text
•
4B
•
Updated
•
65
•
3