RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published 8 days ago • 23
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published 8 days ago • 23
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published 8 days ago • 23 • 2
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention Paper • 2504.16083 • Published 21 days ago • 8
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention Paper • 2504.16083 • Published 21 days ago • 8 • 2
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28 • 39
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 48
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 276
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published Dec 13, 2024 • 10
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published Dec 13, 2024 • 10
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published Dec 13, 2024 • 10 • 2