Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published 9 days ago • 15
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 134
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees Paper • 2506.14606 • Published Jun 17 • 11
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published Jun 5 • 24
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated Jul 21 • 350
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark Paper • 2503.20786 • Published Mar 26 • 2
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark Paper • 2505.16968 • Published May 22 • 40
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published May 30 • 80
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem Paper • 2505.21887 • Published May 28 • 14
CASS Collection Large-scale dataset and model suite for cross-architecture GPU code transpilation between CUDA and HIP at both source and assembly levels • 2 items • Updated May 15 • 5
SALT: Singular Value Adaptation with Low-Rank Transformation Paper • 2503.16055 • Published Mar 20 • 8
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding Paper • 2502.14949 • Published Feb 20 • 9
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 148