Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published 2 days ago • 14
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics Paper • 2512.21010 • Published 1 day ago
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 2 days ago • 39
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models Paper • 2512.19526 • Published 3 days ago • 6
StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published 3 days ago • 15
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published 3 days ago • 60
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments Paper • 2512.19432 • Published 3 days ago • 10
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators Paper • 2512.19682 • Published 3 days ago • 14
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 7 days ago • 103
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories Paper • 2512.17419 • Published 6 days ago • 9
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published 6 days ago • 47