Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security Paper • 2507.19399 • Published 20 days ago • 1
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators Paper • 2507.15339 • Published 24 days ago
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published 26 days ago • 125
Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation Paper • 2507.11966 • Published 29 days ago
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published 29 days ago • 41
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18 • 18
Mind the Gap! Static and Interactive Evaluations of Large Audio Models Paper • 2502.15919 • Published Feb 21 • 4
FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning Paper • 2506.16123 • Published Jun 19 • 8
Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications Paper • 2507.09820 • Published Jul 13
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages Paper • 2507.05980 • Published Jul 8 • 1
ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention Paper • 2507.01004 • Published Jul 1 • 10
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26 • 65
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20 • 23
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 33
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks Paper • 2410.18210 • Published Oct 23, 2024
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper • 2504.13055 • Published Apr 17 • 19