6 319 31

Young-Jun Lee PRO

passing2961

https://sites.google.com/view/passing2961/home

AI & ML interests

Social Dialogue System, Multi-Modal Dialogue

Recent Activity

upvoted a paper 1 day ago

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

upvoted a paper 3 days ago

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

upvoted a paper 3 days ago

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

View all activity

Organizations

upvoted a paper 1 day ago

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Paper • 2512.18470 • Published 6 days ago • 8

upvoted 4 papers 3 days ago

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published 9 days ago • 105

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

Paper • 2512.19432 • Published 4 days ago • 10

QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Paper • 2512.19526 • Published 4 days ago • 10

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 8 days ago • 20

upvoted 3 papers 6 days ago

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

Paper • 2512.15489 • Published 9 days ago • 6

Adaptation of Agentic AI

Paper • 2512.16301 • Published 9 days ago • 92

Kling-Omni Technical Report

Paper • 2512.16776 • Published 8 days ago • 155

upvoted 2 papers 10 days ago

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

Paper • 2512.13168 • Published 12 days ago • 49

Olmo 3

Paper • 2512.13961 • Published 11 days ago • 22

upvoted 3 papers 13 days ago

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Paper • 2512.10791 • Published 15 days ago • 7

Evaluating Gemini Robotics Policies in a Veo World Simulator

Paper • 2512.10675 • Published 15 days ago • 16

Thinking with Images via Self-Calling Agent

Paper • 2512.08511 • Published 18 days ago • 21

upvoted 3 papers 15 days ago

New activity in RefineBench/RefineBench 25 days ago

Improve dataset card: Add metadata, update paper link and arXiv badge, add sample usage

#2 opened 25 days ago by

nielsr

authored a paper 26 days ago

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

Paper • 2511.22173 • Published 30 days ago • 13

upvoted a paper 26 days ago

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

Paper • 2511.22173 • Published 30 days ago • 13

updated a dataset 26 days ago

RefineBench/RefineBench

Viewer • Updated 25 days ago • 1k • 1.79k • 5

Young-Jun Lee PRO

AI & ML interests

Recent Activity

Organizations

passing2961's activity

Improve dataset card: Add metadata, update paper link and arXiv badge, add sample usage