Yilun Zhao's picture

Yilun Zhao PRO

yilunzhao

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 10 days ago

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

upvoted a paper 11 days ago

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

upvoted a paper 11 days ago

On Data Engineering for Scaling LLM Terminal Capabilities

View all activity

Organizations

upvoted a paper 10 days ago

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Paper • 2602.14337 • Published 21 days ago • 13

upvoted 2 papers 11 days ago

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Paper • 2602.21198 • Published 12 days ago • 4

On Data Engineering for Scaling LLM Terminal Capabilities

Paper • 2602.21193 • Published 12 days ago • 91

upvoted 2 papers 13 days ago

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

Paper • 2602.12705 • Published 23 days ago • 65

Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Paper • 2602.10388 • Published 26 days ago • 240

upvoted a paper 18 days ago

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Paper • 2602.12670 • Published 23 days ago • 54

upvoted a paper 19 days ago

Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training

Paper • 2602.07824 • Published 28 days ago • 16

upvoted 2 papers 26 days ago

How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs

Paper • 2602.08808 • Published 27 days ago • 8

ANCHOR: Branch-Point Data Generation for GUI Agents

Paper • 2602.07153 • Published 30 days ago • 5

upvoted 11 papers about 1 month ago

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents

Paper • 2602.05975 • Published about 1 month ago • 12

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Paper • 2602.03419 • Published Feb 3 • 40

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

Paper • 2601.18207 • Published Jan 26 • 19

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Paper • 2602.02196 • Published Feb 2 • 35

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Paper • 2602.02437 • Published Feb 2 • 77

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Paper • 2601.17027 • Published Jan 17 • 41

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Paper • 2601.19325 • Published Jan 27 • 79

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Paper • 2601.21821 • Published Jan 29 • 60

Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives

Paper • 2601.20833 • Published Jan 28 • 182

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Paper • 2601.21468 • Published Jan 29 • 25

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 213