-
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Paper • 2605.30611 • Published • 192 -
HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems
Paper • 2606.01779 • Published • 6 -
Text-to-Image Models Need Less from Text Encoders Than You Think
Paper • 2606.03715 • Published • 10 -
SIA: Self Improving AI with Harness & Weight Updates
Paper • 2605.27276 • Published • 14
JiayuCHEN
KN33SOXXX
AI & ML interests
None yet
Recent Activity
updated a collection about 15 hours ago
benchmark updated a collection about 15 hours ago
benchmark updated a collection 3 days ago
benchmarkOrganizations
benchmark
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
Paper • 2605.25874 • Published • 102 -
Agents' Last Exam
Paper • 2606.05405 • Published • 335 -
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Paper • 2606.09426 • Published • 97 -
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Paper • 2606.07591 • Published • 87
harness
-
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Paper • 2605.30611 • Published • 192 -
HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems
Paper • 2606.01779 • Published • 6 -
Text-to-Image Models Need Less from Text Encoders Than You Think
Paper • 2606.03715 • Published • 10 -
SIA: Self Improving AI with Harness & Weight Updates
Paper • 2605.27276 • Published • 14
benchmark
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
Paper • 2605.25874 • Published • 102 -
Agents' Last Exam
Paper • 2606.05405 • Published • 335 -
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Paper • 2606.09426 • Published • 97 -
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Paper • 2606.07591 • Published • 87
models 0
None public yet