Generative Evaluation of Complex Reasoning in Large Language Models Paper • 2504.02810 • Published Apr 3, 2025 • 14 • 5
Generative Evaluation of Complex Reasoning in Large Language Models Paper • 2504.02810 • Published Apr 3, 2025 • 14 • 5
Class Incremental Learning via Likelihood Ratio Based Task Prediction Paper • 2309.15048 • Published Sep 26, 2023
MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft Paper • 2310.08367 • Published Oct 12, 2023 • 1