FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 5
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7, 2025 • 2
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46