Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR Paper • 2302.03201 • Published Feb 7, 2023
Switching the Loss Reduces the Cost in Batch Reinforcement Learning Paper • 2403.05385 • Published Mar 8, 2024
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning Paper • 2407.15762 • Published Jul 22, 2024 • 10