Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper โข 2504.20157 โข Published Apr 28 โข 38
Columbia-NLP/llama3-8b-instruct-rewriting-r-Decor Text Generation โข 8B โข Updated Jul 5, 2024 โข 2
Columbia-NLP/llama3-8b-instruct-rewriting-nr-Decor Text Generation โข 8B โข Updated Jul 5, 2024 โข 2
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI Paper โข 2307.10172 โข Published Jul 19, 2023 โข 12