RLAIF

Team

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

violetxi updated a collection 4 days ago

HybridReasoning

violetxi updated a collection 4 days ago

HybridReasoning

violetxi updated a collection 4 days ago

HybridReasoning

View all activity

violetxi

updated a collection 4 days ago

HybridReasoning

Collection

13 items • Updated 4 days ago

nlile

authored a paper about 2 months ago

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Paper • 2506.05256 • Published Jun 5 • 2

sea-snell

authored a paper 4 months ago

Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published Apr 21 • 43

nlile

authored a paper 5 months ago

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Paper • 2502.17387 • Published Feb 24 • 6

Asap7772

authored a paper 5 months ago

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 39

nlile

authored a paper 5 months ago

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 39

violetxi

authored 2 papers 7 months ago

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Paper • 2407.07086 • Published Jul 9, 2024

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

nlile

authored a paper 7 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

Asap7772

authored a paper 7 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

nlile

authored a paper 10 months ago

Generative Reward Models

Paper • 2410.12832 • Published Oct 2, 2024 • 7

Asap7772

authored a paper 10 months ago

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation

Paper • 2410.02725 • Published Oct 3, 2024 • 1

Asap7772

authored a paper 12 months ago

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Paper • 2310.08864 • Published Oct 13, 2023 • 2

AI & ML interests

Recent Activity

Team members 9

RLAIF's activity