3 24 15

Le Yu

vanillaOVO

https://yule-buaa.github.io/

yule-BUAA

AI & ML interests

None yet

Recent Activity

upvoted a paper 16 days ago

Agentic Reinforced Policy Optimization

upvoted a paper 20 days ago

Group Sequence Policy Optimization

authored a paper 21 days ago

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

View all activity

Organizations

None yet

upvoted a paper 16 days ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published 19 days ago • 138

upvoted a paper 20 days ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published 21 days ago • 285

upvoted a paper 22 days ago

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Paper • 2507.15024 • Published 24 days ago • 13

upvoted a collection about 1 month ago

Qwen3

Collection

84 items • Updated 7 days ago • 1.08k

upvoted 3 papers 2 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 416

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 254

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 177

upvoted 2 papers 3 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 274

WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15 • 34

upvoted an article 6 months ago

Article

Putting RL back in RLHF

and 1 other •

Jun 12, 2024

• 100

upvoted a paper 7 months ago

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 53

upvoted a paper 8 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 373

upvoted a paper 10 months ago

A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

Paper • 2410.13841 • Published Oct 17, 2024 • 17

upvoted a collection about 1 year ago

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 683

upvoted an article about 1 year ago

Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

and 7 others •

Jul 23, 2024

• 237

upvoted a collection about 1 year ago

Qwen2

Collection

Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 23 days ago • 368

upvoted 2 articles over 1 year ago

Article

Fine-tune Llama 3 with ORPO

•

Apr 22, 2024

• 239

Article

Merge Large Language Models with mergekit

•

Jan 9, 2024

• 133

upvoted 2 papers over 1 year ago

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 27

Resolving Interference When Merging Models

Paper • 2306.01708 • Published Jun 2, 2023 • 15

Le Yu

AI & ML interests

Recent Activity

Organizations

vanillaOVO's activity

Putting RL back in RLHF

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Fine-tune Llama 3 with ORPO

Merge Large Language Models with mergekit