Zichen's picture

5 14 6

Zichen

lkevinzc

·

https://lkevinzc.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset 16 days ago

lkevinzc/llama3-ultrafeedback

published a dataset 16 days ago

lkevinzc/llama3-ultrafeedback

updated a dataset 26 days ago

axon-rl/math-eval

View all activity

Organizations

upvoted a paper about 1 month ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 48

upvoted 2 papers 2 months ago

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2 • 51

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 29

upvoted 3 papers 3 months ago

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 24

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

upvoted 2 papers 4 months ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14 • 13

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

upvoted a collection 5 months ago

🌾Oat-Zero: Understanding R1-Zero-Like Training

5 items • Updated Apr 10 • 7

upvoted a paper 6 months ago

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published Feb 18 • 18

upvoted a paper 9 months ago

Sample-Efficient Alignment for LLMs

Paper • 2411.01493 • Published Nov 3, 2024 • 12

upvoted a collection about 1 year ago

💡 DICE

Self-alignment with DPO Implicit Rewards • 5 items • Updated Jul 28, 2024 • 9

upvoted 2 papers about 1 year ago

RegMix: Data Mixture as Regression for Language Model Pre-training

Paper • 2407.01492 • Published Jul 1, 2024 • 41

Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 41