-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 61 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 49
Collections
Discover the best community collections!
Collections including paper arxiv:2502.01237
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 38 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 52 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 49
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 51 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 34 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
The Differences Between Direct Alignment Algorithms are a Blur
Paper • 2502.01237 • Published • 115 -
Process Reinforcement through Implicit Rewards
Paper • 2502.01456 • Published • 62
-
Differential Transformer
Paper • 2410.05258 • Published • 180 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 134 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 118 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 45
-
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Paper • 2402.12366 • Published • 3 -
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Paper • 2401.08417 • Published • 37 -
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Paper • 2404.14723 • Published • 10 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 28
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 61 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 49
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 51 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 34 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
The Differences Between Direct Alignment Algorithms are a Blur
Paper • 2502.01237 • Published • 115 -
Process Reinforcement through Implicit Rewards
Paper • 2502.01456 • Published • 62
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 38 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 52 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 49
-
Differential Transformer
Paper • 2410.05258 • Published • 180 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 134 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 118 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 45
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Paper • 2402.12366 • Published • 3 -
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Paper • 2401.08417 • Published • 37 -
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Paper • 2404.14723 • Published • 10 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 28