Victor Gallego

vicgalle

https://github.com/vicgalle

AI & ML interests

Preference fine-tuning, alignment & synthetic data. Building LLMs in general!

Recent Activity

liked a dataset 8 days ago

Solenopsisbot/real-slop

upvoted a paper 9 days ago

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

upvoted a paper 13 days ago

Experiential Reinforcement Learning

View all activity

Organizations

upvoted a paper 9 days ago

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Paper • 2602.17363 • Published 11 days ago • 7

upvoted a paper 13 days ago

Experiential Reinforcement Learning

Paper • 2602.13949 • Published 16 days ago • 68

upvoted a paper about 2 months ago

Distilling Feedback into Memory-as-a-Tool

Paper • 2601.05960 • Published Jan 9 • 3

upvoted an article 2 months ago

Article

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

Dec 17, 2025

•

upvoted a paper 3 months ago

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Paper • 2511.12884 • Published Nov 17, 2025 • 26

upvoted a paper 5 months ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273

upvoted an article 5 months ago

Article

mem-agent: Equipping LLM Agents with Memory Using RL

Oct 9, 2025

•

upvoted 2 papers 6 months ago

DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 298

Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25, 2025 • 45

upvoted 6 papers 7 months ago

Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Paper • 2507.18742 • Published Jul 24, 2025 • 6

upvoted an article 7 months ago

Article

Automated Discovery of High-Performance GPU Kernels with OpenEvolve

Jun 27, 2025

•

upvoted 2 papers 8 months ago

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8, 2025 • 45

Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19, 2025 • 9

upvoted a collection 8 months ago

Configurable Preference Tuning ⚙️📝

Collection

CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts • 7 items • Updated Jun 17, 2025 • 1

upvoted a paper 9 months ago

Configurable Preference Tuning with Rubric-Guided Synthetic Data

Paper • 2506.11702 • Published Jun 13, 2025 • 1

Victor Gallego

AI & ML interests

Recent Activity

Organizations

vicgalle's activity

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

mem-agent: Equipping LLM Agents with Memory Using RL

Automated Discovery of High-Performance GPU Kernels with OpenEvolve