view article Article Vision Language Model Alignment in TRL ⚡️ By sergiopaniego and 4 others • 8 days ago • 45
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 64
Running 593 593 Kolors Portrait With Flux 🤗 Kolors Portrait to keep face identity developed with Flux
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents Paper • 2507.22827 • Published 15 days ago • 89
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published 21 days ago • 39