nieshen's picture

8 39 1

nieshen

nieshen

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

upvoted a paper 14 days ago

DEER: Draft with Diffusion, Verify with Autoregressive Models

upvoted a paper 14 days ago

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

View all activity

Organizations

upvoted a paper 2 days ago

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published 3 days ago • 83

upvoted 2 papers 14 days ago

DEER: Draft with Diffusion, Verify with Autoregressive Models

Paper • 2512.15176 • Published 15 days ago • 41

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published 16 days ago • 39

upvoted a paper 16 days ago

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published 17 days ago • 87

upvoted 2 papers 18 days ago

From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs

Paper • 2512.06776 • Published 25 days ago • 24

Learning Unmasking Policies for Diffusion Language Models

Paper • 2512.09106 • Published 23 days ago • 9

upvoted a paper 27 days ago

UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers

Paper • 2512.04504 • Published 28 days ago • 16

upvoted a paper about 1 month ago

UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers

Paper • 2511.20123 • Published Nov 25, 2025 • 17

upvoted 3 papers about 2 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 128

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Paper • 2510.22115 • Published Oct 25, 2025 • 83

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published Oct 29, 2025 • 77

upvoted 3 papers 2 months ago

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Paper • 2510.24824 • Published Oct 28, 2025 • 16

Uniform Discrete Diffusion with Metric Path for Video Generation

Paper • 2510.24717 • Published Oct 28, 2025 • 40

FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published Oct 27, 2025 • 58

updated 2 models 2 months ago

GSAI-ML/LLaDA-8B-Instruct

Text Generation • 8B • Updated Oct 21, 2025 • 214k • 338

GSAI-ML/LLaDA-8B-Base

Text Generation • 8B • Updated Oct 21, 2025 • 170k • 88

upvoted 4 papers 3 months ago

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Paper • 2510.14943 • Published Oct 16, 2025 • 39

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published Oct 7, 2025 • 54

dParallel: Learnable Parallel Decoding for dLLMs

Paper • 2509.26488 • Published Sep 30, 2025 • 19

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70