idea - a Hankto Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Hankto 's Collections

idea

updated about 18 hours ago

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published 20 days ago • 148
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published 25 days ago • 152
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published about 1 month ago • 118
Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 102
Demystifing Video Reasoning

Paper • 2603.16870 • Published 19 days ago • 367
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 24 days ago • 64
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

Paper • 2603.03269 • Published Mar 3 • 62
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

Paper • 2603.06577 • Published about 1 month ago • 47
Online Experiential Learning for Language Models

Paper • 2603.16856 • Published 19 days ago • 57
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising

Paper • 2603.08703 • Published 27 days ago • 32
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

Paper • 2603.09095 • Published 27 days ago • 28
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

Paper • 2603.25702 • Published 10 days ago • 6
MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Paper • 2603.18718 • Published 18 days ago • 10
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

Paper • 2603.13904 • Published 23 days ago • 4
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published 11 days ago • 94
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Paper • 2603.13875 • Published 23 days ago • 34
Dynamic Chunking Diffusion Transformer

Paper • 2603.06351 • Published about 1 month ago • 15
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Paper • 2602.02474 • Published Feb 2 • 62
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 32
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 102
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Paper • 2512.14531 • Published Dec 16, 2025 • 15
MIDUS: Memory-Infused Depth Up-Scaling

Paper • 2512.13751 • Published Dec 15, 2025 • 9
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Paper • 2603.24533 • Published 11 days ago • 46
Understanding the Challenges in Iterative Generative Optimization with LLMs

Paper • 2603.23994 • Published 12 days ago • 26
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published 13 days ago • 131
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 12 days ago • 60
CanViT: Toward Active-Vision Foundation Models

Paper • 2603.22570 • Published 13 days ago • 11
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

Paper • 2603.16932 • Published 23 days ago • 85
Repurposing Geometric Foundation Models for Multi-view Diffusion

Paper • 2603.22275 • Published 13 days ago • 46
Generalized Discrete Diffusion from Snapshots

Paper • 2603.21342 • Published 14 days ago • 11
MemDLM: Memory-Enhanced DLM Training

Paper • 2603.22241 • Published 13 days ago • 3
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Paper • 2603.17024 • Published 19 days ago • 107
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

Paper • 2603.08462 • Published 27 days ago • 21
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection

Paper • 2603.19635 • Published 17 days ago • 11
Teaching an Agent to Sketch One Part at a Time

Paper • 2603.19500 • Published 17 days ago • 5
From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

Paper • 2603.20193 • Published 16 days ago • 1
FASTER: Rethinking Real-Time Flow VLAs

Paper • 2603.19199 • Published 17 days ago • 57
Memento-Skills: Let Agents Design Agents

Paper • 2603.18743 • Published 18 days ago • 56
Efficient Reasoning with Balanced Thinking

Paper • 2603.12372 • Published 24 days ago • 144
Efficient Exploration at Scale

Paper • 2603.17378 • Published 19 days ago • 13
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published 10 days ago • 151
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Paper • 2603.20155 • Published 16 days ago • 8
Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis

Paper • 2602.03279 • Published Feb 3
GEMS: Agent-Native Multimodal Generation with Memory and Skills

Paper • 2603.28088 • Published 7 days ago • 82
AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

Paper • 2603.28696 • Published 6 days ago • 6
Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

Paper • 2603.25926 • Published 10 days ago • 8
Universal YOCO for Efficient Depth Scaling

Paper • 2604.01220 • Published 4 days ago • 13
Adaptive Loops and Memory in Transformers: Think Harder or Know More?

Paper • 2603.08391 • Published 26 days ago
Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design

Paper • 2603.00152 • Published Feb 25 • 2
Remedying Target-Domain Astigmatism for Cross-Domain Few-Shot Object Detection

Paper • 2603.18541 • Published 18 days ago
Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

Paper • 2602.21917 • Published 12 days ago
From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning

Paper • 2603.26597 • Published 10 days ago
Predicting Camera Pose from Perspective Descriptions for Spatial Reasoning

Paper • 2602.06041 • Published Feb 6
UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

Paper • 2604.00590 • Published 5 days ago • 7
Compress, Cross and Scale: Multi-Level Compression Cross Networks for Efficient Scaling in Recommender Systems

Paper • 2602.12041 • Published Feb 12

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs