Collections
Discover the best community collections!
Collections including paper arxiv:2502.04128
-
GaussianSpeech: Audio-Driven Gaussian Avatars
Paper • 2411.18675 • Published -
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
Paper • 2502.04128 • Published • 26 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 23 -
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers
Paper • 2507.12956 • Published • 23
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 51 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 34 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Paper • 2305.06908 • Published • 6 -
CoMoSVC: Consistency Model-based Singing Voice Conversion
Paper • 2401.01792 • Published • 11 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 61 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 33
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 51 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 34 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 22
-
GaussianSpeech: Audio-Driven Gaussian Avatars
Paper • 2411.18675 • Published -
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
Paper • 2502.04128 • Published • 26 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 23 -
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers
Paper • 2507.12956 • Published • 23
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21
-
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Paper • 2305.06908 • Published • 6 -
CoMoSVC: Consistency Model-based Singing Voice Conversion
Paper • 2401.01792 • Published • 11 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 61 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 33