Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.04128

ZhenYe234/hubert_base_general_audio

0.1B • Updated Feb 21 • 81.6k • 2
ZhenYe234/xcodec

Updated Sep 25, 2024 • 1
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 18.7k • 81
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Paper • 2408.17175 • Published Aug 30, 2024 • 4

GaussianSpeech: Audio-Driven Gaussian Avatars

Paper • 2411.18675 • Published Nov 27, 2024
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 26
MOSPA: Human Motion Generation Driven by Spatial Audio

Paper • 2507.11949 • Published 30 days ago • 23
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

Paper • 2507.12956 • Published 28 days ago • 23

about 10 hours ago

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

Audio/Music/Speech/etc.

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 42
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 26

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 116
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 51
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 34
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17 • 22

TTS foundation model compatible with Llama framework (160k hours tokenized speech data released)

HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 18.7k • 81
HKUSTAudio/Llasa-1B

Text-to-Speech • 1B • Updated May 10 • 7.51k • 98
HKUSTAudio/Llasa-3B

Text-to-Speech • 4B • Updated May 10 • 1.45k • 511
HKUSTAudio/Llasa-8B

Text-to-Speech • 9B • Updated Mar 9 • 1.68k • 94

Our AK Daily Papers

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Paper • 2305.06908 • Published May 11, 2023 • 6
CoMoSVC: Consistency Model-based Singing Voice Conversion

Paper • 2401.01792 • Published Jan 3, 2024 • 11
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 61
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 33

ZhenYe234/hubert_base_general_audio

0.1B • Updated Feb 21 • 81.6k • 2
ZhenYe234/xcodec

Updated Sep 25, 2024 • 1
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 18.7k • 81
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Paper • 2408.17175 • Published Aug 30, 2024 • 4

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 116
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 51
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 34
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17 • 22

GaussianSpeech: Audio-Driven Gaussian Avatars

Paper • 2411.18675 • Published Nov 27, 2024
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 26
MOSPA: Human Motion Generation Driven by Spatial Audio

Paper • 2507.11949 • Published 30 days ago • 23
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

Paper • 2507.12956 • Published 28 days ago • 23

TTS foundation model compatible with Llama framework (160k hours tokenized speech data released)

HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 18.7k • 81
HKUSTAudio/Llasa-1B

Text-to-Speech • 1B • Updated May 10 • 7.51k • 98
HKUSTAudio/Llasa-3B

Text-to-Speech • 4B • Updated May 10 • 1.45k • 511
HKUSTAudio/Llasa-8B

Text-to-Speech • 9B • Updated Mar 9 • 1.68k • 94

about 10 hours ago

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

Our AK Daily Papers

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Paper • 2305.06908 • Published May 11, 2023 • 6
CoMoSVC: Consistency Model-based Singing Voice Conversion

Paper • 2401.01792 • Published Jan 3, 2024 • 11
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 61
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 33

Audio/Music/Speech/etc.

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 42
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 26

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs