MegaTTS 3 but with voice cloning!
Higgs Audio Demo
Unified MLLM with Text-Aligned Representations
Demo for multimodal understanding and generation
OmniGen2: Unified Image Understanding and Generation.
Demo of Normalized Attention Guidance for FLUX.1-dev
Next-Gen High-Resolution 3D Model Generation
Expressive Zeroshot TTS
Demo for MMaDA: Multimodal Large Diffusion Language Models
Transcribe audio files to text with timestamps
Generate synchronized video from video and audio
A Unified Framework for Image Customization
A Step Towards Music Generation Foundation Model
Generate realistic talking video from an image and audio
F Lite Texture image generator
Generate images from prompts using selected Lora models
Edit an image based on the given instruction.