Audio Conditioned LipSync with Latent Diffusion Models
Part-level image-to-3D generation.
Generate expressive speech from text with emotion control