Demo for multimodal embedding models
Swap faces between two images
Edit images by providing prompts and noise settings
Detect objects in images or videos
Generate personalized images with a face preservation
Modify images using text guidance
Create music captions from audio files
Chat with an AI assistant using text and images