Spaces:
Configuration error
Configuration error
CosyVoice Version Information
Current Version: v1.0-cosyvoice-300m
Models Installed:
- CosyVoice-300M (Main model)
- CosyVoice-300M-SFT (Supervised Fine-Tuning)
- CosyVoice-300M-direct (Zero-shot inference)
- CosyVoice-ttsfrd (Required resources)
Features:
- Multi-language TTS (Chinese, English, Japanese, Korean)
- Zero-shot voice cloning
- Cross-lingual synthesis
- GPU acceleration with RTX A5000
Performance:
- Generation speed: ~1x real-time
- Model loading: 5-10 seconds
- GPU: RTX A5000 (24GB VRAM)
Known Issues:
- Chinese accent in English/Portuguese synthesis
- Model trained primarily on Chinese data
Next Version:
- CosyVoice2-0.5B (downloading)
- Improved English pronunciation
- Lower latency (150ms)
- 30-50% reduction in pronunciation errors