VoiceClone / README.md
fantos's picture
Update README.md
0352887 verified
---
title: Voice Clone
emoji: ๐ŸŽฅ
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
short_description: Voice Clone Multilingual TTS
---
## ๐ŸŽ™๏ธ Voice Clone Multilingual TTS: Advanced AI Voice Synthesis and Cloning
### Transform Text to Natural Speech with Custom Voice Cloning
Welcome to **Voice Clone Multilingual TTS**, a cutting-edge text-to-speech system powered by OuteTTS-0.3-1B that offers both high-quality voice synthesis and advanced voice cloning capabilities. Create natural-sounding speech in multiple languages using preset voices or clone any voice from a short audio sample.
### What is Voice Clone Multilingual TTS?
Voice Clone Multilingual TTS is an **advanced AI-powered speech synthesis tool** that converts text into natural-sounding speech with remarkable accuracy. Using the OuteTTS-0.3-1B model with bfloat16 precision, it offers both preset speaker voices and the ability to clone custom voices from reference audio, making it perfect for content creation, accessibility, and creative projects.
### Key Features for Professional Voice Synthesis
- **๐ŸŽญ Voice Cloning**: Clone any voice from 7-10 seconds of reference audio
- **๐ŸŒ Multilingual Support**: Generate speech in multiple languages
- **๐Ÿ‘ฅ Preset Speakers**: Choose from various pre-configured voice profiles
- **๐ŸŽ›๏ธ Fine Control**: Adjust temperature and repetition penalty
- **โšก GPU Acceleration**: Fast generation with CUDA optimization
- **๐ŸŽต Natural Prosody**: Realistic intonation and rhythm
- **๐Ÿ“Š Whisper Integration**: Automatic transcription for voice cloning
- **๐Ÿ’พ WAV Export**: High-quality audio output format
### How It Works
#### **Simple Generation Process**
1. **Enter Text**: Type or paste your text content
2. **Choose Voice**: Select preset speaker or upload reference audio
3. **Adjust Settings**: Fine-tune temperature and penalties
4. **Generate**: Create natural-sounding speech instantly
#### **Voice Cloning Technology**
- Upload 7-10 seconds of clear reference audio
- AI analyzes voice characteristics and patterns
- Applies learned voice profile to new text
- Maintains speaker identity across languages
### Perfect Use Cases
- **Content Creation**: Narration for videos and podcasts
- **Audiobook Production**: Convert books to audio format
- **Language Learning**: Practice pronunciation with native accents
- **Accessibility**: Make written content accessible to all
- **Voice Preservation**: Clone and preserve unique voices
- **Creative Projects**: Character voices for games or animations
- **Business Applications**: Automated customer service voices
- **Personal Use**: Create custom voice assistants
### Advanced Controls
- **Temperature (0.1-1.0)**:
- Lower values: More stable, consistent tone
- Higher values: More expressive, varied intonation
- **Repetition Penalty (0.5-2.0)**: Prevents repetitive patterns
- **Speaker Selection**: Multiple preset voice profiles
- **Reference Audio**: Custom voice cloning input
- **Max Length**: Up to 4096 tokens per generation
### Technical Specifications
- **Model**: OuteAI/OuteTTS-0.3-1B
- **Precision**: bfloat16 for optimal performance
- **Framework**: PyTorch with CUDA support
- **Transcription**: Whisper Turbo for voice analysis
- **Output Format**: WAV audio files
- **GPU Optimization**: Automatic CUDA memory management
- **Interface**: Gradio with responsive design
### Voice Cloning Best Practices
1. **Audio Quality**: Use clear, noise-free recordings
2. **Duration**: Optimal results with 7-10 second samples
3. **Consistency**: Single speaker without background noise
4. **Format**: Support for common audio formats
5. **Content**: Natural speech patterns work best
6. **Language**: Can clone across different languages
### Why Choose Voice Clone Multilingual TTS?
1. **Professional Quality**: Studio-grade voice synthesis
2. **Versatile Options**: Preset voices or custom cloning
3. **Fast Processing**: GPU-accelerated generation
4. **User-Friendly**: Simple interface for all users
5. **Flexible Output**: Adjustable voice characteristics
6. **Free Access**: No subscription or usage limits
### Technical Innovation
- **Advanced Architecture**: State-of-the-art TTS model
- **Memory Efficient**: Automatic CUDA cache management
- **Error Handling**: Robust generation with fallbacks
- **Dynamic Loading**: On-demand model initialization
- **Quality Assurance**: Built-in audio validation
### Start Creating Natural Speech
Transform your text into lifelike speech with professional quality. Whether using preset voices or cloning custom voices, Voice Clone Multilingual TTS provides the tools for exceptional audio content creation.
**Community**: [Discord - Openfree AI](https://discord.gg/openfreeai) | **More AI Tools**: [OpenFree Best AI Services](https://huggingface.co/spaces/openfree/Best-AI)
---
## ๐ŸŽ™๏ธ ์Œ์„ฑ ๋ณต์ œ ๋‹ค๊ตญ์–ด TTS: ๊ณ ๊ธ‰ AI ์Œ์„ฑ ํ•ฉ์„ฑ ๋ฐ ๋ณต์ œ
### ๋งž์ถคํ˜• ์Œ์„ฑ ๋ณต์ œ๋กœ ํ…์ŠคํŠธ๋ฅผ ์ž์—ฐ์Šค๋Ÿฌ์šด ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜
**์Œ์„ฑ ๋ณต์ œ ๋‹ค๊ตญ์–ด TTS**์— ์˜ค์‹  ๊ฒƒ์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ๊ณ ํ’ˆ์งˆ ์Œ์„ฑ ํ•ฉ์„ฑ๊ณผ ๊ณ ๊ธ‰ ์Œ์„ฑ ๋ณต์ œ ๊ธฐ๋Šฅ์„ ๋ชจ๋‘ ์ œ๊ณตํ•˜๋Š” OuteTTS-0.3-1B ๊ธฐ๋ฐ˜์˜ ์ตœ์ฒจ๋‹จ ํ…์ŠคํŠธ ์Œ์„ฑ ๋ณ€ํ™˜ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ์‚ฌ์ „ ์„ค์ •๋œ ์Œ์„ฑ์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์งง์€ ์˜ค๋””์˜ค ์ƒ˜ํ”Œ์—์„œ ์Œ์„ฑ์„ ๋ณต์ œํ•˜์—ฌ ์—ฌ๋Ÿฌ ์–ธ์–ด๋กœ ์ž์—ฐ์Šค๋Ÿฌ์šด ์Œ์„ฑ์„ ์ƒ์„ฑํ•˜์„ธ์š”.
### ์Œ์„ฑ ๋ณต์ œ ๋‹ค๊ตญ์–ด TTS๋ž€?
์Œ์„ฑ ๋ณต์ œ ๋‹ค๊ตญ์–ด TTS๋Š” ํ…์ŠคํŠธ๋ฅผ ๋†€๋ผ์šด ์ •ํ™•๋„๋กœ ์ž์—ฐ์Šค๋Ÿฌ์šด ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” **๊ณ ๊ธ‰ AI ๊ธฐ๋ฐ˜ ์Œ์„ฑ ํ•ฉ์„ฑ ๋„๊ตฌ**์ž…๋‹ˆ๋‹ค. bfloat16 ์ •๋ฐ€๋„์˜ OuteTTS-0.3-1B ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์ „ ์„ค์ •๋œ ํ™”์ž ์Œ์„ฑ๊ณผ ์ฐธ์กฐ ์˜ค๋””์˜ค์—์„œ ์‚ฌ์šฉ์ž ์ •์˜ ์Œ์„ฑ์„ ๋ณต์ œํ•˜๋Š” ๊ธฐ๋Šฅ์„ ๋ชจ๋‘ ์ œ๊ณตํ•˜๋ฏ€๋กœ ์ฝ˜ํ…์ธ  ์ œ์ž‘, ์ ‘๊ทผ์„ฑ ๋ฐ ์ฐฝ์˜์ ์ธ ํ”„๋กœ์ ํŠธ์— ์™„๋ฒฝํ•ฉ๋‹ˆ๋‹ค.
### ์ „๋ฌธ ์Œ์„ฑ ํ•ฉ์„ฑ์„ ์œ„ํ•œ ์ฃผ์š” ๊ธฐ๋Šฅ
- **๐ŸŽญ ์Œ์„ฑ ๋ณต์ œ**: 7-10์ดˆ์˜ ์ฐธ์กฐ ์˜ค๋””์˜ค์—์„œ ๋ชจ๋“  ์Œ์„ฑ ๋ณต์ œ
- **๐ŸŒ ๋‹ค๊ตญ์–ด ์ง€์›**: ์—ฌ๋Ÿฌ ์–ธ์–ด๋กœ ์Œ์„ฑ ์ƒ์„ฑ
- **๐Ÿ‘ฅ ์‚ฌ์ „ ์„ค์ • ํ™”์ž**: ๋‹ค์–‘ํ•œ ์‚ฌ์ „ ๊ตฌ์„ฑ ์Œ์„ฑ ํ”„๋กœํ•„ ์ค‘ ์„ ํƒ
- **๐ŸŽ›๏ธ ์„ธ๋ฐ€ํ•œ ์ œ์–ด**: ์˜จ๋„ ๋ฐ ๋ฐ˜๋ณต ํŽ˜๋„ํ‹ฐ ์กฐ์ •
- **โšก GPU ๊ฐ€์†**: CUDA ์ตœ์ ํ™”๋กœ ๋น ๋ฅธ ์ƒ์„ฑ
- **๐ŸŽต ์ž์—ฐ์Šค๋Ÿฌ์šด ์šด์œจ**: ์‚ฌ์‹ค์ ์ธ ์–ต์–‘๊ณผ ๋ฆฌ๋“ฌ
- **๐Ÿ“Š Whisper ํ†ตํ•ฉ**: ์Œ์„ฑ ๋ณต์ œ๋ฅผ ์œ„ํ•œ ์ž๋™ ์ „์‚ฌ
- **๐Ÿ’พ WAV ๋‚ด๋ณด๋‚ด๊ธฐ**: ๊ณ ํ’ˆ์งˆ ์˜ค๋””์˜ค ์ถœ๋ ฅ ํ˜•์‹
### ์ž‘๋™ ๋ฐฉ์‹
#### **๊ฐ„๋‹จํ•œ ์ƒ์„ฑ ํ”„๋กœ์„ธ์Šค**
1. **ํ…์ŠคํŠธ ์ž…๋ ฅ**: ํ…์ŠคํŠธ ๋‚ด์šฉ ์ž…๋ ฅ ๋˜๋Š” ๋ถ™์—ฌ๋„ฃ๊ธฐ
2. **์Œ์„ฑ ์„ ํƒ**: ์‚ฌ์ „ ์„ค์ • ํ™”์ž ์„ ํƒ ๋˜๋Š” ์ฐธ์กฐ ์˜ค๋””์˜ค ์—…๋กœ๋“œ
3. **์„ค์ • ์กฐ์ •**: ์˜จ๋„ ๋ฐ ํŽ˜๋„ํ‹ฐ ๋ฏธ์„ธ ์กฐ์ •
4. **์ƒ์„ฑ**: ์ฆ‰์‹œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์Œ์„ฑ ์ƒ์„ฑ
#### **์Œ์„ฑ ๋ณต์ œ ๊ธฐ์ˆ **
- 7-10์ดˆ์˜ ๋ช…ํ™•ํ•œ ์ฐธ์กฐ ์˜ค๋””์˜ค ์—…๋กœ๋“œ
- AI๊ฐ€ ์Œ์„ฑ ํŠน์„ฑ๊ณผ ํŒจํ„ด ๋ถ„์„
- ํ•™์Šต๋œ ์Œ์„ฑ ํ”„๋กœํ•„์„ ์ƒˆ ํ…์ŠคํŠธ์— ์ ์šฉ
- ์–ธ์–ด ๊ฐ„ ํ™”์ž ์ •์ฒด์„ฑ ์œ ์ง€
### ์™„๋ฒฝํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€
- **์ฝ˜ํ…์ธ  ์ œ์ž‘**: ๋น„๋””์˜ค ๋ฐ ํŒŸ์บ์ŠคํŠธ์šฉ ๋‚ด๋ ˆ์ด์…˜
- **์˜ค๋””์˜ค๋ถ ์ œ์ž‘**: ์ฑ…์„ ์˜ค๋””์˜ค ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜
- **์–ธ์–ด ํ•™์Šต**: ์›์–ด๋ฏผ ์–ต์–‘์œผ๋กœ ๋ฐœ์Œ ์—ฐ์Šต
- **์ ‘๊ทผ์„ฑ**: ์„œ๋ฉด ์ฝ˜ํ…์ธ ๋ฅผ ๋ชจ๋‘๊ฐ€ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜๊ฒŒ
- **์Œ์„ฑ ๋ณด์กด**: ๊ณ ์œ ํ•œ ์Œ์„ฑ ๋ณต์ œ ๋ฐ ๋ณด์กด
- **์ฐฝ์˜์  ํ”„๋กœ์ ํŠธ**: ๊ฒŒ์ž„์ด๋‚˜ ์• ๋‹ˆ๋ฉ”์ด์…˜์šฉ ์บ๋ฆญํ„ฐ ์Œ์„ฑ
- **๋น„์ฆˆ๋‹ˆ์Šค ์‘์šฉ**: ์ž๋™ํ™”๋œ ๊ณ ๊ฐ ์„œ๋น„์Šค ์Œ์„ฑ
- **๊ฐœ์ธ ์‚ฌ์šฉ**: ๋งž์ถคํ˜• ์Œ์„ฑ ๋น„์„œ ๋งŒ๋“ค๊ธฐ
### ๊ณ ๊ธ‰ ์ œ์–ด
- **์˜จ๋„ (0.1-1.0)**:
- ๋‚ฎ์€ ๊ฐ’: ๋” ์•ˆ์ •์ ์ด๊ณ  ์ผ๊ด€๋œ ํ†ค
- ๋†’์€ ๊ฐ’: ๋” ํ‘œํ˜„๋ ฅ ์žˆ๊ณ  ๋‹ค์–‘ํ•œ ์–ต์–‘
- **๋ฐ˜๋ณต ํŽ˜๋„ํ‹ฐ (0.5-2.0)**: ๋ฐ˜๋ณต ํŒจํ„ด ๋ฐฉ์ง€
- **ํ™”์ž ์„ ํƒ**: ์—ฌ๋Ÿฌ ์‚ฌ์ „ ์„ค์ • ์Œ์„ฑ ํ”„๋กœํ•„
- **์ฐธ์กฐ ์˜ค๋””์˜ค**: ๋งž์ถคํ˜• ์Œ์„ฑ ๋ณต์ œ ์ž…๋ ฅ
- **์ตœ๋Œ€ ๊ธธ์ด**: ์ƒ์„ฑ๋‹น ์ตœ๋Œ€ 4096 ํ† ํฐ
### ๊ธฐ์ˆ  ์‚ฌ์–‘
- **๋ชจ๋ธ**: OuteAI/OuteTTS-0.3-1B
- **์ •๋ฐ€๋„**: ์ตœ์  ์„ฑ๋Šฅ์„ ์œ„ํ•œ bfloat16
- **ํ”„๋ ˆ์ž„์›Œํฌ**: CUDA ์ง€์› PyTorch
- **์ „์‚ฌ**: ์Œ์„ฑ ๋ถ„์„์„ ์œ„ํ•œ Whisper Turbo
- **์ถœ๋ ฅ ํ˜•์‹**: WAV ์˜ค๋””์˜ค ํŒŒ์ผ
- **GPU ์ตœ์ ํ™”**: ์ž๋™ CUDA ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ
- **์ธํ„ฐํŽ˜์ด์Šค**: ๋ฐ˜์‘ํ˜• ๋””์ž์ธ์˜ Gradio
### ์Œ์„ฑ ๋ณต์ œ ๋ชจ๋ฒ” ์‚ฌ๋ก€
1. **์˜ค๋””์˜ค ํ’ˆ์งˆ**: ๋ช…ํ™•ํ•˜๊ณ  ์žก์Œ ์—†๋Š” ๋…น์Œ ์‚ฌ์šฉ
2. **์ง€์† ์‹œ๊ฐ„**: 7-10์ดˆ ์ƒ˜ํ”Œ๋กœ ์ตœ์  ๊ฒฐ๊ณผ
3. **์ผ๊ด€์„ฑ**: ๋ฐฐ๊ฒฝ ์žก์Œ ์—†๋Š” ๋‹จ์ผ ํ™”์ž
4. **ํ˜•์‹**: ์ผ๋ฐ˜์ ์ธ ์˜ค๋””์˜ค ํ˜•์‹ ์ง€์›
5. **์ฝ˜ํ…์ธ **: ์ž์—ฐ์Šค๋Ÿฌ์šด ์Œ์„ฑ ํŒจํ„ด์ด ๊ฐ€์žฅ ํšจ๊ณผ์ 
6. **์–ธ์–ด**: ๋‹ค๋ฅธ ์–ธ์–ด ๊ฐ„ ๋ณต์ œ ๊ฐ€๋Šฅ
### ์Œ์„ฑ ๋ณต์ œ ๋‹ค๊ตญ์–ด TTS๋ฅผ ์„ ํƒํ•ด์•ผ ํ•˜๋Š” ์ด์œ 
1. **์ „๋ฌธ๊ฐ€ ํ’ˆ์งˆ**: ์ŠคํŠœ๋””์˜ค๊ธ‰ ์Œ์„ฑ ํ•ฉ์„ฑ
2. **๋‹ค์–‘ํ•œ ์˜ต์…˜**: ์‚ฌ์ „ ์„ค์ • ์Œ์„ฑ ๋˜๋Š” ๋งž์ถค ๋ณต์ œ
3. **๋น ๋ฅธ ์ฒ˜๋ฆฌ**: GPU ๊ฐ€์† ์ƒ์„ฑ
4. **์‚ฌ์šฉ์ž ์นœํ™”์ **: ๋ชจ๋“  ์‚ฌ์šฉ์ž๋ฅผ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ ์ธํ„ฐํŽ˜์ด์Šค
5. **์œ ์—ฐํ•œ ์ถœ๋ ฅ**: ์กฐ์ • ๊ฐ€๋Šฅํ•œ ์Œ์„ฑ ํŠน์„ฑ
6. **๋ฌด๋ฃŒ ์ ‘๊ทผ**: ๊ตฌ๋…๋ฃŒ๋‚˜ ์‚ฌ์šฉ ์ œํ•œ ์—†์Œ
### ๊ธฐ์ˆ  ํ˜์‹ 
- **๊ณ ๊ธ‰ ์•„ํ‚คํ…์ฒ˜**: ์ตœ์ฒจ๋‹จ TTS ๋ชจ๋ธ
- **๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ**: ์ž๋™ CUDA ์บ์‹œ ๊ด€๋ฆฌ
- **์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ**: ํด๋ฐฑ์ด ์žˆ๋Š” ๊ฐ•๋ ฅํ•œ ์ƒ์„ฑ
- **๋™์  ๋กœ๋”ฉ**: ์˜จ๋””๋งจ๋“œ ๋ชจ๋ธ ์ดˆ๊ธฐํ™”
- **ํ’ˆ์งˆ ๋ณด์ฆ**: ๋‚ด์žฅ ์˜ค๋””์˜ค ๊ฒ€์ฆ
### ์ž์—ฐ์Šค๋Ÿฌ์šด ์Œ์„ฑ ์ƒ์„ฑ ์‹œ์ž‘ํ•˜๊ธฐ
์ „๋ฌธ๊ฐ€ ํ’ˆ์งˆ๋กœ ํ…์ŠคํŠธ๋ฅผ ์ƒ์ƒํ•œ ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜ํ•˜์„ธ์š”. ์‚ฌ์ „ ์„ค์ • ์Œ์„ฑ์„ ์‚ฌ์šฉํ•˜๋“  ๋งž์ถค ์Œ์„ฑ์„ ๋ณต์ œํ•˜๋“ , ์Œ์„ฑ ๋ณต์ œ ๋‹ค๊ตญ์–ด TTS๋Š” ํƒ์›”ํ•œ ์˜ค๋””์˜ค ์ฝ˜ํ…์ธ  ์ œ์ž‘์„ ์œ„ํ•œ ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
**์ปค๋ฎค๋‹ˆํ‹ฐ**: [Discord - Openfree AI](https://discord.gg/openfreeai) | **๋” ๋งŽ์€ AI ๋„๊ตฌ**: [OpenFree Best AI Services](https://huggingface.co/spaces/openfree/Best-AI)