--- title: Voice Clone TTS emoji: πŸ† colorFrom: green colorTo: purple sdk: gradio sdk_version: 5.41.1 app_file: app.py pinned: true short_description: mcp_server --- Looking at this code, it's a Text-to-Speech (TTS) application using the Zonos model. Let me provide explanations in both English and Korean. ## English Explanation ### Overview This is a Gradio-based web application for the **Zonos Text-to-Speech (TTS) Generator**. Zonos is an advanced TTS model from Zyphra that can generate natural-sounding speech with customizable voice characteristics. ### Key Features 1. **Model Selection** - Two model variants: Transformer and Hybrid - Different models have different conditioning capabilities 2. **Text Input & Language Support** - Supports multiple languages through eSpeak phoneme conversion - Text length limit of 500 characters - Language selection from supported language codes 3. **Voice Customization** - **Speaker Cloning**: Upload audio to clone a specific voice - **Voice Quality Settings**: - DNS-MOS (Voice Quality): 1.0-5.0 scale - Frequency Max: Control the highest frequency in Hz - Voice Clarity: Adjust voice intelligibility - Pitch Variation: Control how much the pitch varies - Speaking Rate: Adjust speech speed 4. **Emotion Control** - 8 emotion sliders: Happiness, Sadness, Disgust, Fear, Surprise, Anger, Other, Neutral - Fine-tune emotional expression in the generated speech 5. **Advanced Generation Parameters** - **Guidance Scale**: Controls how closely the model follows the conditioning - **Min P**: Controls randomness/creativity in generation - **Seed**: For reproducible results - **Prefix Audio**: Continue generation from existing audio 6. **Unconditional Generation** - Toggle specific conditions to let the model generate them automatically - Useful for more creative/varied outputs ### Technical Details - Uses GPU acceleration via CUDA - Implements classifier-free guidance for better control - Supports audio continuation from prefix - Real-time progress tracking during generation ### How to Use 1. Select a model variant 2. Enter your text and choose language 3. (Optional) Upload speaker audio for voice cloning 4. Adjust voice characteristics and emotions 5. Click "Generate Audio" to create speech 6. Download or play the generated audio --- ## ν•œκΈ€ μ„€λͺ… ### κ°œμš” 이것은 **Zonos ν…μŠ€νŠΈ μŒμ„± λ³€ν™˜(TTS) 생성기**λ₯Ό μœ„ν•œ Gradio 기반 μ›Ή μ• ν”Œλ¦¬μΌ€μ΄μ…˜μž…λ‹ˆλ‹€. ZonosλŠ” Zyphraμ—μ„œ κ°œλ°œν•œ κ³ κΈ‰ TTS λͺ¨λΈλ‘œ, μ‚¬μš©μžκ°€ μŒμ„± νŠΉμ„±μ„ μ»€μŠ€ν„°λ§ˆμ΄μ§•ν•˜μ—¬ μžμ—°μŠ€λŸ¬μš΄ μŒμ„±μ„ 생성할 수 μžˆμŠ΅λ‹ˆλ‹€. ### μ£Όμš” κΈ°λŠ₯ 1. **λͺ¨λΈ 선택** - 두 κ°€μ§€ λͺ¨λΈ λ³€ν˜•: Transformer와 Hybrid - 각 λͺ¨λΈλ§ˆλ‹€ λ‹€λ₯Έ 쑰건뢀 κΈ°λŠ₯ 제곡 2. **ν…μŠ€νŠΈ μž…λ ₯ 및 μ–Έμ–΄ 지원** - eSpeak μŒμ†Œ λ³€ν™˜μ„ ν†΅ν•œ λ‹€κ΅­μ–΄ 지원 - ν…μŠ€νŠΈ 길이 μ œν•œ: 500자 - μ§€μ›λ˜λŠ” μ–Έμ–΄ μ½”λ“œ 쀑 선택 κ°€λŠ₯ 3. **μŒμ„± μ»€μŠ€ν„°λ§ˆμ΄μ§•** - **ν™”μž 볡제**: νŠΉμ • μŒμ„±μ„ λ³΅μ œν•˜κΈ° μœ„ν•œ μ˜€λ””μ˜€ μ—…λ‘œλ“œ - **μŒμ„± ν’ˆμ§ˆ μ„€μ •**: - DNS-MOS (μŒμ„± ν’ˆμ§ˆ): 1.0-5.0 척도 - μ΅œλŒ€ 주파수: Hz λ‹¨μœ„λ‘œ 졜고 주파수 μ œμ–΄ - μŒμ„± λͺ…λ£Œλ„: μŒμ„±μ˜ 이해도 μ‘°μ • - μŒλ†’μ΄ λ³€ν™”: μŒλ†’μ΄ λ³€ν™”λŸ‰ μ œμ–΄ - λ°œν™” 속도: μŒμ„± 속도 μ‘°μ • 4. **감정 μ œμ–΄** - 8κ°€μ§€ 감정 μŠ¬λΌμ΄λ”: 행볡, μŠ¬ν””, 혐였, 두렀움, λ†€λžŒ, λΆ„λ…Έ, 기타, 쀑립 - μƒμ„±λœ μŒμ„±μ˜ 감정 ν‘œν˜„μ„ μ„Έλ°€ν•˜κ²Œ μ‘°μ • 5. **κ³ κΈ‰ 생성 λ§€κ°œλ³€μˆ˜** - **κ°€μ΄λ˜μŠ€ μŠ€μΌ€μΌ**: λͺ¨λΈμ΄ 쑰건을 μ–Όλ§ˆλ‚˜ μΆ©μ‹€νžˆ λ”°λ₯Όμ§€ μ œμ–΄ - **Min P**: μƒμ„±μ˜ λ¬΄μž‘μœ„μ„±/μ°½μ˜μ„± μ œμ–΄ - **μ‹œλ“œ**: μž¬ν˜„ κ°€λŠ₯ν•œ κ²°κ³Όλ₯Ό μœ„ν•œ μ„€μ • - **ν”„λ¦¬ν”½μŠ€ μ˜€λ””μ˜€**: κΈ°μ‘΄ μ˜€λ””μ˜€μ—μ„œ μ΄μ–΄μ„œ 생성 6. **무쑰건뢀 생성** - νŠΉμ • 쑰건을 ν† κΈ€ν•˜μ—¬ λͺ¨λΈμ΄ μžλ™μœΌλ‘œ μƒμ„±ν•˜λ„λ‘ μ„€μ • - 더 창의적이고 λ‹€μ–‘ν•œ 좜λ ₯에 유용 ### 기술적 세뢀사항 - CUDAλ₯Ό ν†΅ν•œ GPU 가속 μ‚¬μš© - 더 λ‚˜μ€ μ œμ–΄λ₯Ό μœ„ν•œ classifier-free guidance κ΅¬ν˜„ - ν”„λ¦¬ν”½μŠ€μ—μ„œ μ˜€λ””μ˜€ 연속 생성 지원 - 생성 쀑 μ‹€μ‹œκ°„ μ§„ν–‰ 상황 좔적 ### μ‚¬μš© 방법 1. λͺ¨λΈ λ³€ν˜• 선택 2. ν…μŠ€νŠΈ μž…λ ₯ 및 μ–Έμ–΄ 선택 3. (선택사항) μŒμ„± 볡제λ₯Ό μœ„ν•œ ν™”μž μ˜€λ””μ˜€ μ—…λ‘œλ“œ 4. μŒμ„± νŠΉμ„± 및 감정 μ‘°μ • 5. "Generate Audio" λ²„νŠΌμ„ ν΄λ¦­ν•˜μ—¬ μŒμ„± 생성 6. μƒμ„±λœ μ˜€λ””μ˜€ λ‹€μš΄λ‘œλ“œ λ˜λŠ” μž¬μƒ ### νŠΉλ³„ κΈ°λŠ₯ - **감정 μ„€μ •**: μƒμ„±λœ μŒμ„±μ˜ 감정 톀을 μ„Έλ°€ν•˜κ²Œ μ œμ–΄ - **μŒμ„± ν’ˆμ§ˆ**: DNS-MOS 점수둜 μŒμ„± ν’ˆμ§ˆ μ‘°μ • - **ν™”μž λ…Έμ΄μ¦ˆ 제거**: μ—…λ‘œλ“œλœ ν™”μž μ˜€λ””μ˜€μ˜ λ…Έμ΄μ¦ˆ 제거 μ˜΅μ…˜ - **무쑰건뢀 ν‚€**: νŠΉμ • κΈ°λŠ₯을 μžλ™μœΌλ‘œ μƒμ„±ν•˜λ„λ‘ μ„€μ • 이 μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ€ κ³ ν’ˆμ§ˆ TTS 생성을 μœ„ν•œ κ°•λ ₯ν•˜κ³  μœ μ—°ν•œ λ„κ΅¬λ‘œ, λ‹€μ–‘ν•œ μš©λ„μ˜ μŒμ„± μ½˜ν…μΈ  μ œμž‘μ— ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.