VoiceClone-TTS

Running on Zero

App Files Files Community

ginipick commited on Jul 4

Commit

46617a5

verified ·

1 Parent(s): f01fec2

Update README.md

Browse files

Files changed (1) hide show

README.md +117 -2

README.md CHANGED Viewed

@@ -4,10 +4,125 @@ emoji: 🏆
 colorFrom: green
 colorTo: purple
 sdk: gradio
-sdk_version: 5.30.0
 app_file: app.py
 pinned: true
 short_description: mcp_server
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 colorFrom: green
 colorTo: purple
 sdk: gradio
+sdk_version: 5.35.0
 app_file: app.py
 pinned: true
 short_description: mcp_server
 ---
+Looking at this code, it's a Text-to-Speech (TTS) application using the Zonos model. Let me provide explanations in both English and Korean.
+## English Explanation
+### Overview
+This is a Gradio-based web application for the **Zonos Text-to-Speech (TTS) Generator**. Zonos is an advanced TTS model from Zyphra that can generate natural-sounding speech with customizable voice characteristics.
+### Key Features
+1. **Model Selection**
+   - Two model variants: Transformer and Hybrid
+   - Different models have different conditioning capabilities
+2. **Text Input & Language Support**
+   - Supports multiple languages through eSpeak phoneme conversion
+   - Text length limit of 500 characters
+   - Language selection from supported language codes
+3. **Voice Customization**
+   - **Speaker Cloning**: Upload audio to clone a specific voice
+   - **Voice Quality Settings**:
+     - DNS-MOS (Voice Quality): 1.0-5.0 scale
+     - Frequency Max: Control the highest frequency in Hz
+     - Voice Clarity: Adjust voice intelligibility
+     - Pitch Variation: Control how much the pitch varies
+     - Speaking Rate: Adjust speech speed
+4. **Emotion Control**
+   - 8 emotion sliders: Happiness, Sadness, Disgust, Fear, Surprise, Anger, Other, Neutral
+   - Fine-tune emotional expression in the generated speech
+5. **Advanced Generation Parameters**
+   - **Guidance Scale**: Controls how closely the model follows the conditioning
+   - **Min P**: Controls randomness/creativity in generation
+   - **Seed**: For reproducible results
+   - **Prefix Audio**: Continue generation from existing audio
+6. **Unconditional Generation**
+   - Toggle specific conditions to let the model generate them automatically
+   - Useful for more creative/varied outputs
+### Technical Details
+- Uses GPU acceleration via CUDA
+- Implements classifier-free guidance for better control
+- Supports audio continuation from prefix
+- Real-time progress tracking during generation
+### How to Use
+1. Select a model variant
+2. Enter your text and choose language
+3. (Optional) Upload speaker audio for voice cloning
+4. Adjust voice characteristics and emotions
+5. Click "Generate Audio" to create speech
+6. Download or play the generated audio
+---
+## 한글 설명
+### 개요
+이것은 **Zonos 텍스트 음성 변환(TTS) 생성기**를 위한 Gradio 기반 웹 애플리케이션입니다. Zonos는 Zyphra에서 개발한 고급 TTS 모델로, 사용자가 음성 특성을 커스터마이징하여 자연스러운 음성을 생성할 수 있습니다.
+### 주요 기능
+1. **모델 선택**
+   - 두 가지 모델 변형: Transformer와 Hybrid
+   - 각 모델마다 다른 조건부 기능 제공
+2. **텍스트 입력 및 언어 지원**
+   - eSpeak 음소 변환을 통한 다국어 지원
+   - 텍스트 길이 제한: 500자
+   - 지원되는 언어 코드 중 선택 가능
+3. **음성 커스터마이징**
+   - **화자 복제**: 특정 음성을 복제하기 위한 오디오 업로드
+   - **음성 품질 설정**:
+     - DNS-MOS (음성 품질): 1.0-5.0 척도
+     - 최대 주파수: Hz 단위로 최고 주파수 제어
+     - 음성 명료도: 음성의 이해도 조정
+     - 음높이 변화: 음높이 변화량 제어
+     - 발화 속도: 음성 속도 조정
+4. **감정 제어**
+   - 8가지 감정 슬라이더: 행복, 슬픔, 혐오, 두려움, 놀람, 분노, 기타, 중립
+   - 생성된 음성의 감정 표현을 세밀하게 조정
+5. **고급 생성 매개변수**
+   - **가이던스 스케일**: 모델이 조건을 얼마나 충실히 따를지 제어
+   - **Min P**: 생성의 무작위성/창의성 제어
+   - **시드**: 재현 가능한 결과를 위한 설정
+   - **프리픽스 오디오**: 기존 오디오에서 이어서 생성
+6. **무조건부 생성**
+   - 특정 조건을 토글하여 모델이 자동으로 생성하도록 설정
+   - 더 창의적이고 다양한 출력에 유용
+### 기술적 세부사항
+- CUDA를 통한 GPU 가속 사용
+- 더 나은 제어를 위한 classifier-free guidance 구현
+- 프리픽스에서 오디오 연속 생성 지원
+- 생성 중 실시간 진행 상황 추적
+### 사용 방법
+1. 모델 변형 선택
+2. 텍스트 입력 및 언어 선택
+3. (선택사항) 음성 복제를 위한 화자 오디오 업로드
+4. 음성 특성 및 감정 조정
+5. "Generate Audio" 버튼을 클릭하여 음성 생성
+6. 생성된 오디오 다운로드 또는 재생
+### 특별 기능
+- **감정 설정**: 생성된 음성의 감정 톤을 세밀하게 제어
+- **음성 품질**: DNS-MOS 점수로 음성 품질 조정
+- **화자 노이즈 제거**: 업로드된 화자 오디오의 노이즈 제거 옵션
+- **무조건부 키**: 특정 기능을 자동으로 생성하도록 설정
+이 애플리케이션은 고품질 TTS 생성을 위한 강력하고 유연한 도구로, 다양한 용도의 음성 콘텐츠 제작에 활용할 수 있습니다.