ginipick commited on
Commit
46617a5
Β·
verified Β·
1 Parent(s): f01fec2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -2
README.md CHANGED
@@ -4,10 +4,125 @@ emoji: πŸ†
4
  colorFrom: green
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: 5.30.0
8
  app_file: app.py
9
  pinned: true
10
  short_description: mcp_server
11
  ---
 
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  colorFrom: green
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 5.35.0
8
  app_file: app.py
9
  pinned: true
10
  short_description: mcp_server
11
  ---
12
+ Looking at this code, it's a Text-to-Speech (TTS) application using the Zonos model. Let me provide explanations in both English and Korean.
13
 
14
+ ## English Explanation
15
+
16
+ ### Overview
17
+ This is a Gradio-based web application for the **Zonos Text-to-Speech (TTS) Generator**. Zonos is an advanced TTS model from Zyphra that can generate natural-sounding speech with customizable voice characteristics.
18
+
19
+ ### Key Features
20
+
21
+ 1. **Model Selection**
22
+ - Two model variants: Transformer and Hybrid
23
+ - Different models have different conditioning capabilities
24
+
25
+ 2. **Text Input & Language Support**
26
+ - Supports multiple languages through eSpeak phoneme conversion
27
+ - Text length limit of 500 characters
28
+ - Language selection from supported language codes
29
+
30
+ 3. **Voice Customization**
31
+ - **Speaker Cloning**: Upload audio to clone a specific voice
32
+ - **Voice Quality Settings**:
33
+ - DNS-MOS (Voice Quality): 1.0-5.0 scale
34
+ - Frequency Max: Control the highest frequency in Hz
35
+ - Voice Clarity: Adjust voice intelligibility
36
+ - Pitch Variation: Control how much the pitch varies
37
+ - Speaking Rate: Adjust speech speed
38
+
39
+ 4. **Emotion Control**
40
+ - 8 emotion sliders: Happiness, Sadness, Disgust, Fear, Surprise, Anger, Other, Neutral
41
+ - Fine-tune emotional expression in the generated speech
42
+
43
+ 5. **Advanced Generation Parameters**
44
+ - **Guidance Scale**: Controls how closely the model follows the conditioning
45
+ - **Min P**: Controls randomness/creativity in generation
46
+ - **Seed**: For reproducible results
47
+ - **Prefix Audio**: Continue generation from existing audio
48
+
49
+ 6. **Unconditional Generation**
50
+ - Toggle specific conditions to let the model generate them automatically
51
+ - Useful for more creative/varied outputs
52
+
53
+ ### Technical Details
54
+ - Uses GPU acceleration via CUDA
55
+ - Implements classifier-free guidance for better control
56
+ - Supports audio continuation from prefix
57
+ - Real-time progress tracking during generation
58
+
59
+ ### How to Use
60
+ 1. Select a model variant
61
+ 2. Enter your text and choose language
62
+ 3. (Optional) Upload speaker audio for voice cloning
63
+ 4. Adjust voice characteristics and emotions
64
+ 5. Click "Generate Audio" to create speech
65
+ 6. Download or play the generated audio
66
+
67
+ ---
68
+
69
+ ## ν•œκΈ€ μ„€λͺ…
70
+
71
+ ### κ°œμš”
72
+ 이것은 **Zonos ν…μŠ€νŠΈ μŒμ„± λ³€ν™˜(TTS) 생성기**λ₯Ό μœ„ν•œ Gradio 기반 μ›Ή μ• ν”Œλ¦¬μΌ€μ΄μ…˜μž…λ‹ˆλ‹€. ZonosλŠ” Zyphraμ—μ„œ κ°œλ°œν•œ κ³ κΈ‰ TTS λͺ¨λΈλ‘œ, μ‚¬μš©μžκ°€ μŒμ„± νŠΉμ„±μ„ μ»€μŠ€ν„°λ§ˆμ΄μ§•ν•˜μ—¬ μžμ—°μŠ€λŸ¬μš΄ μŒμ„±μ„ 생성할 수 μžˆμŠ΅λ‹ˆλ‹€.
73
+
74
+ ### μ£Όμš” κΈ°λŠ₯
75
+
76
+ 1. **λͺ¨λΈ 선택**
77
+ - 두 κ°€μ§€ λͺ¨λΈ λ³€ν˜•: Transformer와 Hybrid
78
+ - 각 λͺ¨λΈλ§ˆλ‹€ λ‹€λ₯Έ 쑰건뢀 κΈ°λŠ₯ 제곡
79
+
80
+ 2. **ν…μŠ€νŠΈ μž…λ ₯ 및 μ–Έμ–΄ 지원**
81
+ - eSpeak μŒμ†Œ λ³€ν™˜μ„ ν†΅ν•œ λ‹€κ΅­μ–΄ 지원
82
+ - ν…μŠ€νŠΈ 길이 μ œν•œ: 500자
83
+ - μ§€μ›λ˜λŠ” μ–Έμ–΄ μ½”λ“œ 쀑 선택 κ°€λŠ₯
84
+
85
+ 3. **μŒμ„± μ»€μŠ€ν„°λ§ˆμ΄μ§•**
86
+ - **ν™”μž 볡제**: νŠΉμ • μŒμ„±μ„ λ³΅μ œν•˜κΈ° μœ„ν•œ μ˜€λ””μ˜€ μ—…λ‘œλ“œ
87
+ - **μŒμ„± ν’ˆμ§ˆ μ„€μ •**:
88
+ - DNS-MOS (μŒμ„± ν’ˆμ§ˆ): 1.0-5.0 척도
89
+ - μ΅œλŒ€ 주파수: Hz λ‹¨μœ„λ‘œ 졜고 주파수 μ œμ–΄
90
+ - μŒμ„± λͺ…λ£Œλ„: μŒμ„±μ˜ 이해도 μ‘°μ •
91
+ - μŒλ†’μ΄ λ³€ν™”: μŒλ†’μ΄ λ³€ν™”λŸ‰ μ œμ–΄
92
+ - λ°œν™” 속도: μŒμ„± 속도 μ‘°μ •
93
+
94
+ 4. **감정 μ œμ–΄**
95
+ - 8κ°€μ§€ 감정 μŠ¬λΌμ΄λ”: 행볡, μŠ¬ν””, 혐였, 두렀움, λ†€λžŒ, λΆ„λ…Έ, 기타, 쀑립
96
+ - μƒμ„±λœ μŒμ„±μ˜ 감정 ν‘œν˜„μ„ μ„Έλ°€ν•˜κ²Œ μ‘°μ •
97
+
98
+ 5. **κ³ κΈ‰ 생성 λ§€κ°œλ³€μˆ˜**
99
+ - **κ°€μ΄λ˜μŠ€ μŠ€μΌ€μΌ**: λͺ¨λΈμ΄ 쑰건을 μ–Όλ§ˆλ‚˜ μΆ©μ‹€νžˆ λ”°λ₯Όμ§€ μ œμ–΄
100
+ - **Min P**: μƒμ„±μ˜ λ¬΄μž‘μœ„μ„±/μ°½μ˜μ„± μ œμ–΄
101
+ - **μ‹œλ“œ**: μž¬ν˜„ κ°€λŠ₯ν•œ κ²°κ³Όλ₯Ό μœ„ν•œ μ„€μ •
102
+ - **ν”„λ¦¬ν”½μŠ€ μ˜€λ””μ˜€**: κΈ°μ‘΄ μ˜€λ””μ˜€μ—μ„œ μ΄μ–΄μ„œ 생성
103
+
104
+ 6. **무쑰건뢀 생성**
105
+ - νŠΉμ • 쑰건을 ν† κΈ€ν•˜μ—¬ λͺ¨λΈμ΄ μžλ™μœΌλ‘œ μƒμ„±ν•˜λ„λ‘ μ„€μ •
106
+ - 더 창의적이고 λ‹€μ–‘ν•œ 좜λ ₯에 유용
107
+
108
+ ### 기술적 세뢀사항
109
+ - CUDAλ₯Ό ν†΅ν•œ GPU 가속 μ‚¬μš©
110
+ - 더 λ‚˜μ€ μ œμ–΄λ₯Ό μœ„ν•œ classifier-free guidance κ΅¬ν˜„
111
+ - ν”„λ¦¬ν”½μŠ€μ—μ„œ μ˜€λ””μ˜€ 연속 생성 지원
112
+ - 생성 쀑 μ‹€μ‹œκ°„ μ§„ν–‰ 상황 좔적
113
+
114
+ ### μ‚¬μš© 방법
115
+ 1. λͺ¨λΈ λ³€ν˜• 선택
116
+ 2. ν…μŠ€νŠΈ μž…λ ₯ 및 μ–Έμ–΄ 선택
117
+ 3. (선택사항) μŒμ„± 볡제λ₯Ό μœ„ν•œ ν™”μž μ˜€λ””μ˜€ μ—…λ‘œλ“œ
118
+ 4. μŒμ„± νŠΉμ„± 및 감정 μ‘°μ •
119
+ 5. "Generate Audio" λ²„νŠΌμ„ ν΄λ¦­ν•˜μ—¬ μŒμ„± 생성
120
+ 6. μƒμ„±λœ μ˜€λ””μ˜€ λ‹€μš΄λ‘œλ“œ λ˜λŠ” μž¬μƒ
121
+
122
+ ### νŠΉλ³„ κΈ°λŠ₯
123
+ - **감정 μ„€μ •**: μƒμ„±λœ μŒμ„±μ˜ 감정 톀을 μ„Έλ°€ν•˜κ²Œ μ œμ–΄
124
+ - **μŒμ„± ν’ˆμ§ˆ**: DNS-MOS 점수둜 μŒμ„± ν’ˆμ§ˆ μ‘°μ •
125
+ - **ν™”μž λ…Έμ΄μ¦ˆ 제거**: μ—…λ‘œλ“œλœ ν™”μž μ˜€λ””μ˜€μ˜ λ…Έμ΄μ¦ˆ 제거 μ˜΅μ…˜
126
+ - **무쑰건뢀 ν‚€**: νŠΉμ • κΈ°λŠ₯을 μžλ™μœΌλ‘œ μƒμ„±ν•˜λ„λ‘ μ„€μ •
127
+
128
+ 이 μ• ν”Œλ¦¬μΌ€μ΄μ…˜μ€ κ³ ν’ˆμ§ˆ TTS 생성을 μœ„ν•œ κ°•λ ₯ν•˜κ³  μœ μ—°ν•œ λ„κ΅¬λ‘œ, λ‹€μ–‘ν•œ μš©λ„μ˜ μŒμ„± μ½˜ν…μΈ  μ œμž‘μ— ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.