feat: update report link
Browse files- index.html +73 -38
- style.css +1 -0
index.html
CHANGED
@@ -28,7 +28,7 @@
|
|
28 |
Encoder</h4>
|
29 |
<p class="author">
|
30 |
MiniMax Team <span class="date">May 2025</span><br />
|
31 |
-
<a style="font-size: 1.1rem;"
|
32 |
href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report/blob/main/MiniMax_Speech.pdf">[Tech
|
33 |
Report]</a>
|
34 |
</p>
|
@@ -58,7 +58,9 @@
|
|
58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
59 |
voice
|
60 |
cloning (PVC) by fine-tuning timbre features with additional data. We encourage readers to visit
|
61 |
-
<a
|
|
|
|
|
62 |
</p>
|
63 |
</div>
|
64 |
|
@@ -73,7 +75,6 @@
|
|
73 |
<ol>
|
74 |
<li><a href="#showcase-with-high-versatility">Showcase with High Versatility</a></li>
|
75 |
<li><a href="#showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts</a></li>
|
76 |
-
<li><a href="#examples-with-more-possibilities">Examples with More Possibilities</a></li>
|
77 |
</ol>
|
78 |
</li>
|
79 |
<li><a href="#zero-shot-vs-one-shot-demonstrations">Zero-Shot vs. One-Shot Demonstrations</a></li>
|
@@ -158,41 +159,45 @@
|
|
158 |
<audio class="audio-md" src="assets/audios/Warm%20and%20Magnetic.mp3" controls></audio>
|
159 |
</td>
|
160 |
</tr>
|
161 |
-
</tbody>
|
162 |
-
</table>
|
163 |
-
</div>
|
164 |
-
|
165 |
-
<h3 id="showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts, Post-Processing
|
166 |
-
Audio Effects and Added Sound Effects</h3>
|
167 |
-
<div class="scroll-wrapper">
|
168 |
-
<table style="width: 100%;">
|
169 |
-
<tbody>
|
170 |
<tr class="border-bottom-thin">
|
171 |
-
<
|
172 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
173 |
</tr>
|
174 |
<tr class="border-bottom-thin">
|
175 |
<td>
|
176 |
-
A
|
177 |
</td>
|
178 |
<td>
|
179 |
-
<audio class="audio-
|
|
|
|
|
|
|
180 |
</td>
|
181 |
</tr>
|
182 |
<tr class="border-bottom-thin">
|
183 |
<td>
|
184 |
-
|
185 |
</td>
|
186 |
<td>
|
187 |
-
<audio class="audio-
|
|
|
|
|
|
|
188 |
</td>
|
189 |
</tr>
|
190 |
</tbody>
|
191 |
</table>
|
192 |
</div>
|
193 |
|
194 |
-
<h3 id="
|
195 |
-
|
196 |
<div class="scroll-wrapper">
|
197 |
<table style="width: 100%;">
|
198 |
<tbody>
|
@@ -202,26 +207,18 @@
|
|
202 |
</tr>
|
203 |
<tr class="border-bottom-thin">
|
204 |
<td>
|
205 |
-
|
206 |
-
</td>
|
207 |
-
<td>
|
208 |
-
<audio class="audio-lg" src="assets/audios/Breathy%20ASMR.MP3" controls></audio>
|
209 |
-
</td>
|
210 |
-
</tr>
|
211 |
-
<tr class="border-bottom-thin">
|
212 |
-
<td>
|
213 |
-
A Robotic Voice with Rich Bass Resonance and Spatial Presence
|
214 |
</td>
|
215 |
<td>
|
216 |
-
<audio class="audio-lg" src="assets/audios/
|
217 |
</td>
|
218 |
</tr>
|
219 |
<tr class="border-bottom-thin">
|
220 |
<td>
|
221 |
-
|
222 |
</td>
|
223 |
<td>
|
224 |
-
<audio class="audio-lg" src="assets/audios/
|
225 |
</td>
|
226 |
</tr>
|
227 |
</tbody>
|
@@ -885,8 +882,8 @@
|
|
885 |
</tr>
|
886 |
<tr class="border-bottom-thin">
|
887 |
<td>
|
888 |
-
|
889 |
-
|
890 |
在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
|
891 |
</td>
|
892 |
<td>
|
@@ -901,9 +898,9 @@
|
|
901 |
</tr>
|
902 |
<tr class="border-bottom-thin">
|
903 |
<td>
|
904 |
-
|
905 |
-
|
906 |
-
|
907 |
</td>
|
908 |
<td>
|
909 |
亲爱的宝宝们,等了好久的神仙面霜终于到货啦!<br>
|
@@ -929,7 +926,7 @@
|
|
929 |
<audio class="audio-md" src="assets/audios/体育解说男青年.wav" controls></audio>
|
930 |
</td>
|
931 |
</tr>
|
932 |
-
<tr>
|
933 |
<td>
|
934 |
中国女青年的声音,音色清脆,说话速度偏快,语调活泼,<br>
|
935 |
像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,<br>
|
@@ -944,6 +941,44 @@
|
|
944 |
<audio class="audio-md" src="assets/audios/游戏主播女青年.wav" controls></audio>
|
945 |
</td>
|
946 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
947 |
</tbody>
|
948 |
</table>
|
949 |
</div>
|
|
|
28 |
Encoder</h4>
|
29 |
<p class="author">
|
30 |
MiniMax Team <span class="date">May 2025</span><br />
|
31 |
+
<a style="font-size: 1.1rem;" target="_blank"
|
32 |
href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report/blob/main/MiniMax_Speech.pdf">[Tech
|
33 |
Report]</a>
|
34 |
</p>
|
|
|
58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
59 |
voice
|
60 |
cloning (PVC) by fine-tuning timbre features with additional data. We encourage readers to visit
|
61 |
+
<a
|
62 |
+
href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report">https://minimax-ai.github.io/tts_tech_report</a>
|
63 |
+
for more examples.
|
64 |
</p>
|
65 |
</div>
|
66 |
|
|
|
75 |
<ol>
|
76 |
<li><a href="#showcase-with-high-versatility">Showcase with High Versatility</a></li>
|
77 |
<li><a href="#showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts</a></li>
|
|
|
78 |
</ol>
|
79 |
</li>
|
80 |
<li><a href="#zero-shot-vs-one-shot-demonstrations">Zero-Shot vs. One-Shot Demonstrations</a></li>
|
|
|
159 |
<audio class="audio-md" src="assets/audios/Warm%20and%20Magnetic.mp3" controls></audio>
|
160 |
</td>
|
161 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
162 |
<tr class="border-bottom-thin">
|
163 |
+
<td>
|
164 |
+
An ASMR Whispering Voice with Generated Breathing and Sound Effects
|
165 |
+
</td>
|
166 |
+
<td>
|
167 |
+
<audio class="audio-md" src="assets/audios/Breathy%20ASMR_Sourse.wav" controls></audio>
|
168 |
+
</td>
|
169 |
+
<td>
|
170 |
+
<audio class="audio-md" src="assets/audios/Breathy%20ASMR.MP3" controls></audio>
|
171 |
+
</td>
|
172 |
</tr>
|
173 |
<tr class="border-bottom-thin">
|
174 |
<td>
|
175 |
+
A Robotic Voice with Rich Bass Resonance and Spatial Presence
|
176 |
</td>
|
177 |
<td>
|
178 |
+
<audio class="audio-md" src="assets/audios/Lucky%20Robot_Sourse.wav" controls></audio>
|
179 |
+
</td>
|
180 |
+
<td>
|
181 |
+
<audio class="audio-md" src="assets/audios/Lucky%20Robot.mp3" controls></audio>
|
182 |
</td>
|
183 |
</tr>
|
184 |
<tr class="border-bottom-thin">
|
185 |
<td>
|
186 |
+
A Sardonic Mature Female Voice
|
187 |
</td>
|
188 |
<td>
|
189 |
+
<audio class="audio-md" src="assets/audios/Onee-san_Sourse.wav" controls></audio>
|
190 |
+
</td>
|
191 |
+
<td>
|
192 |
+
<audio class="audio-md" src="assets/audios/Onee-san.wav" controls></audio>
|
193 |
</td>
|
194 |
</tr>
|
195 |
</tbody>
|
196 |
</table>
|
197 |
</div>
|
198 |
|
199 |
+
<h3 id="showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts, Post-Processing
|
200 |
+
Audio Effects and Added Sound Effects</h3>
|
201 |
<div class="scroll-wrapper">
|
202 |
<table style="width: 100%;">
|
203 |
<tbody>
|
|
|
207 |
</tr>
|
208 |
<tr class="border-bottom-thin">
|
209 |
<td>
|
210 |
+
A Husky Male Voice: From Soft Murmur to Excitement to Anger, then to Whispers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
211 |
</td>
|
212 |
<td>
|
213 |
+
<audio class="audio-lg" src="assets/audios/Murmur-Excitement-Anger-%20Whispers.MP3" controls></audio>
|
214 |
</td>
|
215 |
</tr>
|
216 |
<tr class="border-bottom-thin">
|
217 |
<td>
|
218 |
+
An Angry Female Voice: From Soft Murmur to Rage to Reminiscence, then to Weeping
|
219 |
</td>
|
220 |
<td>
|
221 |
+
<audio class="audio-lg" src="assets/audios/Neutral-Rage-Reminiscence-Weeping.MP3" controls></audio>
|
222 |
</td>
|
223 |
</tr>
|
224 |
</tbody>
|
|
|
882 |
</tr>
|
883 |
<tr class="border-bottom-thin">
|
884 |
<td>
|
885 |
+
男性中年声音,说中文,音色浑厚醇厚,带有自然的磁性,<br>
|
886 |
+
语速偏慢,音量适中,音调偏低沉。声音整体给人沉稳可靠的感觉,<br>
|
887 |
在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
|
888 |
</td>
|
889 |
<td>
|
|
|
898 |
</tr>
|
899 |
<tr class="border-bottom-thin">
|
900 |
<td>
|
901 |
+
说中文的女青年,音色偏甜美,语速比较快,<br>
|
902 |
+
说话时带着一种轻快的感觉,整体音调较高,像是在直播带货,<br>
|
903 |
+
整体氛围比较活跃,声音清晰,听起来很有亲和力。
|
904 |
</td>
|
905 |
<td>
|
906 |
亲爱的宝宝们,等了好久的神仙面霜终于到货啦!<br>
|
|
|
926 |
<audio class="audio-md" src="assets/audios/体育解说男青年.wav" controls></audio>
|
927 |
</td>
|
928 |
</tr>
|
929 |
+
<tr class="border-bottom-thin">
|
930 |
<td>
|
931 |
中国女青年的声音,音色清脆,说话速度偏快,语调活泼,<br>
|
932 |
像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,<br>
|
|
|
941 |
<audio class="audio-md" src="assets/audios/游戏主播女青年.wav" controls></audio>
|
942 |
</td>
|
943 |
</tr>
|
944 |
+
<tr class="border-bottom-thin">
|
945 |
+
<td>
|
946 |
+
English-speaking female voice, sounding relatively young,<br>
|
947 |
+
with a sweet and pleasant tone. Speaking at a moderate pace<br>
|
948 |
+
with a touch of energy, similar to someone narrating a<br>
|
949 |
+
beauty/makeup tutorial video. The overall atmosphere is<br>
|
950 |
+
relaxed and cheerful.
|
951 |
+
</td>
|
952 |
+
<td>
|
953 |
+
Hi everyone! Today I'll be sharing a soft, romantic<br>
|
954 |
+
makeup look that's perfect for dates. Many of you have <br>
|
955 |
+
been asking how to apply this eyeshadow naturally - the<br>
|
956 |
+
key is using gentle techniques. Let's go through the<br>
|
957 |
+
steps together...
|
958 |
+
</td>
|
959 |
+
<td>
|
960 |
+
<audio class="audio-md" src="assets/audios/美妆女博主.wav" controls></audio>
|
961 |
+
</td>
|
962 |
+
</tr>
|
963 |
+
<tr>
|
964 |
+
<td>
|
965 |
+
English-speaking middle-aged male voice, slightly husky, <br>
|
966 |
+
speaking at a moderate-to-slow pace with a deep tone. Like<br>
|
967 |
+
someone telling an old story, conveying a nostalgic feeling,<br>
|
968 |
+
with a relaxed and composed manner of speaking.
|
969 |
+
</td>
|
970 |
+
<td>
|
971 |
+
That was back in the late 1970s. I remember when our <br>
|
972 |
+
village first got electricity - everyone was so excited. <br>
|
973 |
+
In theevenings, people would bring their stools and <br>
|
974 |
+
gather under the big banyan tree by the village committee <br>
|
975 |
+
office to watch movies projected on the wall. Even now, <br>
|
976 |
+
thinking back to those moments still fills me with warmth.
|
977 |
+
</td>
|
978 |
+
<td>
|
979 |
+
<audio class="audio-md" src="assets/audios/回忆男中年.wav" controls></audio>
|
980 |
+
</td>
|
981 |
+
</tr>
|
982 |
</tbody>
|
983 |
</table>
|
984 |
</div>
|
style.css
CHANGED
@@ -837,5 +837,6 @@ h3,
|
|
837 |
h4,
|
838 |
h5,
|
839 |
h6 {
|
|
|
840 |
margin-bottom: 1rem;
|
841 |
}
|
|
|
837 |
h4,
|
838 |
h5,
|
839 |
h6 {
|
840 |
+
text-align: left;
|
841 |
margin-bottom: 1rem;
|
842 |
}
|