feat: update tech report
Browse files- index.html +15 -27
index.html
CHANGED
@@ -57,10 +57,9 @@
|
|
57 |
control
|
58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
59 |
voice
|
60 |
-
cloning (PVC) by fine-tuning timbre features with additional data.
|
61 |
-
<a
|
62 |
-
|
63 |
-
for more examples.
|
64 |
</p>
|
65 |
</div>
|
66 |
|
@@ -233,23 +232,21 @@
|
|
233 |
features based
|
234 |
on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
|
235 |
rate,
|
236 |
-
emotions, etc.) demonstrated in the audio prompt
|
|
|
|
|
237 |
</p>
|
238 |
<div class="scroll-wrapper" style="margin-top: 2rem;">
|
239 |
<table style="width: 100%;">
|
240 |
<tbody>
|
241 |
<tr class="border-bottom-thin">
|
242 |
<th scope="col">Source Audio</th>
|
243 |
-
<th scope="col">Prompt</th>
|
244 |
<th scope="col">Text</th>
|
245 |
<th scope="col">Zero-Shot Version</th>
|
246 |
<th scope="col">One-Shot Version</th>
|
247 |
<th scope="col">Elevenlabs Multilingual_v2</th>
|
248 |
</tr>
|
249 |
<tr class="border-bottom-thin">
|
250 |
-
<th>
|
251 |
-
<audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Source.WAV" controls></audio>
|
252 |
-
</th>
|
253 |
<td>
|
254 |
<audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Prompt.WAV" controls></audio>
|
255 |
</td>
|
@@ -280,9 +277,6 @@
|
|
280 |
</td>
|
281 |
</tr>
|
282 |
<tr class="border-bottom-thin">
|
283 |
-
<th>
|
284 |
-
<audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Source.WAV" controls></audio>
|
285 |
-
</th>
|
286 |
<td>
|
287 |
<audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Prompt.WAV" controls></audio>
|
288 |
</td>
|
@@ -317,9 +311,6 @@
|
|
317 |
</td>
|
318 |
</tr>
|
319 |
<tr class="border-bottom-thin">
|
320 |
-
<th>
|
321 |
-
<audio class="audio-sm" src="assets/audios/Quirky%20Female%20English.MP3" controls></audio>
|
322 |
-
</th>
|
323 |
<td>
|
324 |
<audio class="audio-sm" src="assets/audios/Quirky%20Female%20English_Prompt.MP3" controls></audio>
|
325 |
</td>
|
@@ -346,9 +337,6 @@
|
|
346 |
</td>
|
347 |
</tr>
|
348 |
<tr>
|
349 |
-
<th>
|
350 |
-
<audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English.MP3" controls></audio>
|
351 |
-
</th>
|
352 |
<td>
|
353 |
<audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English_Prompt.MP3" controls></audio>
|
354 |
</td>
|
@@ -398,7 +386,7 @@
|
|
398 |
<th scope="col">Languages</th>
|
399 |
<th scope="col">Source Audio</th>
|
400 |
<th scope="col">Text</th>
|
401 |
-
<th scope="col">
|
402 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
403 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
404 |
</tr>
|
@@ -519,19 +507,19 @@
|
|
519 |
<tbody>
|
520 |
<tr class="border-bottom-thin">
|
521 |
<th scope="col">Original Language</th>
|
522 |
-
<th scope="col">Mixed Language</th>
|
523 |
<th scope="col">Source Audio</th>
|
|
|
524 |
<th scope="col">Text</th>
|
525 |
-
<th scope="col">
|
526 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
527 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
528 |
</tr>
|
529 |
<tr class="border-bottom-thin">
|
530 |
<td>English</td>
|
531 |
-
<td>English + Mandarin</td>
|
532 |
<td>
|
533 |
<audio class="audio-sm" src="assets/audios/Wong_Sourse.mp3" controls></audio>
|
534 |
</td>
|
|
|
535 |
<td>
|
536 |
Kiddo! Come come come, 学如逆水行舟,不进则退。<br>
|
537 |
I see you're using AI tools already - so smart!<br>
|
@@ -551,10 +539,10 @@
|
|
551 |
</tr>
|
552 |
<tr class="border-bottom-thin">
|
553 |
<td>Mandarin</td>
|
554 |
-
<td>Mandarin + Cantonese</td>
|
555 |
<td>
|
556 |
<audio class="audio-sm" src="assets/audios/ShiBanYu_Sourse.mp3" controls></audio>
|
557 |
</td>
|
|
|
558 |
<td>
|
559 |
老铁啊,多谢晒你送我呢本,广州话正音字典,咁好嘢喎!<br>
|
560 |
我呢个大老爷们儿学广州话真系好难㗎!成日都分唔清声调啊。<br>
|
@@ -572,10 +560,10 @@
|
|
572 |
</tr>
|
573 |
<tr class="border-bottom-thin">
|
574 |
<td>Mandarin</td>
|
575 |
-
<td>Mandarin + English</td>
|
576 |
<td>
|
577 |
<audio class="audio-sm" src="assets/audios/ShuanQ_Sourse.mp3" controls></audio>
|
578 |
</td>
|
|
|
579 |
<td>
|
580 |
The people said, 桂林's scenery is the first under heaven.<br>
|
581 |
Yet in my opinion, 阳朔 scenery is better than ��林。<br>
|
@@ -593,10 +581,10 @@
|
|
593 |
</tr>
|
594 |
<tr class="border-bottom-thin">
|
595 |
<td>English</td>
|
596 |
-
<td>English + Spanish</td>
|
597 |
<td>
|
598 |
<audio class="audio-sm" src="assets/audios/CoCo_Sourse.mp3" controls></audio>
|
599 |
</td>
|
|
|
600 |
<td>
|
601 |
Mi abuelita always told me "el que persevera, alcanza".<br>
|
602 |
If you persevere, you'll achieve your dreams!<br>
|
@@ -614,10 +602,10 @@
|
|
614 |
</tr>
|
615 |
<tr class="border-bottom-thin">
|
616 |
<td>Japanese</td>
|
617 |
-
<td>Japanese + Korean</td>
|
618 |
<td>
|
619 |
<audio class="audio-sm" src="assets/audios/Powerful_Girl_Sourse.mp3" controls></audio>
|
620 |
</td>
|
|
|
621 |
<td>
|
622 |
最近の天気予報によりますと、今週末は桜の開花に最適<br>
|
623 |
な気温になる予定です。<br>
|
@@ -996,7 +984,7 @@
|
|
996 |
<tbody>
|
997 |
<tr class="border-bottom-thin">
|
998 |
<th scope="col">Text</th>
|
999 |
-
<th scope="col" style="text-align: center;">
|
1000 |
<th scope="col" style="text-align: center;">Microsoft<br>Azure TTS</th>
|
1001 |
<th scope="col" style="text-align: center;">AWS<br>Polly</th>
|
1002 |
</tr>
|
|
|
57 |
control
|
58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
59 |
voice
|
60 |
+
cloning (PVC) by fine-tuning timbre features with additional data. Welcome to visit
|
61 |
+
<a href="https://www.minimax.io/audio">MiniMax Audio</a> and
|
62 |
+
explore our powerful TTS features.
|
|
|
63 |
</p>
|
64 |
</div>
|
65 |
|
|
|
232 |
features based
|
233 |
on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
|
234 |
rate,
|
235 |
+
emotions, etc.) demonstrated in the audio prompt (The additional input that OneShot has compared to ZeroShot,
|
236 |
+
see
|
237 |
+
technical report for details).
|
238 |
</p>
|
239 |
<div class="scroll-wrapper" style="margin-top: 2rem;">
|
240 |
<table style="width: 100%;">
|
241 |
<tbody>
|
242 |
<tr class="border-bottom-thin">
|
243 |
<th scope="col">Source Audio</th>
|
|
|
244 |
<th scope="col">Text</th>
|
245 |
<th scope="col">Zero-Shot Version</th>
|
246 |
<th scope="col">One-Shot Version</th>
|
247 |
<th scope="col">Elevenlabs Multilingual_v2</th>
|
248 |
</tr>
|
249 |
<tr class="border-bottom-thin">
|
|
|
|
|
|
|
250 |
<td>
|
251 |
<audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Prompt.WAV" controls></audio>
|
252 |
</td>
|
|
|
277 |
</td>
|
278 |
</tr>
|
279 |
<tr class="border-bottom-thin">
|
|
|
|
|
|
|
280 |
<td>
|
281 |
<audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Prompt.WAV" controls></audio>
|
282 |
</td>
|
|
|
311 |
</td>
|
312 |
</tr>
|
313 |
<tr class="border-bottom-thin">
|
|
|
|
|
|
|
314 |
<td>
|
315 |
<audio class="audio-sm" src="assets/audios/Quirky%20Female%20English_Prompt.MP3" controls></audio>
|
316 |
</td>
|
|
|
337 |
</td>
|
338 |
</tr>
|
339 |
<tr>
|
|
|
|
|
|
|
340 |
<td>
|
341 |
<audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English_Prompt.MP3" controls></audio>
|
342 |
</td>
|
|
|
386 |
<th scope="col">Languages</th>
|
387 |
<th scope="col">Source Audio</th>
|
388 |
<th scope="col">Text</th>
|
389 |
+
<th scope="col">MiniMax<br>Speech_02_HD</th>
|
390 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
391 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
392 |
</tr>
|
|
|
507 |
<tbody>
|
508 |
<tr class="border-bottom-thin">
|
509 |
<th scope="col">Original Language</th>
|
|
|
510 |
<th scope="col">Source Audio</th>
|
511 |
+
<th scope="col">Mixed Language</th>
|
512 |
<th scope="col">Text</th>
|
513 |
+
<th scope="col">MiniMax<br>Speech_02_HD</th>
|
514 |
<th scope="col">ElevenLabs<br>Multilingual_v2</th>
|
515 |
<th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
|
516 |
</tr>
|
517 |
<tr class="border-bottom-thin">
|
518 |
<td>English</td>
|
|
|
519 |
<td>
|
520 |
<audio class="audio-sm" src="assets/audios/Wong_Sourse.mp3" controls></audio>
|
521 |
</td>
|
522 |
+
<td>English + Mandarin</td>
|
523 |
<td>
|
524 |
Kiddo! Come come come, 学如逆水行舟,不进则退。<br>
|
525 |
I see you're using AI tools already - so smart!<br>
|
|
|
539 |
</tr>
|
540 |
<tr class="border-bottom-thin">
|
541 |
<td>Mandarin</td>
|
|
|
542 |
<td>
|
543 |
<audio class="audio-sm" src="assets/audios/ShiBanYu_Sourse.mp3" controls></audio>
|
544 |
</td>
|
545 |
+
<td>Mandarin + Cantonese</td>
|
546 |
<td>
|
547 |
老铁啊,多谢晒你送我呢本,广州话正音字典,咁好嘢喎!<br>
|
548 |
我呢个大老爷们儿学广州话真系好难㗎!成日都分唔清声调啊。<br>
|
|
|
560 |
</tr>
|
561 |
<tr class="border-bottom-thin">
|
562 |
<td>Mandarin</td>
|
|
|
563 |
<td>
|
564 |
<audio class="audio-sm" src="assets/audios/ShuanQ_Sourse.mp3" controls></audio>
|
565 |
</td>
|
566 |
+
<td>Mandarin + English</td>
|
567 |
<td>
|
568 |
The people said, 桂林's scenery is the first under heaven.<br>
|
569 |
Yet in my opinion, 阳朔 scenery is better than ��林。<br>
|
|
|
581 |
</tr>
|
582 |
<tr class="border-bottom-thin">
|
583 |
<td>English</td>
|
|
|
584 |
<td>
|
585 |
<audio class="audio-sm" src="assets/audios/CoCo_Sourse.mp3" controls></audio>
|
586 |
</td>
|
587 |
+
<td>English + Spanish</td>
|
588 |
<td>
|
589 |
Mi abuelita always told me "el que persevera, alcanza".<br>
|
590 |
If you persevere, you'll achieve your dreams!<br>
|
|
|
602 |
</tr>
|
603 |
<tr class="border-bottom-thin">
|
604 |
<td>Japanese</td>
|
|
|
605 |
<td>
|
606 |
<audio class="audio-sm" src="assets/audios/Powerful_Girl_Sourse.mp3" controls></audio>
|
607 |
</td>
|
608 |
+
<td>Japanese + Korean</td>
|
609 |
<td>
|
610 |
最近の天気予報によりますと、今週末は桜の開花に最適<br>
|
611 |
な気温になる予定です。<br>
|
|
|
984 |
<tbody>
|
985 |
<tr class="border-bottom-thin">
|
986 |
<th scope="col">Text</th>
|
987 |
+
<th scope="col" style="text-align: center;">MiniMax<br>Speech_02_HD</th>
|
988 |
<th scope="col" style="text-align: center;">Microsoft<br>Azure TTS</th>
|
989 |
<th scope="col" style="text-align: center;">AWS<br>Polly</th>
|
990 |
</tr>
|