sriting commited on
Commit
ee03a71
·
1 Parent(s): 31600f6

feat: update tech report

Browse files
Files changed (1) hide show
  1. index.html +15 -27
index.html CHANGED
@@ -57,10 +57,9 @@
57
  control
58
  via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
59
  voice
60
- cloning (PVC) by fine-tuning timbre features with additional data. We encourage readers to visit
61
- <a
62
- href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report">https://minimax-ai.github.io/tts_tech_report</a>
63
- for more examples.
64
  </p>
65
  </div>
66
 
@@ -233,23 +232,21 @@
233
  features based
234
  on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
235
  rate,
236
- emotions, etc.) demonstrated in the audio prompt.
 
 
237
  </p>
238
  <div class="scroll-wrapper" style="margin-top: 2rem;">
239
  <table style="width: 100%;">
240
  <tbody>
241
  <tr class="border-bottom-thin">
242
  <th scope="col">Source Audio</th>
243
- <th scope="col">Prompt</th>
244
  <th scope="col">Text</th>
245
  <th scope="col">Zero-Shot Version</th>
246
  <th scope="col">One-Shot Version</th>
247
  <th scope="col">Elevenlabs Multilingual_v2</th>
248
  </tr>
249
  <tr class="border-bottom-thin">
250
- <th>
251
- <audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Source.WAV" controls></audio>
252
- </th>
253
  <td>
254
  <audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Prompt.WAV" controls></audio>
255
  </td>
@@ -280,9 +277,6 @@
280
  </td>
281
  </tr>
282
  <tr class="border-bottom-thin">
283
- <th>
284
- <audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Source.WAV" controls></audio>
285
- </th>
286
  <td>
287
  <audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Prompt.WAV" controls></audio>
288
  </td>
@@ -317,9 +311,6 @@
317
  </td>
318
  </tr>
319
  <tr class="border-bottom-thin">
320
- <th>
321
- <audio class="audio-sm" src="assets/audios/Quirky%20Female%20English.MP3" controls></audio>
322
- </th>
323
  <td>
324
  <audio class="audio-sm" src="assets/audios/Quirky%20Female%20English_Prompt.MP3" controls></audio>
325
  </td>
@@ -346,9 +337,6 @@
346
  </td>
347
  </tr>
348
  <tr>
349
- <th>
350
- <audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English.MP3" controls></audio>
351
- </th>
352
  <td>
353
  <audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English_Prompt.MP3" controls></audio>
354
  </td>
@@ -398,7 +386,7 @@
398
  <th scope="col">Languages</th>
399
  <th scope="col">Source Audio</th>
400
  <th scope="col">Text</th>
401
- <th scope="col">Minimax<br>Speech_02_HD</th>
402
  <th scope="col">ElevenLabs<br>Multilingual_v2</th>
403
  <th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
404
  </tr>
@@ -519,19 +507,19 @@
519
  <tbody>
520
  <tr class="border-bottom-thin">
521
  <th scope="col">Original Language</th>
522
- <th scope="col">Mixed Language</th>
523
  <th scope="col">Source Audio</th>
 
524
  <th scope="col">Text</th>
525
- <th scope="col">Minimax<br>Speech_02_HD</th>
526
  <th scope="col">ElevenLabs<br>Multilingual_v2</th>
527
  <th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
528
  </tr>
529
  <tr class="border-bottom-thin">
530
  <td>English</td>
531
- <td>English + Mandarin</td>
532
  <td>
533
  <audio class="audio-sm" src="assets/audios/Wong_Sourse.mp3" controls></audio>
534
  </td>
 
535
  <td>
536
  Kiddo! Come come come, 学如逆水行舟,不进则退。<br>
537
  I see you're using AI tools already - so smart!<br>
@@ -551,10 +539,10 @@
551
  </tr>
552
  <tr class="border-bottom-thin">
553
  <td>Mandarin</td>
554
- <td>Mandarin + Cantonese</td>
555
  <td>
556
  <audio class="audio-sm" src="assets/audios/ShiBanYu_Sourse.mp3" controls></audio>
557
  </td>
 
558
  <td>
559
  老铁啊,多谢晒你送我呢本,广州话正音字典,咁好嘢喎!<br>
560
  我呢个大老爷们儿学广州话真系好难㗎!成日都分唔清声调啊。<br>
@@ -572,10 +560,10 @@
572
  </tr>
573
  <tr class="border-bottom-thin">
574
  <td>Mandarin</td>
575
- <td>Mandarin + English</td>
576
  <td>
577
  <audio class="audio-sm" src="assets/audios/ShuanQ_Sourse.mp3" controls></audio>
578
  </td>
 
579
  <td>
580
  The people said, 桂林's scenery is the first under heaven.<br>
581
  Yet in my opinion, 阳朔 scenery is better than ��林。<br>
@@ -593,10 +581,10 @@
593
  </tr>
594
  <tr class="border-bottom-thin">
595
  <td>English</td>
596
- <td>English + Spanish</td>
597
  <td>
598
  <audio class="audio-sm" src="assets/audios/CoCo_Sourse.mp3" controls></audio>
599
  </td>
 
600
  <td>
601
  Mi abuelita always told me "el que persevera, alcanza".<br>
602
  If you persevere, you'll achieve your dreams!<br>
@@ -614,10 +602,10 @@
614
  </tr>
615
  <tr class="border-bottom-thin">
616
  <td>Japanese</td>
617
- <td>Japanese + Korean</td>
618
  <td>
619
  <audio class="audio-sm" src="assets/audios/Powerful_Girl_Sourse.mp3" controls></audio>
620
  </td>
 
621
  <td>
622
  最近の天気予報によりますと、今週末は桜の開花に最適<br>
623
  な気温になる予定です。<br>
@@ -996,7 +984,7 @@
996
  <tbody>
997
  <tr class="border-bottom-thin">
998
  <th scope="col">Text</th>
999
- <th scope="col" style="text-align: center;">Mnimax<br>Speech_02_HD</th>
1000
  <th scope="col" style="text-align: center;">Microsoft<br>Azure TTS</th>
1001
  <th scope="col" style="text-align: center;">AWS<br>Polly</th>
1002
  </tr>
 
57
  control
58
  via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
59
  voice
60
+ cloning (PVC) by fine-tuning timbre features with additional data. Welcome to visit
61
+ <a href="https://www.minimax.io/audio">MiniMax Audio</a> and
62
+ explore our powerful TTS features.
 
63
  </p>
64
  </div>
65
 
 
232
  features based
233
  on the text content, whereas OneShot adheres more strictly to the speaker characteristics (prosody, speech
234
  rate,
235
+ emotions, etc.) demonstrated in the audio prompt (The additional input that OneShot has compared to ZeroShot,
236
+ see
237
+ technical report for details).
238
  </p>
239
  <div class="scroll-wrapper" style="margin-top: 2rem;">
240
  <table style="width: 100%;">
241
  <tbody>
242
  <tr class="border-bottom-thin">
243
  <th scope="col">Source Audio</th>
 
244
  <th scope="col">Text</th>
245
  <th scope="col">Zero-Shot Version</th>
246
  <th scope="col">One-Shot Version</th>
247
  <th scope="col">Elevenlabs Multilingual_v2</th>
248
  </tr>
249
  <tr class="border-bottom-thin">
 
 
 
250
  <td>
251
  <audio class="audio-sm" src="assets/audios/Lyrical%20Cantonese_Prompt.WAV" controls></audio>
252
  </td>
 
277
  </td>
278
  </tr>
279
  <tr class="border-bottom-thin">
 
 
 
280
  <td>
281
  <audio class="audio-sm" src="assets/audios/Breaking%20Down%20Mandarin_Prompt.WAV" controls></audio>
282
  </td>
 
311
  </td>
312
  </tr>
313
  <tr class="border-bottom-thin">
 
 
 
314
  <td>
315
  <audio class="audio-sm" src="assets/audios/Quirky%20Female%20English_Prompt.MP3" controls></audio>
316
  </td>
 
337
  </td>
338
  </tr>
339
  <tr>
 
 
 
340
  <td>
341
  <audio class="audio-sm" src="assets/audios/Neurotic%20Teenage%20English_Prompt.MP3" controls></audio>
342
  </td>
 
386
  <th scope="col">Languages</th>
387
  <th scope="col">Source Audio</th>
388
  <th scope="col">Text</th>
389
+ <th scope="col">MiniMax<br>Speech_02_HD</th>
390
  <th scope="col">ElevenLabs<br>Multilingual_v2</th>
391
  <th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
392
  </tr>
 
507
  <tbody>
508
  <tr class="border-bottom-thin">
509
  <th scope="col">Original Language</th>
 
510
  <th scope="col">Source Audio</th>
511
+ <th scope="col">Mixed Language</th>
512
  <th scope="col">Text</th>
513
+ <th scope="col">MiniMax<br>Speech_02_HD</th>
514
  <th scope="col">ElevenLabs<br>Multilingual_v2</th>
515
  <th scope="col">OpenAI<br>TTS_1_HD<br>(*not cloned voice)</th>
516
  </tr>
517
  <tr class="border-bottom-thin">
518
  <td>English</td>
 
519
  <td>
520
  <audio class="audio-sm" src="assets/audios/Wong_Sourse.mp3" controls></audio>
521
  </td>
522
+ <td>English + Mandarin</td>
523
  <td>
524
  Kiddo! Come come come, 学如逆水行舟,不进则退。<br>
525
  I see you're using AI tools already - so smart!<br>
 
539
  </tr>
540
  <tr class="border-bottom-thin">
541
  <td>Mandarin</td>
 
542
  <td>
543
  <audio class="audio-sm" src="assets/audios/ShiBanYu_Sourse.mp3" controls></audio>
544
  </td>
545
+ <td>Mandarin + Cantonese</td>
546
  <td>
547
  老铁啊,多谢晒你送我呢本,广州话正音字典,咁好嘢喎!<br>
548
  我呢个大老爷们儿学广州话真系好难㗎!成日都分唔清声调啊。<br>
 
560
  </tr>
561
  <tr class="border-bottom-thin">
562
  <td>Mandarin</td>
 
563
  <td>
564
  <audio class="audio-sm" src="assets/audios/ShuanQ_Sourse.mp3" controls></audio>
565
  </td>
566
+ <td>Mandarin + English</td>
567
  <td>
568
  The people said, 桂林's scenery is the first under heaven.<br>
569
  Yet in my opinion, 阳朔 scenery is better than ��林。<br>
 
581
  </tr>
582
  <tr class="border-bottom-thin">
583
  <td>English</td>
 
584
  <td>
585
  <audio class="audio-sm" src="assets/audios/CoCo_Sourse.mp3" controls></audio>
586
  </td>
587
+ <td>English + Spanish</td>
588
  <td>
589
  Mi abuelita always told me "el que persevera, alcanza".<br>
590
  If you persevere, you'll achieve your dreams!<br>
 
602
  </tr>
603
  <tr class="border-bottom-thin">
604
  <td>Japanese</td>
 
605
  <td>
606
  <audio class="audio-sm" src="assets/audios/Powerful_Girl_Sourse.mp3" controls></audio>
607
  </td>
608
+ <td>Japanese + Korean</td>
609
  <td>
610
  最近の天気予報によりますと、今週末は桜の開花に最適<br>
611
  な気温になる予定です。<br>
 
984
  <tbody>
985
  <tr class="border-bottom-thin">
986
  <th scope="col">Text</th>
987
+ <th scope="col" style="text-align: center;">MiniMax<br>Speech_02_HD</th>
988
  <th scope="col" style="text-align: center;">Microsoft<br>Azure TTS</th>
989
  <th scope="col" style="text-align: center;">AWS<br>Polly</th>
990
  </tr>