Spaces:
Running
Running
File size: 31,164 Bytes
edc06cb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 |
# VOICEVOX ENGINE
[](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-package.yml)
[](https://github.com/VOICEVOX/voicevox_engine/releases)
[](https://discord.gg/WMwWetrzuh)
[](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/test.yml)
[](https://coveralls.io/github/VOICEVOX/voicevox_engine)
[](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-container.yml)
[](https://hub.docker.com/r/voicevox/voicevox_engine)
[VOICEVOX](https://voicevox.hiroshiba.jp/) ã®ãšã³ãžã³ã§ãã
宿
㯠HTTP ãµãŒããŒãªã®ã§ããªã¯ãšã¹ããéä¿¡ããã°ããã¹ãé³å£°åæã§ããŸãã
ïŒãšãã£ã¿ãŒã¯ [VOICEVOX](https://github.com/VOICEVOX/voicevox/) ã
ã³ã¢ã¯ [VOICEVOX CORE](https://github.com/VOICEVOX/voicevox_core/) ã
å
šäœæ§æã¯ [ãã¡ã](https://github.com/VOICEVOX/voicevox/blob/main/docs/%E5%85%A8%E4%BD%93%E6%A7%8B%E6%88%90.md) ã«è©³çްããããŸããïŒ
## ç®æ¬¡
ç®çã«åãããã¬ã€ãã¯ãã¡ãã§ãã
- [ãŠãŒã¶ãŒã¬ã€ã](#ãŠãŒã¶ãŒã¬ã€ã): é³å£°åæããããæ¹åã
- [è²¢ç®è
ã¬ã€ã](#è²¢ç®è
ã¬ã€ã): ã³ã³ããªãã¥ãŒããããæ¹åã
- [éçºè
ã¬ã€ã](#éçºè
ã¬ã€ã): ã³ãŒããå©çšãããæ¹åã
## ãŠãŒã¶ãŒã¬ã€ã
### ããŠã³ããŒã
[ãã¡ã](https://github.com/VOICEVOX/voicevox_engine/releases/latest)ãã察å¿ãããšã³ãžã³ãããŠã³ããŒãããŠãã ããã
### API ããã¥ã¡ã³ã
[API ããã¥ã¡ã³ã](https://voicevox.github.io/voicevox_engine/api/)ããåç
§ãã ããã
VOICEVOX ãšã³ãžã³ãããã¯ãšãã£ã¿ãèµ·åããç¶æ
ã§ http://127.0.0.1:50021/docs ã«ã¢ã¯ã»ã¹ãããšãèµ·åäžã®ãšã³ãžã³ã®ããã¥ã¡ã³ãã確èªã§ããŸãã
ä»åŸã®æ¹éãªã©ã«ã€ããŠã¯ [VOICEVOX é³å£°åæãšã³ãžã³ãšã®é£æº](./docs/VOICEVOXé³å£°åæãšã³ãžã³ãšã®é£æº.md) ãåèã«ãªããããããŸããã
### Docker ã€ã¡ãŒãž
#### CPU
```bash
docker pull voicevox/voicevox_engine:cpu-ubuntu20.04-latest
docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-ubuntu20.04-latest
```
#### GPU
```bash
docker pull voicevox/voicevox_engine:nvidia-ubuntu20.04-latest
docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-ubuntu20.04-latest
```
##### ãã©ãã«ã·ã¥ãŒãã£ã³ã°
GPU çãå©çšããå Žåãç°å¢ã«ãã£ãŠãšã©ãŒãçºçããããšããããŸãããã®å Žåã`--runtime=nvidia`ã`docker run`ã«ã€ããŠå®è¡ãããšè§£æ±ºã§ããããšããããŸãã
### HTTP ãªã¯ãšã¹ãã§é³å£°åæãããµã³ãã«ã³ãŒã
```bash
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1"\
--get --data-urlencode [email protected] \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
```
çæãããé³å£°ã¯ãµã³ããªã³ã°ã¬ãŒãã 24000Hz ãšå°ãç¹æ®ãªãããé³å£°ãã¬ãŒã€ãŒã«ãã£ãŠã¯åçã§ããªãå ŽåããããŸãã
`speaker` ã«æå®ããå€ã¯ `/speakers` ãšã³ããã€ã³ãã§åŸããã `style_id` ã§ããäºææ§ã®ããã« `speaker` ãšããååã«ãªã£ãŠããŸãã
### é³å£°ã調æŽãããµã³ãã«ã³ãŒã
`/audio_query` ã§åŸãããé³å£°åæçšã®ã¯ãšãªã®ãã©ã¡ãŒã¿ãç·šéããããšã§ãé³å£°ã調æŽã§ããŸãã
äŸãã°ã話éã 1.5 åéã«ããŠã¿ãŸãã
```bash
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
# sed ã䜿çšã㊠speedScale ã®å€ã 1.5 ã«å€æŽ
sed -i -r 's/"speedScale":[0-9.]+/"speedScale":1.5/' query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio_fast.wav
```
### èªã¿æ¹ã AquesTalk é¢šèšæ³ã§ååŸã»ä¿®æ£
#### AquesTalk é¢šèšæ³
<!-- NOTE: ãã®ç¯ã¯éçãªã³ã¯ãšããŠéçšäžãªã®ã§å€æŽããªãæ¹ãè¯ã(voicevox_engine#816) -->
ã**AquesTalk é¢šèšæ³**ãã¯ã«ã¿ã«ããšèšå·ã ãã§èªã¿æ¹ãæå®ããèšæ³ã§ãã[AquesTalk æ¬å®¶ã®èšæ³](https://www.a-quest.com/archive/manual/siyo_onseikigou.pdf)ãšã¯äžéšãç°ãªããŸãã
AquesTalk é¢šèšæ³ã¯æ¬¡ã®ã«ãŒã«ã«åŸããŸãïŒ
- å
šãŠã®ã«ãã¯ã«ã¿ã«ãã§èšè¿°ããã
- ã¢ã¯ã»ã³ãå¥ã¯ `/` ãŸã㯠`ã` ã§åºåãã `ã` ã§åºåã£ãå Žåã«éãç¡é³åºéãæ¿å
¥ãããã
- ã«ãã®æåã« `_` ãå
¥ãããšãã®ã«ãã¯ç¡å£°åããã
- ã¢ã¯ã»ã³ãäœçœ®ã `'` ã§æå®ãããå
šãŠã®ã¢ã¯ã»ã³ãå¥ã«ã¯ã¢ã¯ã»ã³ãäœçœ®ã 1 ã€æå®ããå¿
èŠãããã
- ã¢ã¯ã»ã³ã奿«ã« `ïŒ` (å
šè§)ãå
¥ããããšã«ããçåæã®çºé³ãã§ãã
#### AquesTalk é¢šèšæ³ã®ãµã³ãã«ã³ãŒã
`/audio_query`ã®ã¬ã¹ãã³ã¹ã«ã¯ãšã³ãžã³ã倿ããèªã¿æ¹ã[AquesTalk é¢šèšæ³](#aquestalk-é¢šèšæ³)ã§èšè¿°ãããŸãã
ãããä¿®æ£ããããšã§é³å£°ã®èªã¿ä»®åãã¢ã¯ã»ã³ããå¶åŸ¡ã§ããŸãã
```bash
# èªãŸãããæç« ãutf-8ã§text.txtã«æžãåºã
echo -n "ãã£ãŒãã©ãŒãã³ã°ã¯äžèœè¬ã§ã¯ãããŸãã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
cat query.json | grep -o -E "\"kana\":\".*\""
# çµæ... "kana":"ãã£'ã€ã/ã©'ã¢ãã³ã°ã¯/ãã³ããªã€ã¯ãã¯ã¢ãªãã»'ã³"
# "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³"ãšèªãŸãããã®ã§ã
# is_kana=trueãã€ããŠã€ã³ãããŒã·ã§ã³ãååŸãnewphrases.jsonã«ä¿å
echo -n "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³" > kana.txt
curl -s \
-X POST \
"127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \
--get --data-urlencode [email protected] \
> newphrases.json
# query.jsonã®"accent_phrases"ã®å
容ãnewphrases.jsonã®å
容ã«çœ®ãæãã
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @newquery.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
```
### ãŠãŒã¶ãŒèŸæžæ©èœã«ã€ããŠ
API ãããŠãŒã¶ãŒèŸæžã®åç
§ãåèªã®è¿œå ãç·šéãåé€ãè¡ãããšãã§ããŸãã
#### åç
§
`/user_dict`ã« GET ãªã¯ãšã¹ããæããããšã§ãŠãŒã¶ãŒèŸæžã®äžèЧãååŸããããšãã§ããŸãã
```bash
curl -s -X GET "127.0.0.1:50021/user_dict"
```
#### åèªè¿œå
`/user_dict_word`ã« POST ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã«åèªã远å ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããåèªïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
ã¢ã¯ã»ã³ãæ žäœçœ®ã«ã€ããŠã¯ããã¡ãã®æç« ãåèã«ãªãããšæããŸãã
ãåãšãªã£ãŠããæ°åã®éšåãã¢ã¯ã»ã³ãæ žäœçœ®ã«ãªããŸãã
https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html
æåããå Žåã®è¿ãå€ã¯åèªã«å²ãåœãŠããã UUID ã®æååã«ãªããŸãã
```bash
surface="test"
pronunciation="ãã¹ã"
accent_type="1"
curl -s -X POST "127.0.0.1:50021/user_dict_word" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
```
#### åèªä¿®æ£
`/user_dict_word/{word_uuid}`ã« PUT ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãä¿®æ£ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããã¯ãŒãïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯`204 No Content`ã«ãªããŸãã
```bash
surface="test2"
pronunciation="ãã¹ãããŒ"
accent_type="2"
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
```
#### åèªåé€
`/user_dict_word/{word_uuid}`ã« DELETE ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãåé€ããããšãã§ããŸãã
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯`204 No Content`ã«ãªããŸãã
```bash
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid"
```
#### èŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒã
ãšã³ãžã³ã®[èšå®ããŒãž](http://127.0.0.1:50021/setting)å
ã®ããŠãŒã¶ãŒèŸæžã®ãšã¯ã¹ããŒã&ã€ã³ããŒããç¯ã§ããŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ä»ã«ã API ã§ãŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ã€ã³ããŒãã«ã¯ `POST /import_user_dict`ããšã¯ã¹ããŒãã«ã¯ `GET /user_dict` ãå©çšããŸãã
åŒæ°çã®è©³çŽ°ã¯ API ããã¥ã¡ã³ããã芧ãã ããã
### ããªã»ããæ©èœã«ã€ããŠ
`presets.yaml`ãç·šéããããšã§ãã£ã©ã¯ã¿ãŒã話éãªã©ã®ããªã»ããã䜿ãããšãã§ããŸãã
```bash
echo -n "ããªã»ãããããŸã掻çšããã°ããµãŒãããŒãã£éã§åãèšå®ã䜿ãããšãã§ããŸã" >text.txt
# ããªã»ããæ
å ±ãååŸ
curl -s -X GET "127.0.0.1:50021/presets" > presets.json
preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g')
style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g')
# é³å£°åæçšã®ã¯ãšãªãååŸ
curl -s \
-X POST \
"127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\
--get --data-urlencode [email protected] \
> query.json
# é³å£°åæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=$style_id" \
> audio.wav
```
- `speaker_uuid`ã¯ã`/speakers`ã§ç¢ºèªã§ããŸã
- `id`ã¯éè€ããŠã¯ãããŸãã
- ãšã³ãžã³èµ·ååŸã«ãã¡ã€ã«ãæžãæãããšãšã³ãžã³ã«åæ ãããŸã
### 2 çš®é¡ã®ã¹ã¿ã€ã«ã§ã¢ãŒãã£ã³ã°ãããµã³ãã«ã³ãŒã
`/synthesis_morphing`ã§ã¯ã2 çš®é¡ã®ã¹ã¿ã€ã«ã§ããããåæãããé³å£°ãå
ã«ãã¢ãŒãã£ã³ã°ããé³å£°ãçæããŸãã
```bash
echo -n "ã¢ãŒãã£ã³ã°ãå©çšããããšã§ãïŒçš®é¡ã®å£°ãæ··ããããšãã§ããŸãã" > text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=8"\
--get --data-urlencode [email protected] \
> query.json
# å
ã®ã¹ã¿ã€ã«ã§ã®åæçµæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=8" \
> audio.wav
export MORPH_RATE=0.5
# ã¹ã¿ã€ã«2çš®é¡åã®é³å£°åæ+WORLDã«ããé³å£°åæãå
¥ãããæéãæããã®ã§æ³šæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
export MORPH_RATE=0.9
# queryãbase_speakerãtarget_speakerãåãå Žåã¯ãã£ãã·ã¥ã䜿çšãããããæ¯èŒçé«éã«çæããã
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
```
### ãã£ã©ã¯ã¿ãŒã®è¿œå æ
å ±ãååŸãããµã³ãã«ã³ãŒã
è¿œå æ
å ±ã®äžã® portrait.png ãååŸããã³ãŒãã§ãã
ïŒ[jq](https://stedolan.github.io/jq/)ã䜿çšã㊠json ãããŒã¹ããŠããŸããïŒ
```bash
curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \
| jq -r ".portrait" \
| base64 -d \
> portrait.png
```
### ãã£ã³ã»ã«å¯èœãªé³å£°åæ
`/cancellable_synthesis`ã§ã¯éä¿¡ãåæããå Žåã«å³åº§ã«èšç®ãªãœãŒã¹ãéæŸãããŸãã
(`/synthesis`ã§ã¯éä¿¡ãåæããŠãæåŸãŸã§é³å£°åæã®èšç®ãè¡ãããŸã)
ãã® API ã¯å®éšçæ©èœã§ããããšã³ãžã³èµ·åæã«åŒæ°ã§`--enable_cancellable_synthesis`ãæå®ããªããšæå¹åãããŸããã
é³å£°åæã«å¿
èŠãªãã©ã¡ãŒã¿ã¯`/synthesis`ãšåæ§ã§ãã
### HTTP ãªã¯ãšã¹ãã§æå£°åæãããµã³ãã«ã³ãŒã
```bash
echo -n '{
"notes": [
{ "key": null, "frame_length": 15, "lyric": "" },
{ "key": 60, "frame_length": 45, "lyric": "ã" },
{ "key": 62, "frame_length": 45, "lyric": "ã¬" },
{ "key": 64, "frame_length": 45, "lyric": "ã" },
{ "key": null, "frame_length": 15, "lyric": "" }
]
}' > score.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @score.json \
"127.0.0.1:50021/sing_frame_audio_query?speaker=6000" \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/frame_synthesis?speaker=3001" \
> audio.wav
```
ã¹ã³ã¢ã®`key`㯠MIDI çªå·ã§ãã
`lyric`ã¯æè©ã§ãä»»æã®æååãæå®ã§ããŸããããšã³ãžã³ã«ãã£ãŠã¯ã²ãããªã»ã«ã¿ã«ãïŒã¢ãŒã©ä»¥å€ã®æååã¯ãšã©ãŒã«ãªãããšããããŸãã
ãã¬ãŒã ã¬ãŒãã¯ããã©ã«ãã 93.75Hz ã§ããšã³ãžã³ãããã§ã¹ãã®`frame_rate`ã§ååŸã§ããŸãã
ïŒã€ç®ã®ããŒãã¯ç¡é³ã§ããå¿
èŠããããŸãã
`/sing_frame_audio_query`ã§æå®ã§ãã`speaker`ã¯ã`/singers`ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ã`sing`ã`singing_teacher`ãªã¹ã¿ã€ã«ã®`style_id`ã§ãã
`/frame_synthesis`ã§æå®ã§ãã`speaker`ã¯ã`/singers`ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ã`frame_decode`ã®`style_id`ã§ãã
åŒæ°ã `speaker` ãšããååã«ãªã£ãŠããã®ã¯ãä»ã® API ãšäžè²«æ§ãããããããã§ãã
`/sing_frame_audio_query`ãš`/frame_synthesis`ã«ç°ãªãã¹ã¿ã€ã«ãæå®ããããšãå¯èœã§ãã
### CORS èšå®
VOICEVOX ã§ã¯ã»ãã¥ãªãã£ä¿è·ã®ãã`localhost`ã»`127.0.0.1`ã»`app://`ã»Origin ãªã以å€ã® Origin ãããªã¯ãšã¹ããåãå
¥ããªãããã«ãªã£ãŠããŸãã
ãã®ãããäžéšã®ãµãŒãããŒãã£ã¢ããªããã®ã¬ã¹ãã³ã¹ãåãåããªãå¯èœæ§ããããŸãã
ãããåé¿ããæ¹æ³ãšããŠããšã³ãžã³ããèšå®ã§ãã UI ãçšæããŠããŸãã
#### èšå®æ¹æ³
1. <http://127.0.0.1:50021/setting> ã«ã¢ã¯ã»ã¹ããŸãã
2. å©çšããã¢ããªã«åãããŠèšå®ã倿Žã远å ããŠãã ããã
3. ä¿åãã¿ã³ãæŒããŠã倿Žã確å®ããŠãã ããã
4. èšå®ã®é©çšã«ã¯ãšã³ãžã³ã®åèµ·åãå¿
èŠã§ããå¿
èŠã«å¿ããŠåèµ·åãããŠãã ããã
### ããŒã¿ã倿Žãã API ãç¡å¹åãã
å®è¡æåŒæ°`--disable_mutable_api`ãç°å¢å€æ°`VV_DISABLE_MUTABLE_API=1`ãæå®ããããšã§ããšã³ãžã³ã®èšå®ãèŸæžãªã©ã倿Žãã API ãç¡å¹ã«ã§ããŸãã
### æåã³ãŒã
ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã®æåã³ãŒãã¯ãã¹ãŠ UTF-8 ã§ãã
### ãã®ä»ã®åŒæ°
ãšã³ãžã³èµ·åæã«åŒæ°ãæå®ã§ããŸãã詳ããããšã¯`-h`åŒæ°ã§ãã«ãã確èªããŠãã ããã
```bash
$ python run.py -h
usage: run.py [-h] [--host HOST] [--port PORT] [--use_gpu] [--voicevox_dir VOICEVOX_DIR] [--voicelib_dir VOICELIB_DIR] [--runtime_dir RUNTIME_DIR] [--enable_mock] [--enable_cancellable_synthesis]
[--init_processes INIT_PROCESSES] [--load_all_models] [--cpu_num_threads CPU_NUM_THREADS] [--output_log_utf8] [--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}]
[--allow_origin [ALLOW_ORIGIN ...]] [--setting_file SETTING_FILE] [--preset_file PRESET_FILE] [--disable_mutable_api]
VOICEVOX ã®ãšã³ãžã³ã§ãã
options:
-h, --help show this help message and exit
--host HOST æ¥ç¶ãåãä»ãããã¹ãã¢ãã¬ã¹ã§ãã
--port PORT æ¥ç¶ãåãä»ããããŒãçªå·ã§ãã
--use_gpu GPUã䜿ã£ãŠé³å£°åæããããã«ãªããŸãã
--voicevox_dir VOICEVOX_DIR
VOICEVOXã®ãã£ã¬ã¯ããªãã¹ã§ãã
--voicelib_dir VOICELIB_DIR
VOICEVOX COREã®ãã£ã¬ã¯ããªãã¹ã§ãã
--runtime_dir RUNTIME_DIR
VOICEVOX COREã§äœ¿çšããã©ã€ãã©ãªã®ãã£ã¬ã¯ããªãã¹ã§ãã
--enable_mock VOICEVOX COREã䜿ããã¢ãã¯ã§é³å£°åæãè¡ããŸãã
--enable_cancellable_synthesis
é³å£°åæãéäžã§ãã£ã³ã»ã«ã§ããããã«ãªããŸãã
--init_processes INIT_PROCESSES
cancellable_synthesisæ©èœã®åæåæã«çæããããã»ã¹æ°ã§ãã
--load_all_models èµ·åæã«å
šãŠã®é³å£°åæã¢ãã«ãèªã¿èŸŒã¿ãŸãã
--cpu_num_threads CPU_NUM_THREADS
é³å£°åæãè¡ãã¹ã¬ããæ°ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_CPU_NUM_THREADS ã®å€ã䜿ãããŸããVV_CPU_NUM_THREADS ã空æååã§ãªãæ°å€ã§ããªãå Žåã¯ãšã©ãŒçµäºããŸãã
--output_log_utf8 ãã°åºåãUTF-8ã§ãããªããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_OUTPUT_LOG_UTF8 ã®å€ã䜿ãããŸããVV_OUTPUT_LOG_UTF8 ã®å€ã1ã®å Žåã¯UTF-8ã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç°å¢ã«ãã£ãŠèªåçã«æ±ºå®ãããŸãã
--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}
CORSã®èš±å¯ã¢ãŒããallãŸãã¯localappsãæå®ã§ããŸããallã¯ãã¹ãŠãèš±å¯ããŸããlocalappsã¯ãªãªãžã³éãªãœãŒã¹å
±æããªã·ãŒããapp://.ãšlocalhosté¢é£ã«éå®ããŸãããã®ä»ã®ãªãªãžã³ã¯allow_originãªãã·ã§ã³ã§è¿œå ã§ããŸããããã©ã«ãã¯localappsããã®ãªãã·ã§ã³ã¯--
setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--allow_origin [ALLOW_ORIGIN ...]
èš±å¯ãããªãªãžã³ãæå®ããŸããã¹ããŒã¹ã§åºåãããšã§è€æ°æå®ã§ããŸãããã®ãªãã·ã§ã³ã¯--setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--setting_file SETTING_FILE
èšå®ãã¡ã€ã«ãæå®ã§ããŸãã
--preset_file PRESET_FILE
ããªã»ãããã¡ã€ã«ãæå®ã§ããŸããæå®ããªãå Žåãç°å¢å€æ° VV_PRESET_FILEãå®è¡ãã¡ã€ã«ã®ãã£ã¬ã¯ããªã®presets.yamlãé ã«æ¢ããŸãã
--disable_mutable_api
èŸæžç»é²ãèšå®å€æŽãªã©ããšã³ãžã³ã®éçãªããŒã¿ã倿ŽããAPIãç¡å¹åããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_DISABLE_MUTABLE_API ã®å€ã䜿ãããŸããVV_DISABLE_MUTABLE_API ã®å€ã1ã®å Žåã¯ç¡å¹åã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç¡èŠãããŸãã
```
### ã¢ããããŒã
ãšã³ãžã³ãã£ã¬ã¯ããªå
ã«ãããã¡ã€ã«ãå
šãŠæ¶å»ããæ°ãããã®ã«çœ®ãæããŠãã ããã
## è²¢ç®è
ã¬ã€ã
VOICEVOX ENGINE ã¯çããã®ã³ã³ããªãã¥ãŒã·ã§ã³ããåŸ
ã¡ããŠããŸãïŒ
詳现㯠[CONTRIBUTING.md](./CONTRIBUTING.md) ãã芧ãã ããã
ãŸã [VOICEVOX éå
¬åŒ Discord ãµãŒããŒ](https://discord.gg/WMwWetrzuh)ã«ãŠãéçºã®è°è«ãéè«ãè¡ã£ãŠããŸããæ°è»œã«ãåå ãã ããã
ãªããIssue ã解決ãããã«ãªã¯ãšã¹ããäœæãããéã¯ãå¥ã®æ¹ãšåã Issue ã«åãçµãããšãé¿ãããããIssue åŽã§åãçµã¿å§ããããšãäŒããããæåã« Draft ãã«ãªã¯ãšã¹ããäœæããããšãæšå¥šããŠããŸãã
## éçºè
ã¬ã€ã
### ç°å¢æ§ç¯
`Python 3.11.3` ãçšããŠéçºãããŠããŸãã
ã€ã³ã¹ããŒã«ããã«ã¯ãå OS ããšã® C/C++ ã³ã³ãã€ã©ãCMake ãå¿
èŠã«ãªããŸãã
```bash
# å®è¡ç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements.txt
# éçºç°å¢ã»ãã¹ãç°å¢ã»ãã«ãç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements-dev.txt -r requirements-build.txt
```
### å®è¡
ã³ãã³ãã©ã€ã³åŒæ°ã®è©³çްã¯ä»¥äžã®ã³ãã³ãã§ç¢ºèªããŠãã ããã
```bash
python run.py --help
```
```bash
# 補åç VOICEVOX ã§ãµãŒããŒãèµ·å
VOICEVOX_DIR="C:/path/to/voicevox" # 補åç VOICEVOX ãã£ã¬ã¯ããªã®ãã¹
python run.py --voicevox_dir=$VOICEVOX_DIR
```
<!-- å·®ãæ¿ãå¯èœãªé³å£°ã©ã€ãã©ãªãŸãã¯ãã®ä»æ§ãå
¬éããããã³ã¡ã³ããå€ã
```bash
# é³å£°ã©ã€ãã©ãªãå·®ãæ¿ãã
VOICELIB_DIR="C:/path/to/your/tts-model"
python run.py --voicevox_dir=$VOICEVOX_DIR --voicelib_dir=$VOICELIB_DIR
```
-->
```bash
# ã¢ãã¯ã§ãµãŒããŒèµ·å
python run.py --enable_mock
```
```bash
# ãã°ãUTF8ã«å€æŽ
python run.py --output_log_utf8
# ããã㯠VV_OUTPUT_LOG_UTF8=1 python run.py
```
#### CPU ã¹ã¬ããæ°ãæå®ãã
CPU ã¹ã¬ããæ°ãæªæå®ã®å Žåã¯ãè«çã³ã¢æ°ã®ååã䜿ãããŸããïŒæ®ã©ã® CPU ã§ãããã¯å
šäœã®åŠçèœåã®ååã§ãïŒ
ãã IaaS äžã§å®è¡ããŠããããå°çšãµãŒããŒã§å®è¡ããŠããå Žåãªã©ã
ãšã³ãžã³ã䜿ãåŠçèœåã調ç¯ãããå Žåã¯ãCPU ã¹ã¬ããæ°ãæå®ããããšã§å®çŸã§ããŸãã
- å®è¡æåŒæ°ã§æå®ãã
```bash
python run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4
```
- ç°å¢å€æ°ã§æå®ãã
```bash
export VV_CPU_NUM_THREADS=4
python run.py --voicevox_dir=$VOICEVOX_DIR
```
#### éå»ã®ããŒãžã§ã³ã®ã³ã¢ã䜿ã
VOICEVOX Core 0.5.4 以éã®ã³ã¢ã䜿çšããäºãå¯èœã§ãã
Mac ã§ã® libtorch çã³ã¢ã®ãµããŒãã¯ããŠããŸããã
##### éå»ã®ãã€ããªãæå®ãã
補åç VOICEVOX ãããã¯ã³ã³ãã€ã«æžã¿ãšã³ãžã³ã®ãã£ã¬ã¯ããªã`--voicevox_dir`åŒæ°ã§æå®ãããšããã®ããŒãžã§ã³ã®ã³ã¢ã䜿çšãããŸãã
```bash
python run.py --voicevox_dir="/path/to/voicevox"
```
Mac ã§ã¯ã`DYLD_LIBRARY_PATH`ã®æå®ãå¿
èŠã§ãã
```bash
DYLD_LIBRARY_PATH="/path/to/voicevox" python run.py --voicevox_dir="/path/to/voicevox"
```
##### é³å£°ã©ã€ãã©ãªãçŽæ¥æå®ãã
[VOICEVOX Core ã® zip ãã¡ã€ã«](https://github.com/VOICEVOX/voicevox_core/releases)ãè§£åãããã£ã¬ã¯ããªã`--voicelib_dir`åŒæ°ã§æå®ããŸãã
ãŸããã³ã¢ã®ããŒãžã§ã³ã«åãããŠã[libtorch](https://pytorch.org/)ã[onnxruntime](https://github.com/microsoft/onnxruntime) (å
±æã©ã€ãã©ãª) ã®ãã£ã¬ã¯ããªã`--runtime_dir`åŒæ°ã§æå®ããŸãã
ãã ããã·ã¹ãã ã®æ¢çŽ¢ãã¹äžã« libtorchãonnxruntime ãããå Žåã`--runtime_dir`åŒæ°ã®æå®ã¯äžèŠã§ãã
`--voicelib_dir`åŒæ°ã`--runtime_dir`åŒæ°ã¯è€æ°å䜿çšå¯èœã§ãã
API ãšã³ããã€ã³ãã§ã³ã¢ã®ããŒãžã§ã³ãæå®ããå Žåã¯`core_version`åŒæ°ãæå®ããŠãã ãããïŒæªæå®ã®å Žåã¯ææ°ã®ã³ã¢ã䜿çšãããŸãïŒ
```bash
python run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx"
```
Mac ã§ã¯ã`--runtime_dir`åŒæ°ã®ä»£ããã«`DYLD_LIBRARY_PATH`ã®æå®ãå¿
èŠã§ãã
```bash
DYLD_LIBRARY_PATH="/path/to/onnx" python run.py --voicelib_dir="/path/to/voicevox_core"
```
##### ãŠãŒã¶ãŒãã£ã¬ã¯ããªã«é
眮ãã
以äžã®ãã£ã¬ã¯ããªã«ããé³å£°ã©ã€ãã©ãªã¯èªåã§èªã¿èŸŒãŸããŸãã
- ãã«ãç: `<user_data_dir>/voicevox-engine/core_libraries/`
- Python ç: `<user_data_dir>/voicevox-engine-dev/core_libraries/`
`<user_data_dir>`㯠OS ã«ãã£ãŠç°ãªããŸãã
- Windows: `C:\Users\<username>\AppData\Local\`
- macOS: `/Users/<username>/Library/Application\ Support/`
- Linux: `/home/<username>/.local/share/`
### ãã«ã
`pyinstaller` ãçšããããã±ãŒãžåãš Dockerfile ãçšããã³ã³ããåã«ããããŒã«ã«ã§ãã«ããå¯èœã§ãã
æé ã®è©³çŽ°ã¯ [è²¢ç®è
ã¬ã€ã#ãã«ã](./CONTRIBUTING.md#ãã«ã) ã埡芧ãã ããã
GitHub ãçšããå Žåãfork ãããªããžããªã§ GitHub Actions ã«ãããã«ããå¯èœã§ãã
Actions ã ON ã«ããworkflow_dispatch ã§`build-engine-package.yml`ãèµ·åããã°ãã«ãã§ããŸãã
ææç©ã¯ Release ã«ã¢ããããŒããããŸãã
ãã«ãã«å¿
èŠãª GitHub Actions ã®èšå®ã¯ [è²¢ç®è
ã¬ã€ã#GitHub Actions](./CONTRIBUTING.md#github-actions) ã埡芧ãã ããã
### ãã¹ãã»éçè§£æ
`pytest` ãçšãããã¹ããšåçš®ãªã³ã¿ãŒãçšããéçè§£æãå¯èœã§ãã
æé ã®è©³çŽ°ã¯ [è²¢ç®è
ã¬ã€ã#ãã¹ã](./CONTRIBUTING.md#ãã¹ã), [è²¢ç®è
ã¬ã€ã#éçè§£æ](./CONTRIBUTING.md#éçè§£æ) ã埡芧ãã ããã
### äŸåé¢ä¿
äŸåé¢ä¿ã¯ `poetry` ã§ç®¡çãããŠããŸãããŸããå°å
¥å¯èœãªäŸåã©ã€ãã©ãªã«ã¯ã©ã€ã»ã³ã¹äžã®å¶çŽããããŸãã
詳现㯠[è²¢ç®è
ã¬ã€ã#ããã±ãŒãž](./CONTRIBUTING.md#ããã±ãŒãž) ã埡芧ãã ããã
### ãã«ããšã³ãžã³æ©èœã«é¢ããŠ
VOICEVOX ãšãã£ã¿ãŒã§ã¯ãè€æ°ã®ãšã³ãžã³ãåæã«èµ·åããããšãã§ããŸãã
ãã®æ©èœãå©çšããããšã§ãèªäœã®é³å£°åæãšã³ãžã³ãæ¢åã®é³å£°åæãšã³ãžã³ã VOICEVOX ãšãã£ã¿ãŒäžã§åããããšãå¯èœã§ãã
<img src="./docs/res/ãã«ããšã³ãžã³æŠå¿µå³.svg" width="320">
<details>
#### ãã«ããšã³ãžã³æ©èœã®ä»çµã¿
VOICEVOX API ã«æºæ ããè€æ°ã®ãšã³ãžã³ã® Web API ãããŒããåããŠèµ·åããçµ±äžçã«æ±ãããšã§ãã«ããšã³ãžã³æ©èœãå®çŸããŠããŸãã
ãšãã£ã¿ãŒãããããã®ãšã³ãžã³ãå®è¡ãã€ããªçµç±ã§èµ·åããEngineID ãšçµã³ã€ããŠèšå®ãç¶æ
ãåå¥ç®¡çããŸãã
#### ãã«ããšã³ãžã³æ©èœãžã®å¯Ÿå¿æ¹æ³
VOICEVOX API æºæ ãšã³ãžã³ãèµ·åããå®è¡ãã€ããªãäœãããšã§å¯Ÿå¿ãå¯èœã§ãã
VOICEVOX ENGINE ãªããžããªã fork ããäžéšã®æ©èœãæ¹é ããã®ãç°¡åã§ãã
æ¹é ãã¹ãç¹ã¯ãšã³ãžã³æ
å ±ã»ãã£ã©ã¯ã¿ãŒæ
å ±ã»é³å£°åæã®ïŒç¹ã§ãã
ãšã³ãžã³ã®æ
å ±ã¯ã«ãŒãçŽäžã®ãããã§ã¹ããã¡ã€ã«ïŒ`engine_manifest.json`ïŒã§ç®¡çãããŠããŸãã
ãã®åœ¢åŒã®ãããã§ã¹ããã¡ã€ã«ã¯ VOICEVOX API æºæ ãšã³ãžã³ã«å¿
é ã§ãã
ãããã§ã¹ããã¡ã€ã«å
ã®æ
å ±ãèŠãŠé©å®å€æŽããŠãã ããã
é³å£°åæææ³ã«ãã£ãŠã¯ãäŸãã°ã¢ãŒãã£ã³ã°æ©èœãªã©ãVOICEVOX ãšåãæ©èœãæã€ããšãã§ããªãå ŽåããããŸãã
ãã®å Žåã¯ãããã§ã¹ããã¡ã€ã«å
ã®`supported_features`å
ã®æ
å ±ãé©å®å€æŽããŠãã ããã
ãã£ã©ã¯ã¿ãŒæ
å ±ã¯`resources/character_info`ãã£ã¬ã¯ããªå
ã®ãã¡ã€ã«ã§ç®¡çãããŠããŸãã
ãããŒã®ã¢ã€ã³ã³ãªã©ãçšæãããŠããã®ã§é©å®å€æŽããŠãã ããã
é³å£°åæã¯`voicevox_engine/tts_pipeline/tts_engine.py`ã§è¡ãããŠããŸãã
VOICEVOX API ã§ã®é³å£°åæã¯ããšã³ãžã³åŽã§é³å£°åæçšã®ã¯ãšãª `AudioQuery` ã®åæå€ãäœæããŠãŠãŒã¶ãŒã«è¿ãããŠãŒã¶ãŒãå¿
èŠã«å¿ããŠã¯ãšãªãç·šéããããšããšã³ãžã³ãã¯ãšãªã«åŸã£ãŠé³å£°åæããããšã§å®çŸããŠããŸãã
ã¯ãšãªäœæã¯`/audio_query`ãšã³ããã€ã³ãã§ãé³å£°åæã¯`/synthesis`ãšã³ããã€ã³ãã§è¡ã£ãŠãããæäœãã®ïŒã€ã«å¯Ÿå¿ããã° VOICEVOX API ã«æºæ ããããšã«ãªããŸãã
#### ãã«ããšã³ãžã³æ©èœå¯Ÿå¿ãšã³ãžã³ã®é
åžæ¹æ³
VVPP ãã¡ã€ã«ãšããŠé
åžããã®ãããããã§ãã
VVPP ã¯ãVOICEVOX ãã©ã°ã€ã³ããã±ãŒãžãã®ç¥ã§ãäžèº«ã¯ãã«ããããšã³ãžã³ãªã©ãå«ãã ãã£ã¬ã¯ããªã® Zip ãã¡ã€ã«ã§ãã
æ¡åŒµåã`.vvpp`ã«ãããšãããã«ã¯ãªãã¯ã§ VOICEVOX ãšãã£ã¿ãŒã«ã€ã³ã¹ããŒã«ã§ããŸãã
ãšãã£ã¿ãŒåŽã¯åãåã£ã VVPP ãã¡ã€ã«ãããŒã«ã«ãã£ã¹ã¯äžã« Zip å±éããããšãã«ãŒãã®çŽäžã«ãã`engine_manifest.json`ã«åŸã£ãŠãã¡ã€ã«ãæ¢æ»ããŸãã
VOICEVOX ãšãã£ã¿ãŒã«ããŸãèªã¿èŸŒãŸããããªããšãã¯ããšãã£ã¿ãŒã®ãšã©ãŒãã°ãåç
§ããŠãã ããã
ãŸãã`xxx.vvpp`ã¯åå²ããŠé£çªãä»ãã`xxx.0.vvppp`ãã¡ã€ã«ãšããŠé
åžããããšãå¯èœã§ãã
ããã¯ãã¡ã€ã«å®¹éã倧ãããŠé
åžãå°é£ãªå Žåã«æçšã§ãã
</details>
## äºäŸç޹ä»
**[voicevox-client](https://github.com/voicevox-client) [@voicevox-client](https://github.com/voicevox-client)**  VOICEVOX ENGINE ã®åèšèªåã API ã©ãããŒ
## ã©ã€ã»ã³ã¹
LGPL v3 ãšããœãŒã¹ã³ãŒãã®å
¬éãäžèŠãªå¥ã©ã€ã»ã³ã¹ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã§ãã
å¥ã©ã€ã»ã³ã¹ãååŸãããå Žåã¯ãããã«æ±ããŠãã ããã
X ã¢ã«ãŠã³ã: [@hiho_karuta](https://x.com/hiho_karuta)
|