Spaces:
Running
Running
File size: 31,164 Bytes
edc06cb |
|
# VOICEVOX ENGINE
[](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-package.yml)
[](https://github.com/VOICEVOX/voicevox_engine/releases)
[](https://discord.gg/WMwWetrzuh)
[](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/test.yml)
[](https://coveralls.io/github/VOICEVOX/voicevox_engine)
[](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-engine-container.yml)
[](https://hub.docker.com/r/voicevox/voicevox_engine)
[VOICEVOX](https://voicevox.hiroshiba.jp/) ã®ãšã³ãžã³ã§ãã
宿
㯠HTTP ãµãŒããŒãªã®ã§ããªã¯ãšã¹ããéä¿¡ããã°ããã¹ãé³å£°åæã§ããŸãã
ïŒãšãã£ã¿ãŒã¯ [VOICEVOX](https://github.com/VOICEVOX/voicevox/) ã
ã³ã¢ã¯ [VOICEVOX CORE](https://github.com/VOICEVOX/voicevox_core/) ã
å
šäœæ§æã¯ [ãã¡ã](https://github.com/VOICEVOX/voicevox/blob/main/docs/%E5%85%A8%E4%BD%93%E6%A7%8B%E6%88%90.md) ã«è©³çްããããŸããïŒ
## ç®æ¬¡
ç®çã«åãããã¬ã€ãã¯ãã¡ãã§ãã
- [ãŠãŒã¶ãŒã¬ã€ã](#ãŠãŒã¶ãŒã¬ã€ã): é³å£°åæããããæ¹åã
- [è²¢ç®è
ã¬ã€ã](#è²¢ç®è
ã¬ã€ã): ã³ã³ããªãã¥ãŒããããæ¹åã
- [éçºè
ã¬ã€ã](#éçºè
ã¬ã€ã): ã³ãŒããå©çšãããæ¹åã
## ãŠãŒã¶ãŒã¬ã€ã
### ããŠã³ããŒã
[ãã¡ã](https://github.com/VOICEVOX/voicevox_engine/releases/latest)ãã察å¿ãããšã³ãžã³ãããŠã³ããŒãããŠãã ããã
### API ããã¥ã¡ã³ã
[API ããã¥ã¡ã³ã](https://voicevox.github.io/voicevox_engine/api/)ããåç
§ãã ããã
VOICEVOX ãšã³ãžã³ãããã¯ãšãã£ã¿ãèµ·åããç¶æ
ã§ http://127.0.0.1:50021/docs ã«ã¢ã¯ã»ã¹ãããšãèµ·åäžã®ãšã³ãžã³ã®ããã¥ã¡ã³ãã確èªã§ããŸãã
ä»åŸã®æ¹éãªã©ã«ã€ããŠã¯ [VOICEVOX é³å£°åæãšã³ãžã³ãšã®é£æº](./docs/VOICEVOXé³å£°åæãšã³ãžã³ãšã®é£æº.md) ãåèã«ãªããããããŸããã
### Docker ã€ã¡ãŒãž
#### CPU
```bash
docker pull voicevox/voicevox_engine:cpu-ubuntu20.04-latest
docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-ubuntu20.04-latest
```
#### GPU
```bash
docker pull voicevox/voicevox_engine:nvidia-ubuntu20.04-latest
docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-ubuntu20.04-latest
```
##### ãã©ãã«ã·ã¥ãŒãã£ã³ã°
GPU çãå©çšããå Žåãç°å¢ã«ãã£ãŠãšã©ãŒãçºçããããšããããŸãããã®å Žåã`--runtime=nvidia`ã`docker run`ã«ã€ããŠå®è¡ãããšè§£æ±ºã§ããããšããããŸãã
### HTTP ãªã¯ãšã¹ãã§é³å£°åæãããµã³ãã«ã³ãŒã
```bash
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1"\
--get --data-urlencode [email protected] \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
```
çæãããé³å£°ã¯ãµã³ããªã³ã°ã¬ãŒãã 24000Hz ãšå°ãç¹æ®ãªãããé³å£°ãã¬ãŒã€ãŒã«ãã£ãŠã¯åçã§ããªãå ŽåããããŸãã
`speaker` ã«æå®ããå€ã¯ `/speakers` ãšã³ããã€ã³ãã§åŸããã `style_id` ã§ããäºææ§ã®ããã« `speaker` ãšããååã«ãªã£ãŠããŸãã
### é³å£°ã調æŽãããµã³ãã«ã³ãŒã
`/audio_query` ã§åŸãããé³å£°åæçšã®ã¯ãšãªã®ãã©ã¡ãŒã¿ãç·šéããããšã§ãé³å£°ã調æŽã§ããŸãã
äŸãã°ã話éã 1.5 åéã«ããŠã¿ãŸãã
```bash
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
# sed ã䜿çšã㊠speedScale ã®å€ã 1.5 ã«å€æŽ
sed -i -r 's/"speedScale":[0-9.]+/"speedScale":1.5/' query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio_fast.wav
```
### èªã¿æ¹ã AquesTalk é¢šèšæ³ã§ååŸã»ä¿®æ£
#### AquesTalk é¢šèšæ³
<!-- NOTE: ãã®ç¯ã¯éçãªã³ã¯ãšããŠéçšäžãªã®ã§å€æŽããªãæ¹ãè¯ã(voicevox_engine#816) -->
ã**AquesTalk é¢šèšæ³**ãã¯ã«ã¿ã«ããšèšå·ã ãã§èªã¿æ¹ãæå®ããèšæ³ã§ãã[AquesTalk æ¬å®¶ã®èšæ³](https://www.a-quest.com/archive/manual/siyo_onseikigou.pdf)ãšã¯äžéšãç°ãªããŸãã
AquesTalk é¢šèšæ³ã¯æ¬¡ã®ã«ãŒã«ã«åŸããŸãïŒ
- å
šãŠã®ã«ãã¯ã«ã¿ã«ãã§èšè¿°ããã
- ã¢ã¯ã»ã³ãå¥ã¯ `/` ãŸã㯠`ã` ã§åºåãã `ã` ã§åºåã£ãå Žåã«éãç¡é³åºéãæ¿å
¥ãããã
- ã«ãã®æåã« `_` ãå
¥ãããšãã®ã«ãã¯ç¡å£°åããã
- ã¢ã¯ã»ã³ãäœçœ®ã `'` ã§æå®ãããå
šãŠã®ã¢ã¯ã»ã³ãå¥ã«ã¯ã¢ã¯ã»ã³ãäœçœ®ã 1 ã€æå®ããå¿
èŠãããã
- ã¢ã¯ã»ã³ã奿«ã« `ïŒ` (å
šè§)ãå
¥ããããšã«ããçåæã®çºé³ãã§ãã
#### AquesTalk é¢šèšæ³ã®ãµã³ãã«ã³ãŒã
`/audio_query`ã®ã¬ã¹ãã³ã¹ã«ã¯ãšã³ãžã³ã倿ããèªã¿æ¹ã[AquesTalk é¢šèšæ³](#aquestalk-é¢šèšæ³)ã§èšè¿°ãããŸãã
ãããä¿®æ£ããããšã§é³å£°ã®èªã¿ä»®åãã¢ã¯ã»ã³ããå¶åŸ¡ã§ããŸãã
```bash
# èªãŸãããæç« ãutf-8ã§text.txtã«æžãåºã
echo -n "ãã£ãŒãã©ãŒãã³ã°ã¯äžèœè¬ã§ã¯ãããŸãã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
cat query.json | grep -o -E "\"kana\":\".*\""
# çµæ... "kana":"ãã£'ã€ã/ã©'ã¢ãã³ã°ã¯/ãã³ããªã€ã¯ãã¯ã¢ãªãã»'ã³"
# "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³"ãšèªãŸãããã®ã§ã
# is_kana=trueãã€ããŠã€ã³ãããŒã·ã§ã³ãååŸãnewphrases.jsonã«ä¿å
echo -n "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³" > kana.txt
curl -s \
-X POST \
"127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \
--get --data-urlencode [email protected] \
> newphrases.json
# query.jsonã®"accent_phrases"ã®å
容ãnewphrases.jsonã®å
容ã«çœ®ãæãã
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @newquery.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
```
### ãŠãŒã¶ãŒèŸæžæ©èœã«ã€ããŠ
API ãããŠãŒã¶ãŒèŸæžã®åç
§ãåèªã®è¿œå ãç·šéãåé€ãè¡ãããšãã§ããŸãã
#### åç
§
`/user_dict`ã« GET ãªã¯ãšã¹ããæããããšã§ãŠãŒã¶ãŒèŸæžã®äžèЧãååŸããããšãã§ããŸãã
```bash
curl -s -X GET "127.0.0.1:50021/user_dict"
```
#### åèªè¿œå
`/user_dict_word`ã« POST ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã«åèªã远å ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããåèªïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
ã¢ã¯ã»ã³ãæ žäœçœ®ã«ã€ããŠã¯ããã¡ãã®æç« ãåèã«ãªãããšæããŸãã
ãåãšãªã£ãŠããæ°åã®éšåãã¢ã¯ã»ã³ãæ žäœçœ®ã«ãªããŸãã
https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html
æåããå Žåã®è¿ãå€ã¯åèªã«å²ãåœãŠããã UUID ã®æååã«ãªããŸãã
```bash
surface="test"
pronunciation="ãã¹ã"
accent_type="1"
curl -s -X POST "127.0.0.1:50021/user_dict_word" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
```
#### åèªä¿®æ£
`/user_dict_word/{word_uuid}`ã« PUT ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãä¿®æ£ããããšãã§ããŸãã
URL ãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããã¯ãŒãïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯`204 No Content`ã«ãªããŸãã
```bash
surface="test2"
pronunciation="ãã¹ãããŒ"
accent_type="2"
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
```
#### åèªåé€
`/user_dict_word/{word_uuid}`ã« DELETE ãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãåé€ããããšãã§ããŸãã
word_uuid ã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯`204 No Content`ã«ãªããŸãã
```bash
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid"
```
#### èŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒã
ãšã³ãžã³ã®[èšå®ããŒãž](http://127.0.0.1:50021/setting)å
ã®ããŠãŒã¶ãŒèŸæžã®ãšã¯ã¹ããŒã&ã€ã³ããŒããç¯ã§ããŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ä»ã«ã API ã§ãŠãŒã¶ãŒèŸæžã®ã€ã³ããŒã&ãšã¯ã¹ããŒããå¯èœã§ãã
ã€ã³ããŒãã«ã¯ `POST /import_user_dict`ããšã¯ã¹ããŒãã«ã¯ `GET /user_dict` ãå©çšããŸãã
åŒæ°çã®è©³çŽ°ã¯ API ããã¥ã¡ã³ããã芧ãã ããã
### ããªã»ããæ©èœã«ã€ããŠ
`presets.yaml`ãç·šéããããšã§ãã£ã©ã¯ã¿ãŒã話éãªã©ã®ããªã»ããã䜿ãããšãã§ããŸãã
```bash
echo -n "ããªã»ãããããŸã掻çšããã°ããµãŒãããŒãã£éã§åãèšå®ã䜿ãããšãã§ããŸã" >text.txt
# ããªã»ããæ
å ±ãååŸ
curl -s -X GET "127.0.0.1:50021/presets" > presets.json
preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g')
style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g')
# é³å£°åæçšã®ã¯ãšãªãååŸ
curl -s \
-X POST \
"127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\
--get --data-urlencode [email protected] \
> query.json
# é³å£°åæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=$style_id" \
> audio.wav
```
- `speaker_uuid`ã¯ã`/speakers`ã§ç¢ºèªã§ããŸã
- `id`ã¯éè€ããŠã¯ãããŸãã
- ãšã³ãžã³èµ·ååŸã«ãã¡ã€ã«ãæžãæãããšãšã³ãžã³ã«åæ ãããŸã
### 2 çš®é¡ã®ã¹ã¿ã€ã«ã§ã¢ãŒãã£ã³ã°ãããµã³ãã«ã³ãŒã
`/synthesis_morphing`ã§ã¯ã2 çš®é¡ã®ã¹ã¿ã€ã«ã§ããããåæãããé³å£°ãå
ã«ãã¢ãŒãã£ã³ã°ããé³å£°ãçæããŸãã
```bash
echo -n "ã¢ãŒãã£ã³ã°ãå©çšããããšã§ãïŒçš®é¡ã®å£°ãæ··ããããšãã§ããŸãã" > text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=8"\
--get --data-urlencode [email protected] \
> query.json
# å
ã®ã¹ã¿ã€ã«ã§ã®åæçµæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=8" \
> audio.wav
export MORPH_RATE=0.5
# ã¹ã¿ã€ã«2çš®é¡åã®é³å£°åæ+WORLDã«ããé³å£°åæãå
¥ãããæéãæããã®ã§æ³šæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
export MORPH_RATE=0.9
# queryãbase_speakerãtarget_speakerãåãå Žåã¯ãã£ãã·ã¥ã䜿çšãããããæ¯èŒçé«éã«çæããã
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=8&target_speaker=10&morph_rate=$MORPH_RATE" \
> audio.wav
```
### ãã£ã©ã¯ã¿ãŒã®è¿œå æ
å ±ãååŸãããµã³ãã«ã³ãŒã
è¿œå æ
å ±ã®äžã® portrait.png ãååŸããã³ãŒãã§ãã
ïŒ[jq](https://stedolan.github.io/jq/)ã䜿çšã㊠json ãããŒã¹ããŠããŸããïŒ
```bash
curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \
| jq -r ".portrait" \
| base64 -d \
> portrait.png
```
### ãã£ã³ã»ã«å¯èœãªé³å£°åæ
`/cancellable_synthesis`ã§ã¯éä¿¡ãåæããå Žåã«å³åº§ã«èšç®ãªãœãŒã¹ãéæŸãããŸãã
(`/synthesis`ã§ã¯éä¿¡ãåæããŠãæåŸãŸã§é³å£°åæã®èšç®ãè¡ãããŸã)
ãã® API ã¯å®éšçæ©èœã§ããããšã³ãžã³èµ·åæã«åŒæ°ã§`--enable_cancellable_synthesis`ãæå®ããªããšæå¹åãããŸããã
é³å£°åæã«å¿
èŠãªãã©ã¡ãŒã¿ã¯`/synthesis`ãšåæ§ã§ãã
### HTTP ãªã¯ãšã¹ãã§æå£°åæãããµã³ãã«ã³ãŒã
```bash
echo -n '{
"notes": [
{ "key": null, "frame_length": 15, "lyric": "" },
{ "key": 60, "frame_length": 45, "lyric": "ã" },
{ "key": 62, "frame_length": 45, "lyric": "ã¬" },
{ "key": 64, "frame_length": 45, "lyric": "ã" },
{ "key": null, "frame_length": 15, "lyric": "" }
]
}' > score.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @score.json \
"127.0.0.1:50021/sing_frame_audio_query?speaker=6000" \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/frame_synthesis?speaker=3001" \
> audio.wav
```
ã¹ã³ã¢ã®`key`㯠MIDI çªå·ã§ãã
`lyric`ã¯æè©ã§ãä»»æã®æååãæå®ã§ããŸããããšã³ãžã³ã«ãã£ãŠã¯ã²ãããªã»ã«ã¿ã«ãïŒã¢ãŒã©ä»¥å€ã®æååã¯ãšã©ãŒã«ãªãããšããããŸãã
ãã¬ãŒã ã¬ãŒãã¯ããã©ã«ãã 93.75Hz ã§ããšã³ãžã³ãããã§ã¹ãã®`frame_rate`ã§ååŸã§ããŸãã
ïŒã€ç®ã®ããŒãã¯ç¡é³ã§ããå¿
èŠããããŸãã
`/sing_frame_audio_query`ã§æå®ã§ãã`speaker`ã¯ã`/singers`ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ã`sing`ã`singing_teacher`ãªã¹ã¿ã€ã«ã®`style_id`ã§ãã
`/frame_synthesis`ã§æå®ã§ãã`speaker`ã¯ã`/singers`ã§ååŸã§ããã¹ã¿ã€ã«ã®å
ãçš®é¡ã`frame_decode`ã®`style_id`ã§ãã
åŒæ°ã `speaker` ãšããååã«ãªã£ãŠããã®ã¯ãä»ã® API ãšäžè²«æ§ãããããããã§ãã
`/sing_frame_audio_query`ãš`/frame_synthesis`ã«ç°ãªãã¹ã¿ã€ã«ãæå®ããããšãå¯èœã§ãã
### CORS èšå®
VOICEVOX ã§ã¯ã»ãã¥ãªãã£ä¿è·ã®ãã`localhost`ã»`127.0.0.1`ã»`app://`ã»Origin ãªã以å€ã® Origin ãããªã¯ãšã¹ããåãå
¥ããªãããã«ãªã£ãŠããŸãã
ãã®ãããäžéšã®ãµãŒãããŒãã£ã¢ããªããã®ã¬ã¹ãã³ã¹ãåãåããªãå¯èœæ§ããããŸãã
ãããåé¿ããæ¹æ³ãšããŠããšã³ãžã³ããèšå®ã§ãã UI ãçšæããŠããŸãã
#### èšå®æ¹æ³
1. <http://127.0.0.1:50021/setting> ã«ã¢ã¯ã»ã¹ããŸãã
2. å©çšããã¢ããªã«åãããŠèšå®ã倿Žã远å ããŠãã ããã
3. ä¿åãã¿ã³ãæŒããŠã倿Žã確å®ããŠãã ããã
4. èšå®ã®é©çšã«ã¯ãšã³ãžã³ã®åèµ·åãå¿
èŠã§ããå¿
èŠã«å¿ããŠåèµ·åãããŠãã ããã
### ããŒã¿ã倿Žãã API ãç¡å¹åãã
å®è¡æåŒæ°`--disable_mutable_api`ãç°å¢å€æ°`VV_DISABLE_MUTABLE_API=1`ãæå®ããããšã§ããšã³ãžã³ã®èšå®ãèŸæžãªã©ã倿Žãã API ãç¡å¹ã«ã§ããŸãã
### æåã³ãŒã
ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã®æåã³ãŒãã¯ãã¹ãŠ UTF-8 ã§ãã
### ãã®ä»ã®åŒæ°
ãšã³ãžã³èµ·åæã«åŒæ°ãæå®ã§ããŸãã詳ããããšã¯`-h`åŒæ°ã§ãã«ãã確èªããŠãã ããã
```bash
$ python run.py -h
usage: run.py [-h] [--host HOST] [--port PORT] [--use_gpu] [--voicevox_dir VOICEVOX_DIR] [--voicelib_dir VOICELIB_DIR] [--runtime_dir RUNTIME_DIR] [--enable_mock] [--enable_cancellable_synthesis]
[--init_processes INIT_PROCESSES] [--load_all_models] [--cpu_num_threads CPU_NUM_THREADS] [--output_log_utf8] [--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}]
[--allow_origin [ALLOW_ORIGIN ...]] [--setting_file SETTING_FILE] [--preset_file PRESET_FILE] [--disable_mutable_api]
VOICEVOX ã®ãšã³ãžã³ã§ãã
options:
-h, --help show this help message and exit
--host HOST æ¥ç¶ãåãä»ãããã¹ãã¢ãã¬ã¹ã§ãã
--port PORT æ¥ç¶ãåãä»ããããŒãçªå·ã§ãã
--use_gpu GPUã䜿ã£ãŠé³å£°åæããããã«ãªããŸãã
--voicevox_dir VOICEVOX_DIR
VOICEVOXã®ãã£ã¬ã¯ããªãã¹ã§ãã
--voicelib_dir VOICELIB_DIR
VOICEVOX COREã®ãã£ã¬ã¯ããªãã¹ã§ãã
--runtime_dir RUNTIME_DIR
VOICEVOX COREã§äœ¿çšããã©ã€ãã©ãªã®ãã£ã¬ã¯ããªãã¹ã§ãã
--enable_mock VOICEVOX COREã䜿ããã¢ãã¯ã§é³å£°åæãè¡ããŸãã
--enable_cancellable_synthesis
é³å£°åæãéäžã§ãã£ã³ã»ã«ã§ããããã«ãªããŸãã
--init_processes INIT_PROCESSES
cancellable_synthesisæ©èœã®åæåæã«çæããããã»ã¹æ°ã§ãã
--load_all_models èµ·åæã«å
šãŠã®é³å£°åæã¢ãã«ãèªã¿èŸŒã¿ãŸãã
--cpu_num_threads CPU_NUM_THREADS
é³å£°åæãè¡ãã¹ã¬ããæ°ã§ããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_CPU_NUM_THREADS ã®å€ã䜿ãããŸããVV_CPU_NUM_THREADS ã空æååã§ãªãæ°å€ã§ããªãå Žåã¯ãšã©ãŒçµäºããŸãã
--output_log_utf8 ãã°åºåãUTF-8ã§ãããªããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_OUTPUT_LOG_UTF8 ã®å€ã䜿ãããŸããVV_OUTPUT_LOG_UTF8 ã®å€ã1ã®å Žåã¯UTF-8ã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç°å¢ã«ãã£ãŠèªåçã«æ±ºå®ãããŸãã
--cors_policy_mode {CorsPolicyMode.all,CorsPolicyMode.localapps}
CORSã®èš±å¯ã¢ãŒããallãŸãã¯localappsãæå®ã§ããŸããallã¯ãã¹ãŠãèš±å¯ããŸããlocalappsã¯ãªãªãžã³éãªãœãŒã¹å
±æããªã·ãŒããapp://.ãšlocalhosté¢é£ã«éå®ããŸãããã®ä»ã®ãªãªãžã³ã¯allow_originãªãã·ã§ã³ã§è¿œå ã§ããŸããããã©ã«ãã¯localappsããã®ãªãã·ã§ã³ã¯--
setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--allow_origin [ALLOW_ORIGIN ...]
èš±å¯ãããªãªãžã³ãæå®ããŸããã¹ããŒã¹ã§åºåãããšã§è€æ°æå®ã§ããŸãããã®ãªãã·ã§ã³ã¯--setting_fileã§æå®ãããèšå®ãã¡ã€ã«ãããåªå
ãããŸãã
--setting_file SETTING_FILE
èšå®ãã¡ã€ã«ãæå®ã§ããŸãã
--preset_file PRESET_FILE
ããªã»ãããã¡ã€ã«ãæå®ã§ããŸããæå®ããªãå Žåãç°å¢å€æ° VV_PRESET_FILEãå®è¡ãã¡ã€ã«ã®ãã£ã¬ã¯ããªã®presets.yamlãé ã«æ¢ããŸãã
--disable_mutable_api
èŸæžç»é²ãèšå®å€æŽãªã©ããšã³ãžã³ã®éçãªããŒã¿ã倿ŽããAPIãç¡å¹åããŸããæå®ããªãå Žåã代ããã«ç°å¢å€æ° VV_DISABLE_MUTABLE_API ã®å€ã䜿ãããŸããVV_DISABLE_MUTABLE_API ã®å€ã1ã®å Žåã¯ç¡å¹åã§ã0ãŸãã¯ç©ºæåãå€ããªãå Žåã¯ç¡èŠãããŸãã
```
### ã¢ããããŒã
ãšã³ãžã³ãã£ã¬ã¯ããªå
ã«ãããã¡ã€ã«ãå
šãŠæ¶å»ããæ°ãããã®ã«çœ®ãæããŠãã ããã
## è²¢ç®è
ã¬ã€ã
VOICEVOX ENGINE ã¯çããã®ã³ã³ããªãã¥ãŒã·ã§ã³ããåŸ
ã¡ããŠããŸãïŒ
詳现㯠[CONTRIBUTING.md](./CONTRIBUTING.md) ãã芧ãã ããã
ãŸã [VOICEVOX éå
¬åŒ Discord ãµãŒããŒ](https://discord.gg/WMwWetrzuh)ã«ãŠãéçºã®è°è«ãéè«ãè¡ã£ãŠããŸããæ°è»œã«ãåå ãã ããã
ãªããIssue ã解決ãããã«ãªã¯ãšã¹ããäœæãããéã¯ãå¥ã®æ¹ãšåã Issue ã«åãçµãããšãé¿ãããããIssue åŽã§åãçµã¿å§ããããšãäŒããããæåã« Draft ãã«ãªã¯ãšã¹ããäœæããããšãæšå¥šããŠããŸãã
## éçºè
ã¬ã€ã
### ç°å¢æ§ç¯
`Python 3.11.3` ãçšããŠéçºãããŠããŸãã
ã€ã³ã¹ããŒã«ããã«ã¯ãå OS ããšã® C/C++ ã³ã³ãã€ã©ãCMake ãå¿
èŠã«ãªããŸãã
```bash
# å®è¡ç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements.txt
# éçºç°å¢ã»ãã¹ãç°å¢ã»ãã«ãç°å¢ã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements-dev.txt -r requirements-build.txt
```
### å®è¡
ã³ãã³ãã©ã€ã³åŒæ°ã®è©³çްã¯ä»¥äžã®ã³ãã³ãã§ç¢ºèªããŠãã ããã
```bash
python run.py --help
```
```bash
# 補åç VOICEVOX ã§ãµãŒããŒãèµ·å
VOICEVOX_DIR="C:/path/to/voicevox" # 補åç VOICEVOX ãã£ã¬ã¯ããªã®ãã¹
python run.py --voicevox_dir=$VOICEVOX_DIR
```
<!-- å·®ãæ¿ãå¯èœãªé³å£°ã©ã€ãã©ãªãŸãã¯ãã®ä»æ§ãå
¬éããããã³ã¡ã³ããå€ã
```bash
# é³å£°ã©ã€ãã©ãªãå·®ãæ¿ãã
VOICELIB_DIR="C:/path/to/your/tts-model"
python run.py --voicevox_dir=$VOICEVOX_DIR --voicelib_dir=$VOICELIB_DIR
```
-->
```bash
# ã¢ãã¯ã§ãµãŒããŒèµ·å
python run.py --enable_mock
```
```bash
# ãã°ãUTF8ã«å€æŽ
python run.py --output_log_utf8
# ããã㯠VV_OUTPUT_LOG_UTF8=1 python run.py
```
#### CPU ã¹ã¬ããæ°ãæå®ãã
CPU ã¹ã¬ããæ°ãæªæå®ã®å Žåã¯ãè«çã³ã¢æ°ã®ååã䜿ãããŸããïŒæ®ã©ã® CPU ã§ãããã¯å
šäœã®åŠçèœåã®ååã§ãïŒ
ãã IaaS äžã§å®è¡ããŠããããå°çšãµãŒããŒã§å®è¡ããŠããå Žåãªã©ã
ãšã³ãžã³ã䜿ãåŠçèœåã調ç¯ãããå Žåã¯ãCPU ã¹ã¬ããæ°ãæå®ããããšã§å®çŸã§ããŸãã
- å®è¡æåŒæ°ã§æå®ãã
```bash
python run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4
```
- ç°å¢å€æ°ã§æå®ãã
```bash
export VV_CPU_NUM_THREADS=4
python run.py --voicevox_dir=$VOICEVOX_DIR
```
#### éå»ã®ããŒãžã§ã³ã®ã³ã¢ã䜿ã
VOICEVOX Core 0.5.4 以éã®ã³ã¢ã䜿çšããäºãå¯èœã§ãã
Mac ã§ã® libtorch çã³ã¢ã®ãµããŒãã¯ããŠããŸããã
##### éå»ã®ãã€ããªãæå®ãã
補åç VOICEVOX ãããã¯ã³ã³ãã€ã«æžã¿ãšã³ãžã³ã®ãã£ã¬ã¯ããªã`--voicevox_dir`åŒæ°ã§æå®ãããšããã®ããŒãžã§ã³ã®ã³ã¢ã䜿çšãããŸãã
```bash
python run.py --voicevox_dir="/path/to/voicevox"
```
Mac ã§ã¯ã`DYLD_LIBRARY_PATH`ã®æå®ãå¿
èŠã§ãã
```bash
DYLD_LIBRARY_PATH="/path/to/voicevox" python run.py --voicevox_dir="/path/to/voicevox"
```
##### é³å£°ã©ã€ãã©ãªãçŽæ¥æå®ãã
[VOICEVOX Core ã® zip ãã¡ã€ã«](https://github.com/VOICEVOX/voicevox_core/releases)ãè§£åãããã£ã¬ã¯ããªã`--voicelib_dir`åŒæ°ã§æå®ããŸãã
ãŸããã³ã¢ã®ããŒãžã§ã³ã«åãããŠã[libtorch](https://pytorch.org/)ã[onnxruntime](https://github.com/microsoft/onnxruntime) (å
±æã©ã€ãã©ãª) ã®ãã£ã¬ã¯ããªã`--runtime_dir`åŒæ°ã§æå®ããŸãã
ãã ããã·ã¹ãã ã®æ¢çŽ¢ãã¹äžã« libtorchãonnxruntime ãããå Žåã`--runtime_dir`åŒæ°ã®æå®ã¯äžèŠã§ãã
`--voicelib_dir`åŒæ°ã`--runtime_dir`åŒæ°ã¯è€æ°å䜿çšå¯èœã§ãã
API ãšã³ããã€ã³ãã§ã³ã¢ã®ããŒãžã§ã³ãæå®ããå Žåã¯`core_version`åŒæ°ãæå®ããŠãã ãããïŒæªæå®ã®å Žåã¯ææ°ã®ã³ã¢ã䜿çšãããŸãïŒ
```bash
python run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx"
```
Mac ã§ã¯ã`--runtime_dir`åŒæ°ã®ä»£ããã«`DYLD_LIBRARY_PATH`ã®æå®ãå¿
èŠã§ãã
```bash
DYLD_LIBRARY_PATH="/path/to/onnx" python run.py --voicelib_dir="/path/to/voicevox_core"
```
##### ãŠãŒã¶ãŒãã£ã¬ã¯ããªã«é
眮ãã
以äžã®ãã£ã¬ã¯ããªã«ããé³å£°ã©ã€ãã©ãªã¯èªåã§èªã¿èŸŒãŸããŸãã
- ãã«ãç: `<user_data_dir>/voicevox-engine/core_libraries/`
- Python ç: `<user_data_dir>/voicevox-engine-dev/core_libraries/`
`<user_data_dir>`㯠OS ã«ãã£ãŠç°ãªããŸãã
- Windows: `C:\Users\<username>\AppData\Local\`
- macOS: `/Users/<username>/Library/Application\ Support/`
- Linux: `/home/<username>/.local/share/`
### ãã«ã
`pyinstaller` ãçšããããã±ãŒãžåãš Dockerfile ãçšããã³ã³ããåã«ããããŒã«ã«ã§ãã«ããå¯èœã§ãã
æé ã®è©³çŽ°ã¯ [è²¢ç®è
ã¬ã€ã#ãã«ã](./CONTRIBUTING.md#ãã«ã) ã埡芧ãã ããã
GitHub ãçšããå Žåãfork ãããªããžããªã§ GitHub Actions ã«ãããã«ããå¯èœã§ãã
Actions ã ON ã«ããworkflow_dispatch ã§`build-engine-package.yml`ãèµ·åããã°ãã«ãã§ããŸãã
ææç©ã¯ Release ã«ã¢ããããŒããããŸãã
ãã«ãã«å¿
èŠãª GitHub Actions ã®èšå®ã¯ [è²¢ç®è
ã¬ã€ã#GitHub Actions](./CONTRIBUTING.md#github-actions) ã埡芧ãã ããã
### ãã¹ãã»éçè§£æ
`pytest` ãçšãããã¹ããšåçš®ãªã³ã¿ãŒãçšããéçè§£æãå¯èœã§ãã
æé ã®è©³çŽ°ã¯ [è²¢ç®è
ã¬ã€ã#ãã¹ã](./CONTRIBUTING.md#ãã¹ã), [è²¢ç®è
ã¬ã€ã#éçè§£æ](./CONTRIBUTING.md#éçè§£æ) ã埡芧ãã ããã
### äŸåé¢ä¿
äŸåé¢ä¿ã¯ `poetry` ã§ç®¡çãããŠããŸãããŸããå°å
¥å¯èœãªäŸåã©ã€ãã©ãªã«ã¯ã©ã€ã»ã³ã¹äžã®å¶çŽããããŸãã
詳现㯠[è²¢ç®è
ã¬ã€ã#ããã±ãŒãž](./CONTRIBUTING.md#ããã±ãŒãž) ã埡芧ãã ããã
### ãã«ããšã³ãžã³æ©èœã«é¢ããŠ
VOICEVOX ãšãã£ã¿ãŒã§ã¯ãè€æ°ã®ãšã³ãžã³ãåæã«èµ·åããããšãã§ããŸãã
ãã®æ©èœãå©çšããããšã§ãèªäœã®é³å£°åæãšã³ãžã³ãæ¢åã®é³å£°åæãšã³ãžã³ã VOICEVOX ãšãã£ã¿ãŒäžã§åããããšãå¯èœã§ãã
<img src="./docs/res/ãã«ããšã³ãžã³æŠå¿µå³.svg" width="320">
<details>
#### ãã«ããšã³ãžã³æ©èœã®ä»çµã¿
VOICEVOX API ã«æºæ ããè€æ°ã®ãšã³ãžã³ã® Web API ãããŒããåããŠèµ·åããçµ±äžçã«æ±ãããšã§ãã«ããšã³ãžã³æ©èœãå®çŸããŠããŸãã
ãšãã£ã¿ãŒãããããã®ãšã³ãžã³ãå®è¡ãã€ããªçµç±ã§èµ·åããEngineID ãšçµã³ã€ããŠèšå®ãç¶æ
ãåå¥ç®¡çããŸãã
#### ãã«ããšã³ãžã³æ©èœãžã®å¯Ÿå¿æ¹æ³
VOICEVOX API æºæ ãšã³ãžã³ãèµ·åããå®è¡ãã€ããªãäœãããšã§å¯Ÿå¿ãå¯èœã§ãã
VOICEVOX ENGINE ãªããžããªã fork ããäžéšã®æ©èœãæ¹é ããã®ãç°¡åã§ãã
æ¹é ãã¹ãç¹ã¯ãšã³ãžã³æ
å ±ã»ãã£ã©ã¯ã¿ãŒæ
å ±ã»é³å£°åæã®ïŒç¹ã§ãã
ãšã³ãžã³ã®æ
å ±ã¯ã«ãŒãçŽäžã®ãããã§ã¹ããã¡ã€ã«ïŒ`engine_manifest.json`ïŒã§ç®¡çãããŠããŸãã
ãã®åœ¢åŒã®ãããã§ã¹ããã¡ã€ã«ã¯ VOICEVOX API æºæ ãšã³ãžã³ã«å¿
é ã§ãã
ãããã§ã¹ããã¡ã€ã«å
ã®æ
å ±ãèŠãŠé©å®å€æŽããŠãã ããã
é³å£°åæææ³ã«ãã£ãŠã¯ãäŸãã°ã¢ãŒãã£ã³ã°æ©èœãªã©ãVOICEVOX ãšåãæ©èœãæã€ããšãã§ããªãå ŽåããããŸãã
ãã®å Žåã¯ãããã§ã¹ããã¡ã€ã«å
ã®`supported_features`å
ã®æ
å ±ãé©å®å€æŽããŠãã ããã
ãã£ã©ã¯ã¿ãŒæ
å ±ã¯`resources/character_info`ãã£ã¬ã¯ããªå
ã®ãã¡ã€ã«ã§ç®¡çãããŠããŸãã
ãããŒã®ã¢ã€ã³ã³ãªã©ãçšæãããŠããã®ã§é©å®å€æŽããŠãã ããã
é³å£°åæã¯`voicevox_engine/tts_pipeline/tts_engine.py`ã§è¡ãããŠããŸãã
VOICEVOX API ã§ã®é³å£°åæã¯ããšã³ãžã³åŽã§é³å£°åæçšã®ã¯ãšãª `AudioQuery` ã®åæå€ãäœæããŠãŠãŒã¶ãŒã«è¿ãããŠãŒã¶ãŒãå¿
èŠã«å¿ããŠã¯ãšãªãç·šéããããšããšã³ãžã³ãã¯ãšãªã«åŸã£ãŠé³å£°åæããããšã§å®çŸããŠããŸãã
ã¯ãšãªäœæã¯`/audio_query`ãšã³ããã€ã³ãã§ãé³å£°åæã¯`/synthesis`ãšã³ããã€ã³ãã§è¡ã£ãŠãããæäœãã®ïŒã€ã«å¯Ÿå¿ããã° VOICEVOX API ã«æºæ ããããšã«ãªããŸãã
#### ãã«ããšã³ãžã³æ©èœå¯Ÿå¿ãšã³ãžã³ã®é
åžæ¹æ³
VVPP ãã¡ã€ã«ãšããŠé
åžããã®ãããããã§ãã
VVPP ã¯ãVOICEVOX ãã©ã°ã€ã³ããã±ãŒãžãã®ç¥ã§ãäžèº«ã¯ãã«ããããšã³ãžã³ãªã©ãå«ãã ãã£ã¬ã¯ããªã® Zip ãã¡ã€ã«ã§ãã
æ¡åŒµåã`.vvpp`ã«ãããšãããã«ã¯ãªãã¯ã§ VOICEVOX ãšãã£ã¿ãŒã«ã€ã³ã¹ããŒã«ã§ããŸãã
ãšãã£ã¿ãŒåŽã¯åãåã£ã VVPP ãã¡ã€ã«ãããŒã«ã«ãã£ã¹ã¯äžã« Zip å±éããããšãã«ãŒãã®çŽäžã«ãã`engine_manifest.json`ã«åŸã£ãŠãã¡ã€ã«ãæ¢æ»ããŸãã
VOICEVOX ãšãã£ã¿ãŒã«ããŸãèªã¿èŸŒãŸããããªããšãã¯ããšãã£ã¿ãŒã®ãšã©ãŒãã°ãåç
§ããŠãã ããã
ãŸãã`xxx.vvpp`ã¯åå²ããŠé£çªãä»ãã`xxx.0.vvppp`ãã¡ã€ã«ãšããŠé
åžããããšãå¯èœã§ãã
ããã¯ãã¡ã€ã«å®¹éã倧ãããŠé
åžãå°é£ãªå Žåã«æçšã§ãã
</details>
## äºäŸç޹ä»
**[voicevox-client](https://github.com/voicevox-client) [@voicevox-client](https://github.com/voicevox-client)**  VOICEVOX ENGINE ã®åèšèªåã API ã©ãããŒ
## ã©ã€ã»ã³ã¹
LGPL v3 ãšããœãŒã¹ã³ãŒãã®å
¬éãäžèŠãªå¥ã©ã€ã»ã³ã¹ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã§ãã
å¥ã©ã€ã»ã³ã¹ãååŸãããå Žåã¯ãããã«æ±ããŠãã ããã
X ã¢ã«ãŠã³ã: [@hiho_karuta](https://x.com/hiho_karuta)
|