Spaces:

DeepLearning101
/

GPT-SoVITS_TWMAN

Running

File size: 5,155 Bytes

89c506f
 
 
 
 
52ad4ef
89c506f
 
 
52ad4ef
89c506f
 
 
 
52ad4ef
89c506f
52ad4ef
89c506f
 
52ad4ef
0fb6ffc
 
 
 
 
54bd4a5
e4723ee
bba7c7e
e1ab021
52ad4ef
89c506f
 
52ad4ef
89c506f
 
 
 
52ad4ef
 
 
 
 
 
 
89c506f
52ad4ef
89c506f
 
 
 
 
 
 
 
 
 
52ad4ef
89c506f
0fb6ffc
 
24393d5
 
0fb6ffc
 
 
 
 
1fb9b7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0fb6ffc
52ad4ef
89c506f
52ad4ef
89c506f
52ad4ef
 
 
 
 
 
 
 
0fb6ffc
89c506f
 
 
 
 
 
52ad4ef
89c506f
52ad4ef

# -*- coding: utf-8 -*-
"""
@author:XuMing([email protected])
@description: Re-train by TWMAN
"""

import hashlib
import os
import ssl
import subprocess

import gradio as gr
import torch
from loguru import logger
import nltk

# 設定 HTTPS context 避免證書錯誤
ssl._create_default_https_context = ssl._create_unverified_context

# 🧠 下載 NLTK 所需資源
nltk_data_path = os.path.expanduser('~/nltk_data')
if not os.path.exists(os.path.join(nltk_data_path, 'corpora/cmudict.zip')):
    nltk.download('cmudict', download_dir=nltk_data_path)
if not os.path.exists(os.path.join(nltk_data_path, 'taggers/averaged_perceptron_tagger.zip')):
    nltk.download('averaged_perceptron_tagger', download_dir=nltk_data_path)

import sys
sys.path.insert(0, os.path.abspath("ForkLangSegment"))

# 📦 匯入 parrots
from parrots import TextToSpeech

# 設定裝置與精度
device = "cuda" if torch.cuda.is_available() else "cpu"
logger.info(f"device: {device}")
half = True if device == "cuda" else False

# 初始化 TTS 模型
m = TextToSpeech(
    speaker_model_path="DeepLearning101/GPT-SoVITS_TWMAN",
    speaker_name="TWMAN",
    device=device,
    half=half
)

# 🔊 音訊生成邏輯
def get_text_hash(text: str):
    return hashlib.md5(text.encode('utf-8')).hexdigest()

def do_tts_wav_predict(text: str, output_path: str = None):
    if output_path is None:
        output_path = f"output_audio_{get_text_hash(text)}.wav"
    if not os.path.exists(output_path):
        m.predict(text, text_language="auto", output_path=output_path)
    return output_path

# 🌐 Gradio WebUI 設定
with gr.Blocks(title="TTS WebUI") as app:
    gr.Markdown("""
    # 線上語音合成 (TWMAN)
    # [TonTon Huang Ph.D.](https://www.twman.org) | [手把手帶你一起踩AI坑](https://blog.twman.org/p/deeplearning101.html)

    #### 請嚴格遵守法規，發布二創作品請標註本專案作者及連結，並標註生成工具 GPT-SoVITS AI！
    ⚠️ 注意：在線生成可能較慢，建議在本地進行推理。 
    
    - [Parrots專案](https://github.com/shibing624/parrots)
    - [模型使用說明](https://github.com/RVC-Boss/GPT-SoVITS)
    
    - [Deep Learning 101 Github](https://github.com/Deep-Learning-101) | [Deep Learning 101](http://deeplearning101.twman.org)
    - [台灣人工智慧社團 FB](https://www.facebook.com/groups/525579498272187/) | [YouTube](https://www.youtube.com/c/DeepLearning101)
    - [那些 AI Agent 要踩的坑](https://blog.twman.org/2025/03/AIAgent.html)：探討多種 AI 代理人工具的應用經驗與挑戰，分享實用經驗與工具推薦。
    - [白話文手把手帶你科普 GenAI](https://blog.twman.org/2024/08/LLM.html)：淺顯介紹生成式人工智慧核心概念，強調硬體資源和數據的重要性。
    - [大型語言模型直接就打完收工？](https://blog.twman.org/2024/09/LLM.html)：回顧 LLM 領域探索歷程，討論硬體升級對 AI 開發的重要性。
    - [那些檢索增強生成要踩的坑](https://blog.twman.org/2024/07/RAG.html)：探討 RAG 技術應用與挑戰，提供實用經驗分享和工具建議。
    - [那些大型語言模型要踩的坑](https://blog.twman.org/2024/02/LLM.html)：探討多種 LLM 工具的應用與挑戰，強調硬體資源的重要性。
    - [Large Language Model，LLM](https://blog.twman.org/2023/04/GPT.html)：探討 LLM 的發展與應用，強調硬體資源在開發中的關鍵作用。
    - [ComfyUI + Stable Diffuision](https://blog.twman.org/2024/11/diffusion.html)：深入探討影像生成與分割技術的應用，強調硬體資源的重要性。
    - [那些ASR和TTS可能會踩的坑](https://blog.twman.org/2024/02/asr-tts.html)：探討 ASR 和 TTS 技術應用中的問題，強調數據質量的重要性。
    - [那些自然語言處理踩的坑](https://blog.twman.org/2021/04/NLP.html)：分享 NLP 領域的實踐經驗，強調數據質量對模型效果的影響。
    - [那些語音處理 (Speech Processing) 踩的坑](https://blog.twman.org/2021/04/ASR.html)：分享語音處理領域的實務經驗，強調資料品質對模型效果的影響。
    - [用PPOCRLabel來幫PaddleOCR做OCR的微調和標註](https://blog.twman.org/2023/07/wsl.html)
    - [基於機器閱讀理解和指令微調的統一信息抽取框架之診斷書醫囑資訊擷取分析](https://blog.twman.org/2023/07/HugIE.html)    
    """)
    
    with gr.Group():
        gr.Markdown("🔤 請輸入要進行語音合成的文字：")
        with gr.Row():
            text = gr.Textbox(
                label="輸入文字（建議 100 字內）",
                value="床前明月光，疑是地上霜。舉頭望明月，低頭思故鄉。",
                placeholder="請輸入文字...",
                lines=3
            )
            inference_button = gr.Button("🎤 語音合成", variant="primary")
            output = gr.Audio(label="🔊 合成的語音")

        inference_button.click(
            do_tts_wav_predict,
            [text],
            [output],
        )

# 啟動 Gradio App
app.queue(max_size=10)
app.launch(share=True, inbrowser=True)