Spaces:

fdaudens
/

kokoro-mcp

Running

App Files Files Community

fdaudens HF Staff commited on Apr 30

Commit

a132885

verified ·

1 Parent(s): f50b82e

Upload 3 files

Browse files

Files changed (3) hide show

README.md +33 -12
kokoro_text_to_audio.py +81 -0
requirements.txt +5 -0

README.md CHANGED Viewed

@@ -1,12 +1,33 @@
----
-title: Kokoro Mcp
-emoji: 🐠
-colorFrom: yellow
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.28.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Kokoro Text-to-Audio App
+A simple Gradio application that uses the hexgrad/Kokoro-82M model to convert text to audio.
+## Setup Instructions
+1. Install the required dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+2. Run the application:
+   ```
+   python kokoro_text_to_audio.py
+   ```
+3. Open your web browser and navigate to the URL displayed in the terminal (typically http://127.0.0.1:7860)
+## Features
+- Simple text input box for entering the text you want to convert to audio
+- Adjustable speech speed slider
+- Audio playback directly in the browser
+## Requirements
+- Python 3.8 or higher
+- GPU is recommended for faster generation, but not required
+- Internet connection (to download the model on first run)
+## Model Information
+This app uses the [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) model from Hugging Face.

kokoro_text_to_audio.py ADDED Viewed

	@@ -0,0 +1,81 @@

+import gradio as gr
+import torch
+from transformers import AutoModelForTextToWaveform, AutoProcessor
+# Load model and processor
+model_name = "hexgrad/Kokoro-82M"
+processor = AutoProcessor.from_pretrained(model_name)
+model = AutoModelForTextToWaveform.from_pretrained(model_name, torch_dtype=torch.float16)
+# Move to GPU if available
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = model.to(device)
+def text_to_audio(text, speed=1.0):
+    """Convert text to audio using Kokoro model"""
+    # Process the input text
+    inputs = processor(text=text, return_tensors="pt")
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    # Set generation parameters
+    gen_kwargs = {
+        "do_sample": True,
+        "temperature": 0.7,
+        "length_penalty": 1.0,
+        "repetition_penalty": 2.0,
+        "top_p": 0.9,
+    }
+    # Generate waveform
+    with torch.no_grad():
+        waveform = model.generate(**inputs, **gen_kwargs).cpu().numpy()[0]
+    # Create a sample rate (typical for audio is 24000)
+    sample_rate = 24000
+    # Apply speed factor if needed
+    if speed != 1.0:
+        import numpy as np
+        import librosa
+        waveform = librosa.effects.time_stretch(waveform.astype(np.float32), rate=speed)
+    return sample_rate, waveform
+# Create Gradio interface
+with gr.Blocks(title="Kokoro Text-to-Audio") as app:
+    gr.Markdown("# 🎵 Kokoro Text-to-Audio Converter")
+    gr.Markdown("Convert text to speech using hexgrad/Kokoro-82M model")
+    with gr.Row():
+        with gr.Column():
+            text_input = gr.Textbox(
+                label="Enter your text",
+                placeholder="Type something to convert to audio...",
+                lines=5
+            )
+            speed_slider = gr.Slider(
+                minimum=0.5,
+                maximum=1.5,
+                value=1.0,
+                step=0.1,
+                label="Speech Speed"
+            )
+            submit_btn = gr.Button("Generate Audio")
+        with gr.Column():
+            audio_output = gr.Audio(label="Generated Audio", type="numpy")
+    submit_btn.click(
+        fn=text_to_audio,
+        inputs=[text_input, speed_slider],
+        outputs=[audio_output]
+    )
+    gr.Markdown("### Usage Tips")
+    gr.Markdown("- For best results, keep your text reasonably short")
+    gr.Markdown("- Adjust the speed slider to modify the pace of speech")
+    gr.Markdown("- The model may take a moment to load on first use")
+# Launch the app
+if __name__ == "__main__":
+    app.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+gradio>=3.50.2
+torch>=2.0.0
+transformers>=4.34.0
+librosa>=0.10.0
+numpy>=1.22.0