twitchard
more explicit control flow
1ed6720 unverified
|
raw
history blame
4.49 kB

Expressive TTS Arena

An web application for comparing and evaluating the expressiveness of different text-to-speech models

Overview

Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with Gradio, it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.

Prerequisites

Project Structure

Expressive TTS Arena/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ assets/
β”‚   β”‚   β”œβ”€β”€ styles.css          # Defines custom css
β”‚   β”œβ”€β”€ database/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes database a package; expose ORM methods
β”‚   β”‚   β”œβ”€β”€ crud.py             # Defines operations for interacting with database
β”‚   β”‚   β”œβ”€β”€ database.py         # Sets up SQLAlchemy database connection
β”‚   β”‚   └── models.py           # SQLAlchemy database models
β”‚   β”œβ”€β”€ integrations/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes integrations a package; exposes API clients
β”‚   β”‚   β”œβ”€β”€ anthropic_api.py    # Anthropic API integration
β”‚   β”‚   β”œβ”€β”€ elevenlabs_api.py   # ElevenLabs API integration
β”‚   β”‚   └── hume_api.py         # Hume API integration
β”‚   β”œβ”€β”€ scripts/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes scripts a package
β”‚   β”‚   β”œβ”€β”€ init_db.py          # Script for initializing database
β”‚   β”‚   β”œβ”€β”€ test_db.py          # Script for testing database connection
β”‚   β”œβ”€β”€ __init__.py             # Makes src a package
β”‚   β”œβ”€β”€ app.py                  # Entry file
β”‚   β”œβ”€β”€ config.py               # Global config and logger setup
β”‚   β”œβ”€β”€ constants.py            # Global constants
β”‚   β”œβ”€β”€ custom_types.py         # Global custom types
β”‚   β”œβ”€β”€ theme.py                # Custom Gradio Theme
β”‚   └── utils.py                # Utility functions
│── static/
β”‚   β”œβ”€β”€ audio/                  # Directory for storing generated audio files
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .pre-commit-config.yaml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ LICENSE.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ uv.lock

Installation

  1. This project uses the uv package manager. Follow the installation instructions for your platform here.

  2. Configure environment variables:

    • Create a .env file based on .env.example
    • Add your API keys:
    HUME_API_KEY=YOUR_HUME_API_KEY
    ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
    ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
    
  3. Run the application:

    Standard

    uv run python -m src.main
    

    With hot-reloading

    uv run watchfiles "python -m src.main" src
    
  4. Test the application by navigating to the the localhost URL in your browser (e.g. localhost:7860 or http://127.0.0.1:7860)

  5. (Optional) If contributing, install pre-commit hook for automatic file formatting:

    uv run pre-commit install
    

User Flow

  1. Choose or enter a character description: Select a sample from the list or enter your own to guide text and voice generation.
  2. Generate text: Click "Generate Text" to create dialogue based on the character. The generated text will appear in the input field automaticallyβ€”edit it if needed.
  3. Synthesize speech: Click "Synthesize Speech" to send your text and character description to two TTS APIs. Each API generates a voice and synthesizes speech in that voice.
  4. Listen & compare: Play both audio options and assess their expressiveness.
  5. Vote for the best: Click "Select Option A" or "Select Option B" to choose the most expressive output.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.