Spaces:
Running

Expressive TTS Arena
An web application for comparing and evaluating the expressiveness of different text-to-speech models
Overview
Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with Gradio, it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.
Prerequisites
- Python >=3.11.11
- pip >=25.0
- uv >=0.5.29
- Postgres
- API keys for Hume AI, Anthropic, and ElevenLabs
Project Structure
Expressive TTS Arena/
βββ src/
β βββ assets/
β β βββ styles.css # Defines custom css
β βββ database/
β β βββ __init__.py # Makes database a package; expose ORM methods
β β βββ crud.py # Defines operations for interacting with database
β β βββ database.py # Sets up SQLAlchemy database connection
β β βββ models.py # SQLAlchemy database models
β βββ integrations/
β β βββ __init__.py # Makes integrations a package; exposes API clients
β β βββ anthropic_api.py # Anthropic API integration
β β βββ elevenlabs_api.py # ElevenLabs API integration
β β βββ hume_api.py # Hume API integration
β βββ scripts/
β β βββ __init__.py # Makes scripts a package
β β βββ init_db.py # Script for initializing database
β β βββ test_db.py # Script for testing database connection
β βββ __init__.py # Makes src a package
β βββ app.py # Entry file
β βββ config.py # Global config and logger setup
β βββ constants.py # Global constants
β βββ custom_types.py # Global custom types
β βββ theme.py # Custom Gradio Theme
β βββ utils.py # Utility functions
βββ static/
β βββ audio/ # Directory for storing generated audio files
βββ .env.example
βββ .gitignore
βββ .pre-commit-config.yaml
βββ Dockerfile
βββ LICENSE.txt
βββ pyproject.toml
βββ README.md
βββ uv.lock
Installation
This project uses the uv package manager. Follow the installation instructions for your platform here.
Configure environment variables:
- Create a
.env
file based on.env.example
- Add your API keys:
HUME_API_KEY=YOUR_HUME_API_KEY ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
- Create a
Run the application:
Standard
uv run python -m src.main
With hot-reloading
uv run watchfiles "python -m src.main" src
Test the application by navigating to the the localhost URL in your browser (e.g.
localhost:7860
orhttp://127.0.0.1:7860
)(Optional) If contributing, install pre-commit hook for automatic file formatting:
uv run pre-commit install
User Flow
- Choose or enter a character description: Select a sample from the list or enter your own to guide text and voice generation.
- Generate text: Click "Generate Text" to create dialogue based on the character. The generated text will appear in the input field automaticallyβedit it if needed.
- Synthesize speech: Click "Synthesize Speech" to send your text and character description to two TTS APIs. Each API generates a voice and synthesizes speech in that voice.
- Listen & compare: Play both audio options and assess their expressiveness.
- Vote for the best: Click "Select Option A" or "Select Option B" to choose the most expressive output.
License
This project is licensed under the MIT License - see the LICENSE.txt file for details.