Expressive TTS Arena

An web application for comparing and evaluating the expressiveness of different text-to-speech models

Overview

Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with Gradio, it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.

Prerequisites

Python >=3.11.11
pip >=25.0
uv >=0.5.29
Postgres
API keys for Hume AI, Anthropic, and ElevenLabs

Project Structure

Expressive TTS Arena/
├── src/
│   ├── assets/
│   │   ├── styles.css          # Defines custom css
│   ├── database/
│   │   ├── __init__.py         # Makes database a package; expose ORM methods
│   │   ├── crud.py             # Defines operations for interacting with database
│   │   ├── database.py         # Sets up SQLAlchemy database connection
│   │   └── models.py           # SQLAlchemy database models
│   ├── integrations/
│   │   ├── __init__.py         # Makes integrations a package; exposes API clients
│   │   ├── anthropic_api.py    # Anthropic API integration
│   │   ├── elevenlabs_api.py   # ElevenLabs API integration
│   │   └── hume_api.py         # Hume API integration
│   ├── scripts/
│   │   ├── __init__.py         # Makes scripts a package
│   │   ├── init_db.py          # Script for initializing database
│   │   ├── test_db.py          # Script for testing database connection
│   ├── __init__.py             # Makes src a package
│   ├── app.py                  # Entry file
│   ├── config.py               # Global config and logger setup
│   ├── constants.py            # Global constants
│   ├── custom_types.py         # Global custom types
│   ├── theme.py                # Custom Gradio Theme
│   └── utils.py                # Utility functions
│── static/
│   ├── audio/                  # Directory for storing generated audio files
├── .env.example
├── .gitignore
├── .pre-commit-config.yaml
├── Dockerfile
├── LICENSE.txt
├── pyproject.toml
├── README.md
├── uv.lock

Installation

This project uses the uv package manager. Follow the installation instructions for your platform here.

Configure environment variables:

Create a .env file based on .env.example
Add your API keys:

HUME_API_KEY=YOUR_HUME_API_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY

Run the application:

Standard

uv run python -m src.main

With hot-reloading

uv run watchfiles "python -m src.main" src

Test the application by navigating to the the localhost URL in your browser (e.g. localhost:7860 or http://127.0.0.1:7860)
(Optional) If contributing, install pre-commit hook for automatic file formatting:
```
uv run pre-commit install
```

User Flow

Choose or enter a character description: Select a sample from the list or enter your own to guide text and voice generation.
Generate text: Click "Generate Text" to create dialogue based on the character. The generated text will appear in the input field automatically—edit it if needed.
Synthesize speech: Click "Synthesize Speech" to send your text and character description to two TTS APIs. Each API generates a voice and synthesizes speech in that voice.
Listen & compare: Play both audio options and assess their expressiveness.
Vote for the best: Click "Select Option A" or "Select Option B" to choose the most expressive output.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.