Spaces:
Running
Running
File size: 4,488 Bytes
8ba4308 6680f24 87ff28a 6680f24 8ba4308 87ff28a 9100090 a4e10c8 87ff28a 9751248 cb57d96 9751248 87ff28a e898abd 9100090 e898abd 0f77dec 5fae21a 0f77dec e898abd 0f77dec e898abd 048c3fc e898abd b850013 e898abd 0f77dec e898abd 87ff28a f420a37 8ba4308 f420a37 87ff28a f420a37 87ff28a f420a37 557e7ca 1ed6720 557e7ca 8ba4308 1ed6720 87ff28a f420a37 8ba4308 87ff28a 163ac45 87ff28a 9100090 f420a37 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
<div align="center">
<img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png">
<h1>Expressive TTS Arena</h1>
<p>
<strong> An web application for comparing and evaluating the expressiveness of different text-to-speech models </strong>
</p>
</div>
## Overview
Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with [Gradio](https://www.gradio.app/), it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.
## Prerequisites
- [Python >=3.11.11](https://www.python.org/downloads/)
- [pip >=25.0](https://pypi.org/project/pip/)
- [uv >=0.5.29](https://github.com/astral-sh/uv)
- [Postgres](https://www.postgresql.org/download/)
- API keys for Hume AI, Anthropic, and ElevenLabs
## Project Structure
```
Expressive TTS Arena/
βββ src/
β βββ assets/
β β βββ styles.css # Defines custom css
β βββ database/
β β βββ __init__.py # Makes database a package; expose ORM methods
β β βββ crud.py # Defines operations for interacting with database
β β βββ database.py # Sets up SQLAlchemy database connection
β β βββ models.py # SQLAlchemy database models
β βββ integrations/
β β βββ __init__.py # Makes integrations a package; exposes API clients
β β βββ anthropic_api.py # Anthropic API integration
β β βββ elevenlabs_api.py # ElevenLabs API integration
β β βββ hume_api.py # Hume API integration
β βββ scripts/
β β βββ __init__.py # Makes scripts a package
β β βββ init_db.py # Script for initializing database
β β βββ test_db.py # Script for testing database connection
β βββ __init__.py # Makes src a package
β βββ app.py # Entry file
β βββ config.py # Global config and logger setup
β βββ constants.py # Global constants
β βββ custom_types.py # Global custom types
β βββ theme.py # Custom Gradio Theme
β βββ utils.py # Utility functions
βββ static/
β βββ audio/ # Directory for storing generated audio files
βββ .env.example
βββ .gitignore
βββ .pre-commit-config.yaml
βββ Dockerfile
βββ LICENSE.txt
βββ pyproject.toml
βββ README.md
βββ uv.lock
```
## Installation
1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/).
2. Configure environment variables:
- Create a `.env` file based on `.env.example`
- Add your API keys:
```txt
HUME_API_KEY=YOUR_HUME_API_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
```
3. Run the application:
Standard
```sh
uv run python -m src.main
```
With hot-reloading
```sh
uv run watchfiles "python -m src.main" src
```
4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`)
5. (Optional) If contributing, install pre-commit hook for automatic file formatting:
```sh
uv run pre-commit install
```
## User Flow
1. **Choose or enter a character description**: Select a sample from the list or enter your own to guide text and voice generation.
2. **Generate text**: Click **"Generate Text"** to create dialogue based on the character. The generated text will appear in the input field automaticallyβedit it if needed.
3. **Synthesize speech**: Click **"Synthesize Speech"** to send your text and character description to two TTS APIs. Each API generates a voice and synthesizes speech in that voice.
4. **Listen & compare**: Play both audio options and assess their expressiveness.
5. **Vote for the best**: Click **"Select Option A"** or **"Select Option B"** to choose the most expressive output.
## License
This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.
|