metadata

title: Audio Agent
colorFrom: yellow
colorTo: blue
sdk: gradio
app_file: src/ui.py
pinned: true
license: apache-2.0
emoji: 🚀
short_description: An intelligent audio processing assistant powered by AI
sdk_version: 5.33.0
tags:
  - agent-demo-track

Audio Agent - Your AI Audio Assistant

An intelligent audio processing assistant powered by AI that can help you manipulate, analyze, and transcribe audio files through a simple web interface.

You can see the demo here Demo

Features

🎚️ Audio Manipulation

Merge multiple audio files into one continuous track
Cut or trim specific sections from any file
Adjust volume levels (increase or decrease)
Normalize audio levels for consistency
Apply fade-in or fade-out effects for smooth transitions (Mono channel only)
Change playback speed (faster or slower, with pitch change)
Reverse audio for creative effects
Remove silence from beginning or end of files

📝 Analysis & Transcription (English only)

Transcribe speech in audio to text
Analyze audio properties (duration, sample rate, etc.)

Supported Audio Formats: MP3, WAV, M4A, FLAC, AAC, OGG

Requirements

Python 3.13
OpenAI API key
MCP (Model Context Protocol) Server for audio tools

Installation

Clone the repository

git clone <repository-url>
cd audio-agent

Install dependencies

The project uses Poetry for dependency management. All dependencies are defined in pyproject.toml.

Using Poetry (recommended):
```
poetry install
```
Or using pip:
```
pip install -e .
```

Configuration

Environment Variables

Create a .env file in the project root or set the following environment variables:

# Required: MCP Server endpoint for audio tools
MCP_SERVER=your_mcp_server_endpoint

# Optional: OpenAI API key (can also be provided in the UI)
OPENAI_API_KEY=sk-your-openai-api-key-here

Environment Variable Details

MCP_SERVER (Required): The endpoint URL for the MCP server that provides audio processing tools
OPENAI_API_KEY (Optional): Your OpenAI API key. If not set here, you can provide it through the web interface

Usage

Running the Application

Start the web interface with:

python -m src.ui

The application will launch a Gradio web interface accessible at:

Local: http://localhost:7861
Public share URL (if enabled)

Using the Interface

Configure the Model: Select your preferred AI model and adjust settings in the right panel
Provide API Key: Enter your OpenAI API key if not set in environment variables
Upload Audio Files: Drag and drop or select audio files to process
Describe Your Task: Type what you want to do with the audio files
Get Results: The AI will process your request and provide the results

Example Requests

"Merge these two audio files and add a fade-in effect"
"Remove the silence at the beginning of this recording"
"Transcribe the speech in this audio file"
"Increase the volume of the first track and normalize both files"
"Cut out the middle section from 1:30 to 2:45"
"Make this audio play 1.5x faster"
"Apply a fade-out effect to the end of this track"

Dependencies

The project relies on several key libraries:

LangGraph (0.4.8+): For building the AI agent workflow
Gradio (5.33.0+): For the web interface
LangChain OpenAI (0.3.21+): For OpenAI model integration
LangChain MCP Adapters (0.1.7+): For Model Context Protocol integration
dotenv (0.9.9+): For environment variable management

See pyproject.toml for the complete list of dependencies.

Troubleshooting

Common Issues

"Please configure the agent first"
- Ensure you've provided a valid OpenAI API key
- Check that the selected model is available
Audio processing errors
- Verify the MCP_SERVER environment variable is set correctly
- Ensure your audio files are in supported formats
- Check that the MCP server is running and accessible
Import errors
- Make sure all dependencies are installed: poetry install or pip install -e .
- Verify you're using Python 3.13 or higher

Getting Help

If you encounter issues:

Check the console output for error messages
Verify your environment variables are set correctly
Ensure your audio files are in supported formats
Try with different AI models if one isn't working