Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.42.0
title: Audio Agent
colorFrom: yellow
colorTo: blue
sdk: gradio
app_file: src/ui.py
pinned: true
license: apache-2.0
emoji: π
short_description: An intelligent audio processing assistant powered by AI
sdk_version: 5.33.0
tags:
- agent-demo-track
Audio Agent - Your AI Audio Assistant
An intelligent audio processing assistant powered by AI that can help you manipulate, analyze, and transcribe audio files through a simple web interface.
You can see the demo here Demo
Features
ποΈ Audio Manipulation
- Merge multiple audio files into one continuous track
- Cut or trim specific sections from any file
- Adjust volume levels (increase or decrease)
- Normalize audio levels for consistency
- Apply fade-in or fade-out effects for smooth transitions (Mono channel only)
- Change playback speed (faster or slower, with pitch change)
- Reverse audio for creative effects
- Remove silence from beginning or end of files
π Analysis & Transcription (English only)
- Transcribe speech in audio to text
- Analyze audio properties (duration, sample rate, etc.)
Supported Audio Formats: MP3, WAV, M4A, FLAC, AAC, OGG
Requirements
- Python 3.13
- OpenAI API key
- MCP (Model Context Protocol) Server for audio tools
Installation
Clone the repository
git clone <repository-url> cd audio-agent
Install dependencies
The project uses Poetry for dependency management. All dependencies are defined in
pyproject.toml
.Using Poetry (recommended):
poetry install
Or using pip:
pip install -e .
Configuration
Environment Variables
Create a .env
file in the project root or set the following environment variables:
# Required: MCP Server endpoint for audio tools
MCP_SERVER=your_mcp_server_endpoint
# Optional: OpenAI API key (can also be provided in the UI)
OPENAI_API_KEY=sk-your-openai-api-key-here
Environment Variable Details
MCP_SERVER
(Required): The endpoint URL for the MCP server that provides audio processing toolsOPENAI_API_KEY
(Optional): Your OpenAI API key. If not set here, you can provide it through the web interface
Usage
Running the Application
Start the web interface with:
python -m src.ui
The application will launch a Gradio web interface accessible at:
- Local:
http://localhost:7861
- Public share URL (if enabled)
Using the Interface
- Configure the Model: Select your preferred AI model and adjust settings in the right panel
- Provide API Key: Enter your OpenAI API key if not set in environment variables
- Upload Audio Files: Drag and drop or select audio files to process
- Describe Your Task: Type what you want to do with the audio files
- Get Results: The AI will process your request and provide the results
Example Requests
- "Merge these two audio files and add a fade-in effect"
- "Remove the silence at the beginning of this recording"
- "Transcribe the speech in this audio file"
- "Increase the volume of the first track and normalize both files"
- "Cut out the middle section from 1:30 to 2:45"
- "Make this audio play 1.5x faster"
- "Apply a fade-out effect to the end of this track"
Dependencies
The project relies on several key libraries:
- LangGraph (0.4.8+): For building the AI agent workflow
- Gradio (5.33.0+): For the web interface
- LangChain OpenAI (0.3.21+): For OpenAI model integration
- LangChain MCP Adapters (0.1.7+): For Model Context Protocol integration
- dotenv (0.9.9+): For environment variable management
See pyproject.toml
for the complete list of dependencies.
Troubleshooting
Common Issues
"Please configure the agent first"
- Ensure you've provided a valid OpenAI API key
- Check that the selected model is available
Audio processing errors
- Verify the MCP_SERVER environment variable is set correctly
- Ensure your audio files are in supported formats
- Check that the MCP server is running and accessible
Import errors
- Make sure all dependencies are installed:
poetry install
orpip install -e .
- Verify you're using Python 3.13 or higher
- Make sure all dependencies are installed:
Getting Help
If you encounter issues:
- Check the console output for error messages
- Verify your environment variables are set correctly
- Ensure your audio files are in supported formats
- Try with different AI models if one isn't working