File size: 4,546 Bytes
15efe80
 
 
 
 
 
 
da21631
 
 
 
 
 
15efe80
e9367c1
 
 
 
8fb89fe
 
 
e9367c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
da21631
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: Audio Agent
colorFrom: yellow
colorTo: blue
sdk: gradio
app_file: src/ui.py
pinned: true
license: apache-2.0
emoji: πŸš€
short_description: An intelligent audio processing assistant powered by AI
sdk_version: 5.33.0
tags:
- agent-demo-track
---
# Audio Agent - Your AI Audio Assistant

An intelligent audio processing assistant powered by AI that can help you manipulate, analyze, and transcribe audio files through a simple web interface.


You can see the demo here [Demo](https://youtu.be/BYYnWm-yJMo)

## Features

🎚️ **Audio Manipulation**
- Merge multiple audio files into one continuous track
- Cut or trim specific sections from any file
- Adjust volume levels (increase or decrease)
- Normalize audio levels for consistency
- Apply fade-in or fade-out effects for smooth transitions (Mono channel only)
- Change playback speed (faster or slower, with pitch change)
- Reverse audio for creative effects
- Remove silence from beginning or end of files

πŸ“ **Analysis & Transcription** (English only)
- Transcribe speech in audio to text
- Analyze audio properties (duration, sample rate, etc.)

**Supported Audio Formats**: MP3, WAV, M4A, FLAC, AAC, OGG

## Requirements

- Python 3.13
- OpenAI API key
- MCP (Model Context Protocol) Server for audio tools

## Installation

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd audio-agent
   ```

2. **Install dependencies**
   
   The project uses Poetry for dependency management. All dependencies are defined in `pyproject.toml`.
   
   Using Poetry (recommended):
   ```bash
   poetry install
   ```
   
   Or using pip:
   ```bash
   pip install -e .
   ```

## Configuration

### Environment Variables

Create a `.env` file in the project root or set the following environment variables:

```bash
# Required: MCP Server endpoint for audio tools
MCP_SERVER=your_mcp_server_endpoint

# Optional: OpenAI API key (can also be provided in the UI)
OPENAI_API_KEY=sk-your-openai-api-key-here
```

### Environment Variable Details

- **`MCP_SERVER`** (Required): The endpoint URL for the MCP server that provides audio processing tools
- **`OPENAI_API_KEY`** (Optional): Your OpenAI API key. If not set here, you can provide it through the web interface

## Usage

### Running the Application

Start the web interface with:

```bash
python -m src.ui
```

The application will launch a Gradio web interface accessible at:
- Local: `http://localhost:7861`
- Public share URL (if enabled)

### Using the Interface

1. **Configure the Model**: Select your preferred AI model and adjust settings in the right panel
2. **Provide API Key**: Enter your OpenAI API key if not set in environment variables
3. **Upload Audio Files**: Drag and drop or select audio files to process
4. **Describe Your Task**: Type what you want to do with the audio files
5. **Get Results**: The AI will process your request and provide the results

### Example Requests

- *"Merge these two audio files and add a fade-in effect"*
- *"Remove the silence at the beginning of this recording"*
- *"Transcribe the speech in this audio file"*
- *"Increase the volume of the first track and normalize both files"*
- *"Cut out the middle section from 1:30 to 2:45"*
- *"Make this audio play 1.5x faster"*
- *"Apply a fade-out effect to the end of this track"*

## Dependencies

The project relies on several key libraries:

- **LangGraph** (0.4.8+): For building the AI agent workflow
- **Gradio** (5.33.0+): For the web interface
- **LangChain OpenAI** (0.3.21+): For OpenAI model integration
- **LangChain MCP Adapters** (0.1.7+): For Model Context Protocol integration
- **dotenv** (0.9.9+): For environment variable management

See `pyproject.toml` for the complete list of dependencies.

## Troubleshooting

### Common Issues

1. **"Please configure the agent first"**
   - Ensure you've provided a valid OpenAI API key
   - Check that the selected model is available

2. **Audio processing errors**
   - Verify the MCP_SERVER environment variable is set correctly
   - Ensure your audio files are in supported formats
   - Check that the MCP server is running and accessible

3. **Import errors**
   - Make sure all dependencies are installed: `poetry install` or `pip install -e .`
   - Verify you're using Python 3.13 or higher

### Getting Help

If you encounter issues:
1. Check the console output for error messages
2. Verify your environment variables are set correctly
3. Ensure your audio files are in supported formats
4. Try with different AI models if one isn't working