Spaces:
Sleeping
Sleeping
Commit
Β·
e9367c1
1
Parent(s):
9afb0c3
Update README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Audio Agent - Your AI Audio Assistant
|
2 |
+
|
3 |
+
An intelligent audio processing assistant powered by AI that can help you manipulate, analyze, and transcribe audio files through a simple web interface.
|
4 |
+
|
5 |
+
## Features
|
6 |
+
|
7 |
+
ποΈ **Audio Manipulation**
|
8 |
+
- Merge multiple audio files into one continuous track
|
9 |
+
- Cut or trim specific sections from any file
|
10 |
+
- Adjust volume levels (increase or decrease)
|
11 |
+
- Normalize audio levels for consistency
|
12 |
+
- Apply fade-in or fade-out effects for smooth transitions (Mono channel only)
|
13 |
+
- Change playback speed (faster or slower, with pitch change)
|
14 |
+
- Reverse audio for creative effects
|
15 |
+
- Remove silence from beginning or end of files
|
16 |
+
|
17 |
+
π **Analysis & Transcription** (English only)
|
18 |
+
- Transcribe speech in audio to text
|
19 |
+
- Analyze audio properties (duration, sample rate, etc.)
|
20 |
+
|
21 |
+
**Supported Audio Formats**: MP3, WAV, M4A, FLAC, AAC, OGG
|
22 |
+
|
23 |
+
## Requirements
|
24 |
+
|
25 |
+
- Python 3.13
|
26 |
+
- OpenAI API key
|
27 |
+
- MCP (Model Context Protocol) Server for audio tools
|
28 |
+
|
29 |
+
## Installation
|
30 |
+
|
31 |
+
1. **Clone the repository**
|
32 |
+
```bash
|
33 |
+
git clone <repository-url>
|
34 |
+
cd audio-agent
|
35 |
+
```
|
36 |
+
|
37 |
+
2. **Install dependencies**
|
38 |
+
|
39 |
+
The project uses Poetry for dependency management. All dependencies are defined in `pyproject.toml`.
|
40 |
+
|
41 |
+
Using Poetry (recommended):
|
42 |
+
```bash
|
43 |
+
poetry install
|
44 |
+
```
|
45 |
+
|
46 |
+
Or using pip:
|
47 |
+
```bash
|
48 |
+
pip install -e .
|
49 |
+
```
|
50 |
+
|
51 |
+
## Configuration
|
52 |
+
|
53 |
+
### Environment Variables
|
54 |
+
|
55 |
+
Create a `.env` file in the project root or set the following environment variables:
|
56 |
+
|
57 |
+
```bash
|
58 |
+
# Required: MCP Server endpoint for audio tools
|
59 |
+
MCP_SERVER=your_mcp_server_endpoint
|
60 |
+
|
61 |
+
# Optional: OpenAI API key (can also be provided in the UI)
|
62 |
+
OPENAI_API_KEY=sk-your-openai-api-key-here
|
63 |
+
```
|
64 |
+
|
65 |
+
### Environment Variable Details
|
66 |
+
|
67 |
+
- **`MCP_SERVER`** (Required): The endpoint URL for the MCP server that provides audio processing tools
|
68 |
+
- **`OPENAI_API_KEY`** (Optional): Your OpenAI API key. If not set here, you can provide it through the web interface
|
69 |
+
|
70 |
+
## Usage
|
71 |
+
|
72 |
+
### Running the Application
|
73 |
+
|
74 |
+
Start the web interface with:
|
75 |
+
|
76 |
+
```bash
|
77 |
+
python -m src.ui
|
78 |
+
```
|
79 |
+
|
80 |
+
The application will launch a Gradio web interface accessible at:
|
81 |
+
- Local: `http://localhost:7861`
|
82 |
+
- Public share URL (if enabled)
|
83 |
+
|
84 |
+
### Using the Interface
|
85 |
+
|
86 |
+
1. **Configure the Model**: Select your preferred AI model and adjust settings in the right panel
|
87 |
+
2. **Provide API Key**: Enter your OpenAI API key if not set in environment variables
|
88 |
+
3. **Upload Audio Files**: Drag and drop or select audio files to process
|
89 |
+
4. **Describe Your Task**: Type what you want to do with the audio files
|
90 |
+
5. **Get Results**: The AI will process your request and provide the results
|
91 |
+
|
92 |
+
### Example Requests
|
93 |
+
|
94 |
+
- *"Merge these two audio files and add a fade-in effect"*
|
95 |
+
- *"Remove the silence at the beginning of this recording"*
|
96 |
+
- *"Transcribe the speech in this audio file"*
|
97 |
+
- *"Increase the volume of the first track and normalize both files"*
|
98 |
+
- *"Cut out the middle section from 1:30 to 2:45"*
|
99 |
+
- *"Make this audio play 1.5x faster"*
|
100 |
+
- *"Apply a fade-out effect to the end of this track"*
|
101 |
+
|
102 |
+
## Dependencies
|
103 |
+
|
104 |
+
The project relies on several key libraries:
|
105 |
+
|
106 |
+
- **LangGraph** (0.4.8+): For building the AI agent workflow
|
107 |
+
- **Gradio** (5.33.0+): For the web interface
|
108 |
+
- **LangChain OpenAI** (0.3.21+): For OpenAI model integration
|
109 |
+
- **LangChain MCP Adapters** (0.1.7+): For Model Context Protocol integration
|
110 |
+
- **dotenv** (0.9.9+): For environment variable management
|
111 |
+
|
112 |
+
See `pyproject.toml` for the complete list of dependencies.
|
113 |
+
|
114 |
+
## Troubleshooting
|
115 |
+
|
116 |
+
### Common Issues
|
117 |
+
|
118 |
+
1. **"Please configure the agent first"**
|
119 |
+
- Ensure you've provided a valid OpenAI API key
|
120 |
+
- Check that the selected model is available
|
121 |
+
|
122 |
+
2. **Audio processing errors**
|
123 |
+
- Verify the MCP_SERVER environment variable is set correctly
|
124 |
+
- Ensure your audio files are in supported formats
|
125 |
+
- Check that the MCP server is running and accessible
|
126 |
+
|
127 |
+
3. **Import errors**
|
128 |
+
- Make sure all dependencies are installed: `poetry install` or `pip install -e .`
|
129 |
+
- Verify you're using Python 3.13 or higher
|
130 |
+
|
131 |
+
### Getting Help
|
132 |
+
|
133 |
+
If you encounter issues:
|
134 |
+
1. Check the console output for error messages
|
135 |
+
2. Verify your environment variables are set correctly
|
136 |
+
3. Ensure your audio files are in supported formats
|
137 |
+
4. Try with different AI models if one isn't working
|