YigitSekerci commited on
Commit
e9367c1
Β·
1 Parent(s): 9afb0c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md CHANGED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Audio Agent - Your AI Audio Assistant
2
+
3
+ An intelligent audio processing assistant powered by AI that can help you manipulate, analyze, and transcribe audio files through a simple web interface.
4
+
5
+ ## Features
6
+
7
+ 🎚️ **Audio Manipulation**
8
+ - Merge multiple audio files into one continuous track
9
+ - Cut or trim specific sections from any file
10
+ - Adjust volume levels (increase or decrease)
11
+ - Normalize audio levels for consistency
12
+ - Apply fade-in or fade-out effects for smooth transitions (Mono channel only)
13
+ - Change playback speed (faster or slower, with pitch change)
14
+ - Reverse audio for creative effects
15
+ - Remove silence from beginning or end of files
16
+
17
+ πŸ“ **Analysis & Transcription** (English only)
18
+ - Transcribe speech in audio to text
19
+ - Analyze audio properties (duration, sample rate, etc.)
20
+
21
+ **Supported Audio Formats**: MP3, WAV, M4A, FLAC, AAC, OGG
22
+
23
+ ## Requirements
24
+
25
+ - Python 3.13
26
+ - OpenAI API key
27
+ - MCP (Model Context Protocol) Server for audio tools
28
+
29
+ ## Installation
30
+
31
+ 1. **Clone the repository**
32
+ ```bash
33
+ git clone <repository-url>
34
+ cd audio-agent
35
+ ```
36
+
37
+ 2. **Install dependencies**
38
+
39
+ The project uses Poetry for dependency management. All dependencies are defined in `pyproject.toml`.
40
+
41
+ Using Poetry (recommended):
42
+ ```bash
43
+ poetry install
44
+ ```
45
+
46
+ Or using pip:
47
+ ```bash
48
+ pip install -e .
49
+ ```
50
+
51
+ ## Configuration
52
+
53
+ ### Environment Variables
54
+
55
+ Create a `.env` file in the project root or set the following environment variables:
56
+
57
+ ```bash
58
+ # Required: MCP Server endpoint for audio tools
59
+ MCP_SERVER=your_mcp_server_endpoint
60
+
61
+ # Optional: OpenAI API key (can also be provided in the UI)
62
+ OPENAI_API_KEY=sk-your-openai-api-key-here
63
+ ```
64
+
65
+ ### Environment Variable Details
66
+
67
+ - **`MCP_SERVER`** (Required): The endpoint URL for the MCP server that provides audio processing tools
68
+ - **`OPENAI_API_KEY`** (Optional): Your OpenAI API key. If not set here, you can provide it through the web interface
69
+
70
+ ## Usage
71
+
72
+ ### Running the Application
73
+
74
+ Start the web interface with:
75
+
76
+ ```bash
77
+ python -m src.ui
78
+ ```
79
+
80
+ The application will launch a Gradio web interface accessible at:
81
+ - Local: `http://localhost:7861`
82
+ - Public share URL (if enabled)
83
+
84
+ ### Using the Interface
85
+
86
+ 1. **Configure the Model**: Select your preferred AI model and adjust settings in the right panel
87
+ 2. **Provide API Key**: Enter your OpenAI API key if not set in environment variables
88
+ 3. **Upload Audio Files**: Drag and drop or select audio files to process
89
+ 4. **Describe Your Task**: Type what you want to do with the audio files
90
+ 5. **Get Results**: The AI will process your request and provide the results
91
+
92
+ ### Example Requests
93
+
94
+ - *"Merge these two audio files and add a fade-in effect"*
95
+ - *"Remove the silence at the beginning of this recording"*
96
+ - *"Transcribe the speech in this audio file"*
97
+ - *"Increase the volume of the first track and normalize both files"*
98
+ - *"Cut out the middle section from 1:30 to 2:45"*
99
+ - *"Make this audio play 1.5x faster"*
100
+ - *"Apply a fade-out effect to the end of this track"*
101
+
102
+ ## Dependencies
103
+
104
+ The project relies on several key libraries:
105
+
106
+ - **LangGraph** (0.4.8+): For building the AI agent workflow
107
+ - **Gradio** (5.33.0+): For the web interface
108
+ - **LangChain OpenAI** (0.3.21+): For OpenAI model integration
109
+ - **LangChain MCP Adapters** (0.1.7+): For Model Context Protocol integration
110
+ - **dotenv** (0.9.9+): For environment variable management
111
+
112
+ See `pyproject.toml` for the complete list of dependencies.
113
+
114
+ ## Troubleshooting
115
+
116
+ ### Common Issues
117
+
118
+ 1. **"Please configure the agent first"**
119
+ - Ensure you've provided a valid OpenAI API key
120
+ - Check that the selected model is available
121
+
122
+ 2. **Audio processing errors**
123
+ - Verify the MCP_SERVER environment variable is set correctly
124
+ - Ensure your audio files are in supported formats
125
+ - Check that the MCP server is running and accessible
126
+
127
+ 3. **Import errors**
128
+ - Make sure all dependencies are installed: `poetry install` or `pip install -e .`
129
+ - Verify you're using Python 3.13 or higher
130
+
131
+ ### Getting Help
132
+
133
+ If you encounter issues:
134
+ 1. Check the console output for error messages
135
+ 2. Verify your environment variables are set correctly
136
+ 3. Ensure your audio files are in supported formats
137
+ 4. Try with different AI models if one isn't working