Spaces:

Agents-MCP-Hackathon
/

Audio-Agent

Sleeping

App Files Files Community

YigitSekerci commited on Jun 9

Commit

e9367c1

1 Parent(s): 9afb0c3

Update README.md

Browse files

Files changed (1) hide show

README.md +137 -0

README.md CHANGED Viewed

	@@ -0,0 +1,137 @@

+# Audio Agent - Your AI Audio Assistant
+An intelligent audio processing assistant powered by AI that can help you manipulate, analyze, and transcribe audio files through a simple web interface.
+## Features
+🎚️ **Audio Manipulation**
+- Merge multiple audio files into one continuous track
+- Cut or trim specific sections from any file
+- Adjust volume levels (increase or decrease)
+- Normalize audio levels for consistency
+- Apply fade-in or fade-out effects for smooth transitions (Mono channel only)
+- Change playback speed (faster or slower, with pitch change)
+- Reverse audio for creative effects
+- Remove silence from beginning or end of files
+📝 **Analysis & Transcription** (English only)
+- Transcribe speech in audio to text
+- Analyze audio properties (duration, sample rate, etc.)
+**Supported Audio Formats**: MP3, WAV, M4A, FLAC, AAC, OGG
+## Requirements
+- Python 3.13
+- OpenAI API key
+- MCP (Model Context Protocol) Server for audio tools
+## Installation
+1. **Clone the repository**
+   ```bash
+   git clone <repository-url>
+   cd audio-agent
+   ```
+2. **Install dependencies**
+   The project uses Poetry for dependency management. All dependencies are defined in `pyproject.toml`.
+   Using Poetry (recommended):
+   ```bash
+   poetry install
+   ```
+   Or using pip:
+   ```bash
+   pip install -e .
+   ```
+## Configuration
+### Environment Variables
+Create a `.env` file in the project root or set the following environment variables:
+```bash
+# Required: MCP Server endpoint for audio tools
+MCP_SERVER=your_mcp_server_endpoint
+# Optional: OpenAI API key (can also be provided in the UI)
+OPENAI_API_KEY=sk-your-openai-api-key-here
+```
+### Environment Variable Details
+- **`MCP_SERVER`** (Required): The endpoint URL for the MCP server that provides audio processing tools
+- **`OPENAI_API_KEY`** (Optional): Your OpenAI API key. If not set here, you can provide it through the web interface
+## Usage
+### Running the Application
+Start the web interface with:
+```bash
+python -m src.ui
+```
+The application will launch a Gradio web interface accessible at:
+- Local: `http://localhost:7861`
+- Public share URL (if enabled)
+### Using the Interface
+1. **Configure the Model**: Select your preferred AI model and adjust settings in the right panel
+2. **Provide API Key**: Enter your OpenAI API key if not set in environment variables
+3. **Upload Audio Files**: Drag and drop or select audio files to process
+4. **Describe Your Task**: Type what you want to do with the audio files
+5. **Get Results**: The AI will process your request and provide the results
+### Example Requests
+- *"Merge these two audio files and add a fade-in effect"*
+- *"Remove the silence at the beginning of this recording"*
+- *"Transcribe the speech in this audio file"*
+- *"Increase the volume of the first track and normalize both files"*
+- *"Cut out the middle section from 1:30 to 2:45"*
+- *"Make this audio play 1.5x faster"*
+- *"Apply a fade-out effect to the end of this track"*
+## Dependencies
+The project relies on several key libraries:
+- **LangGraph** (0.4.8+): For building the AI agent workflow
+- **Gradio** (5.33.0+): For the web interface
+- **LangChain OpenAI** (0.3.21+): For OpenAI model integration
+- **LangChain MCP Adapters** (0.1.7+): For Model Context Protocol integration
+- **dotenv** (0.9.9+): For environment variable management
+See `pyproject.toml` for the complete list of dependencies.
+## Troubleshooting
+### Common Issues
+1. **"Please configure the agent first"**
+   - Ensure you've provided a valid OpenAI API key
+   - Check that the selected model is available
+2. **Audio processing errors**
+   - Verify the MCP_SERVER environment variable is set correctly
+   - Ensure your audio files are in supported formats
+   - Check that the MCP server is running and accessible
+3. **Import errors**
+   - Make sure all dependencies are installed: `poetry install` or `pip install -e .`
+   - Verify you're using Python 3.13 or higher
+### Getting Help
+If you encounter issues:
+1. Check the console output for error messages
+2. Verify your environment variables are set correctly
+3. Ensure your audio files are in supported formats
+4. Try with different AI models if one isn't working