VoiceCraftr / README.md
Nick021402's picture
Update README.md
324216f verified

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
license: mit
title: VoiceCraftr
sdk: gradio
emoji: πŸ”₯
colorFrom: indigo
colorTo: gray
pinned: true
short_description: Transform any song into any voice – including yours.

🎡 AI Cover Song Platform

Transform any song with AI voice synthesis! Upload a song, choose a voice model, and generate high-quality AI covers.

✨ Features

  • 🎡 Audio Separation: Automatically separate vocals and instrumentals using Demucs/Spleeter
  • 🎀 Voice Cloning: Convert vocals to different artist styles (Drake, Ariana Grande, The Weeknd, etc.)
  • 🎧 High-Quality Output: Generate professional-quality AI covers
  • πŸŽ™οΈ Custom Voice Training: Train your own voice models with personal recordings
  • βš™οΈ Advanced Controls: Pitch shifting, voice strength, auto-tune, and format options

πŸš€ How It Works

  1. Upload Your Song - Support for MP3, WAV, FLAC files
  2. Choose Voice Model - Select from pre-trained artist voices or train your own
  3. Adjust Settings - Fine-tune pitch, voice strength, and audio effects
  4. Generate Cover - AI processes and creates your cover song

πŸ› οΈ Technology Stack

Audio Processing

  • Demucs: State-of-the-art audio source separation
  • Spleeter: Alternative audio separation engine
  • Librosa: Advanced audio analysis and processing
  • SoundFile: High-quality audio I/O

Voice Synthesis

  • So-VITS-SVC: High-quality singing voice conversion
  • Fairseq: Neural machine translation for voice
  • ESPnet: End-to-end speech processing toolkit

Machine Learning

  • PyTorch: Deep learning framework
  • Transformers: Pre-trained model hub
  • Accelerate: Distributed training utilities

Web Interface

  • Gradio: Interactive ML web applications
  • Hugging Face Spaces: Cloud deployment platform

πŸ“‹ Installation

For Hugging Face Spaces

This app is designed to run on Hugging Face Spaces. Simply:

  1. Create a new Space on Hugging Face
  2. Upload all files from this repository
  3. The app will automatically install dependencies and launch

For Local Development

# Clone the repository
git clone <your-repo-url>
cd ai-cover-platform

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py

🎯 Usage

Basic Usage

  1. Upload an audio file (MP3, WAV, or FLAC)
  2. Select a voice model from the dropdown
  3. Adjust settings if needed
  4. Click "Generate AI Cover"
  5. Download your AI-generated cover!

Custom Voice Training

  1. Click on "Train Custom Voice" accordion
  2. Upload 2-5 voice samples (30 seconds each)
  3. Click "Train Custom Voice"
  4. Use the custom model for your covers

Advanced Settings

  • Pitch Shift: Adjust vocal pitch (-12 to +12 semitones)
  • Voice Strength: Control how strong the AI voice effect is (0-100%)
  • Auto-tune: Apply automatic pitch correction
  • Output Format: Choose between WAV, MP3, or FLAC

🎨 Voice Models

Pre-trained Models

  • Drake Style: Hip-hop/R&B vocals with deep, smooth tone
  • Ariana Style: Pop vocals with high range and vibrato
  • The Weeknd Style: Alternative R&B with atmospheric vocals
  • Taylor Swift Style: Pop-country vocals with clear articulation

Custom Models

Train your own voice model by uploading voice samples. The system will:

  • Extract vocal characteristics
  • Train a personalized voice model
  • Make it available for future covers

βš™οΈ Configuration

Environment Variables

Create a .env file for configuration:

# Optional: Set custom model paths
MODELS_DIR=/path/to/models
TEMP_DIR=/path/to/temp

# Optional: API keys for enhanced features
HUGGINGFACE_TOKEN=your_token_here
WANDB_API_KEY=your_wandb_key

Hardware Requirements

  • Minimum: 4GB RAM, CPU-only processing
  • Recommended: 8GB+ RAM, NVIDIA GPU with CUDA
  • Optimal: 16GB+ RAM, RTX 3080+ or equivalent

πŸ”§ Technical Details

Audio Processing Pipeline

  1. Input Validation: Check file format and size
  2. Audio Loading: Convert to standard format (44.1kHz, 16-bit)
  3. Source Separation: Extract vocals and instrumentals
  4. Voice Conversion: Apply target voice characteristics
  5. Audio Mixing: Combine converted vocals with instrumentals
  6. Post-processing: Apply effects and format conversion

Voice Conversion Process

  1. Feature Extraction: Analyze vocal characteristics
  2. Model Loading: Load target voice model
  3. Style Transfer: Apply voice characteristics
  4. Quality Enhancement: Improve audio quality
  5. Temporal Alignment: Sync with original timing

πŸ“Š Performance

Processing Times (approximate)

  • 3-minute song: 2-5 minutes on CPU, 30-60 seconds on GPU
  • Custom voice training: 5-15 minutes depending on sample length
  • Audio separation: 1-3 minutes per song

Quality Metrics

  • Audio Quality: Up to 44.1kHz/24-bit output
  • Voice Similarity: 80-95% depending on model and source material
  • Processing Accuracy: 90%+ vocal separation quality

⚠️ Legal & Ethical Considerations

Important Disclaimers

  • Educational Use Only: This platform is for demonstration and educational purposes
  • Consent Required: Always obtain consent before cloning someone's voice
  • Copyright Respect: Respect copyright laws and artist rights
  • No Harmful Content: Do not create misleading or harmful content
  • Attribution: Credit original artists when sharing covers

Responsible AI Use

  • Use voice cloning technology ethically
  • Respect privacy and consent
  • Follow platform terms of service
  • Report misuse when encountered

🀝 Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

# Format code
black app.py
isort app.py

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

Common Issues

  • Out of Memory: Reduce audio length or use CPU processing
  • Poor Quality: Check input audio quality and voice model compatibility
  • Slow Processing: Consider using GPU acceleration

Getting Help

  • Open an issue on GitHub
  • Check the FAQ
  • Join our community discussions

πŸŽ‰ Acknowledgments

  • Demucs Team: For excellent audio separation models
  • So-VITS-SVC: For voice conversion technology
  • Hugging Face: For the amazing Spaces platform
  • Gradio Team: For the intuitive ML web interface
  • Open Source Community: For