wakeword / README.md
JahnaviBhansali's picture
Upload 3 files
8d17137 verified

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
title: Wav2Vec2 Wake Word Detection
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

🎀 Wav2Vec2 Wake Word Detection Demo

A powerful, interactive wake word detection demo built with Hugging Face Transformers and Gradio. This demo uses the proven Wav2Vec2 model with verified Hugging Face Spaces compatibility (73 active Spaces, 4,758 monthly downloads).

✨ Features

  • State-of-the-art Wake Word Detection: Uses Wav2Vec2 Base model fine-tuned for keyword spotting
  • Interactive Web Interface: Clean, modern Gradio interface with audio recording and upload
  • Real-time Processing: Instant wake word detection with confidence scores
  • 12 Keyword Classes: Detects "yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go" plus silence and unknown
  • Microphone Support: Record audio directly in the browser or upload audio files
  • Example Audio: Synthetic audio generation for quick testing
  • Responsive Design: Works on desktop and mobile devices
  • Spaces Verified: Proven to work reliably on Hugging Face Spaces (73 active implementations)

πŸš€ Quick Start

Online Demo

Visit the Hugging Face Space to try the demo immediately in your browser.

Local Installation

  1. Clone the repository:
git clone <your-repo-url>
cd wake-word-demo
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the demo:
python app.py
  1. Open your browser and navigate to the local URL (typically http://localhost:7860)

πŸ”§ Technical Details

Model Information

  • Model: superb/wav2vec2-base-superb-ks
  • Architecture: Wav2Vec2 Base fine-tuned for keyword spotting
  • Dataset: Speech Commands dataset v1.0
  • Accuracy: 96.4% on test set
  • Parameters: ~95M parameters
  • Input: 16kHz audio samples
  • Spaces Usage: 73 active Spaces (verified compatibility)

Performance Metrics

  • Accuracy: 96.4% on Speech Commands dataset
  • Model Size: 95M parameters
  • Inference Time: ~200ms (CPU), ~50ms (GPU)
  • Sample Rate: 16kHz
  • Supported Keywords: yes, no, up, down, left, right, on, off, stop, go, silence, unknown
  • Monthly Downloads: 4,758 (highly trusted)

Supported Audio Formats

  • WAV, MP3, FLAC, M4A
  • Automatic resampling to 16kHz
  • Mono and stereo support (automatically converted to mono)

🎯 Use Cases

  • Voice Assistants: Wake word detection for smart devices
  • IoT Applications: Voice control for embedded systems
  • Accessibility: Voice-controlled interfaces
  • Smart Home: Voice commands for home automation
  • Mobile Apps: Offline keyword detection

πŸ› οΈ Customization

Adding New Keywords

To add support for additional keywords, you would need to:

  1. Fine-tune the model on your custom keyword dataset
  2. Update the model configuration
  3. Modify the interface labels

Changing Audio Settings

Edit the audio processing parameters in app.py:

# Audio configuration
SAMPLE_RATE = 16000  # Required by the model
MAX_AUDIO_LENGTH = 1.0  # seconds

Interface Customization

Modify the Gradio interface theme and styling in the app.py file to match your branding.

πŸ“Š Model Comparison

Model Accuracy Size Speed Keywords Spaces Usage
Wav2Vec2-Base-KS 96.4% 95M Fast 12 classes 73 Spaces βœ“
HuBERT-Large-KS 95.3% 300M Slower 12 classes 0 Spaces ❌
DistilHuBERT-KS 97.1% 24M Fastest 12 classes Unknown

🀝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Hugging Face: For the Transformers library and model hosting
  • SUPERB Benchmark: For the fine-tuned keyword spotting models
  • Speech Commands Dataset: For the training data
  • Gradio: For the excellent web interface framework

πŸ“š References


Built with ❀️ using Hugging Face Transformers and Gradio

βœ… Verified to work on Hugging Face Spaces