Spaces:
Running
A newer version of the Gradio SDK is available:
5.42.0
title: Wav2Vec2 Wake Word Detection
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
π€ Wav2Vec2 Wake Word Detection Demo
A powerful, interactive wake word detection demo built with Hugging Face Transformers and Gradio. This demo uses the proven Wav2Vec2 model with verified Hugging Face Spaces compatibility (73 active Spaces, 4,758 monthly downloads).
β¨ Features
- State-of-the-art Wake Word Detection: Uses Wav2Vec2 Base model fine-tuned for keyword spotting
- Interactive Web Interface: Clean, modern Gradio interface with audio recording and upload
- Real-time Processing: Instant wake word detection with confidence scores
- 12 Keyword Classes: Detects "yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go" plus silence and unknown
- Microphone Support: Record audio directly in the browser or upload audio files
- Example Audio: Synthetic audio generation for quick testing
- Responsive Design: Works on desktop and mobile devices
- Spaces Verified: Proven to work reliably on Hugging Face Spaces (73 active implementations)
π Quick Start
Online Demo
Visit the Hugging Face Space to try the demo immediately in your browser.
Local Installation
- Clone the repository:
git clone <your-repo-url>
cd wake-word-demo
- Install dependencies:
pip install -r requirements.txt
- Run the demo:
python app.py
- Open your browser and navigate to the local URL (typically
http://localhost:7860
)
π§ Technical Details
Model Information
- Model:
superb/wav2vec2-base-superb-ks
- Architecture: Wav2Vec2 Base fine-tuned for keyword spotting
- Dataset: Speech Commands dataset v1.0
- Accuracy: 96.4% on test set
- Parameters: ~95M parameters
- Input: 16kHz audio samples
- Spaces Usage: 73 active Spaces (verified compatibility)
Performance Metrics
- Accuracy: 96.4% on Speech Commands dataset
- Model Size: 95M parameters
- Inference Time: ~200ms (CPU), ~50ms (GPU)
- Sample Rate: 16kHz
- Supported Keywords: yes, no, up, down, left, right, on, off, stop, go, silence, unknown
- Monthly Downloads: 4,758 (highly trusted)
Supported Audio Formats
- WAV, MP3, FLAC, M4A
- Automatic resampling to 16kHz
- Mono and stereo support (automatically converted to mono)
π― Use Cases
- Voice Assistants: Wake word detection for smart devices
- IoT Applications: Voice control for embedded systems
- Accessibility: Voice-controlled interfaces
- Smart Home: Voice commands for home automation
- Mobile Apps: Offline keyword detection
π οΈ Customization
Adding New Keywords
To add support for additional keywords, you would need to:
- Fine-tune the model on your custom keyword dataset
- Update the model configuration
- Modify the interface labels
Changing Audio Settings
Edit the audio processing parameters in app.py
:
# Audio configuration
SAMPLE_RATE = 16000 # Required by the model
MAX_AUDIO_LENGTH = 1.0 # seconds
Interface Customization
Modify the Gradio interface theme and styling in the app.py
file to match your branding.
π Model Comparison
Model | Accuracy | Size | Speed | Keywords | Spaces Usage |
---|---|---|---|---|---|
Wav2Vec2-Base-KS | 96.4% | 95M | Fast | 12 classes | 73 Spaces β |
HuBERT-Large-KS | 95.3% | 300M | Slower | 12 classes | 0 Spaces β |
DistilHuBERT-KS | 97.1% | 24M | Fastest | 12 classes | Unknown |
π€ Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
Development Setup
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Hugging Face: For the Transformers library and model hosting
- SUPERB Benchmark: For the fine-tuned keyword spotting models
- Speech Commands Dataset: For the training data
- Gradio: For the excellent web interface framework
π References
- SUPERB: Speech processing Universal PERformance Benchmark
- Wav2Vec2: A Framework for Self-Supervised Learning of Speech Representations
- Speech Commands Dataset
Built with β€οΈ using Hugging Face Transformers and Gradio
β Verified to work on Hugging Face Spaces