Spaces:
Running
Running
File size: 4,695 Bytes
8d17137 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
title: Wav2Vec2 Wake Word Detection
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.1"
app_file: app.py
pinned: false
---
# π€ Wav2Vec2 Wake Word Detection Demo
A powerful, interactive wake word detection demo built with Hugging Face Transformers and Gradio. This demo uses the **proven** Wav2Vec2 model with verified Hugging Face Spaces compatibility (73 active Spaces, 4,758 monthly downloads).
## β¨ Features
- **State-of-the-art Wake Word Detection**: Uses Wav2Vec2 Base model fine-tuned for keyword spotting
- **Interactive Web Interface**: Clean, modern Gradio interface with audio recording and upload
- **Real-time Processing**: Instant wake word detection with confidence scores
- **12 Keyword Classes**: Detects "yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go" plus silence and unknown
- **Microphone Support**: Record audio directly in the browser or upload audio files
- **Example Audio**: Synthetic audio generation for quick testing
- **Responsive Design**: Works on desktop and mobile devices
- **Spaces Verified**: Proven to work reliably on Hugging Face Spaces (73 active implementations)
## π Quick Start
### Online Demo
Visit the Hugging Face Space to try the demo immediately in your browser.
### Local Installation
1. **Clone the repository:**
```bash
git clone <your-repo-url>
cd wake-word-demo
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Run the demo:**
```bash
python app.py
```
4. **Open your browser** and navigate to the local URL (typically `http://localhost:7860`)
## π§ Technical Details
### Model Information
- **Model**: `superb/wav2vec2-base-superb-ks`
- **Architecture**: Wav2Vec2 Base fine-tuned for keyword spotting
- **Dataset**: Speech Commands dataset v1.0
- **Accuracy**: 96.4% on test set
- **Parameters**: ~95M parameters
- **Input**: 16kHz audio samples
- **Spaces Usage**: 73 active Spaces (verified compatibility)
### Performance Metrics
- **Accuracy**: 96.4% on Speech Commands dataset
- **Model Size**: 95M parameters
- **Inference Time**: ~200ms (CPU), ~50ms (GPU)
- **Sample Rate**: 16kHz
- **Supported Keywords**: yes, no, up, down, left, right, on, off, stop, go, silence, unknown
- **Monthly Downloads**: 4,758 (highly trusted)
### Supported Audio Formats
- WAV, MP3, FLAC, M4A
- Automatic resampling to 16kHz
- Mono and stereo support (automatically converted to mono)
## π― Use Cases
- **Voice Assistants**: Wake word detection for smart devices
- **IoT Applications**: Voice control for embedded systems
- **Accessibility**: Voice-controlled interfaces
- **Smart Home**: Voice commands for home automation
- **Mobile Apps**: Offline keyword detection
## π οΈ Customization
### Adding New Keywords
To add support for additional keywords, you would need to:
1. Fine-tune the model on your custom keyword dataset
2. Update the model configuration
3. Modify the interface labels
### Changing Audio Settings
Edit the audio processing parameters in `app.py`:
```python
# Audio configuration
SAMPLE_RATE = 16000 # Required by the model
MAX_AUDIO_LENGTH = 1.0 # seconds
```
### Interface Customization
Modify the Gradio interface theme and styling in the `app.py` file to match your branding.
## π Model Comparison
| Model | Accuracy | Size | Speed | Keywords | Spaces Usage |
|-------|----------|------|-------|----------|--------------|
| **Wav2Vec2-Base-KS** | **96.4%** | **95M** | **Fast** | **12 classes** | **73 Spaces β** |
| HuBERT-Large-KS | 95.3% | 300M | Slower | 12 classes | 0 Spaces β |
| DistilHuBERT-KS | 97.1% | 24M | Fastest | 12 classes | Unknown |
## π€ Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
### Development Setup
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## π License
This project is licensed under the MIT License - see the LICENSE file for details.
## π Acknowledgments
- **Hugging Face**: For the Transformers library and model hosting
- **SUPERB Benchmark**: For the fine-tuned keyword spotting models
- **Speech Commands Dataset**: For the training data
- **Gradio**: For the excellent web interface framework
## π References
- [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051)
- [Wav2Vec2: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477)
- [Speech Commands Dataset](https://arxiv.org/abs/1804.03209)
---
**Built with β€οΈ using Hugging Face Transformers and Gradio**
**β
Verified to work on Hugging Face Spaces** |