File size: 3,187 Bytes
b566a39
edec527
2bf1a88
b566a39
0c60506
b566a39
 
 
 
edec527
aa6d1da
9dd4c4c
016a7d0
9dd4c4c
2b18779
9dd4c4c
 
 
2b18779
 
430cf67
 
 
 
b566a39
 
2bf1a88
b566a39
9dd4c4c
b566a39
9dd4c4c
b566a39
9dd4c4c
b566a39
9dd4c4c
 
 
 
 
 
 
b566a39
9dd4c4c
b566a39
9dd4c4c
 
 
 
 
 
 
b566a39
9dd4c4c
b566a39
9dd4c4c
b566a39
9dd4c4c
 
 
b566a39
9dd4c4c
 
b566a39
9dd4c4c
2b18779
9dd4c4c
b566a39
9dd4c4c
 
 
 
b566a39
9dd4c4c
b566a39
9dd4c4c
 
b566a39
9dd4c4c
b566a39
2b18779
b566a39
9dd4c4c
b566a39
9dd4c4c
 
 
 
 
b566a39
9dd4c4c
b566a39
0c60506
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: Parakeet.js Demo
emoji: 🦜
colorFrom: indigo
colorTo: blue
sdk: static
pinned: false
app_build_command: npm run build
app_file: build/index.html
license: mit
short_description: NVIDIA Parakeet speech recognition for the browser
models:
- istupakov/parakeet-tdt-0.6b-v2-onnx
tags:
- parakeet-js
- parakeet
- onnx
- webgpu
- asr
- istupakov/parakeet-tdt-0.6b-v2-onnx
custom_headers:
  cross-origin-embedder-policy: require-corp
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin
---

# 🦜 Parakeet.js - HF Spaces Demo

> **NVIDIA Parakeet speech recognition for the browser using WebGPU/WASM**

This demo showcases the **[parakeet.js](https://www.npmjs.com/package/parakeet.js)** library, which brings NVIDIA's Parakeet speech recognition models to the browser using ONNX Runtime Web with WebGPU and WASM backends.

## πŸš€ Features

- **πŸ–₯️ Browser-based**: Runs entirely in your browser - no server required
- **⚑ WebGPU acceleration**: Fast inference using WebGPU when available
- **πŸ”§ WASM fallback**: CPU-based inference using WebAssembly
- **πŸ“± Multiple formats**: Supports various audio formats (WAV, MP3, etc.)
- **🎯 Real-time performance**: Optimized for fast transcription
- **πŸ“Š Performance metrics**: Shows detailed timing information
- **πŸŽ›οΈ Configurable**: Adjustable quantization, preprocessing, and backend settings

## πŸ”§ How to Use

1. **Click "Load Model"** to download and initialize the speech recognition model
2. **Select your preferences**:
   - **Backend**: Choose WebGPU (faster) or WASM (more compatible)
   - **Quantization**: fp32 (higher quality) or int8 (faster)
   - **Preprocessor**: Different audio processing options
3. **Upload an audio file** using the file input
4. **View the transcription** in real-time with performance metrics

## πŸ“¦ Integration

You can use parakeet.js in your own projects:

```bash
npm install parakeet.js onnxruntime-web
```

```javascript
import { ParakeetModel, getParakeetModel } from 'parakeet.js';

// Load model from HuggingFace Hub
const modelUrls = await getParakeetModel('istupakov/parakeet-tdt-0.6b-v2-onnx');
const model = await ParakeetModel.fromUrls(modelUrls);

// Transcribe audio
const result = await model.transcribe(audioData, sampleRate);
console.log(result.utterance_text);
```

## πŸ”— Links

- **πŸ“š [GitHub Repository](https://github.com/ysdede/parakeet.js)** - Source code and documentation
- **πŸ“¦ [npm Package](https://www.npmjs.com/package/parakeet.js)** - Install via npm

## 🧠 Model Information

This demo uses the **istupakov/parakeet-tdt-0.6b-v2-onnx** model, which is an ONNX-converted version of NVIDIA's Parakeet speech recognition model optimized for browser deployment.

## πŸ’‘ Technical Details

- **Model Format**: ONNX for cross-platform compatibility
- **Backends**: WebGPU (GPU acceleration) and WASM (CPU fallback)
- **Quantization**: Support for both fp32 and int8 precision
- **Audio Processing**: Built-in preprocessing for various audio formats
- **Performance**: Real-time factor (RTF) typically < 1.0x for fast transcription

---

*Built with ❀️ using React and deployed on Hugging Face Spaces*