Spaces:
Running
Running
Init
Browse files- README.md +60 -51
- package-lock.json +0 -0
- package.json +3 -1
- src/App.css +157 -25
- src/App.js +311 -16
README.md
CHANGED
@@ -8,76 +8,85 @@ pinned: false
|
|
8 |
app_build_command: npm run build
|
9 |
app_file: build/index.html
|
10 |
license: mit
|
11 |
-
short_description: NVIDIA Parakeet speech recognition for the browser (WebGPU)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
-
#
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
19 |
|
20 |
-
|
21 |
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
-
|
25 |
-
Open [http://localhost:3000](http://localhost:3000) to view it in your browser.
|
26 |
|
27 |
-
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
-
See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.
|
34 |
|
35 |
-
|
|
|
|
|
36 |
|
37 |
-
|
38 |
-
|
39 |
|
40 |
-
|
41 |
-
|
|
|
42 |
|
43 |
-
|
|
|
|
|
|
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
|
|
|
|
48 |
|
49 |
-
|
50 |
|
51 |
-
|
52 |
|
53 |
-
|
54 |
|
55 |
-
|
|
|
|
|
|
|
|
|
56 |
|
57 |
-
|
58 |
-
|
59 |
-
To learn React, check out the [React documentation](https://reactjs.org/).
|
60 |
-
|
61 |
-
### Code Splitting
|
62 |
-
|
63 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/code-splitting](https://facebook.github.io/create-react-app/docs/code-splitting)
|
64 |
-
|
65 |
-
### Analyzing the Bundle Size
|
66 |
-
|
67 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/analyzing-the-bundle-size](https://facebook.github.io/create-react-app/docs/analyzing-the-bundle-size)
|
68 |
-
|
69 |
-
### Making a Progressive Web App
|
70 |
-
|
71 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/making-a-progressive-web-app](https://facebook.github.io/create-react-app/docs/making-a-progressive-web-app)
|
72 |
-
|
73 |
-
### Advanced Configuration
|
74 |
-
|
75 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/advanced-configuration](https://facebook.github.io/create-react-app/docs/advanced-configuration)
|
76 |
-
|
77 |
-
### Deployment
|
78 |
-
|
79 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/deployment](https://facebook.github.io/create-react-app/docs/deployment)
|
80 |
-
|
81 |
-
### `npm run build` fails to minify
|
82 |
|
83 |
-
|
|
|
8 |
app_build_command: npm run build
|
9 |
app_file: build/index.html
|
10 |
license: mit
|
11 |
+
short_description: NVIDIA Parakeet speech recognition for the browser (WebGPU/WASM)
|
12 |
+
models:
|
13 |
+
- ysdede/parakeet-tdt-0.6b-v2-onnx
|
14 |
+
tags:
|
15 |
+
- parakeet
|
16 |
+
- speech
|
17 |
+
- onnx
|
18 |
+
- webgpu
|
19 |
+
- wasm
|
20 |
+
- transcription
|
21 |
+
- nvidia
|
22 |
+
- speech-recognition
|
23 |
+
- browser
|
24 |
---
|
25 |
|
26 |
+
# 🐠 Parakeet.js - HF Spaces Demo
|
27 |
|
28 |
+
> **NVIDIA Parakeet speech recognition for the browser using WebGPU/WASM**
|
29 |
|
30 |
+
This demo showcases the **[parakeet.js](https://www.npmjs.com/package/parakeet.js)** library, which brings NVIDIA's Parakeet speech recognition models to the browser using ONNX Runtime Web with WebGPU and WASM backends.
|
31 |
|
32 |
+
## 🚀 Features
|
33 |
|
34 |
+
- **🖥️ Browser-based**: Runs entirely in your browser - no server required
|
35 |
+
- **⚡ WebGPU acceleration**: Fast inference using WebGPU when available
|
36 |
+
- **🔧 WASM fallback**: CPU-based inference using WebAssembly
|
37 |
+
- **📱 Multiple formats**: Supports various audio formats (WAV, MP3, etc.)
|
38 |
+
- **🎯 Real-time performance**: Optimized for fast transcription
|
39 |
+
- **📊 Performance metrics**: Shows detailed timing information
|
40 |
+
- **🎛️ Configurable**: Adjustable quantization, preprocessing, and backend settings
|
41 |
|
42 |
+
## 🔧 How to Use
|
|
|
43 |
|
44 |
+
1. **Click "Load Model"** to download and initialize the speech recognition model
|
45 |
+
2. **Select your preferences**:
|
46 |
+
- **Backend**: Choose WebGPU (faster) or WASM (more compatible)
|
47 |
+
- **Quantization**: fp32 (higher quality) or int8 (faster)
|
48 |
+
- **Preprocessor**: Different audio processing options
|
49 |
+
3. **Upload an audio file** using the file input
|
50 |
+
4. **View the transcription** in real-time with performance metrics
|
51 |
|
52 |
+
## 📦 Integration
|
53 |
|
54 |
+
You can use parakeet.js in your own projects:
|
|
|
55 |
|
56 |
+
```bash
|
57 |
+
npm install parakeet.js onnxruntime-web
|
58 |
+
```
|
59 |
|
60 |
+
```javascript
|
61 |
+
import { ParakeetModel, getParakeetModel } from 'parakeet.js';
|
62 |
|
63 |
+
// Load model from HuggingFace Hub
|
64 |
+
const modelUrls = await getParakeetModel('ysdede/parakeet-tdt-0.6b-v2-onnx');
|
65 |
+
const model = await ParakeetModel.fromUrls(modelUrls);
|
66 |
|
67 |
+
// Transcribe audio
|
68 |
+
const result = await model.transcribe(audioData, sampleRate);
|
69 |
+
console.log(result.utterance_text);
|
70 |
+
```
|
71 |
|
72 |
+
## 🔗 Links
|
73 |
|
74 |
+
- **📚 [GitHub Repository](https://github.com/ysdede/parakeet.js)** - Source code and documentation
|
75 |
+
- **📦 [npm Package](https://www.npmjs.com/package/parakeet.js)** - Install via npm
|
76 |
+
- **🤖 [NVIDIA Parakeet Model](https://huggingface.co/nvidia/parakeet-tdt-1.1b)** - Original model on HuggingFace
|
77 |
|
78 |
+
## 🧠 Model Information
|
79 |
|
80 |
+
This demo uses the **ysdede/parakeet-tdt-0.6b-v2-onnx** model, which is an ONNX-converted version of NVIDIA's Parakeet speech recognition model optimized for browser deployment.
|
81 |
|
82 |
+
## 💡 Technical Details
|
83 |
|
84 |
+
- **Model Format**: ONNX for cross-platform compatibility
|
85 |
+
- **Backends**: WebGPU (GPU acceleration) and WASM (CPU fallback)
|
86 |
+
- **Quantization**: Support for both fp32 and int8 precision
|
87 |
+
- **Audio Processing**: Built-in preprocessing for various audio formats
|
88 |
+
- **Performance**: Real-time factor (RTF) typically < 1.0x for fast transcription
|
89 |
|
90 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
+
*Built with ❤️ using React and deployed on Hugging Face Spaces*
|
package-lock.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
package.json
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"name": "
|
3 |
"version": "0.1.0",
|
4 |
"private": true,
|
5 |
"dependencies": {
|
@@ -7,6 +7,8 @@
|
|
7 |
"@testing-library/jest-dom": "^6.6.3",
|
8 |
"@testing-library/react": "^16.3.0",
|
9 |
"@testing-library/user-event": "^13.5.0",
|
|
|
|
|
10 |
"react": "^19.1.0",
|
11 |
"react-dom": "^19.1.0",
|
12 |
"react-scripts": "5.0.1",
|
|
|
1 |
{
|
2 |
+
"name": "parakeet-js-hf-spaces-demo",
|
3 |
"version": "0.1.0",
|
4 |
"private": true,
|
5 |
"dependencies": {
|
|
|
7 |
"@testing-library/jest-dom": "^6.6.3",
|
8 |
"@testing-library/react": "^16.3.0",
|
9 |
"@testing-library/user-event": "^13.5.0",
|
10 |
+
"parakeet.js": "^0.0.1",
|
11 |
+
"onnxruntime-web": "1.22.0-dev.20250409-89f8206ba4",
|
12 |
"react": "^19.1.0",
|
13 |
"react-dom": "^19.1.0",
|
14 |
"react-scripts": "5.0.1",
|
src/App.css
CHANGED
@@ -1,38 +1,170 @@
|
|
1 |
-
|
2 |
-
|
|
|
|
|
|
|
3 |
}
|
4 |
|
5 |
-
.
|
6 |
-
|
7 |
-
|
|
|
|
|
|
|
|
|
8 |
}
|
9 |
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
14 |
}
|
15 |
|
16 |
-
.
|
17 |
-
|
18 |
-
min-height: 100vh;
|
19 |
display: flex;
|
20 |
-
flex-direction: column;
|
21 |
align-items: center;
|
22 |
-
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
}
|
26 |
|
27 |
-
.
|
28 |
-
color: #
|
|
|
29 |
}
|
30 |
|
31 |
-
|
32 |
-
|
33 |
-
transform: rotate(0deg);
|
34 |
-
}
|
35 |
-
to {
|
36 |
-
transform: rotate(360deg);
|
37 |
-
}
|
38 |
}
|
|
|
1 |
+
:root {
|
2 |
+
font-family: Inter, system-ui, sans-serif;
|
3 |
+
line-height: 1.4;
|
4 |
+
color: #222;
|
5 |
+
background: #f3f6f8;
|
6 |
}
|
7 |
|
8 |
+
.app {
|
9 |
+
max-width: 760px;
|
10 |
+
margin: 2rem auto;
|
11 |
+
background: #ffffff;
|
12 |
+
border-radius: 8px;
|
13 |
+
padding: 1.5rem 2rem;
|
14 |
+
box-shadow: 0 4px 14px rgba(0, 0, 0, 0.06);
|
15 |
}
|
16 |
|
17 |
+
.controls {
|
18 |
+
display: flex;
|
19 |
+
flex-wrap: wrap;
|
20 |
+
gap: 0.75rem;
|
21 |
+
align-items: center;
|
22 |
+
margin-bottom: 1rem;
|
23 |
}
|
24 |
|
25 |
+
.controls label {
|
26 |
+
font-size: 0.9rem;
|
|
|
27 |
display: flex;
|
|
|
28 |
align-items: center;
|
29 |
+
gap: 0.35rem;
|
30 |
+
}
|
31 |
+
|
32 |
+
.controls select,
|
33 |
+
.controls input[type="number"] {
|
34 |
+
padding: 0.25rem 0.5rem;
|
35 |
+
border: 1px solid #d1d5db;
|
36 |
+
border-radius: 4px;
|
37 |
+
background: #fff;
|
38 |
+
}
|
39 |
+
|
40 |
+
button.primary {
|
41 |
+
padding: 0.4rem 0.9rem;
|
42 |
+
background: #3b82f6;
|
43 |
+
color: #ffffff;
|
44 |
+
border: none;
|
45 |
+
border-radius: 4px;
|
46 |
+
cursor: pointer;
|
47 |
+
}
|
48 |
+
|
49 |
+
button.primary:hover {
|
50 |
+
background: #2563eb;
|
51 |
+
}
|
52 |
+
|
53 |
+
.status {
|
54 |
+
margin-top: 0.5rem;
|
55 |
+
font-weight: 500;
|
56 |
+
}
|
57 |
+
|
58 |
+
.progress-wrapper {
|
59 |
+
margin: 0.5rem 0;
|
60 |
+
}
|
61 |
+
|
62 |
+
.progress-bar {
|
63 |
+
height: 8px;
|
64 |
+
background: #e2e8f0;
|
65 |
+
border-radius: 4px;
|
66 |
+
overflow: hidden;
|
67 |
+
}
|
68 |
+
|
69 |
+
.progress-bar > div {
|
70 |
+
height: 100%;
|
71 |
+
background: #10b981;
|
72 |
+
transition: width 0.2s;
|
73 |
+
}
|
74 |
+
|
75 |
+
.progress-text {
|
76 |
+
font-size: 0.8rem;
|
77 |
+
color: #555;
|
78 |
+
margin-top: 0.25rem;
|
79 |
+
}
|
80 |
+
|
81 |
+
.textarea {
|
82 |
+
width: 100%;
|
83 |
+
height: 6rem;
|
84 |
+
resize: vertical;
|
85 |
+
padding: 0.75rem;
|
86 |
+
border: 1px solid #d1d5db;
|
87 |
+
border-radius: 4px;
|
88 |
+
font-family: inherit;
|
89 |
+
font-size: 0.9rem;
|
90 |
+
}
|
91 |
+
|
92 |
+
.performance {
|
93 |
+
font-size: 0.85rem;
|
94 |
+
background: #ecfdf5;
|
95 |
+
padding: 0.5rem 0.75rem;
|
96 |
+
border-radius: 6px;
|
97 |
+
border: 1px solid #d1fae5;
|
98 |
+
margin-bottom: 1rem;
|
99 |
+
}
|
100 |
+
|
101 |
+
.history {
|
102 |
+
margin-top: 1rem;
|
103 |
+
}
|
104 |
+
|
105 |
+
.history h3 {
|
106 |
+
margin-bottom: 0.5rem;
|
107 |
+
color: #333;
|
108 |
+
}
|
109 |
+
|
110 |
+
.history-item {
|
111 |
+
padding: 1rem;
|
112 |
+
border-bottom: 1px solid #f1f5f9;
|
113 |
+
background: #ffffff;
|
114 |
+
}
|
115 |
+
|
116 |
+
.history-item:last-child {
|
117 |
+
border-bottom: none;
|
118 |
+
}
|
119 |
+
|
120 |
+
.history-meta {
|
121 |
+
display: flex;
|
122 |
+
justify-content: space-between;
|
123 |
+
font-size: 0.9rem;
|
124 |
+
color: #666;
|
125 |
+
margin-bottom: 0.5rem;
|
126 |
+
}
|
127 |
+
|
128 |
+
.history-stats {
|
129 |
+
font-size: 0.75rem;
|
130 |
+
color: #666;
|
131 |
+
margin-bottom: 0.5rem;
|
132 |
+
}
|
133 |
+
|
134 |
+
.history-text {
|
135 |
+
background: #f9fafb;
|
136 |
+
padding: 0.5rem 0.75rem;
|
137 |
+
border-radius: 4px;
|
138 |
+
border: 1px solid #e5e7eb;
|
139 |
+
font-size: 0.9rem;
|
140 |
+
}
|
141 |
+
|
142 |
+
/* HF Spaces specific styles */
|
143 |
+
.app h2 {
|
144 |
+
margin-top: 0;
|
145 |
+
color: #1f2937;
|
146 |
+
}
|
147 |
+
|
148 |
+
.app p {
|
149 |
+
margin-bottom: 1rem;
|
150 |
+
color: #6b7280;
|
151 |
+
}
|
152 |
+
|
153 |
+
.app h3 {
|
154 |
+
color: #374151;
|
155 |
+
margin-bottom: 0.5rem;
|
156 |
+
}
|
157 |
+
|
158 |
+
.app h4 {
|
159 |
+
color: #374151;
|
160 |
+
margin-bottom: 0.5rem;
|
161 |
}
|
162 |
|
163 |
+
.app a {
|
164 |
+
color: #3b82f6;
|
165 |
+
text-decoration: none;
|
166 |
}
|
167 |
|
168 |
+
.app a:hover {
|
169 |
+
text-decoration: underline;
|
|
|
|
|
|
|
|
|
|
|
170 |
}
|
src/App.js
CHANGED
@@ -1,25 +1,320 @@
|
|
1 |
-
import
|
|
|
2 |
import './App.css';
|
3 |
|
4 |
-
function App() {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
return (
|
6 |
-
<div className="
|
7 |
-
<
|
8 |
-
|
|
|
|
|
9 |
<p>
|
10 |
-
|
11 |
</p>
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
>
|
18 |
-
|
19 |
-
</
|
20 |
-
</
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
</div>
|
22 |
);
|
23 |
}
|
24 |
-
|
25 |
-
export default App;
|
|
|
1 |
+
import React, { useState, useRef, useEffect } from 'react';
|
2 |
+
import { ParakeetModel, getParakeetModel } from 'parakeet.js';
|
3 |
import './App.css';
|
4 |
|
5 |
+
export default function App() {
|
6 |
+
const repoId = 'ysdede/parakeet-tdt-0.6b-v2-onnx';
|
7 |
+
const [backend, setBackend] = useState('webgpu-hybrid');
|
8 |
+
const [quant, setQuant] = useState('fp32');
|
9 |
+
const [preprocessor, setPreprocessor] = useState('nemo128');
|
10 |
+
const [status, setStatus] = useState('Idle');
|
11 |
+
const [progress, setProgress] = useState('');
|
12 |
+
const [progressText, setProgressText] = useState('');
|
13 |
+
const [progressPct, setProgressPct] = useState(null);
|
14 |
+
const [text, setText] = useState('');
|
15 |
+
const [latestMetrics, setLatestMetrics] = useState(null);
|
16 |
+
const [transcriptions, setTranscriptions] = useState([]);
|
17 |
+
const [isTranscribing, setIsTranscribing] = useState(false);
|
18 |
+
const [verboseLog, setVerboseLog] = useState(false);
|
19 |
+
const [decoderInt8, setDecoderInt8] = useState(true);
|
20 |
+
const [frameStride, setFrameStride] = useState(1);
|
21 |
+
const [dumpDetail, setDumpDetail] = useState(false);
|
22 |
+
const maxCores = navigator.hardwareConcurrency || 8;
|
23 |
+
const [cpuThreads, setCpuThreads] = useState(Math.max(1, maxCores - 2));
|
24 |
+
const modelRef = useRef(null);
|
25 |
+
const fileInputRef = useRef(null);
|
26 |
+
|
27 |
+
// Auto-adjust quant preset when backend changes
|
28 |
+
useEffect(() => {
|
29 |
+
if (backend.startsWith('webgpu')) {
|
30 |
+
setQuant('fp32');
|
31 |
+
} else if (backend === 'wasm') {
|
32 |
+
setQuant('int8');
|
33 |
+
}
|
34 |
+
}, [backend]);
|
35 |
+
|
36 |
+
async function loadModel() {
|
37 |
+
setStatus('Loading model…');
|
38 |
+
setProgress('');
|
39 |
+
setProgressText('');
|
40 |
+
setProgressPct(0);
|
41 |
+
console.time('LoadModel');
|
42 |
+
|
43 |
+
try {
|
44 |
+
const progressCallback = ({ loaded, total, file }) => {
|
45 |
+
const pct = total > 0 ? Math.round((loaded / total) * 100) : 0;
|
46 |
+
setProgressText(`${file}: ${pct}%`);
|
47 |
+
setProgressPct(pct);
|
48 |
+
};
|
49 |
+
|
50 |
+
// 1. Download all model files from HuggingFace Hub
|
51 |
+
const modelUrls = await getParakeetModel(repoId, {
|
52 |
+
quantization: quant,
|
53 |
+
preprocessor,
|
54 |
+
backend, // Pass backend to enable automatic fp32 selection for WebGPU
|
55 |
+
decoderInt8,
|
56 |
+
progress: progressCallback
|
57 |
+
});
|
58 |
+
|
59 |
+
// Show compiling sessions stage
|
60 |
+
setStatus('Creating sessions…');
|
61 |
+
setProgressText('Compiling model (this may take ~10 s)…');
|
62 |
+
setProgressPct(null);
|
63 |
+
|
64 |
+
// 2. Create the model instance with all file URLs
|
65 |
+
modelRef.current = await ParakeetModel.fromUrls({
|
66 |
+
...modelUrls.urls,
|
67 |
+
filenames: modelUrls.filenames,
|
68 |
+
backend,
|
69 |
+
verbose: verboseLog,
|
70 |
+
decoderOnWasm: decoderInt8, // if we selected int8 decoder, keep it on WASM
|
71 |
+
decoderInt8,
|
72 |
+
cpuThreads,
|
73 |
+
});
|
74 |
+
|
75 |
+
// 3. Warm-up and verify
|
76 |
+
setStatus('Warming up & verifying…');
|
77 |
+
setProgressText('Model ready! Upload an audio file to transcribe.');
|
78 |
+
setProgressPct(null);
|
79 |
+
|
80 |
+
console.timeEnd('LoadModel');
|
81 |
+
setStatus('Model ready ✔');
|
82 |
+
setProgressText('');
|
83 |
+
} catch (e) {
|
84 |
+
console.error(e);
|
85 |
+
setStatus(`Failed: ${e.message}`);
|
86 |
+
setProgress('');
|
87 |
+
}
|
88 |
+
}
|
89 |
+
|
90 |
+
async function transcribeFile(e) {
|
91 |
+
if (!modelRef.current) return alert('Load model first');
|
92 |
+
const file = e.target.files?.[0];
|
93 |
+
if (!file) return;
|
94 |
+
|
95 |
+
setIsTranscribing(true);
|
96 |
+
setStatus(`Transcribing "${file.name}"…`);
|
97 |
+
|
98 |
+
try {
|
99 |
+
const buf = await file.arrayBuffer();
|
100 |
+
const audioCtx = new AudioContext({ sampleRate: 16000 });
|
101 |
+
const decoded = await audioCtx.decodeAudioData(buf);
|
102 |
+
const pcm = decoded.getChannelData(0);
|
103 |
+
|
104 |
+
console.time(`Transcribe-${file.name}`);
|
105 |
+
const res = await modelRef.current.transcribe(pcm, 16_000, {
|
106 |
+
returnTimestamps: true,
|
107 |
+
returnConfidences: true,
|
108 |
+
frameStride
|
109 |
+
});
|
110 |
+
console.timeEnd(`Transcribe-${file.name}`);
|
111 |
+
|
112 |
+
if (dumpDetail) {
|
113 |
+
console.log('[Parakeet] Detailed transcription output', res);
|
114 |
+
}
|
115 |
+
setLatestMetrics(res.metrics);
|
116 |
+
// Add to transcriptions list
|
117 |
+
const newTranscription = {
|
118 |
+
id: Date.now(),
|
119 |
+
filename: file.name,
|
120 |
+
text: res.utterance_text,
|
121 |
+
timestamp: new Date().toLocaleTimeString(),
|
122 |
+
duration: pcm.length / 16000, // duration in seconds
|
123 |
+
wordCount: res.words?.length || 0,
|
124 |
+
confidence: res.confidence_scores?.overall_log_prob || null,
|
125 |
+
metrics: res.metrics
|
126 |
+
};
|
127 |
+
|
128 |
+
setTranscriptions(prev => [newTranscription, ...prev]);
|
129 |
+
setText(res.utterance_text); // Show latest transcription
|
130 |
+
setStatus('Model ready ✔'); // Ready for next file
|
131 |
+
|
132 |
+
} catch (error) {
|
133 |
+
console.error('Transcription failed:', error);
|
134 |
+
setStatus('Transcription failed');
|
135 |
+
alert(`Failed to transcribe "${file.name}": ${error.message}`);
|
136 |
+
} finally {
|
137 |
+
setIsTranscribing(false);
|
138 |
+
// Clear the file input so the same file can be selected again
|
139 |
+
if (fileInputRef.current) {
|
140 |
+
fileInputRef.current.value = '';
|
141 |
+
}
|
142 |
+
}
|
143 |
+
}
|
144 |
+
|
145 |
+
function clearTranscriptions() {
|
146 |
+
setTranscriptions([]);
|
147 |
+
setText('');
|
148 |
+
}
|
149 |
+
|
150 |
return (
|
151 |
+
<div className="app">
|
152 |
+
<h2>🐠 Parakeet.js - HF Spaces Demo</h2>
|
153 |
+
<p>NVIDIA Parakeet speech recognition for the browser using WebGPU/WASM</p>
|
154 |
+
|
155 |
+
<div className="controls">
|
156 |
<p>
|
157 |
+
<strong>Model:</strong> {repoId}
|
158 |
</p>
|
159 |
+
</div>
|
160 |
+
|
161 |
+
<div className="controls">
|
162 |
+
<label>
|
163 |
+
Backend:
|
164 |
+
<select value={backend} onChange={e=>setBackend(e.target.value)}>
|
165 |
+
<option value="webgpu-hybrid">WebGPU (Hybrid)</option>
|
166 |
+
<option value="webgpu-strict">WebGPU (Strict)</option>
|
167 |
+
<option value="wasm">WASM (CPU)</option>
|
168 |
+
</select>
|
169 |
+
</label>
|
170 |
+
{' '}
|
171 |
+
<label>
|
172 |
+
Quant:
|
173 |
+
<select value={quant} onChange={e=>setQuant(e.target.value)}>
|
174 |
+
<option value="int8">int8 (faster)</option>
|
175 |
+
<option value="fp32">fp32 (higher quality)</option>
|
176 |
+
</select>
|
177 |
+
</label>
|
178 |
+
{' '}
|
179 |
+
{backend.startsWith('webgpu') && (
|
180 |
+
<label style={{ fontSize:'0.9em' }}>
|
181 |
+
<input type="checkbox" checked={decoderInt8} onChange={e=>setDecoderInt8(e.target.checked)} />
|
182 |
+
Decoder INT8 on CPU
|
183 |
+
</label>
|
184 |
+
)}
|
185 |
+
{' '}
|
186 |
+
<label>
|
187 |
+
Preprocessor:
|
188 |
+
<select value={preprocessor} onChange={e=>setPreprocessor(e.target.value)}>
|
189 |
+
<option value="nemo80">nemo80 (smaller)</option>
|
190 |
+
<option value="nemo128">nemo128 (default)</option>
|
191 |
+
</select>
|
192 |
+
</label>
|
193 |
+
{' '}
|
194 |
+
<label>
|
195 |
+
Stride:
|
196 |
+
<select value={frameStride} onChange={e=>setFrameStride(Number(e.target.value))}>
|
197 |
+
<option value={1}>1</option>
|
198 |
+
<option value={2}>2</option>
|
199 |
+
<option value={4}>4</option>
|
200 |
+
</select>
|
201 |
+
</label>
|
202 |
+
{' '}
|
203 |
+
<label>
|
204 |
+
<input type="checkbox" checked={verboseLog} onChange={e => setVerboseLog(e.target.checked)} />
|
205 |
+
Verbose Log
|
206 |
+
</label>
|
207 |
+
{' '}
|
208 |
+
<label style={{fontSize:'0.9em'}}>
|
209 |
+
<input type="checkbox" checked={dumpDetail} onChange={e=>setDumpDetail(e.target.checked)} />
|
210 |
+
Dump result to console
|
211 |
+
</label>
|
212 |
+
{(backend === 'wasm' || decoderInt8) && (
|
213 |
+
<label style={{fontSize:'0.9em'}}>
|
214 |
+
Threads:
|
215 |
+
<input type="number" min="1" max={maxCores} value={cpuThreads} onChange={e=>setCpuThreads(Number(e.target.value))} style={{width:'4rem'}} />
|
216 |
+
</label>
|
217 |
+
)}
|
218 |
+
<button
|
219 |
+
onClick={loadModel}
|
220 |
+
disabled={!status.toLowerCase().includes('fail') && status !== 'Idle'}
|
221 |
+
className="primary"
|
222 |
>
|
223 |
+
{status === 'Model ready ✔' ? 'Model Loaded' : 'Load Model'}
|
224 |
+
</button>
|
225 |
+
</div>
|
226 |
+
|
227 |
+
{typeof SharedArrayBuffer === 'undefined' && backend === 'wasm' && (
|
228 |
+
<div style={{
|
229 |
+
marginBottom: '1rem',
|
230 |
+
padding: '0.5rem',
|
231 |
+
backgroundColor: '#fff3cd',
|
232 |
+
border: '1px solid #ffeaa7',
|
233 |
+
borderRadius: '4px',
|
234 |
+
fontSize: '0.9em'
|
235 |
+
}}>
|
236 |
+
⚠️ <strong>Performance Note:</strong> SharedArrayBuffer is not available.
|
237 |
+
WASM will run single-threaded. For better performance, use WebGPU.
|
238 |
+
</div>
|
239 |
+
)}
|
240 |
+
|
241 |
+
<div className="controls">
|
242 |
+
<input
|
243 |
+
ref={fileInputRef}
|
244 |
+
type="file"
|
245 |
+
accept="audio/*"
|
246 |
+
onChange={transcribeFile}
|
247 |
+
disabled={status !== 'Model ready ✔' || isTranscribing}
|
248 |
+
/>
|
249 |
+
{transcriptions.length > 0 && (
|
250 |
+
<button
|
251 |
+
onClick={clearTranscriptions}
|
252 |
+
style={{ marginLeft: '1rem', padding: '0.25rem 0.5rem' }}
|
253 |
+
>
|
254 |
+
Clear History
|
255 |
+
</button>
|
256 |
+
)}
|
257 |
+
</div>
|
258 |
+
|
259 |
+
<p>Status: {status}</p>
|
260 |
+
{progressPct!==null && (
|
261 |
+
<div className="progress-wrapper">
|
262 |
+
<div className="progress-bar"><div style={{ width: `${progressPct}%` }} /></div>
|
263 |
+
<p className="progress-text">{progressText}</p>
|
264 |
+
</div>
|
265 |
+
)}
|
266 |
+
|
267 |
+
{/* Latest transcription */}
|
268 |
+
<div className="controls">
|
269 |
+
<h3>Latest Transcription:</h3>
|
270 |
+
<textarea
|
271 |
+
value={text}
|
272 |
+
readOnly
|
273 |
+
className="textarea"
|
274 |
+
placeholder="Transcribed text will appear here..."
|
275 |
+
/>
|
276 |
+
</div>
|
277 |
+
|
278 |
+
{/* Latest transcription performace info */}
|
279 |
+
{latestMetrics && (
|
280 |
+
<div className="performance">
|
281 |
+
<strong>RTF:</strong> {latestMetrics.rtf?.toFixed(2)}x | Total: {latestMetrics.total_ms} ms<br/>
|
282 |
+
Preprocess {latestMetrics.preprocess_ms} ms · Encode {latestMetrics.encode_ms} ms · Decode {latestMetrics.decode_ms} ms · Tokenize {latestMetrics.tokenize_ms} ms
|
283 |
+
</div>
|
284 |
+
)}
|
285 |
+
|
286 |
+
{/* Transcription history */}
|
287 |
+
{transcriptions.length > 0 && (
|
288 |
+
<div className="history">
|
289 |
+
<h3>Transcription History ({transcriptions.length} files):</h3>
|
290 |
+
<div style={{ maxHeight: '400px', overflowY: 'auto', border: '1px solid #ddd', borderRadius: '4px' }}>
|
291 |
+
{transcriptions.map((trans) => (
|
292 |
+
<div className="history-item" key={trans.id}>
|
293 |
+
<div className="history-meta"><strong>{trans.filename}</strong><span>{trans.timestamp}</span></div>
|
294 |
+
<div className="history-stats">Duration: {trans.duration.toFixed(1)}s | Words: {trans.wordCount}{trans.confidence && ` | Confidence: ${trans.confidence.toFixed(2)}`}{trans.metrics && ` | RTF: ${trans.metrics.rtf?.toFixed(2)}x`}</div>
|
295 |
+
<div className="history-text">{trans.text}</div>
|
296 |
+
</div>
|
297 |
+
))}
|
298 |
+
</div>
|
299 |
+
</div>
|
300 |
+
)}
|
301 |
+
|
302 |
+
<div style={{ marginTop: '2rem', padding: '1rem', backgroundColor: '#f8f9fa', borderRadius: '4px', fontSize: '0.9em' }}>
|
303 |
+
<h4>🔗 Links:</h4>
|
304 |
+
<p>
|
305 |
+
<a href="https://github.com/ysdede/parakeet.js" target="_blank" rel="noopener noreferrer">
|
306 |
+
GitHub Repository
|
307 |
+
</a>
|
308 |
+
{' | '}
|
309 |
+
<a href="https://www.npmjs.com/package/parakeet.js" target="_blank" rel="noopener noreferrer">
|
310 |
+
npm Package
|
311 |
+
</a>
|
312 |
+
{' | '}
|
313 |
+
<a href="https://huggingface.co/nvidia/parakeet-tdt-1.1b" target="_blank" rel="noopener noreferrer">
|
314 |
+
NVIDIA Parakeet Model
|
315 |
+
</a>
|
316 |
+
</p>
|
317 |
+
</div>
|
318 |
</div>
|
319 |
);
|
320 |
}
|
|
|
|