Update README.md
Browse files
README.md
CHANGED
@@ -22,22 +22,9 @@ tags:
|
|
22 |
---
|
23 |
|
24 |
ASR model + pitch aware relative positional embeddings.
|
|
|
25 |
|
26 |
-
|
27 |
-
|
28 |
-
def _compute_freqs_base(self):
|
29 |
-
mel_scale = torch.pow(10, torch.linspace(0, 2595 * torch.log10(torch.tensor(1 + 4000/200)), self.head_dim // 2, device=device, dtype=dtype) / 2595) - 1
|
30 |
-
return 200 * mel_scale / 1000
|
31 |
-
|
32 |
-
### Standared inv freqs: 'eval_wer': 61.6
|
33 |
-
freqs = 1.0 / (self.theta ** (torch.arange(0, self.head_dim, 2, device=device, dtype=dtype) / (self.head_dim // 2)))
|
34 |
-
|
35 |
-
|
36 |
-
<img width="1363" height="732" alt="pitch_spectrogram" src="https://github.com/user-attachments/assets/ceb65e94-7df4-41b7-aa3d-c4aa4c6c0717" />
|
37 |
-
|
38 |
-
<img width="233" height="77" alt="legend" src="https://github.com/user-attachments/assets/fad84550-a199-43b3-8471-d011a9fd6f94" />
|
39 |
-
|
40 |
-
https://huggingface.co/Sin2pi/asr-model/tensorboard
|
41 |
|
42 |
Questions:
|
43 |
|
@@ -80,7 +67,7 @@ Reference: [PyTorch Documentation - torch.polar]https:pytorch.orgdocsstablegener
|
|
80 |
|
81 |
|
82 |
|
83 |
-
|
84 |
```python
|
85 |
|
86 |
|
@@ -260,3 +247,6 @@ The Complex Frequency Result:
|
|
260 |
|
261 |
|
262 |
|
|
|
|
|
|
|
|
22 |
---
|
23 |
|
24 |
ASR model + pitch aware relative positional embeddings.
|
25 |
+
Nothing in these repositories are intended for production.
|
26 |
|
27 |
+
This particular model uses internal dynamic local attention windowing for variable length sequence in the cross attention, cross modal, and cross talking steps of which all are decoder causal however the model does away with the decoder encoder distinctions in favor of a more unified less transformer like design.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
Questions:
|
30 |
|
|
|
67 |
|
68 |
|
69 |
|
70 |
+
https://huggingface.co/Sin2pi/asr-model/tensorboard
|
71 |
```python
|
72 |
|
73 |
|
|
|
247 |
|
248 |
|
249 |
|
250 |
+
|
251 |
+
|
252 |
+
|