Sin2pi commited on
Commit
891e9f3
·
verified ·
1 Parent(s): 4c53133

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -17
README.md CHANGED
@@ -21,10 +21,6 @@ tags:
21
  ---
22
 
23
 
24
-
25
- ![sp](https://github.com/user-attachments/assets/a29f8c97-71c7-4bfc-9c11-76005614822c)
26
-
27
-
28
  ## Echo - NLP/ASR model with acoustic variable radii relative position embedding (vRoPE) that maps pitch to token. And some other stuff...
29
 
30
  https://github.com/sine2pi/asr_model_echo
@@ -40,20 +36,19 @@ To highlight the relationship between pitch and rotary embeddings echo implement
40
  2. The second adds direct similarity bias to attention
41
  3. Variable radii added in place of unit circle radius(1.0) associated with torch.polar. The frequencies (f0) are time aligned with tokens creating acoustically-weighted positional encodings where the "loudness" of each position in the embedding space reflects the acoustic prominence in the original speech.
42
 
43
- * -- Initial tests indicate that direct use of f0 without mapping results in better WER.. 10k arbitrary? no one-size-fits-all.
44
 
 
45
 
 
46
 
47
- By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
48
 
49
 
50
- 1000 steps no f0
51
 
52
- <img width="470" alt="123" src="https://github.com/user-attachments/assets/1b3ca1e8-0b7d-47dd-802b-5eda9537ae13" />
53
 
54
- 1000 steps with f0
55
 
56
- <img width="470" alt="321" src="https://github.com/user-attachments/assets/24a68910-b316-4cfc-8927-5c6fd846b919" />
57
 
58
  The patterns below show how positions "see" each other in relation to theta and f0.
59
 
@@ -83,13 +78,7 @@ The theoretical foundation:
83
  - Varying the rotation frequency based on pitch creates a more speech-aware positional encoding
84
 
85
  ---
86
-
87
- ### Diagnostic test run with google/fleurs - Spectrogram + f0_rotary:
88
-
89
- <img width="689" alt="graph" src="https://github.com/user-attachments/assets/c161a89d-539c-4983-8d24-12ec41ebc859" />
90
-
91
- <img width="277" alt="321" src="https://github.com/user-attachments/assets/4cc71b43-3e48-4241-b381-5bda17ed9d0d" />
92
-
93
 
94
  ## The F0-Conditioned Rotation Mechanism
95
 
 
21
  ---
22
 
23
 
 
 
 
 
24
  ## Echo - NLP/ASR model with acoustic variable radii relative position embedding (vRoPE) that maps pitch to token. And some other stuff...
25
 
26
  https://github.com/sine2pi/asr_model_echo
 
36
  2. The second adds direct similarity bias to attention
37
  3. Variable radii added in place of unit circle radius(1.0) associated with torch.polar. The frequencies (f0) are time aligned with tokens creating acoustically-weighted positional encodings where the "loudness" of each position in the embedding space reflects the acoustic prominence in the original speech.
38
 
39
+ 1000 steps no f0:
40
 
41
+ <img width="470" alt="123" src="https://github.com/user-attachments/assets/1b3ca1e8-0b7d-47dd-802b-5eda9537ae13" />
42
 
43
+ 1000 steps with f0 / theta substitutions:
44
 
45
+ <img width="470" alt="321" src="https://github.com/user-attachments/assets/24a68910-b316-4cfc-8927-5c6fd846b919" />
46
 
47
 
 
48
 
49
+ By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
50
 
 
51
 
 
52
 
53
  The patterns below show how positions "see" each other in relation to theta and f0.
54
 
 
78
  - Varying the rotation frequency based on pitch creates a more speech-aware positional encoding
79
 
80
  ---
81
+ ![sp](https://github.com/user-attachments/assets/a29f8c97-71c7-4bfc-9c11-76005614822c)
 
 
 
 
 
 
82
 
83
  ## The F0-Conditioned Rotation Mechanism
84