Sin2pi commited on
Commit
b4f3b2e
·
verified ·
1 Parent(s): 05c7231

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -18
README.md CHANGED
@@ -46,7 +46,7 @@ To highlight the relationship between pitch and rotary embeddings echo implement
46
 
47
 
48
 
49
- Whisper: STEP 1000 • WER:91.89% • Loss:7.8292 • LR:0.00098035
50
 
51
  By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
52
 
@@ -61,14 +61,10 @@ Narrow bands: More focus on nearby positions (good for local patterns)
61
  <img width="470" alt="cc" src="https://github.com/user-attachments/assets/28d00fc5-2676-41ed-a971-e4d857af43f8" />
62
  <img width="470" alt="cc2" src="https://github.com/user-attachments/assets/9089e806-966b-41aa-8793-bee03a6e6be1" />
63
 
64
- Static 10k theta is perfectly fine for a text model but probably not for a NLP ai.
65
-
66
- Echos rotary implementation maps the perceptual properties of audio to the mathematical properties of the rotary embeddings, creating a more adaptive and context-aware representation system. Pitch is optionally extracted from audio in the data processing pipeline and can be used for an additional feature along with spectrograms and or used to inform the rotary and or pitch bias.
67
-
68
  Pitch bias
69
 
70
- The pitch bias implementation creates an attention bias matrix:
71
- This makes tokens with similar pitch attend to each other more, which helps:
72
 
73
  - Track speaker consistency
74
  - Maintain coherent pitch patterns
@@ -79,14 +75,4 @@ The theoretical foundation:
79
  - Speech has inherent rhythmic and tonal patterns that correlate with semantic content
80
  - Varying the rotation frequency based on pitch creates a more speech-aware positional encoding
81
 
82
- ---
83
-
84
- <img width="470" alt="cc2" src="https://github.com/user-attachments/assets/d52a48b1-8717-4d29-9452-cfdf43c92fe8" />
85
-
86
- ## The F0-Conditioned Rotation Mechanism
87
-
88
- The high gate usage validates the fundamental frequency conditioning approach:
89
-
90
- - Pitch-adaptive rotary embeddings are providing meaningful signal that the gates are actively utilizing
91
- - The decoder is learning to selectively attend to pitch-relevant patterns
92
- - The gates are functioning as a kind of "pitch-aware filter" that determines which information should flow through the network
 
46
 
47
 
48
 
49
+
50
 
51
  By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
52
 
 
61
  <img width="470" alt="cc" src="https://github.com/user-attachments/assets/28d00fc5-2676-41ed-a971-e4d857af43f8" />
62
  <img width="470" alt="cc2" src="https://github.com/user-attachments/assets/9089e806-966b-41aa-8793-bee03a6e6be1" />
63
 
 
 
 
 
64
  Pitch bias
65
 
66
+ The pitch bias implementation creates an attention bias matrix.
67
+ This makes tokens with similar pitch attend to each other more.
68
 
69
  - Track speaker consistency
70
  - Maintain coherent pitch patterns
 
75
  - Speech has inherent rhythmic and tonal patterns that correlate with semantic content
76
  - Varying the rotation frequency based on pitch creates a more speech-aware positional encoding
77
 
78
+ ---