Sin2pi commited on
Commit
1813939
·
verified ·
1 Parent(s): 7f3d41a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -31,10 +31,12 @@ Experimental - research model. Some of the modules and functions in the code are
31
 
32
  Pitch-Aware Processing: Integrates F0/pitch information throughout the processing pipeline, making the model sensitive to prosodic features of speech.
33
 
 
34
  To highlight the relationship between pitch and rotary embeddings echo implements two complementary pitch-based enhancements:
35
 
36
  1. The first uses pitch to modify theta (rotary frequency)
37
  2. The second adds direct similarity bias to attention
 
38
 
39
  By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
40
 
 
31
 
32
  Pitch-Aware Processing: Integrates F0/pitch information throughout the processing pipeline, making the model sensitive to prosodic features of speech.
33
 
34
+
35
  To highlight the relationship between pitch and rotary embeddings echo implements two complementary pitch-based enhancements:
36
 
37
  1. The first uses pitch to modify theta (rotary frequency)
38
  2. The second adds direct similarity bias to attention
39
+ 3. Variable radii added in place of unit circle radius(1.0) associated with torch.polar. The frequencies (f0) are time aligned with tokens creating acoustically-weighted positional encodings where the "loudness" of each position in the embedding space reflects the acoustic prominence in the original speech.
40
 
41
  By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
42