Update README.md
Browse files
README.md
CHANGED
@@ -31,10 +31,12 @@ Experimental - research model. Some of the modules and functions in the code are
|
|
31 |
|
32 |
Pitch-Aware Processing: Integrates F0/pitch information throughout the processing pipeline, making the model sensitive to prosodic features of speech.
|
33 |
|
|
|
34 |
To highlight the relationship between pitch and rotary embeddings echo implements two complementary pitch-based enhancements:
|
35 |
|
36 |
1. The first uses pitch to modify theta (rotary frequency)
|
37 |
2. The second adds direct similarity bias to attention
|
|
|
38 |
|
39 |
By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
|
40 |
|
|
|
31 |
|
32 |
Pitch-Aware Processing: Integrates F0/pitch information throughout the processing pipeline, making the model sensitive to prosodic features of speech.
|
33 |
|
34 |
+
|
35 |
To highlight the relationship between pitch and rotary embeddings echo implements two complementary pitch-based enhancements:
|
36 |
|
37 |
1. The first uses pitch to modify theta (rotary frequency)
|
38 |
2. The second adds direct similarity bias to attention
|
39 |
+
3. Variable radii added in place of unit circle radius(1.0) associated with torch.polar. The frequencies (f0) are time aligned with tokens creating acoustically-weighted positional encodings where the "loudness" of each position in the embedding space reflects the acoustic prominence in the original speech.
|
40 |
|
41 |
By modulating the RoPE frequencies based on pitch (F0), we are essentially telling the model to pay attention to the acoustic features relate to sequence position in a way that's proportional to the voice characteristics. This approach creates a more speech-aware positional representation that helps the model better understand the relationship between acoustic features and text.
|
42 |
|