Spaces:

sofdog
/

live-transcription-english

Sleeping

Sofia Casadei commited on May 29

Commit

9784bd2

1 Parent(s): 950ef75

font

Files changed (2) hide show

main.py CHANGED Viewed

@@ -150,17 +150,17 @@ stream = Stream(
             # If, after the user started speaking, there is a chunk with less than speech_threshold seconds of speech, the user stopped speaking. (default 0.1)
             speech_threshold=0.1,
             # Max duration of speech chunks before the handler is triggered, even if a pause is not detected by the VAD model. (default -inf)
-            max_continuous_speech_s=5
         ),
         model_options=SileroVadOptions(
             # Threshold for what is considered speech (default 0.5)
             threshold=0.5,
             # Final speech chunks shorter min_speech_duration_ms are thrown out (default 250)
-            min_speech_duration_ms=200,
             # Max duration of speech chunks, longer will be split at the timestamp of the last silence that lasts more than 100ms (if any) or just before max_speech_duration_s (default float('inf')) (used internally in the VAD algorithm to split the audio that's passed to the algorithm)
             max_speech_duration_s=5,
             # Wait for ms at the end of each speech chunk before separating it (default 2000)
-            min_silence_duration_ms=100,
             # Chunk size for VAD model. Can be 512, 1024, 1536 for 16k s.r. (default 1024)
             window_size_samples=1024,
             # Final speech chunks are padded by speech_pad_ms each side (default 400)

             # If, after the user started speaking, there is a chunk with less than speech_threshold seconds of speech, the user stopped speaking. (default 0.1)
             speech_threshold=0.1,
             # Max duration of speech chunks before the handler is triggered, even if a pause is not detected by the VAD model. (default -inf)
+            max_continuous_speech_s=15
         ),
         model_options=SileroVadOptions(
             # Threshold for what is considered speech (default 0.5)
             threshold=0.5,
             # Final speech chunks shorter min_speech_duration_ms are thrown out (default 250)
+            min_speech_duration_ms=250,
             # Max duration of speech chunks, longer will be split at the timestamp of the last silence that lasts more than 100ms (if any) or just before max_speech_duration_s (default float('inf')) (used internally in the VAD algorithm to split the audio that's passed to the algorithm)
             max_speech_duration_s=5,
             # Wait for ms at the end of each speech chunk before separating it (default 2000)
+            min_silence_duration_ms=200,
             # Chunk size for VAD model. Can be 512, 1024, 1536 for 16k s.r. (default 1024)
             window_size_samples=1024,
             # Final speech chunks are padded by speech_pad_ms each side (default 400)

static/index-screen.html CHANGED Viewed

@@ -54,7 +54,7 @@
             background: transparent; /* Transparent background (no highlighting) */
             border-radius: 0; /* No rounded corners */
             line-height: 1.6; /* Increases line spacing for readability */
-            font-size: 3.5rem;  /* rem means relative to the root font size */
             font-weight: 500; /* 500 = medium weight, 700 = bold */
             max-width: 98%; /* Full width within container */
             white-space: normal;  /* Allows text to wrap normally */

             background: transparent; /* Transparent background (no highlighting) */
             border-radius: 0; /* No rounded corners */
             line-height: 1.6; /* Increases line spacing for readability */
+            font-size: 6rem;  /* rem means relative to the root font size */
             font-weight: 500; /* 500 = medium weight, 700 = bold */
             max-width: 98%; /* Full width within container */
             white-space: normal;  /* Allows text to wrap normally */