Spaces:

langtech-innovation
/

WhisperLiveKitDiarization

Paused

skripnik commited on Jan 3, 2024

Commit

80eb0ba

1 Parent(s): 949304a

Removed duplicate variable self.last_chunked_at

I tried to find the difference between self.last_chunked_at and self.buffer_time_offset, and it took me a while to understand that they are exactly the same. I think it's better to get rid of one of the duplicates to make the code more readable.

Files changed (1) hide show

whisper_online.py +1 -3

whisper_online.py CHANGED Viewed

@@ -328,7 +328,6 @@ class OnlineASRProcessor:
         self.transcript_buffer = HypothesisBuffer(logfile=self.logfile)
         self.commited = []
-        self.last_chunked_at = 0
         self.silence_iters = 0
@@ -340,7 +339,7 @@ class OnlineASRProcessor:
         "context" is the commited text that is inside the audio buffer. It is transcribed again and skipped. It is returned only for debugging and logging reasons.
         """
         k = max(0,len(self.commited)-1)
-        while k > 0 and self.commited[k-1][1] > self.last_chunked_at:
             k -= 1
         p = self.commited[:k]
@@ -451,7 +450,6 @@ class OnlineASRProcessor:
         cut_seconds = time - self.buffer_time_offset
         self.audio_buffer = self.audio_buffer[int(cut_seconds*self.SAMPLING_RATE):]
         self.buffer_time_offset = time
-        self.last_chunked_at = time
     def words_to_sentences(self, words):
         """Uses self.tokenizer for sentence segmentation of words.

         self.transcript_buffer = HypothesisBuffer(logfile=self.logfile)
         self.commited = []
         self.silence_iters = 0
         "context" is the commited text that is inside the audio buffer. It is transcribed again and skipped. It is returned only for debugging and logging reasons.
         """
         k = max(0,len(self.commited)-1)
+        while k > 0 and self.commited[k-1][1] > self.buffer_time_offset:
             k -= 1
         p = self.commited[:k]
         cut_seconds = time - self.buffer_time_offset
         self.audio_buffer = self.audio_buffer[int(cut_seconds*self.SAMPLING_RATE):]
         self.buffer_time_offset = time
     def words_to_sentences(self, words):
         """Uses self.tokenizer for sentence segmentation of words.