Microphone live inference catching up when inference is too slow (whisper). (#21219)

* Microphone live inference catching up when inference is too slow (whisper). * Adding copyright.

Microphone live inference catching up when inference is too slow (whisper). (#21219)
* Microphone live inference catching up when inference is too slow (whisper). * Adding copyright.
d54d7598 · Nicolas Patry · GitHub · 7fc1cb15 · d54d7598
Unverified Commit d54d7598 authored Jan 20, 2023 by Nicolas Patry Committed by GitHub Jan 20, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

src/transformers/pipelines/audio_utils.py src/transformers/pipelines/audio_utils.py +8 -0

No files found.
--- a/src/transformers/pipelines/audio_utils.py
+++ b/src/transformers/pipelines/audio_utils.py
+# Copyright 2023 The HuggingFace Team. All rights reserved.
+import datetime
 import platform
 import subprocess
 from typing import Optional, Tuple, Union
@@ -154,6 +156,8 @@ def ffmpeg_microphone_live(

    stride_left = int(round(sampling_rate * stride_length_s[0])) * size_of_sample
    stride_right = int(round(sampling_rate * stride_length_s[1])) * size_of_sample
+    audio_time = datetime.datetime.now()
+    delta = datetime.timedelta(seconds=chunk_s)
    for item in chunk_bytes_iter(microphone, chunk_len, stride=(stride_left, stride_right), stream=True):
        # Put everything back in numpy scale
        item["raw"] = np.frombuffer(item["raw"], dtype=dtype)
@@ -162,6 +166,10 @@ def ffmpeg_microphone_live(
            item["stride"][1] // size_of_sample,
        )
        item["sampling_rate"] = sampling_rate
+        audio_time += delta
+        if datetime.datetime.now() > audio_time + 10 * delta:
+            # We're late !! SKIP
+            continue
        yield item