• Nicolas Patry's avatar
    Adding `AutomaticSpeechRecognitionPipeline`. (#11337) · db9dd09c
    Nicolas Patry authored
    
    
    * Adding `AutomaticSpeechRecognitionPipeline`.
    
    - Because we added everything to enable this pipeline, we probably
    should add it to `transformers`.
    - This PR tries to limit the scope and focuses only on the pipeline part
    (what should go in, and out).
    - The tests are very specific for S2T and Wav2vec2 to make sure both
    architectures are supported by the pipeline. We don't use the mixin for
    tests right now, because that requires more work in the `pipeline`
    function (will be done in a follow up PR).
    - Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
      sense from a user perspective, it does not add any additional
    dependencies (as in hard dependency, because users can always use their
    own load mechanism). Meanwhile, it feels slightly clunky to have so much
    optional preprocessing.
    - The pipeline is not done to support streaming audio right now.
    
    Future work:
    
    - Add `automatic-speech-recognition` as a `task`. And add the
    FeatureExtractor.from_pretrained within `pipeline` function.
    - Add small models within tests
    - Add the Mixin to tests.
    - Make the logic between ForCTC vs ForConditionalGeneration better.
    
    * Update tests/test_pipelines_automatic_speech_recognition.py
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    
    * Adding docs + main import + type checking + LICENSE.
    
    * Doc style !.
    
    * Fixing TYPE_HINT.
    
    * Specifying waveform shape in the docs.
    
    * Adding asserts + specify in the documentation the shape of the input
    np.ndarray.
    
    * Update src/transformers/pipelines/automatic_speech_recognition.py
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * Adding require to tests + move the `feature_extractor` doc.
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    db9dd09c
test_pipelines_automatic_speech_recognition.py 3.64 KB