1. 19 Jan, 2024 1 commit
  2. 22 Dec, 2023 1 commit
  3. 04 Dec, 2023 1 commit
  4. 22 Nov, 2023 1 commit
    • Patrick von Platen's avatar
      [Whisper] Add sequential longform decoding (#27492) · 4151fbb4
      Patrick von Platen authored
      * [Whisper] Add seq gen
      
      * [Whisper] Add seq gen
      
      * more debug
      
      * Fix whisper logit processor
      
      * Improve whisper code further
      
      * Fix more
      
      * more debug
      
      * more debug
      
      * Improve further
      
      * Add tests
      
      * Prep for batch size > 1
      
      * Get batch_size>1 working
      
      * Correct more
      
      * Add extensive tests
      
      * more debug
      
      * more debug
      
      * more debug
      
      * add more tests
      
      * more debug
      
      * Apply suggestions from code review
      
      * more debug
      
      * add comments to explain the code better
      
      * add comments to explain the code better
      
      * add comments to explain the code better
      
      * Add more examples
      
      * add comments to explain the code better
      
      * fix more
      
      * add comments to explain the code better
      
      * add comments to explain the code better
      
      * correct
      
      * correct
      
      * finalize
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      4151fbb4
  5. 16 Nov, 2023 1 commit
    • Arthur's avatar
      [`Styling`] stylify using ruff (#27144) · 651408a0
      Arthur authored
      
      
      * try to stylify using ruff
      
      * might need to remove these changes?
      
      * use ruf format andruff check
      
      * use isinstance instead of type comparision
      
      * use # fmt: skip
      
      * use # fmt: skip
      
      * nits
      
      * soem styling changes
      
      * update ci job
      
      * nits isinstance
      
      * more files update
      
      * nits
      
      * more nits
      
      * small nits
      
      * check and format
      
      * revert wrong changes
      
      * actually use formatter instead of checker
      
      * nits
      
      * well docbuilder is overwriting this commit
      
      * revert notebook changes
      
      * try to nuke docbuilder
      
      * style
      
      * fix feature exrtaction test
      
      * remve `indent-width = 4`
      
      * fixup
      
      * more nits
      
      * update the ruff version that we use
      
      * style
      
      * nuke docbuilder styling
      
      * leve the print for detected changes
      
      * nits
      
      * Remove file I/O
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      
      * style
      
      * nits
      
      * revert notebook changes
      
      * Add # fmt skip when possible
      
      * Add # fmt skip when possible
      
      * Fix
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * NIts
      
      * more fixes
      
      * fix tapas
      
      * Another way to skip
      
      * Recommended way
      
      * Fix two more fiels
      
      * Remove asynch
      Remove asynch
      
      ---------
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      651408a0
  6. 14 Nov, 2023 1 commit
  7. 07 Nov, 2023 1 commit
  8. 31 Oct, 2023 1 commit
  9. 12 Oct, 2023 1 commit
  10. 14 Sep, 2023 1 commit
  11. 05 Sep, 2023 2 commits
  12. 24 Aug, 2023 1 commit
  13. 16 Aug, 2023 1 commit
  14. 08 Aug, 2023 1 commit
  15. 21 Jun, 2023 1 commit
    • Matthijs Hollemans's avatar
      add word-level timestamps to Whisper (#23205) · cd927a47
      Matthijs Hollemans authored
      * let's go!
      
      * initial implementation of token-level timestamps
      
      * only return a single timestamp per token
      
      * remove token probabilities
      
      * fix return type
      
      * fix doc comment
      
      * strip special tokens
      
      * rename
      
      * revert to not stripping special tokens
      
      * only support models that have alignment_heads
      
      * add integration test
      
      * consistently name it token-level timestamps
      
      * small DTW tweak
      
      * initial support for ASR pipeline
      
      * fix pipeline doc comments
      
      * resolve token timestamps in pipeline with chunking
      
      * change warning when no final timestamp is found
      
      * return word-level timestamps
      
      * fixup
      
      * fix bug that skipped final word in each chunk
      
      * fix failing unit tests
      
      * merge punctuations into the words
      
      * also return word tokens
      
      * also return token indices
      
      * add (failing) unit test for combine_tokens_into_words
      
      * make combine_tokens_into_words private
      
      * restore OpenAI's punctuation rules
      
      * add pipeline tests
      
      * make requested changes
      
      * PR review changes
      
      * fix failing pipeline test
      
      * small stuff from PR
      
      * only return words and their timestamps, not segments
      
      * move alignment_heads into generation config
      
      * forgot to set alignment_heads in pipeline tests
      
      * tiny comment fix
      
      * grr
      cd927a47
  16. 04 Apr, 2023 1 commit
  17. 23 Mar, 2023 2 commits
  18. 02 Mar, 2023 2 commits
  19. 28 Feb, 2023 1 commit
    • Yih-Dar's avatar
      馃敟Rework pipeline testing by removing `PipelineTestCaseMeta` 馃殌 (#21516) · 871c31a6
      Yih-Dar authored
      
      
      * Add PipelineTesterMixin
      
      * remove class PipelineTestCaseMeta
      
      * move validate_test_components
      
      * Add for ViT
      
      * Add to SPECIAL_MODULE_TO_TEST_MAP
      
      * style and quality
      
      * Add feature-extraction
      
      * update
      
      * raise instead of skip
      
      * add tiny_model_summary.json
      
      * more explicit
      
      * skip tasks not in mapping
      
      * add availability check
      
      * Add Copyright
      
      * A way to diable irrelevant tests
      
      * update with main
      
      * remove disable_irrelevant_tests
      
      * skip tests
      
      * better skip message
      
      * better skip message
      
      * Add all pipeline task tests
      
      * revert
      
      * Import PipelineTesterMixin
      
      * subclass test classes with PipelineTesterMixin
      
      * Add pipieline_model_mapping
      
      * Fix import after adding pipieline_model_mapping
      
      * Fix style and quality after adding pipieline_model_mapping
      
      * Fix one more import after adding pipieline_model_mapping
      
      * Fix style and quality after adding pipieline_model_mapping
      
      * Fix test issues
      
      * Fix import requirements
      
      * Fix mapping for MobileViTModelTest
      
      * Update
      
      * Better skip message
      
      * pipieline_model_mapping could not be None
      
      * Remove some PipelineTesterMixin
      
      * Fix typo
      
      * revert tests_fetcher.py
      
      * update
      
      * rename
      
      * revert
      
      * Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests
      
      * style and quality
      
      * test fetcher for all pipeline/model tests
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      871c31a6
  20. 24 Feb, 2023 1 commit
  21. 21 Feb, 2023 1 commit
  22. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  23. 30 Jan, 2023 1 commit
  24. 25 Jan, 2023 2 commits
  25. 23 Jan, 2023 1 commit
  26. 20 Jan, 2023 1 commit
  27. 19 Jan, 2023 1 commit
    • Arthur's avatar
      [Whisper] Fix timestamp processor (#21187) · e9b4800d
      Arthur authored
      
      
      * add draft logit processor
      
      * add template functions
      
      * update timesapmt processor parameters
      
      * draft script
      
      * simplify code
      
      * cleanup
      
      * fixup and clean
      
      * update pipeline
      
      * style
      
      * clean up previous idea
      
      * add tokenization utils
      
      * update tokenizer and asr output
      
      * fit whisper type
      
      * style and update test
      
      * clean test
      
      * style test
      
      * update tests
      
      * update error test
      
      * udpate code (not based on review yet)
      
      * update tokenization
      
      * update asr pipeline
      
      * update code
      
      * cleanup and update test
      
      * fmt
      
      * remove text verificatino
      
      * cleanup
      
      * cleanup
      
      * add model test
      
      * update tests
      
      * update code add docstring
      
      * update code and add docstring
      
      * fix pipeline tests
      
      * add draft logit processor
      
      add template functions
      
      update timesapmt processor parameters
      
      draft script
      
      simplify code
      
      cleanup
      
      fixup and clean
      
      update pipeline
      
      style
      
      clean up previous idea
      
      add tokenization utils
      
      update tokenizer and asr output
      
      fit whisper type
      
      style and update test
      
      clean test
      
      style test
      
      update tests
      
      update error test
      
      udpate code (not based on review yet)
      
      update tokenization
      
      update asr pipeline
      
      update code
      
      cleanup and update test
      
      fmt
      
      remove text verificatino
      
      cleanup
      
      cleanup
      
      add model test
      
      update tests
      
      update code add docstring
      
      update code and add docstring
      
      fix pipeline tests
      
      * Small update.
      
      * Fixup.
      
      * Tmp.
      
      * More support.
      
      * Making `forced_decoder_ids` non mandatory for users to set.
      
      * update and fix first bug
      
      * properly process sequence right after merge if last
      
      * tofo
      
      * allow list inputs + compute begin index better
      
      * start adding tests
      
      * add the 3 edge cases
      
      * style
      
      * format sequences
      
      * fixup
      
      * update
      
      * update
      
      * style
      
      * test passes, edge cases should be good
      
      * update last value
      
      * remove Trie
      
      * update tests and expec ted values
      
      * handle bigger chunk_length
      
      * clean tests a bit
      
      * refactor chunk iter and clean pipeline
      
      * update tests
      
      * style
      
      * refactor chunk iter and clean pipeline
      
      * upade
      
      * resolve comments
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      
      * take stride right into account
      
      * update test expected values
      
      * Update code based on review
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      
      * major refactor
      
      * add correct strides for tests
      
      * Update src/transformers/pipelines/automatic_speech_recognition.py
      
      * fix whisper timestamp test
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      e9b4800d
  28. 17 Jan, 2023 1 commit
    • Arthur's avatar
      Whisper Timestamp processor and prediction (#20620) · bb300ac6
      Arthur authored
      
      
      * add draft logit processor
      
      * add template functions
      
      * update timesapmt processor parameters
      
      * draft script
      
      * simplify code
      
      * cleanup
      
      * fixup and clean
      
      * update pipeline
      
      * style
      
      * clean up previous idea
      
      * add tokenization utils
      
      * update tokenizer and asr output
      
      * fit whisper type
      
      * style and update test
      
      * clean test
      
      * style test
      
      * update tests
      
      * update error test
      
      * udpate code (not based on review yet)
      
      * update tokenization
      
      * update asr pipeline
      
      * update code
      
      * cleanup and update test
      
      * fmt
      
      * remove text verificatino
      
      * cleanup
      
      * cleanup
      
      * add model test
      
      * update tests
      
      * update code add docstring
      
      * update code and add docstring
      
      * fix pipeline tests
      
      * add draft logit processor
      
      add template functions
      
      update timesapmt processor parameters
      
      draft script
      
      simplify code
      
      cleanup
      
      fixup and clean
      
      update pipeline
      
      style
      
      clean up previous idea
      
      add tokenization utils
      
      update tokenizer and asr output
      
      fit whisper type
      
      style and update test
      
      clean test
      
      style test
      
      update tests
      
      update error test
      
      udpate code (not based on review yet)
      
      update tokenization
      
      update asr pipeline
      
      update code
      
      cleanup and update test
      
      fmt
      
      remove text verificatino
      
      cleanup
      
      cleanup
      
      add model test
      
      update tests
      
      update code add docstring
      
      update code and add docstring
      
      fix pipeline tests
      
      * Small update.
      
      * Fixup.
      
      * Tmp.
      
      * More support.
      
      * Making `forced_decoder_ids` non mandatory for users to set.
      
      * update and fix first bug
      
      * properly process sequence right after merge if last
      
      * tofo
      
      * allow list inputs + compute begin index better
      
      * start adding tests
      
      * add the 3 edge cases
      
      * style
      
      * format sequences
      
      * fixup
      
      * update
      
      * update
      
      * style
      
      * test passes, edge cases should be good
      
      * update last value
      
      * remove Trie
      
      * update tests and expec ted values
      
      * handle bigger chunk_length
      
      * clean tests a bit
      
      * refactor chunk iter and clean pipeline
      
      * update tests
      
      * style
      
      * refactor chunk iter and clean pipeline
      
      * upade
      
      * resolve comments
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      
      * take stride right into account
      
      * update test expected values
      
      * Update code based on review
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      bb300ac6
  29. 31 Dec, 2022 1 commit
  30. 23 Dec, 2022 1 commit
    • Nicolas Patry's avatar
      Adding support for `fp16` for asr pipeline. (#20864) · f7f0ec2f
      Nicolas Patry authored
      * Supporting `fp16` for asr pipeline
      
      * Adding test.
      
      * Style.
      
      * Oops.
      
      * Flake8 update ?
      
      * Fixing flake8 ?
      
      * Revert "Flake8 update ?"
      
      This reverts commit 0b917fcb520e5f34d1933d9d37d8f32b64553048.
      
      * Style (acctidentally deleted flake8 F401.)
      
      * Move to a bigger test (no small whisper model, and s2t doesn't seem to
      accept torch_dtype=fp16).
      
      Also we need to use a GPU to actually compute on fp16.
      
      * Using BatchFeature capability.
      f7f0ec2f
  31. 06 Dec, 2022 1 commit
  32. 05 Dec, 2022 1 commit
  33. 14 Nov, 2022 1 commit
  34. 18 Oct, 2022 1 commit
  35. 14 Oct, 2022 1 commit
    • Nicolas Patry's avatar
      Improve error messaging for ASR pipeline. (#19570) · 463226e2
      Nicolas Patry authored
      * Improve error messaging for ASR pipeline.
      
      - Raise error early (in `_sanitize`) so users don't waste time trying to
        run queries with invalid params.
      
      - Fix the error was after using `config.inputs_to_logits_ratio` so our
        check was masked by the failing property does not exist.
      
      - Added some manual check on s2t for the error message.
        No non ctc model seems to be used by the default runner (they are all
        skipped).
      
      * Removing pdb.
      
      * Stop the early error it doesn't really work :(.
      463226e2
  36. 11 Oct, 2022 1 commit
    • Arthur's avatar
      Fix whisper for `pipeline` (#19482) · b722a6be
      Arthur authored
      * update feature extractor params
      
      * update attention mask handling
      
      * fix doc and pipeline test
      
      * add warning when skipping test
      
      * add whisper translation and transcription test
      
      * fix build doc test
      b722a6be