Commits · 5da3db3fd5c070107df717a13382ccf1fe1efbe4 · chenpangpang / transformers

22 Dec, 2023 1 commit

[Whisper] Fix word-level timestamps with bs>1 or num_beams>1 (#28114) · 5da3db3f

Yoach Lacombe authored Dec 22, 2023



* fix frames

* use smaller chunk length

* correct beam search + tentative stride

* fix whisper word timestamp in batch

* add test batch generation with return token timestamps

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* clean a test

* make style + correct typo

* write clearer comments

* explain test in comment

---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

5da3db3f

22 Nov, 2023 1 commit

[Whisper] Add sequential longform decoding (#27492) · 4151fbb4

Patrick von Platen authored Nov 22, 2023

* [Whisper] Add seq gen

* [Whisper] Add seq gen

* more debug

* Fix whisper logit processor

* Improve whisper code further

* Fix more

* more debug

* more debug

* Improve further

* Add tests

* Prep for batch size > 1

* Get batch_size>1 working

* Correct more

* Add extensive tests

* more debug

* more debug

* more debug

* add more tests

* more debug

* Apply suggestions from code review

* more debug

* add comments to explain the code better

* add comments to explain the code better

* add comments to explain the code better

* Add more examples

* add comments to explain the code better

* fix more

* add comments to explain the code better

* add comments to explain the code better

* correct

* correct

* finalize

* Apply suggestions from code review

* Apply suggestions from code review

4151fbb4

12 Oct, 2023 1 commit
- Add many missing spaces in adjacent strings (#26751) · 40ea9ab2
  Tom Aarsen authored Oct 12, 2023
```
Add missing spaces in adjacent strings
```
  40ea9ab2
29 Sep, 2023 1 commit

[ASR Pipe] Improve docs and error messages (#26476) · 0b192de1

Sanchit Gandhi authored Sep 29, 2023



* improve docs/errors

* why whisper

* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Lysandre Debut <hi@lysand.re>

* specify pt only

---------
Co-authored-by: Lysandre Debut <hi@lysand.re>

0b192de1

14 Sep, 2023 1 commit

[Whisper] Fix word-level timestamps for audio < 30 seconds (#25607) · 95fe0f5d

Joshua Lochner authored Sep 14, 2023



* Fix word-level timestamps for audio < 30 seconds

* Fix code quality

* fix unit tests

* Fix unit tests

* Fix unit test

* temp: print out result

* temp: set max diff to None

* fix unit tests

* fix typo

* Fix typo
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Use generation config for `num_frames`

* fix docs

* Move `num_frames` to kwargs

* compute stride/attn_mask once

* mark test as slow

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>

95fe0f5d

24 Aug, 2023 1 commit
- [ASR Pipe Test] Fix CTC timestamps error message (#25727) · 02188768
  Sanchit Gandhi authored Aug 24, 2023
  
  02188768
16 Aug, 2023 1 commit

[ASR Pipeline] Fix init with timestamps (#25438) · 36f183eb

Sanchit Gandhi authored Aug 16, 2023

* [ASR Pipeline] Fix init

* refactor test

* change default kwarg setting

* only perform checks if we have to

* override init

* move pre/forward/post checks to sanitize

36f183eb

08 Aug, 2023 1 commit

[ASR Pipeline] Clarify return timestamps (#25344) · dedd1116

Sanchit Gandhi authored Aug 08, 2023

* [ASR Pipeline] Clarify return timestamps

* fix indentation

* fix ctc check

* fix ctc error message!

* fix test

* fix other test

* add new tests

* final comment

dedd1116

21 Jul, 2023 1 commit
- Avoid importing all models when instantiating a pipeline (#24960) · 5b7ffd54
  Sylvain Gugger authored Jul 21, 2023
```
* Avoid importing all models when instantiating a pipeline

* Remove sums that don't work
```
  5b7ffd54
22 Jun, 2023 1 commit

[ASR pipeline] Check for torchaudio (#23953) · 7e03e469

Sanchit Gandhi authored Jun 22, 2023



* [ASR pipeline] Check for torchaudio

* add pip instructions
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

---------
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

7e03e469

21 Jun, 2023 1 commit

add word-level timestamps to Whisper (#23205) · cd927a47

Matthijs Hollemans authored Jun 21, 2023

* let's go!

* initial implementation of token-level timestamps

* only return a single timestamp per token

* remove token probabilities

* fix return type

* fix doc comment

* strip special tokens

* rename

* revert to not stripping special tokens

* only support models that have alignment_heads

* add integration test

* consistently name it token-level timestamps

* small DTW tweak

* initial support for ASR pipeline

* fix pipeline doc comments

* resolve token timestamps in pipeline with chunking

* change warning when no final timestamp is found

* return word-level timestamps

* fixup

* fix bug that skipped final word in each chunk

* fix failing unit tests

* merge punctuations into the words

* also return word tokens

* also return token indices

* add (failing) unit test for combine_tokens_into_words

* make combine_tokens_into_words private

* restore OpenAI's punctuation rules

* add pipeline tests

* make requested changes

* PR review changes

* fix failing pipeline test

* small stuff from PR

* only return words and their timestamps, not segments

* move alignment_heads into generation config

* forgot to set alignment_heads in pipeline tests

* tiny comment fix

* grr

cd927a47

23 Mar, 2023 1 commit
- Fix various imports (#22281) · 506e7c63
  Sylvain Gugger authored Mar 23, 2023
```
* Fix various imports

* Fix copies

* Fix import
```
  506e7c63
02 Mar, 2023 1 commit

Refactor whisper asr pipeline to include language too. (#21427) · 13254591

Nicolas Patry authored Mar 02, 2023



* [WIP] whisper refacto to support language output.

* Handling merges.

* A bit more cleanup and comments.

* Many improvements.

Lots of details everywhere.

* Cleanup old code and tests.

* Handle lone timestamp tokens (just recover when something bad happens).

* Adding return_language example.

* No ffmpeg.

* Hmm.

* Some corrections.

* Both fast and slow.

* New black.

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove print.

* Undoing tests modifications.

* Smaller test modifications.

* Rename.

* Remove maxDiff.

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

13254591

24 Feb, 2023 1 commit

fix: Change is_last chunk calc and add conditional break in chunk_iter (#21612) · 279008ad

Connor Henderson authored Feb 24, 2023

* fix: Change is_last chunk calc and add conditional break

* format fix

* account for 0 and full stride_rights, add comment

* add new test

* make style

* update slow whisper asr test timestamps

* use nested_simplify on output and round timestamp to hundreths place

279008ad

10 Feb, 2023 1 commit

[`pipeline`] A simple fix for half-precision & 8bit models (#21479) · f8394268

Younes Belkada authored Feb 10, 2023



* v1 fix

* adapt from suggestions

* make style

* fix tests

* add gpu tests

* update docs

* fix other tests

* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* better fix

* make fixup

* better example

* revert changes

* proposal

* more elegant solution

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

f8394268

06 Feb, 2023 1 commit

Update quality tooling for formatting (#21480) · 6f79d264

Sylvain Gugger authored Feb 06, 2023

* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies

6f79d264

27 Jan, 2023 1 commit

[Whisper] another patch (#21324) · 0dff407d

Arthur authored Jan 27, 2023

* another patch

* fix timestamp test modeling

* let it be negative when the token is None

0dff407d

25 Jan, 2023 1 commit

[Whisper] Refactor whisper (#21252) · 255257f3

Arthur authored Jan 25, 2023

* update whisper logit processor

* add generate for whisper

* remove part of the whisper specific code from pipeline

* update logit processes

* major update

* enforce first timestamp

* update generate

* add more tests

* update new decoding strategy

* Apply suggestions from code review

* update docstring

* fixup

* default config will not have multilingual ar

* update expected tokenizer size, see pull on the hub for whisper-tiny

255257f3

23 Jan, 2023 1 commit

[ci-daily] Fix pipeline tests (#21257) · b80b2218

Arthur authored Jan 23, 2023

* use streaming dataset

* fix whisper's test

* add rescale argument to chunk_iter

b80b2218

20 Jan, 2023 2 commits
- [Whispe] Fix pipeline after timestamp merges (#21198) · 5d3cb760
  Arthur authored Jan 20, 2023
```
* pass return_timestamps to pre-process

* add a test to test it

* test does not need device 0

* remove failing bit

* update test
```
  5d3cb760
- Enabling live `automatic-speech-recognition` asr for Whisper. (#21196) · 5326460f
  Nicolas Patry authored Jan 20, 2023
```
* Enabling live `automatic-speech-recognition` asr for Whisper.

* Dummy change.
```
  5326460f
19 Jan, 2023 1 commit

[Whisper] Fix timestamp processor (#21187) · e9b4800d

Arthur authored Jan 19, 2023



* add draft logit processor

* add template functions

* update timesapmt processor parameters

* draft script

* simplify code

* cleanup

* fixup and clean

* update pipeline

* style

* clean up previous idea

* add tokenization utils

* update tokenizer and asr output

* fit whisper type

* style and update test

* clean test

* style test

* update tests

* update error test

* udpate code (not based on review yet)

* update tokenization

* update asr pipeline

* update code

* cleanup and update test

* fmt

* remove text verificatino

* cleanup

* cleanup

* add model test

* update tests

* update code add docstring

* update code and add docstring

* fix pipeline tests

* add draft logit processor

add template functions

update timesapmt processor parameters

draft script

simplify code

cleanup

fixup and clean

update pipeline

style

clean up previous idea

add tokenization utils

update tokenizer and asr output

fit whisper type

style and update test

clean test

style test

update tests

update error test

udpate code (not based on review yet)

update tokenization

update asr pipeline

update code

cleanup and update test

fmt

remove text verificatino

cleanup

cleanup

add model test

update tests

update code add docstring

update code and add docstring

fix pipeline tests

* Small update.

* Fixup.

* Tmp.

* More support.

* Making `forced_decoder_ids` non mandatory for users to set.

* update and fix first bug

* properly process sequence right after merge if last

* tofo

* allow list inputs + compute begin index better

* start adding tests

* add the 3 edge cases

* style

* format sequences

* fixup

* update

* update

* style

* test passes, edge cases should be good

* update last value

* remove Trie

* update tests and expec ted values

* handle bigger chunk_length

* clean tests a bit

* refactor chunk iter and clean pipeline

* update tests

* style

* refactor chunk iter and clean pipeline

* upade

* resolve comments

* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* take stride right into account

* update test expected values

* Update code based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* major refactor

* add correct strides for tests

* Update src/transformers/pipelines/automatic_speech_recognition.py

* fix whisper timestamp test
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

e9b4800d

17 Jan, 2023 1 commit

Whisper Timestamp processor and prediction (#20620) · bb300ac6

Arthur authored Jan 17, 2023



* add draft logit processor

* add template functions

* update timesapmt processor parameters

* draft script

* simplify code

* cleanup

* fixup and clean

* update pipeline

* style

* clean up previous idea

* add tokenization utils

* update tokenizer and asr output

* fit whisper type

* style and update test

* clean test

* style test

* update tests

* update error test

* udpate code (not based on review yet)

* update tokenization

* update asr pipeline

* update code

* cleanup and update test

* fmt

* remove text verificatino

* cleanup

* cleanup

* add model test

* update tests

* update code add docstring

* update code and add docstring

* fix pipeline tests

* add draft logit processor

add template functions

update timesapmt processor parameters

draft script

simplify code

cleanup

fixup and clean

update pipeline

style

clean up previous idea

add tokenization utils

update tokenizer and asr output

fit whisper type

style and update test

clean test

style test

update tests

update error test

udpate code (not based on review yet)

update tokenization

update asr pipeline

update code

cleanup and update test

fmt

remove text verificatino

cleanup

cleanup

add model test

update tests

update code add docstring

update code and add docstring

fix pipeline tests

* Small update.

* Fixup.

* Tmp.

* More support.

* Making `forced_decoder_ids` non mandatory for users to set.

* update and fix first bug

* properly process sequence right after merge if last

* tofo

* allow list inputs + compute begin index better

* start adding tests

* add the 3 edge cases

* style

* format sequences

* fixup

* update

* update

* style

* test passes, edge cases should be good

* update last value

* remove Trie

* update tests and expec ted values

* handle bigger chunk_length

* clean tests a bit

* refactor chunk iter and clean pipeline

* update tests

* style

* refactor chunk iter and clean pipeline

* upade

* resolve comments

* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* take stride right into account

* update test expected values

* Update code based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

bb300ac6

07 Jan, 2023 1 commit
- fix typo (#21042) · bd9d5126
  Kaito Sugimoto authored Jan 07, 2023
  
  bd9d5126
31 Dec, 2022 1 commit
- Add generate kwargs to `AutomaticSpeechRecognitionPipeline` (#20952) · 47c9b22d
  bofeng huang authored Dec 31, 2022
```
* Add generate kwargs to AutomaticSpeechRecognitionPipeline

* Add test for generation kwargs
```
  47c9b22d
29 Dec, 2022 1 commit

Fix FP16 inference in TextGenerationPipeline (#20913) · fe65657d

bofeng huang authored Dec 29, 2022



* add torch_dtype attribute to Pipeline

* Use torch_dtype to cast input tensor type in AutomaticSpeechRecognitionPipeline

* Fix code quality

* Add TextGenerationPipeline fp16 test

* Fix code quality

* Remove useless require in tests
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

fe65657d

23 Dec, 2022 1 commit

Adding support for `fp16` for asr pipeline. (#20864) · f7f0ec2f

Nicolas Patry authored Dec 23, 2022

* Supporting `fp16` for asr pipeline

* Adding test.

* Style.

* Oops.

* Flake8 update ?

* Fixing flake8 ?

* Revert "Flake8 update ?"

This reverts commit 0b917fcb520e5f34d1933d9d37d8f32b64553048.

* Style (acctidentally deleted flake8 F401.)

* Move to a bigger test (no small whisper model, and s2t doesn't seem to
accept torch_dtype=fp16).

Also we need to use a GPU to actually compute on fp16.

* Using BatchFeature capability.

f7f0ec2f

30 Nov, 2022 1 commit
- Update `AutomaticSpeechRecognitionPipeline` doc example (#20512) · afb66749
  Yih-Dar authored Nov 30, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  afb66749
18 Nov, 2022 1 commit
- Remove double brackets (#20307) · b2c863a3
  Steven Liu authored Nov 18, 2022
```
* remove double brackets

* oops get other bracket
```
  b2c863a3
16 Nov, 2022 2 commits

Rephrasing the link. (#20253) · a239bdd2
Nicolas Patry authored Nov 16, 2022
```
* Rephrasing the link.

* Removing `nested_simplify` within doctests.

* Fixup.
```
a239bdd2

Adding ASR pipeline example. (#20226) · 443aaaa1

Nicolas Patry authored Nov 16, 2022

* Adding ASR pipeline example.

* De indent.

* Example deindent.

* Fixing example ?

* Putting the example in a more prominent place.

* Fixup.

* Adding the file.

* Adding the doctest to the daily test.

* Fixing comments.

* transcriber name.

* Adding `>>>`.

* Removing assert.

443aaaa1

14 Nov, 2022 2 commits

Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. (#20104) · 25c451e5

Nicolas Patry authored Nov 14, 2022

* Very crude matching algorithm.

* Fixing tests.

* Removing comments

* Adding warning + fix short matches.

* Cleanup tests.

* Quality.

* Less noisy.

* Fixup.

25c451e5

Proposal Remove the weird `inspect` in ASR pipeline and make WhisperEncoder... · 03bc6ece

Nicolas Patry authored Nov 14, 2022


Proposal Remove the weird `inspect` in ASR pipeline and make WhisperEncoder just nice to use. (#19571)

* Proposal Remove the weird `inspect` in ASR pipeline and make
WhisperEncoder just nice to use.

It seems that accepting `attention_mask` is kind of an invariant of our
models. For Seq2Seq ASR models, we had a special comment on how it
actually was important to send it.

`inspecting` seems pretty brittle way to handle this case.
My suggestion is to simply add it as an kwarg that and just ignoring
it with the docstring explaining why it's ignored.

* Fixup.

* Update src/transformers/models/whisper/modeling_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Doc fixing .
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

03bc6ece

14 Oct, 2022 1 commit

Improve error messaging for ASR pipeline. (#19570) · 463226e2

Nicolas Patry authored Oct 14, 2022

* Improve error messaging for ASR pipeline.

- Raise error early (in `_sanitize`) so users don't waste time trying to
  run queries with invalid params.

- Fix the error was after using `config.inputs_to_logits_ratio` so our
  check was masked by the failing property does not exist.

- Added some manual check on s2t for the error message.
  No non ctc model seems to be used by the default runner (they are all
  skipped).

* Removing pdb.

* Stop the early error it doesn't really work :(.

463226e2

07 Oct, 2022 1 commit
- update attention mask handling (#19385) · 994b7a4e
  Arthur authored Oct 07, 2022
```
* update feature extractor params

* update attention mask handling
```
  994b7a4e
28 Jul, 2022 1 commit
- Update automatic_speech_recognition.py (#18339) · 5d1fed07
  bhuang authored Jul 28, 2022
  
  5d1fed07
21 Apr, 2022 1 commit

Adding support for `array` key in raw dictionnaries in ASR pipeline. (#16827) · e789418e

Nicolas Patry authored Apr 21, 2022



* Adding support for `array` key in raw dictionnaries in ASR pipeline.

* ES .

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Making it work by not popping `array` first.

* Black 22.3
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e789418e

19 Apr, 2022 1 commit
- [ASR Pipeline] Correct init docs (#16833) · db9f1891
  Patrick von Platen authored Apr 19, 2022
```
* correct

* up
```
  db9f1891
12 Apr, 2022 1 commit

Change the chunk_iter function to handle (#16730) · a192f61e

Nicolas Patry authored Apr 12, 2022

* Change the chunk_iter function to handle

the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.

We need to remove the right striding on the previous item.

* Remove commented line.

a192f61e

23 Mar, 2022 1 commit

Reorganize file utils (#16264) · 4975002d

Sylvain Gugger authored Mar 23, 2022

* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit

4975002d