Commits · 25c451e5a044969eb91e1e481574a2bfca5130ca · chenpangpang / transformers

14 Nov, 2022 2 commits

Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. (#20104) · 25c451e5

Nicolas Patry authored Nov 14, 2022

* Very crude matching algorithm.

* Fixing tests.

* Removing comments

* Adding warning + fix short matches.

* Cleanup tests.

* Quality.

* Less noisy.

* Fixup.

25c451e5

Proposal Remove the weird `inspect` in ASR pipeline and make WhisperEncoder... · 03bc6ece

Nicolas Patry authored Nov 14, 2022


Proposal Remove the weird `inspect` in ASR pipeline and make WhisperEncoder just nice to use. (#19571)

* Proposal Remove the weird `inspect` in ASR pipeline and make
WhisperEncoder just nice to use.

It seems that accepting `attention_mask` is kind of an invariant of our
models. For Seq2Seq ASR models, we had a special comment on how it
actually was important to send it.

`inspecting` seems pretty brittle way to handle this case.
My suggestion is to simply add it as an kwarg that and just ignoring
it with the docstring explaining why it's ignored.

* Fixup.

* Update src/transformers/models/whisper/modeling_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Doc fixing .
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

03bc6ece

14 Oct, 2022 1 commit

Improve error messaging for ASR pipeline. (#19570) · 463226e2

Nicolas Patry authored Oct 14, 2022

* Improve error messaging for ASR pipeline.

- Raise error early (in `_sanitize`) so users don't waste time trying to
  run queries with invalid params.

- Fix the error was after using `config.inputs_to_logits_ratio` so our
  check was masked by the failing property does not exist.

- Added some manual check on s2t for the error message.
  No non ctc model seems to be used by the default runner (they are all
  skipped).

* Removing pdb.

* Stop the early error it doesn't really work :(.

463226e2

07 Oct, 2022 1 commit
- update attention mask handling (#19385) · 994b7a4e
  Arthur authored Oct 07, 2022
```
* update feature extractor params

* update attention mask handling
```
  994b7a4e
28 Jul, 2022 1 commit
- Update automatic_speech_recognition.py (#18339) · 5d1fed07
  bhuang authored Jul 28, 2022
  
  5d1fed07
21 Apr, 2022 1 commit

Adding support for `array` key in raw dictionnaries in ASR pipeline. (#16827) · e789418e

Nicolas Patry authored Apr 21, 2022



* Adding support for `array` key in raw dictionnaries in ASR pipeline.

* ES .

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Making it work by not popping `array` first.

* Black 22.3
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e789418e

19 Apr, 2022 1 commit
- [ASR Pipeline] Correct init docs (#16833) · db9f1891
  Patrick von Platen authored Apr 19, 2022
```
* correct

* up
```
  db9f1891
12 Apr, 2022 1 commit

Change the chunk_iter function to handle (#16730) · a192f61e

Nicolas Patry authored Apr 12, 2022

* Change the chunk_iter function to handle

the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.

We need to remove the right striding on the previous item.

* Remove commented line.

a192f61e

23 Mar, 2022 1 commit

Reorganize file utils (#16264) · 4975002d

Sylvain Gugger authored Mar 23, 2022

* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit

4975002d

02 Mar, 2022 1 commit
- Adding timestamps for CTC with LM in ASR pipeline. (#15863) · 6e57a569
  Nicolas Patry authored Mar 02, 2022
```
* Adding timestamps for CTC with LM in ASR pipeline.

* iRemove print.

* Nit change.
```
  6e57a569
28 Feb, 2022 1 commit

Fixing the timestamps with chunking. (#15843) · 97f9b8a2

Nicolas Patry authored Feb 28, 2022



* Fixing the timestamps with chunking.

* The changes modified (and fixed) the striding tests.

* Adding a tokenizer test.

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Defense -> comment.

* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

97f9b8a2

25 Feb, 2022 1 commit

Adding the option to return_timestamps on pure CTC ASR models. (#15792) · ad0d7d17

Nicolas Patry authored Feb 25, 2022



* Adding the option to return_timestamps on pure CTC ASR models.

* Remove `math.prod` which was introduced in Python 3.8

* int are not floats.

* Reworking the PR to support "char" vs "word" output.

* Fixup!

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

ad0d7d17

15 Feb, 2022 1 commit

Add `decoder_kwargs` to send to LM on asr pipeline. (#15646) · a3dbbc34

Nicolas Patry authored Feb 15, 2022


Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>
Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

a3dbbc34

07 Feb, 2022 1 commit
- [ASR pipeline] correct asr pipeline for seq2seq models (#15541) · 5f1918a4
  Patrick von Platen authored Feb 07, 2022
  
  5f1918a4
02 Feb, 2022 2 commits

Fic docstring of ASR pipeline (#15481) · 13297ac7
Sylvain Gugger authored Feb 02, 2022

13297ac7

Adding support for `microphone` streaming within pipeline. (#15046) · 623d8cb4

Nicolas Patry authored Feb 02, 2022



* Adding support for `microphone` streaming within pipeline.

- Uses `ffmpeg` to get microphone data.
- Makes sure alignment is made to `size_of_sample`.
- Works by sending `{"raw": ..data.., "stride": (n, left, right),
"partial": bool}`
directly to the pipeline enabling to stream partial results and still
get inference.
- Let's `partial` information flow through the pipeline to enable caller
  to get it back and choose to display text or not.

- The striding reconstitution is bound to have errors since CTC does not
keep previous state. Currently most of the errors are we don't know if
there's a space or not between two chunks.
Since we have some left striding info, we could use that during decoding
to choose what to do with those spaces and even extra letters maybe (if
the stride is long enough, it's bound to cover at least a few symbols)

Fixing tests.

Protecting with `require_torch`.

`raw_ctc` support for nicer demo.

Post rebase fixes.

Revamp to split raw_mic_data from it's live chunking.

- Requires a refactor to make everything a bit cleaner.

Automatic resampling.

Small fix.

Small fix.

* Post rebase fix (need to let super handle more logic, reorder args.)

* Update docstrings

* Docstring format.

* Remove print.

* Prevent flow of `input_values`.

* Fixing `stride` too.

* Fixing the PR by removing `raw_ctc`.

* Better docstrings.

* Fixing init.

* Update src/transformers/pipelines/audio_utils.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Quality.
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

623d8cb4

24 Jan, 2022 1 commit
- Remove old debug code leftover. (#15306) · eac4aecc
  Nicolas Patry authored Jan 24, 2022
  
  eac4aecc
19 Jan, 2022 1 commit

Make chuking smartly (long files) work on asr ctc_with_lm. (#15219) · 3fefee99

Nicolas Patry authored Jan 19, 2022



* [WIP] Make chuking smartly (long files) work on asr ctc_with_lm.

* Slow test with functionality.

* Fixing regular test.

* fix for batch size 1

* Handling batch outside `rescale_Stride`.

- Renamed to `rescale_stride`.

* Disable equality in the test.

* Remove print.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

3fefee99

18 Jan, 2022 1 commit
- `is_ctc` needs to be updated to `self.type == "ctc". (#15194) · dea563c9
  Nicolas Patry authored Jan 18, 2022
```
* `is_ctc` needs to be updated to `self.type == "ctc".

* Adding fast test for this functionality.
```
  dea563c9
12 Jan, 2022 1 commit

Pipeline ASR with LM. (#15071) · 68cc4ccd

Nicolas Patry authored Jan 12, 2022



* Pipeline ASR with LM.

* Revamped into `self.decoder`.

* Fixing.

* 2nd fix.

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fixing.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

68cc4ccd

04 Jan, 2022 1 commit

Hotfix `chunk_length_s` instead of `_ms`. (#15029) · 19d37c2d

Nicolas Patry authored Jan 04, 2022

* Hotfix `chunk_length_s` instead of `_ms`.

* Adding fix of `pad_token` which should be last/previous token for CTC

proper decoding

* Fixing ChunkPipeline unwrapping.

* Adding a PackIterator specific test.

19d37c2d

03 Jan, 2022 1 commit

Large audio chunking for the existing ASR pipeline (#14896) · 38f95d18

Anton Lozhkov authored Jan 03, 2022



* Naive ASR chunking

* Fixing batching for ASR.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

38f95d18

27 Dec, 2021 1 commit

Doc styler v2 (#14950) · 87e6e4fe

Sylvain Gugger authored Dec 27, 2021

* New doc styler

* Fix issue with args at the start

* Code sample fixes

* Style code examples in MDX

* Fix more patterns

* Typo

* Typo

* More patterns

* Do without black for now

* Get more info in error

* Docstring style

* Re-enable check

* Quality

* Fix add_end_docstring decorator

* Fix docstring

87e6e4fe

21 Dec, 2021 1 commit

Mass conversion of documentation from rst to Markdown (#14866) · 27b3031d

Sylvain Gugger authored Dec 21, 2021

* Convert docstrings of all configurations and tokenizers

* Processors and fixes

* Last modeling files and fixes to models

* Pipeline modules

* Utils files

* Data submodule

* All the other files

* Style

* Missing examples

* Style again

* Fix copies

* Say bye bye to rst docstrings forever

27b3031d

06 Dec, 2021 1 commit
- Fix syntax for class references (#14644) · e513c16e
  Sylvain Gugger authored Dec 06, 2021
  
  e513c16e
29 Oct, 2021 1 commit

Adding `batch_size` support for (almost) all pipelines (#13724) · be236361

Nicolas Patry authored Oct 29, 2021



* Tentative enabling of `batch_size` for pipelines.

* Add systematic test for pipeline batching.

* Enabling batch_size on almost all pipelines

- Not `zero-shot` (it's already passing stuff as batched so trickier)
- Not `QA` (preprocess uses squad features, we need to switch to real
tensors at this boundary.

* Adding `min_length_for_response` for conversational.

* Making CTC, speech mappings avaiable regardless of framework.

* Attempt at fixing automatic tests (ffmpeg not enabled for fast tests)

* Removing ffmpeg dependency in tests.

* Small fixes.

* Slight cleanup.

* Adding docs

and adressing comments.

* Quality.

* Update docs/source/main_classes/pipelines.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/question_answering.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Improving docs.

* Update docs/source/main_classes/pipelines.rst
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

* N -> oberved_batch_size

softmax trick.

* Follow `padding_side`.

* Supporting image pipeline batching (and padding).

* Rename `unbatch` -> `loader_batch`.

* unbatch_size forgot.

* Custom padding for offset mappings.

* Attempt to remove librosa.

* Adding require_audio.

* torchaudio.

* Back to using datasets librosa.

* Adding help to set a pad_token on the tokenizer.

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Quality.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

be236361

27 Oct, 2021 1 commit
- [Pipelines] Fix ASR model types check (#14178) · 25ceb818
  Anton Lozhkov authored Oct 27, 2021
  
  25ceb818
21 Sep, 2021 1 commit
- Add Speech AutoModels (#13655) · 48fa42e5
  Patrick von Platen authored Sep 21, 2021
```
* upload

* correct

* correct

* correct

* finish

* up

* up

* up again
```
  48fa42e5
10 Sep, 2021 1 commit

[Large PR] Entire rework of pipelines. (#13308) · c63fcabf

Nicolas Patry authored Sep 10, 2021



* Enabling dataset iteration on pipelines.

Enabling dataset iteration on pipelines.

Unifying parameters under `set_parameters` function.

Small fix.

Last fixes after rebase

Remove print.

Fixing text2text `generate_kwargs`

No more `self.max_length`.

Fixing tf only conversational.

Consistency in start/stop index over TF/PT.

Speeding up drastically on TF (nasty bug where max_length would increase
a ton.)

Adding test for support for non fast tokenizers.

Fixign GPU usage on zero-shot.

Fix working on Tf.

Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Small cleanup.

Remove all asserts + simple format.

* Fixing audio-classification for large PR.

* Overly explicity null checking.

* Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.

* Removed internal state for parameters of the  pipeline.

Instead of overriding implicitly internal state, we moved
to real named arguments on every `preprocess`, `_forward`,
`postprocess` function.

Instead `_sanitize_parameters` will be used to split all kwargs
of both __init__ and __call__ into the 3 kinds of named parameters.

* Move import warnings.

* Small fixes.

* Quality.

* Another small fix, using the CI to debug faster.

* Last fixes.

* Last fix.

* Small cleanup of tensor moving.

* is not None.

* Adding a bunch of docs + a iteration test.

* Fixing doc style.

* KeyDataset = None guard.

* RRemoving the Cuda test for pipelines (was testing).

* Even more simple iteration test.

* Correct import .

* Long day.

* Fixes in docs.

* [WIP] migrating object detection.

* Fixed the target_size bug.

* Fixup.

* Bad variable name.

* Fixing `ensure_on_device` respects original ModelOutput.

c63fcabf

01 Sep, 2021 1 commit

Add SpeechEncoderDecoder & Speech2Text2 (#13186) · 0b8c84e1

Patrick von Platen authored Sep 01, 2021



* fix_torch_device_generate_test

* remove @

* up

* correct some bugs

* correct model

* finish speech2text extension

* up

* up

* up

* up

* Update utils/custom_init_isort.py

* up

* up

* update with tokenizer

* correct old tok

* correct old tok

* fix bug

* up

* up

* add more tests

* up

* fix docs

* up

* fix some more tests

* add better config

* correct some more things
"

* fix tests

* improve docs

* Apply suggestions from code review

* Apply suggestions from code review

* final fixes

* finalize

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* apply suggestions Lysandre and Sylvain

* apply nicos suggestions

* upload everything

* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

0b8c84e1

26 May, 2021 1 commit

Ensure input tensor are on device. (#11874) · 0b0a5984

francescorubbo authored May 26, 2021

The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.

0b0a5984

30 Apr, 2021 1 commit

Adding `AutomaticSpeechRecognitionPipeline`. (#11337) · db9dd09c

Nicolas Patry authored Apr 30, 2021



* Adding `AutomaticSpeechRecognitionPipeline`.

- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
  sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.

Future work:

- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.

* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding docs + main import + type checking + LICENSE.

* Doc style !.

* Fixing TYPE_HINT.

* Specifying waveform shape in the docs.

* Adding asserts + specify in the documentation the shape of the input
np.ndarray.

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Adding require to tests + move the `feature_extractor` doc.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

db9dd09c