Commits · 25ddd91b249014d818fb2ed3d4ba856ed9a5653e · chenpangpang / transformers

31 Dec, 2022 1 commit
- Add generate kwargs to `AutomaticSpeechRecognitionPipeline` (#20952) · 47c9b22d
  bofeng huang authored Dec 31, 2022
```
* Add generate kwargs to AutomaticSpeechRecognitionPipeline

* Add test for generation kwargs
```
  47c9b22d
23 Dec, 2022 1 commit

Adding support for `fp16` for asr pipeline. (#20864) · f7f0ec2f

Nicolas Patry authored Dec 23, 2022

* Supporting `fp16` for asr pipeline

* Adding test.

* Style.

* Oops.

* Flake8 update ?

* Fixing flake8 ?

* Revert "Flake8 update ?"

This reverts commit 0b917fcb520e5f34d1933d9d37d8f32b64553048.

* Style (acctidentally deleted flake8 F401.)

* Move to a bigger test (no small whisper model, and s2t doesn't seem to
accept torch_dtype=fp16).

Also we need to use a GPU to actually compute on fp16.

* Using BatchFeature capability.

f7f0ec2f

06 Dec, 2022 1 commit

Fix `AutomaticSpeechRecognitionPipelineTests.run_pipeline_test` (#20597) · 9b14c1b6

Yih-Dar authored Dec 06, 2022



* Remove assert exception not triggered

* Fix wrong expected exception string

* fix

* use assertRaisesRegex
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9b14c1b6

05 Dec, 2022 1 commit
- Ci-whisper-asr (#20588) · 538e5248
  Arthur authored Dec 05, 2022
```
* Expected output for the test changed

* fix failing asr test
```
  538e5248
14 Nov, 2022 1 commit

Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. (#20104) · 25c451e5

Nicolas Patry authored Nov 14, 2022

* Very crude matching algorithm.

* Fixing tests.

* Removing comments

* Adding warning + fix short matches.

* Cleanup tests.

* Quality.

* Less noisy.

* Fixup.

25c451e5

18 Oct, 2022 1 commit
- fix test whisper with new max length (#19668) · d356b89f
  Arthur authored Oct 18, 2022
  
  d356b89f
14 Oct, 2022 1 commit

Improve error messaging for ASR pipeline. (#19570) · 463226e2

Nicolas Patry authored Oct 14, 2022

* Improve error messaging for ASR pipeline.

- Raise error early (in `_sanitize`) so users don't waste time trying to
  run queries with invalid params.

- Fix the error was after using `config.inputs_to_logits_ratio` so our
  check was masked by the failing property does not exist.

- Added some manual check on s2t for the error message.
  No non ctc model seems to be used by the default runner (they are all
  skipped).

* Removing pdb.

* Stop the early error it doesn't really work :(.

463226e2

11 Oct, 2022 1 commit

Fix whisper for `pipeline` (#19482) · b722a6be

Arthur authored Oct 11, 2022

* update feature extractor params

* update attention mask handling

* fix doc and pipeline test

* add warning when skipping test

* add whisper translation and transcription test

* fix build doc test

b722a6be

07 Oct, 2022 1 commit

Rework pipeline tests (#19366) · 9ac586b3

Sylvain Gugger authored Oct 07, 2022

* Rework pipeline tests

* Try to fix Flax tests

* Try to put it before

* Use a new decorator instead

* Remove ignore marker since it doesn't work

* Filter pipeline tests

* Woopsie

* Use the fitlered list

* Clean up and fake modif

* Remove init

* Revert fake modif

9ac586b3

05 Aug, 2022 1 commit

Fixing issue where generic model types wouldn't load properly with the pipeline (#18392) · 586dcf6b

Nicolas Patry authored Aug 05, 2022

* Adding a better error message when the model is improperly configured

within transformers.

* Update src/transformers/pipelines/__init__.py

* Black version.

* Overriding task aliases so that tokenizer+feature_extractor

values are correct.

* Fixing task aliases by overriding their names early

* X.

* Fixing feature-extraction.

* black again.

* Normalizing `translation` too.

* Fixing last few corner cases.

translation need to use its non normalized name (translation_XX_to_YY,
so that the task_specific_params are correctly overloaded).
This can be removed and cleaned up in a later PR.

`speech-encode-decoder` actually REQUIRES to pass a `tokenizer` manually
so the error needs to be discarded when the `tokenizer` is already
there.

* doc-builder fix.

* Fixing the real issue.

* Removing dead code.

* Do not import the actual config classes.

586dcf6b

12 May, 2022 1 commit

Black preview (#17217) · afe5d42d

Sylvain Gugger authored May 12, 2022

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

afe5d42d

12 Apr, 2022 1 commit

Change the chunk_iter function to handle (#16730) · a192f61e

Nicolas Patry authored Apr 12, 2022

* Change the chunk_iter function to handle

the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.

We need to remove the right striding on the previous item.

* Remove commented line.

a192f61e

02 Mar, 2022 1 commit
- Adding timestamps for CTC with LM in ASR pipeline. (#15863) · 6e57a569
  Nicolas Patry authored Mar 02, 2022
```
* Adding timestamps for CTC with LM in ASR pipeline.

* iRemove print.

* Nit change.
```
  6e57a569
28 Feb, 2022 1 commit

Fixing the timestamps with chunking. (#15843) · 97f9b8a2

Nicolas Patry authored Feb 28, 2022



* Fixing the timestamps with chunking.

* The changes modified (and fixed) the striding tests.

* Adding a tokenizer test.

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Defense -> comment.

* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

97f9b8a2

25 Feb, 2022 1 commit

Adding the option to return_timestamps on pure CTC ASR models. (#15792) · ad0d7d17

Nicolas Patry authored Feb 25, 2022



* Adding the option to return_timestamps on pure CTC ASR models.

* Remove `math.prod` which was introduced in Python 3.8

* int are not floats.

* Reworking the PR to support "char" vs "word" output.

* Fixup!

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

ad0d7d17

23 Feb, 2022 1 commit

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

15 Feb, 2022 2 commits

Add `decoder_kwargs` to send to LM on asr pipeline. (#15646) · a3dbbc34

Nicolas Patry authored Feb 15, 2022


Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>
Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

a3dbbc34

Fix ASR pipelines from local directories with wav2vec models that have... · 9eb7e9ba

Javier de la Rosa authored Feb 15, 2022

Fix ASR pipelines from local directories with wav2vec models that have language models attached (#15590)

* Fix loading pipelines with wav2vec models with lm when in local paths

* Adding tests

* Fix test

* Adding tests

* Flake8 fixes

* Removing conflict files :(

* Adding task type to test

* Remove unnecessary test and imports

9eb7e9ba

07 Feb, 2022 1 commit
- [ASR pipeline] correct asr pipeline for seq2seq models (#15541) · 5f1918a4
  Patrick von Platen authored Feb 07, 2022
  
  5f1918a4
02 Feb, 2022 1 commit

Adding support for `microphone` streaming within pipeline. (#15046) · 623d8cb4

Nicolas Patry authored Feb 02, 2022



* Adding support for `microphone` streaming within pipeline.

- Uses `ffmpeg` to get microphone data.
- Makes sure alignment is made to `size_of_sample`.
- Works by sending `{"raw": ..data.., "stride": (n, left, right),
"partial": bool}`
directly to the pipeline enabling to stream partial results and still
get inference.
- Let's `partial` information flow through the pipeline to enable caller
  to get it back and choose to display text or not.

- The striding reconstitution is bound to have errors since CTC does not
keep previous state. Currently most of the errors are we don't know if
there's a space or not between two chunks.
Since we have some left striding info, we could use that during decoding
to choose what to do with those spaces and even extra letters maybe (if
the stride is long enough, it's bound to cover at least a few symbols)

Fixing tests.

Protecting with `require_torch`.

`raw_ctc` support for nicer demo.

Post rebase fixes.

Revamp to split raw_mic_data from it's live chunking.

- Requires a refactor to make everything a bit cleaner.

Automatic resampling.

Small fix.

Small fix.

* Post rebase fix (need to let super handle more logic, reorder args.)

* Update docstrings

* Docstring format.

* Remove print.

* Prevent flow of `input_values`.

* Fixing `stride` too.

* Fixing the PR by removing `raw_ctc`.

* Better docstrings.

* Fixing init.

* Update src/transformers/pipelines/audio_utils.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Quality.
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

623d8cb4

19 Jan, 2022 1 commit

Make chuking smartly (long files) work on asr ctc_with_lm. (#15219) · 3fefee99

Nicolas Patry authored Jan 19, 2022



* [WIP] Make chuking smartly (long files) work on asr ctc_with_lm.

* Slow test with functionality.

* Fixing regular test.

* fix for batch size 1

* Handling batch outside `rescale_Stride`.

- Renamed to `rescale_stride`.

* Disable equality in the test.

* Remove print.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

3fefee99

18 Jan, 2022 2 commits
- [ASR pipeline] correct with lm pipeline (#15200) · 497346d0
  Patrick von Platen authored Jan 18, 2022
```
* [ASR pipeline] correct with lm pipeline

* improve error
```
  497346d0
- `is_ctc` needs to be updated to `self.type == "ctc". (#15194) · dea563c9
  Nicolas Patry authored Jan 18, 2022
```
* `is_ctc` needs to be updated to `self.type == "ctc".

* Adding fast test for this functionality.
```
  dea563c9
12 Jan, 2022 1 commit

Pipeline ASR with LM. (#15071) · 68cc4ccd

Nicolas Patry authored Jan 12, 2022



* Pipeline ASR with LM.

* Revamped into `self.decoder`.

* Fixing.

* 2nd fix.

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fixing.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

68cc4ccd

04 Jan, 2022 1 commit

Hotfix `chunk_length_s` instead of `_ms`. (#15029) · 19d37c2d

Nicolas Patry authored Jan 04, 2022

* Hotfix `chunk_length_s` instead of `_ms`.

* Adding fix of `pad_token` which should be last/previous token for CTC

proper decoding

* Fixing ChunkPipeline unwrapping.

* Adding a PackIterator specific test.

19d37c2d

03 Jan, 2022 1 commit

Large audio chunking for the existing ASR pipeline (#14896) · 38f95d18

Anton Lozhkov authored Jan 03, 2022



* Naive ASR chunking

* Fixing batching for ASR.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

38f95d18

30 Dec, 2021 1 commit
- [Generate] correct encoder_outputs are passed without attention_mask (#14980) · c043ce6c
  Patrick von Platen authored Dec 30, 2021
```
* [Generate] correct encoder_outputs are passed without attention_mask

* Apply suggestions from code review

* up
```
  c043ce6c
27 Dec, 2021 1 commit

ChunkPipeline (batch_size enabled on `zero-cls` and `qa` pipelines. (#14225) · b058490c

Nicolas Patry authored Dec 27, 2021



* Pipeline chunks.

* Batching for Chunking pipelines ?

* Batching for `question-answering` and `zero-shot-cls`.

* Fixing for FNet.

* Making ASR a chunk pipeline.

* Chunking ASR API.

* doc style.

* Fixing ASR test.

* Fixing QA eror (p_mask, padding is 1, not 0).

* Enable both vad and simple chunking.

* Max length for vad.

* remove inference mode, crashing on s2t.

* Revert ChunkPipeline for ASRpipeline.

Too many knobs for simple integration within the pipeline, better stick
to external convenience functions instead, more control to be had,
simpler pipeline and also easier to replace with other things later.

* Drop necessity for PT for these.

* Enabling generators.

* Add mic + cleanup.

* Typo.

* Typo2.

* Remove ASR work, it does not belong in this PR anymore.

* Update src/transformers/pipelines/pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding many comments.

* Doc quality.

* `hidden_states` handling.

* Adding doc.

* Bad rebase.

* Autofixing docs.

* Fixing CRITICAL bug in the new Zerocls pipeline.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

b058490c

16 Dec, 2021 1 commit
- Remove datasets requirement (#14795) · d194d639
  Lysandre Debut authored Dec 16, 2021
  
  d194d639
17 Nov, 2021 1 commit

[Wav2Vec2] Add New Wav2Vec2 Translation (#14392) · 700a748f

Patrick von Platen authored Nov 17, 2021

* add new wav2vec2 translation

* correct

* up

* add tests

* correct end copy

* correct more

* up

* correct unispeech sat

* finish

* finalize

* finish

* up

700a748f

29 Oct, 2021 1 commit

Adding `batch_size` support for (almost) all pipelines (#13724) · be236361

Nicolas Patry authored Oct 29, 2021



* Tentative enabling of `batch_size` for pipelines.

* Add systematic test for pipeline batching.

* Enabling batch_size on almost all pipelines

- Not `zero-shot` (it's already passing stuff as batched so trickier)
- Not `QA` (preprocess uses squad features, we need to switch to real
tensors at this boundary.

* Adding `min_length_for_response` for conversational.

* Making CTC, speech mappings avaiable regardless of framework.

* Attempt at fixing automatic tests (ffmpeg not enabled for fast tests)

* Removing ffmpeg dependency in tests.

* Small fixes.

* Slight cleanup.

* Adding docs

and adressing comments.

* Quality.

* Update docs/source/main_classes/pipelines.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/question_answering.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Improving docs.

* Update docs/source/main_classes/pipelines.rst
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

* N -> oberved_batch_size

softmax trick.

* Follow `padding_side`.

* Supporting image pipeline batching (and padding).

* Rename `unbatch` -> `loader_batch`.

* unbatch_size forgot.

* Custom padding for offset mappings.

* Attempt to remove librosa.

* Adding require_audio.

* torchaudio.

* Back to using datasets librosa.

* Adding help to set a pad_token on the tokenizer.

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Quality.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

be236361

14 Oct, 2021 1 commit
- up (#14008) · 7fb2a8b3
  Patrick von Platen authored Oct 14, 2021
  
  7fb2a8b3
21 Sep, 2021 2 commits

[SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy (#13650) · 1417978c

Anton Lozhkov authored Sep 21, 2021

* Test np padding

* Pass feature extraction tests

* Update type hints

* Fix flaky integration tests

* Try a more stable waveform

* Add to_numpy jax support

* int32 attention masks

* Refactor normalization tests

1417978c

Add Speech AutoModels (#13655) · 48fa42e5
Patrick von Platen authored Sep 21, 2021
```
* upload

* correct

* correct

* correct

* finish

* up

* up

* up again
```
48fa42e5

01 Sep, 2021 1 commit

Add SpeechEncoderDecoder & Speech2Text2 (#13186) · 0b8c84e1

Patrick von Platen authored Sep 01, 2021



* fix_torch_device_generate_test

* remove @

* up

* correct some bugs

* correct model

* finish speech2text extension

* up

* up

* up

* up

* Update utils/custom_init_isort.py

* up

* up

* update with tokenizer

* correct old tok

* correct old tok

* fix bug

* up

* up

* add more tests

* up

* fix docs

* up

* fix some more tests

* add better config

* correct some more things
"

* fix tests

* improve docs

* Apply suggestions from code review

* Apply suggestions from code review

* final fixes

* finalize

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* apply suggestions Lysandre and Sylvain

* apply nicos suggestions

* upload everything

* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

0b8c84e1

07 Jul, 2021 1 commit

Adding support for `pipeline("automatic-speech-recognition")`. (#11525) · ebc69afc

Nicolas Patry authored Jul 07, 2021

* Adding support for `pipeline("automatic-speech-recognition")`.

- Ugly `"config"` choice for AutoModel. It would be great to have the
possibility to have something like `AutoModelFor` that would implement
the same logic (Load the config, check Architectures and load the first
one)

* Remove `model_id` was not needed in the end.

* Rebased !

* Remove old code.

* Rename `nlp`.

ebc69afc

30 Apr, 2021 1 commit

Adding `AutomaticSpeechRecognitionPipeline`. (#11337) · db9dd09c

Nicolas Patry authored Apr 30, 2021



* Adding `AutomaticSpeechRecognitionPipeline`.

- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
  sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.

Future work:

- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.

* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding docs + main import + type checking + LICENSE.

* Doc style !.

* Fixing TYPE_HINT.

* Specifying waveform shape in the docs.

* Adding asserts + specify in the documentation the shape of the input
np.ndarray.

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Adding require to tests + move the `feature_extractor` doc.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

db9dd09c