Commits · 365313edd2658e5d048d97fad04c7729deb9815b · OpenDAS / Torchaudio

"vscode:/vscode.git/clone" did not exist on "ade312c4e68adfce2e30e1e19861a76f4c80f1a1"

26 Feb, 2022 1 commit

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

24 Feb, 2022 1 commit

Fix style check (#2258) · 20488dd8

Caroline Chen authored Feb 24, 2022

Summary:
fix a style check failure from internal diff

Pull Request resolved: https://github.com/pytorch/audio/pull/2258

Reviewed By: nateanl

Differential Revision: D34459526

Pulled By: carolineechen

fbshipit-source-id: d0e6782b5689c3bf63214a4ec6a75dd757678e0d

20488dd8

23 Feb, 2022 1 commit

[lightning] Replace deprecated DDP accelerator with ddp_find_unused_parameters_false · 1fb10077

Binh Tang authored Feb 23, 2022

Summary: We proactively remove references to the deprecated DDP accelerator to prepare for the breaking changes following the release of PyTorch Lighting 1.6 (see T112240890).

Differential Revision: D34295318

fbshipit-source-id: 7b2245ca9c7c2900f510722b33af8d8eeda49919

1fb10077

17 Feb, 2022 2 commits

Add unit tests for PyTorch Lightning modules of emformer_rnnt recipes (#2240) · b5d77b15

Zhaoheng Ni authored Feb 17, 2022

Summary:
- Refactor the current `LibriSpeechRNNTModule`'s unit test.
- Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule`
- Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2240

Reviewed By: mthrok

Differential Revision: D34285195

Pulled By: nateanl

fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f

b5d77b15

Update online ASR tutorial (#2226) · c5c4bbfd

moto authored Feb 16, 2022

Summary:
https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

1. Add figure to explain the caching
2. Fix the initialization of stream iterator

Pull Request resolved: https://github.com/pytorch/audio/pull/2226

Reviewed By: carolineechen

Differential Revision: D34265971

Pulled By: mthrok

fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4

c5c4bbfd

16 Feb, 2022 6 commits

Add EMFORMER_RNNT_BASE_MUSTC into pipeline demo script (#2248) · 38569ef0

Zhaoheng Ni authored Feb 16, 2022

Summary:
This PR adds ``EMFORMER_RNNT_BASE_MUSTC`` support in `pipeline_demo.py`. The bundle is trained on MuST-C release 2.0 dataset. The model preserves the casing and punctuations in the transcript.

Here is a screen recording of how it works in streaming and non-streaming modes:

https://user-images.githubusercontent.com/8653221/154356521-fe84bdc1-fb0c-41bd-8729-9edbb3224a07.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2248

Reviewed By: hwangjeff

Differential Revision: D34282598

Pulled By: nateanl

fbshipit-source-id: 42ed7e2623031dfebd176ef0c6bfd70da3c897d4

38569ef0

Refactor pipeline_demo script in emformer_rnnt recipes (#2239) · fdea0a7c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Use dictionary to select the `RNNTBundle` and the corresponding dataset.
- Use the dictionary's keys as choices in ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2239

Reviewed By: mthrok

Differential Revision: D34267070

Pulled By: nateanl

fbshipit-source-id: 99c7942d5c7c1518694e1ae02a55a7decd87c220

fdea0a7c

Refactor eval and pipeline_demo scripts in emformer_rnnt (#2238) · e3b40d1c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Add docstring to `eval.py` and `pipeline_demo.py` under `emformer_rnnt` directory.
- Refactor logger and ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2238

Reviewed By: mthrok

Differential Revision: D34267059

Pulled By: nateanl

fbshipit-source-id: 4b8d3d183ee7bc0ad71ce305cab87bfa90208b2e

e3b40d1c

Fix lm used for ctc decoder example (#2235) · c2decba4

Caroline Chen authored Feb 16, 2022

Summary:
LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme

Pull Request resolved: https://github.com/pytorch/audio/pull/2235

Reviewed By: nateanl

Differential Revision: D34273042

Pulled By: carolineechen

fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812

c2decba4

Add shebang lines to scripts in emformer_rnnt recipes (#2237) · aac83fe5

Zhaoheng Ni authored Feb 16, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2237

Reviewed By: mthrok

Differential Revision: D34267000

Pulled By: nateanl

fbshipit-source-id: 4c264aea6cf3fba5d8728d5fe60f9f471815852d

aac83fe5

Refactor ArgumentParser arguments in emformer_rnnt recipes (#2236) · 81f56f64

Zhaoheng Ni authored Feb 16, 2022

Summary:
Replace underscore with dash in ArgumentParser's arguments.

Pull Request resolved: https://github.com/pytorch/audio/pull/2236

Reviewed By: mthrok

Differential Revision: D34266977

Pulled By: nateanl

fbshipit-source-id: ceacac12c04016a8dbf2a1a7d6bbcf65d4d53d21

81f56f64

15 Feb, 2022 1 commit

Update context building to not delay the inference (#2213) · 8e3c6144

moto authored Feb 14, 2022

Summary:
Updating the context cacher so that fetched audio chunk is used for inference immediately.

https://github.com/pytorch/audio/pull/2202#discussion_r802838174

Pull Request resolved: https://github.com/pytorch/audio/pull/2213

Reviewed By: hwangjeff

Differential Revision: D34235230

Pulled By: mthrok

fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e

8e3c6144

11 Feb, 2022 5 commits

Add training recipe for Emformer RNNT trained on MuST-C release v2.0 dataset (#2219) · 4d0095a5

nateanl authored Feb 11, 2022

Summary:
- Add a MUSTC dataset under examples
- Add a lightning module for MuST-C dataset
- Refactor `train.py`, `eval.py`, and `global_stats.py` scripts

Pull Request resolved: https://github.com/pytorch/audio/pull/2219

Reviewed By: hwangjeff

Differential Revision: D34180466

Pulled By: nateanl

fbshipit-source-id: 9fc74ce7527da1a81dd0738e124428f9d516d164

4d0095a5

Add SentencePiece model training script for LibriSpeech Emformer RNN-T (#2218) · 825a5976

hwangjeff authored Feb 11, 2022

Summary:
Adds SentencePiece model training script for LibriSpeech Emformer RNN-T example recipe; updates readme with references.

Pull Request resolved: https://github.com/pytorch/audio/pull/2218

Reviewed By: nateanl

Differential Revision: D34177295

Pulled By: hwangjeff

fbshipit-source-id: 9f32805af792fb8c6f834f2812e20104177a6c43

825a5976

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles (#2203) · 16d02a9e

nateanl authored Feb 11, 2022

Summary:
We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming).

We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2203

Reviewed By: carolineechen

Differential Revision: D34006388

Pulled By: nateanl

fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632

16d02a9e

Add unit tests for Emformer RNN-T LibriSpeech recipe (#2216) · bbdbd582

hwangjeff authored Feb 11, 2022

Summary:
Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/2216

Reviewed By: nateanl

Differential Revision: D34171480

Pulled By: hwangjeff

fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e

bbdbd582

Fix bugs from Emformer RNN-T recipes merge (#2217) · 2b991225

hwangjeff authored Feb 11, 2022

Summary:
- Removes 100-batch truncation in TEDLIUM3 recipe.
- Reinstates `train_spm.py` for TEDLIUM3.

Pull Request resolved: https://github.com/pytorch/audio/pull/2217

Reviewed By: nateanl

Differential Revision: D34171525

Pulled By: hwangjeff

fbshipit-source-id: 54698e5e1b094c26c28eec9b8b1722223077876c

2b991225

10 Feb, 2022 1 commit

Refactor Emformer RNNT recipes (#2212) · 33bcb7b0

hwangjeff authored Feb 09, 2022

Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2212

Reviewed By: mthrok

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 29c6e27195d5998f76d67c35b718110e73529456

33bcb7b0

09 Feb, 2022 1 commit

Fix librosa calls (#2208) · e5d567c9

hwangjeff authored Feb 08, 2022

Summary:
Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2208

Reviewed By: mthrok

Differential Revision: D34099793

Pulled By: hwangjeff

fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc

e5d567c9

04 Feb, 2022 1 commit

Add RNNTBundle with weights pre-trained on tedlium3 dataset (#2177) · a1dc9e0a

Zhaoheng Ni authored Feb 04, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177

Reviewed By: hwangjeff

Differential Revision: D33893052

Pulled By: nateanl

fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7

a1dc9e0a

03 Feb, 2022 3 commits

Fix TimeMasking argument in TED-LIUM Emformer recipe (#2199) · b986e9ef

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2199

Reviewed By: hwangjeff

Differential Revision: D33979923

Pulled By: nateanl

fbshipit-source-id: 566ba1944dd3511fee740ac17fea2dcb0e5810fa

b986e9ef

Add training recipe of Emformer trained on TED-LIUM release 3 dataset (#2195) · 8f68b3f0

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2195

Reviewed By: hwangjeff

Differential Revision: D33950179

Pulled By: nateanl

fbshipit-source-id: 5fcfa4f433fffdcbb3b8e97f7c90fb8f723a30a2

8f68b3f0

Add tutorials with streaming API (#2193) · c00f65da

moto authored Feb 03, 2022

Summary:
* tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html
* tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2193

Reviewed By: hwangjeff

Differential Revision: D33971312

Pulled By: mthrok

fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f

c00f65da

02 Feb, 2022 2 commits

Add timesteps visualization to CTC decoder tutorial (#2188) · 94f4ef0f

Caroline Chen authored Feb 02, 2022

Summary:
resulting tutorial: https://538358-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html
- add visualization for timestep alignments
- modify section organization for decoder construction

Pull Request resolved: https://github.com/pytorch/audio/pull/2188

Reviewed By: mthrok

Differential Revision: D33954937

Pulled By: carolineechen

fbshipit-source-id: 8f397229d74c994b8793a30623e1de4c19ebd401

94f4ef0f

Revise RNN-T pipeline streaming decoding logic (#2192) · 612de66b

hwangjeff authored Feb 01, 2022

Summary:
Rather than apply SentencePiece's `decode` to directly convert each hypothesis's token id sequence to an output string, we convert each token id sequence to word pieces and then manually join the word pieces ourselves. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8345689/152093668-11fb775a-bf7b-4b1d-9516-9f8d5a9b6683.mov

Versus the previous behavior visualized in https://github.com/pytorch/audio/issues/2093, the scheme here properly constructs words comprising multiple pieces.

Pull Request resolved: https://github.com/pytorch/audio/pull/2192

Reviewed By: mthrok

Differential Revision: D33936622

Pulled By: hwangjeff

fbshipit-source-id: e550980c7d4cac9e982315508f793a6b816752e9

612de66b

01 Feb, 2022 3 commits

Update stale prototype references (#2189) · 1a0935c6

hwangjeff authored Feb 01, 2022

Summary:
Missed a couple of spots in https://github.com/pytorch/audio/issues/2187.

Pull Request resolved: https://github.com/pytorch/audio/pull/2189

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D33926342

Pulled By: hwangjeff

fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06

1a0935c6

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c

Add global stats script and new json for LibriSpeech RNN-T training recipe (#2183) · 157cb2a2

hwangjeff authored Jan 31, 2022

Summary:
Adds script for generating global feature statistics along with new feature statistics json for LibriSpeech RNN-T training recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/2183

Reviewed By: mthrok

Differential Revision: D33902377

Pulled By: hwangjeff

fbshipit-source-id: ec347a685ae67aefc485084aac6ed2efd653250f

157cb2a2

31 Jan, 2022 1 commit

Use download.pytorch.org for asset URL (#2182) · f654b2c9

moto authored Jan 31, 2022

Summary:
Changing the URL of tutorial assets to `download.pytorch.org` which is more appropriate for user facing materials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2182

Reviewed By: nateanl

Differential Revision: D33887839

Pulled By: mthrok

fbshipit-source-id: 30569672e8caf30aae5476036dfdadc8ebd436bf

f654b2c9

27 Jan, 2022 2 commits

Add no lm support for CTC decoder (#2174) · 4c3fa875

Caroline Chen authored Jan 27, 2022

Summary:
Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)

Pull Request resolved: https://github.com/pytorch/audio/pull/2174

Reviewed By: hwangjeff, nateanl

Differential Revision: D33798674

Pulled By: carolineechen

fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde

4c3fa875

Refactor RNNT factory function to support num_symbols argument (#2178) · 2cb87c6b

Zhaoheng Ni authored Jan 26, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178

Reviewed By: mthrok

Differential Revision: D33797649

Pulled By: nateanl

fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603

2cb87c6b

26 Jan, 2022 1 commit

Add beam search description to tutorial (#2173) · bcf04839

Caroline Chen authored Jan 26, 2022

Summary:
following up on https://github.com/pytorch/audio/pull/2141#discussion_r779055465, adding brief beam search description and linking to resources

Pull Request resolved: https://github.com/pytorch/audio/pull/2173

Reviewed By: nateanl

Differential Revision: D33791731

Pulled By: carolineechen

fbshipit-source-id: 603fdd177c9a3c8276a4692fb7bb385bd01b9bfb

bcf04839

22 Jan, 2022 1 commit

[Example] Refactor BucketizeBatchSampler and HuBERTDataset (#2150) · 576b02b1

Zhaoheng Ni authored Jan 22, 2022

Summary:
- Rename `BucketizeSampler` to `BucketizeBatchSampler`
- Fix bugs in `BucketizeBatchSampler`
- Adjust HuBERTDataset based on the latest `BucketizeBatchSampler`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2150

Reviewed By: mthrok

Differential Revision: D33689963

Pulled By: nateanl

fbshipit-source-id: 203764e9af5b7577ba08ebaa30ba5da3b67fb7e7

576b02b1

20 Jan, 2022 1 commit

Remove multiprocessing from audio dataset tutorial (#2163) · dc4f76fd

yonMaor authored Jan 20, 2022

Summary:
Closes https://github.com/pytorch/audio/issues/2162

Pull Request resolved: https://github.com/pytorch/audio/pull/2163

Reviewed By: nateanl

Differential Revision: D33666354

Pulled By: mthrok

fbshipit-source-id: 3e7a963b9ac85046317df8d5dab91af363e5668b

dc4f76fd

18 Jan, 2022 1 commit

Add more CTC decoding WERs (#2161) · 7a83f84f

Caroline Chen authored Jan 18, 2022

Summary:
additionally add decoding results for wav2vec2 large and also on the test-clean dataset

Pull Request resolved: https://github.com/pytorch/audio/pull/2161

Reviewed By: mthrok

Differential Revision: D33644670

Pulled By: carolineechen

fbshipit-source-id: a219a15af46f82a6bd90169bb3001dbad8f0a96e

7a83f84f

08 Jan, 2022 1 commit

[PyTorchLightning/pytorch-lightning] Add deprecation path for renamed training... · 7b6b2d00

Binh Tang authored Jan 08, 2022

[PyTorchLightning/pytorch-lightning] Add deprecation path for renamed training type plugins (#11227)

Summary:
### New commit log messages
  4eede7c30 Add deprecation path for renamed training type plugins (#11227)

Reviewed By: edward-io, daniellepintz

Differential Revision: D33409991

fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0

7b6b2d00

07 Jan, 2022 1 commit

Add parameter usage to CTC inference tutorial (#2141) · ffbfe74a

Caroline Chen authored Jan 06, 2022

Summary:
Add explanation and demonstration of different beam search decoder parameters.
Additionally use a better sample audio file and load in with token list instead of tokens file.

Pull Request resolved: https://github.com/pytorch/audio/pull/2141

Reviewed By: mthrok

Differential Revision: D33463230

Pulled By: carolineechen

fbshipit-source-id: d3dd6452b03d4fc2e095d778189c66f7161e4c68

ffbfe74a

06 Jan, 2022 2 commits

[Example] abstracts BucketizeSampler to be usable outside of HuBERT example. (#2147) · 8c16529b

Elijah Rippeth authored Jan 06, 2022

Summary:
This PR:

- Replaces the `data_source` with `lengths`
- Adds a `shuffle` argument to decide whether to shuffle the samples in the buckets.
- Add `max_len` and `min_len` to filter out samples that are > max_len or < min_len.

cc nateanl

Pull Request resolved: https://github.com/pytorch/audio/pull/2147

Reviewed By: carolineechen

Differential Revision: D33454369

Pulled By: nateanl

fbshipit-source-id: 3835169ec7f808f8dd9650e7f183f79091efe886

8c16529b

[PyTorchLightning/pytorch-lightning] Rename `DDPPlugin` to `DDPStrategy` (#11142) · e4f508a3

Binh Tang authored Jan 05, 2022

Summary:
### New commit log messages
  b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142)

Reviewed By: jjenniferdai

Differential Revision: D33259306

fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0

e4f508a3

05 Jan, 2022 1 commit

Add librispeech inference script (#2130) · 5c4c61b2

Caroline Chen authored Jan 04, 2022

Summary:
add script for running CTC beam search decoder on librispeech dataset with torchaudio pretrained wav2vec2 models

Pull Request resolved: https://github.com/pytorch/audio/pull/2130

Reviewed By: mthrok

Differential Revision: D33419436

Pulled By: carolineechen

fbshipit-source-id: 0a0d00f4c17ecdbb497c9eda78673aa939d73c57

5c4c61b2