Commits · d62875cc67f0ecae75c6edeffa1c74178308e034 · OpenDAS / Torchaudio

15 May, 2022 1 commit

[codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402214

fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c

d62875cc

12 May, 2022 1 commit

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

11 May, 2022 1 commit

Refactor LibriSpeech Conformer RNN-T recipe (#2366) · 69467ea5

hwangjeff authored May 10, 2022

Summary:
Modifies the example LibriSpeech Conformer RNN-T recipe as follows:
- Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module).
- Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms).
- Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments).
- Modifies training script to allow for specifying path model checkpoint to restart training from.

Pull Request resolved: https://github.com/pytorch/audio/pull/2366

Reviewed By: mthrok

Differential Revision: D36305028

Pulled By: hwangjeff

fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba

69467ea5

21 Apr, 2022 1 commit

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

13 Apr, 2022 1 commit

Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b

hwangjeff authored Apr 13, 2022

Summary:
Adds Conformer RNN-T LibriSpeech training recipe to examples directory.

Produces 30M-parameter model that achieves the following WER:

|                     |          WER |
|:-------------------:|-------------:|
| test-clean          |       0.0310 |
| test-other          |       0.0805 |
| dev-clean           |       0.0314 |
| dev-other           |       0.0827 |

Pull Request resolved: https://github.com/pytorch/audio/pull/2329

Reviewed By: xiaohui-zhang

Differential Revision: D35578727

Pulled By: hwangjeff

fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d

c262758b

04 Apr, 2022 2 commits

Use pretrained LM API for decoder example (#2317) · 66185e00

Caroline Chen authored Apr 04, 2022

Summary:
update example ASR pipeline to use the recently added pretrained LM API for decoding

Pull Request resolved: https://github.com/pytorch/audio/pull/2317

Reviewed By: mthrok

Differential Revision: D35361354

Pulled By: carolineechen

fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46

66185e00

Fix arguments in CTC decoding script (#2315) · 4a749e2d

Zhaoheng Ni authored Apr 04, 2022

Summary:
Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser.

Pull Request resolved: https://github.com/pytorch/audio/pull/2315

Reviewed By: carolineechen

Differential Revision: D35357678

Pulled By: nateanl

fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92

4a749e2d

17 Feb, 2022 1 commit

Add unit tests for PyTorch Lightning modules of emformer_rnnt recipes (#2240) · b5d77b15

Zhaoheng Ni authored Feb 17, 2022

Summary:
- Refactor the current `LibriSpeechRNNTModule`'s unit test.
- Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule`
- Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2240

Reviewed By: mthrok

Differential Revision: D34285195

Pulled By: nateanl

fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f

b5d77b15

16 Feb, 2022 6 commits

Add EMFORMER_RNNT_BASE_MUSTC into pipeline demo script (#2248) · 38569ef0

Zhaoheng Ni authored Feb 16, 2022

Summary:
This PR adds ``EMFORMER_RNNT_BASE_MUSTC`` support in `pipeline_demo.py`. The bundle is trained on MuST-C release 2.0 dataset. The model preserves the casing and punctuations in the transcript.

Here is a screen recording of how it works in streaming and non-streaming modes:

https://user-images.githubusercontent.com/8653221/154356521-fe84bdc1-fb0c-41bd-8729-9edbb3224a07.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2248

Reviewed By: hwangjeff

Differential Revision: D34282598

Pulled By: nateanl

fbshipit-source-id: 42ed7e2623031dfebd176ef0c6bfd70da3c897d4

38569ef0

Refactor pipeline_demo script in emformer_rnnt recipes (#2239) · fdea0a7c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Use dictionary to select the `RNNTBundle` and the corresponding dataset.
- Use the dictionary's keys as choices in ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2239

Reviewed By: mthrok

Differential Revision: D34267070

Pulled By: nateanl

fbshipit-source-id: 99c7942d5c7c1518694e1ae02a55a7decd87c220

fdea0a7c

Refactor eval and pipeline_demo scripts in emformer_rnnt (#2238) · e3b40d1c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Add docstring to `eval.py` and `pipeline_demo.py` under `emformer_rnnt` directory.
- Refactor logger and ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2238

Reviewed By: mthrok

Differential Revision: D34267059

Pulled By: nateanl

fbshipit-source-id: 4b8d3d183ee7bc0ad71ce305cab87bfa90208b2e

e3b40d1c

Fix lm used for ctc decoder example (#2235) · c2decba4

Caroline Chen authored Feb 16, 2022

Summary:
LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme

Pull Request resolved: https://github.com/pytorch/audio/pull/2235

Reviewed By: nateanl

Differential Revision: D34273042

Pulled By: carolineechen

fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812

c2decba4

Add shebang lines to scripts in emformer_rnnt recipes (#2237) · aac83fe5

Zhaoheng Ni authored Feb 16, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2237

Reviewed By: mthrok

Differential Revision: D34267000

Pulled By: nateanl

fbshipit-source-id: 4c264aea6cf3fba5d8728d5fe60f9f471815852d

aac83fe5

Refactor ArgumentParser arguments in emformer_rnnt recipes (#2236) · 81f56f64

Zhaoheng Ni authored Feb 16, 2022

Summary:
Replace underscore with dash in ArgumentParser's arguments.

Pull Request resolved: https://github.com/pytorch/audio/pull/2236

Reviewed By: mthrok

Differential Revision: D34266977

Pulled By: nateanl

fbshipit-source-id: ceacac12c04016a8dbf2a1a7d6bbcf65d4d53d21

81f56f64

11 Feb, 2022 5 commits

Add training recipe for Emformer RNNT trained on MuST-C release v2.0 dataset (#2219) · 4d0095a5

nateanl authored Feb 11, 2022

Summary:
- Add a MUSTC dataset under examples
- Add a lightning module for MuST-C dataset
- Refactor `train.py`, `eval.py`, and `global_stats.py` scripts

Pull Request resolved: https://github.com/pytorch/audio/pull/2219

Reviewed By: hwangjeff

Differential Revision: D34180466

Pulled By: nateanl

fbshipit-source-id: 9fc74ce7527da1a81dd0738e124428f9d516d164

4d0095a5

Add SentencePiece model training script for LibriSpeech Emformer RNN-T (#2218) · 825a5976

hwangjeff authored Feb 11, 2022

Summary:
Adds SentencePiece model training script for LibriSpeech Emformer RNN-T example recipe; updates readme with references.

Pull Request resolved: https://github.com/pytorch/audio/pull/2218

Reviewed By: nateanl

Differential Revision: D34177295

Pulled By: hwangjeff

fbshipit-source-id: 9f32805af792fb8c6f834f2812e20104177a6c43

825a5976

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles (#2203) · 16d02a9e

nateanl authored Feb 11, 2022

Summary:
We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming).

We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2203

Reviewed By: carolineechen

Differential Revision: D34006388

Pulled By: nateanl

fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632

16d02a9e

Add unit tests for Emformer RNN-T LibriSpeech recipe (#2216) · bbdbd582

hwangjeff authored Feb 11, 2022

Summary:
Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/2216

Reviewed By: nateanl

Differential Revision: D34171480

Pulled By: hwangjeff

fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e

bbdbd582

Fix bugs from Emformer RNN-T recipes merge (#2217) · 2b991225

hwangjeff authored Feb 11, 2022

Summary:
- Removes 100-batch truncation in TEDLIUM3 recipe.
- Reinstates `train_spm.py` for TEDLIUM3.

Pull Request resolved: https://github.com/pytorch/audio/pull/2217

Reviewed By: nateanl

Differential Revision: D34171525

Pulled By: hwangjeff

fbshipit-source-id: 54698e5e1b094c26c28eec9b8b1722223077876c

2b991225

10 Feb, 2022 1 commit

Refactor Emformer RNNT recipes (#2212) · 33bcb7b0

hwangjeff authored Feb 09, 2022

Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2212

Reviewed By: mthrok

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 29c6e27195d5998f76d67c35b718110e73529456

33bcb7b0

04 Feb, 2022 1 commit

Add RNNTBundle with weights pre-trained on tedlium3 dataset (#2177) · a1dc9e0a

Zhaoheng Ni authored Feb 04, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177

Reviewed By: hwangjeff

Differential Revision: D33893052

Pulled By: nateanl

fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7

a1dc9e0a

03 Feb, 2022 2 commits

Fix TimeMasking argument in TED-LIUM Emformer recipe (#2199) · b986e9ef

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2199

Reviewed By: hwangjeff

Differential Revision: D33979923

Pulled By: nateanl

fbshipit-source-id: 566ba1944dd3511fee740ac17fea2dcb0e5810fa

b986e9ef

Add training recipe of Emformer trained on TED-LIUM release 3 dataset (#2195) · 8f68b3f0

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2195

Reviewed By: hwangjeff

Differential Revision: D33950179

Pulled By: nateanl

fbshipit-source-id: 5fcfa4f433fffdcbb3b8e97f7c90fb8f723a30a2

8f68b3f0

02 Feb, 2022 1 commit

Revise RNN-T pipeline streaming decoding logic (#2192) · 612de66b

hwangjeff authored Feb 01, 2022

Summary:
Rather than apply SentencePiece's `decode` to directly convert each hypothesis's token id sequence to an output string, we convert each token id sequence to word pieces and then manually join the word pieces ourselves. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8345689/152093668-11fb775a-bf7b-4b1d-9516-9f8d5a9b6683.mov

Versus the previous behavior visualized in https://github.com/pytorch/audio/issues/2093, the scheme here properly constructs words comprising multiple pieces.

Pull Request resolved: https://github.com/pytorch/audio/pull/2192

Reviewed By: mthrok

Differential Revision: D33936622

Pulled By: hwangjeff

fbshipit-source-id: e550980c7d4cac9e982315508f793a6b816752e9

612de66b

01 Feb, 2022 3 commits

Update stale prototype references (#2189) · 1a0935c6

hwangjeff authored Feb 01, 2022

Summary:
Missed a couple of spots in https://github.com/pytorch/audio/issues/2187.

Pull Request resolved: https://github.com/pytorch/audio/pull/2189

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D33926342

Pulled By: hwangjeff

fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06

1a0935c6

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c

Add global stats script and new json for LibriSpeech RNN-T training recipe (#2183) · 157cb2a2

hwangjeff authored Jan 31, 2022

Summary:
Adds script for generating global feature statistics along with new feature statistics json for LibriSpeech RNN-T training recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/2183

Reviewed By: mthrok

Differential Revision: D33902377

Pulled By: hwangjeff

fbshipit-source-id: ec347a685ae67aefc485084aac6ed2efd653250f

157cb2a2

27 Jan, 2022 2 commits

Add no lm support for CTC decoder (#2174) · 4c3fa875

Caroline Chen authored Jan 27, 2022

Summary:
Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)

Pull Request resolved: https://github.com/pytorch/audio/pull/2174

Reviewed By: hwangjeff, nateanl

Differential Revision: D33798674

Pulled By: carolineechen

fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde

4c3fa875

Refactor RNNT factory function to support num_symbols argument (#2178) · 2cb87c6b

Zhaoheng Ni authored Jan 26, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178

Reviewed By: mthrok

Differential Revision: D33797649

Pulled By: nateanl

fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603

2cb87c6b

18 Jan, 2022 1 commit

Add more CTC decoding WERs (#2161) · 7a83f84f

Caroline Chen authored Jan 18, 2022

Summary:
additionally add decoding results for wav2vec2 large and also on the test-clean dataset

Pull Request resolved: https://github.com/pytorch/audio/pull/2161

Reviewed By: mthrok

Differential Revision: D33644670

Pulled By: carolineechen

fbshipit-source-id: a219a15af46f82a6bd90169bb3001dbad8f0a96e

7a83f84f

05 Jan, 2022 1 commit

Add librispeech inference script (#2130) · 5c4c61b2

Caroline Chen authored Jan 04, 2022

Summary:
add script for running CTC beam search decoder on librispeech dataset with torchaudio pretrained wav2vec2 models

Pull Request resolved: https://github.com/pytorch/audio/pull/2130

Reviewed By: mthrok

Differential Revision: D33419436

Pulled By: carolineechen

fbshipit-source-id: 0a0d00f4c17ecdbb497c9eda78673aa939d73c57

5c4c61b2

30 Dec, 2021 1 commit

Enforce lint checks and fix/mute lint errors (#2116) · 8ed14782

Joao Gomes authored Dec 30, 2021

Summary:
cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/2116

Reviewed By: mthrok

Differential Revision: D33368453

Pulled By: jdsgomes

fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c

8ed14782

29 Dec, 2021 3 commits

Reorganize RNN-T components in prototype module (#2110) · 67cdf882

hwangjeff authored Dec 29, 2021

Summary:
Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`.

Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2110

Reviewed By: carolineechen, mthrok

Differential Revision: D33354116

Pulled By: hwangjeff

fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb

67cdf882

[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 697f92f1
CodemodService Bot authored Dec 29, 2021
```
Reviewed By: zertosh

Differential Revision: D33347867

fbshipit-source-id: 7672f65392e363c0359de2d86e745782a09cf9dc
```
697f92f1

Add pretrained Emformer RNN-T streaming ASR inference pipeline (#2093) · 72a98a86

hwangjeff authored Dec 28, 2021

Summary:
Adds pretrained Emformer RNN-T inference pipeline that's capable of performing streaming and non-streaming ASR.

Includes demo script that uses pipeline to alternately perform streaming and non-streaming ASR on LibriSpeech test samples (video below).

https://user-images.githubusercontent.com/8345689/147590753-d5126557-d575-4551-8dfe-5977276cb4ad.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2093

Reviewed By: mthrok

Differential Revision: D33340776

Pulled By: hwangjeff

fbshipit-source-id: fbb3b1d471b4e9f1b93fa9dea9c464154537a8ac

72a98a86

23 Dec, 2021 1 commit

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

03 Dec, 2021 1 commit

Add training recipe for RNN-T Emformer ASR model (#2052) · 7ac525e7

hwangjeff authored Dec 03, 2021

Summary:
Add training recipe for RNN-T Emformer ASR model to examples directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2052

Reviewed By: nateanl

Differential Revision: D32814096

Pulled By: hwangjeff

fbshipit-source-id: a5153044efc16cb39f0e6413369a6791637af76a

7ac525e7