Commits · 27aa52fb47503c70216a47e038db990fe2a1fee6 · OpenDAS / Torchaudio

06 Jun, 2023 1 commit

moto authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410

Differential Revision: D46496786

Pulled By: mthrok

fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059

27aa52fb

26 May, 2023 1 commit

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa

25 May, 2023 1 commit

Add LRS3 AV-ASR recipe (#3278) · c6624fa6

Pingchuan Ma authored May 25, 2023

Summary:
This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes.

CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff

Pull Request resolved: https://github.com/pytorch/audio/pull/3278

Reviewed By: nateanl

Differential Revision: D46121550

Pulled By: mpc001

fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6

c6624fa6

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

23 Feb, 2023 1 commit

Add TCPGen context-biasing Conformer RNN-T (#2890) · 1ed330b5

G. Sun authored Feb 23, 2023

Summary:
This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing.

An example for Librispeech can be found in audio/examples/asr/librispeech_biasing.

Maintainer's note (mthrok):
It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple
could cause some issue without running the code, so the code is not changed, though the annotation uses tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2890

Reviewed By: nateanl

Differential Revision: D43171447

Pulled By: mthrok

fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e

1ed330b5

16 Feb, 2023 1 commit

Update WER results for CTC n-gram decoding (#3070) · 11bdafc3

Zhaoheng Ni authored Feb 16, 2023

Summary:
In https://github.com/pytorch/audio/issues/2873, layer normalization is applied to waveforms for SSL models trained on large scale datasets. The word error rate is significantly reduced after the change. The PR updates the results for the affected models.

Without the change in https://github.com/pytorch/audio/issues/2873, here is the WER result table:
|                                                                                            Model | dev-clean | dev-other | test-clean | test-other |
|:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|
| [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) |        10.59|        15.62|        9.58|        16.33|
| [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) |        2.80|        6.01|        2.82|        6.34|
| [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) |        2.36|        4.43|        2.41|        4.96|
| [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) |        1.85|        3.46|        2.09|        3.89|
| [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) |         2.21|        3.40|        2.26|        4.05|

After applying layer normalization, here is the updated result:
|                                                                                            Model | dev-clean | dev-other | test-clean | test-other |
|:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|
| [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) |        6.77|        10.03|        6.87|        10.51|
| [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) |        2.19|        4.55|        2.32|        4.64|
| [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) |        1.78|        3.51|        2.03|        3.68|
| [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) |        1.77|        3.32|        2.03|        3.68|
| [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) |         1.73|        2.72|        1.90|        3.16|

Pull Request resolved: https://github.com/pytorch/audio/pull/3070

Reviewed By: mthrok

Differential Revision: D43365313

Pulled By: nateanl

fbshipit-source-id: 34a60ad2e5eb1299da64ef88ff0208ec8ec76e91

11bdafc3

19 Jan, 2023 1 commit

Simplify train step in Conformer RNN-T LibriSpeech recipe (#2981) · c6a52355

hwangjeff authored Jan 19, 2023

Summary:
In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead.

Pull Request resolved: https://github.com/pytorch/audio/pull/2981

Reviewed By: mthrok

Differential Revision: D42507228

Pulled By: hwangjeff

fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309

c6a52355

17 Nov, 2022 1 commit

fix import bug in global_stats.py (#2858) · d912dcd7

vasiliy authored Nov 17, 2022

Summary:
This code was added by
https://github.com/pytorch/audio/commit/4d0095a528412cfec2a549204fc01d9ebb15df7a

Seems that the original code had a typo?

Pull Request resolved: https://github.com/pytorch/audio/pull/2858

Test Plan:
```
// the import of `mustc` now succeeds, previously crashed
python examples/asr/emformer_rnnt/global_stats.py --model-type librispeech --dataset-path /home/vasiliy/local/librispeech/
```

Reviewed By: carolineechen

Differential Revision: D41355663

Pulled By: nateanl

fbshipit-source-id: 92507e529d41b984b9dd400ad24a55d130372b7d

d912dcd7

09 Sep, 2022 1 commit

Fix LibriSpeech Conforner RNN-T eval script (#2666) · 3b194f68

hwangjeff authored Sep 09, 2022

Summary:
`ConformerRNNTModule`'s initializer now accepts a SentencePiece model rather than a path to a model as input. This PR corrects `eval.py` accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2666

Reviewed By: carolineechen

Differential Revision: D39386968

Pulled By: hwangjeff

fbshipit-source-id: 95a94dd898263d648650f7376c29810b1456d6c1

3b194f68

10 Aug, 2022 1 commit

Fix bug in Conformer RNN-T recipe (#2611) · cd4d6607

hwangjeff authored Aug 10, 2022

Summary:
https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script.

Pull Request resolved: https://github.com/pytorch/audio/pull/2611

Reviewed By: carolineechen

Differential Revision: D38578892

Pulled By: hwangjeff

fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac

cd4d6607

11 Jul, 2022 1 commit

Revise LibriSpeech Conformer RNN-T recipe (#2535) · a7d1b31c

Jeff Hwang authored Jul 11, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2535

Modifies LibriSpeech Conformer RNN-T example recipe to make the Lightning module and datamodule more generic and reusable.

Reviewed By: mthrok

Differential Revision: D36731576

fbshipit-source-id: 4643e86fac78f3c2bacc15f5d385bc7b10f410a2

a7d1b31c

23 Jun, 2022 1 commit

[AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · fee994ce

CodemodService FBSourceBlackLinterBot authored Jun 23, 2022

Summary:
Meta:
**If you take no action, this diff will be automatically accepted on 2022-06-23.**
(To remove yourself from auto-accept diffs and just let them all land, add yourself to [this Butterfly rule](https://www.internalfb.com/butterfly/rule/904302247110220))

Produced by `tools/arcanist/lint/codemods/black-fbsource`.

#nocancel

Rules run:
- CodemodTransformerSimpleShell

Config Oncall: [lint](https://our.intern.facebook.com/intern/oncall3/?shortname=lint)
CodemodConfig: [CodemodConfigFBSourceBlackLinter](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/fbsource_arc_f/CodemodConfigFBSourceBlackLinter.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/13510799586951394/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).

_____

## Questions / Comments / Feedback?

**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=13510799586951394).**

* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.

drop-conflicts

Reviewed By: adamjernst

Differential Revision: D37375235

fbshipit-source-id: 3d7eb39e5c0539a78d1412f37562dec90b0fc759

fee994ce

04 Jun, 2022 1 commit

Refactor LibriSpeech Lightning datamodule to accommodate different dataset implementations (#2437) · a63629b6

Jeff Hwang authored Jun 04, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2437

Refactors LibriSpeech Lightning datamodule to accommodate different dataset implementations.

Reviewed By: carolineechen, nateanl

Differential Revision: D36731577

fbshipit-source-id: 4ba91044311fa3f99a928aef6ef411316955f6b5

a63629b6

01 Jun, 2022 1 commit

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

15 May, 2022 1 commit

[codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402214

fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c

d62875cc

12 May, 2022 1 commit

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

11 May, 2022 1 commit

Refactor LibriSpeech Conformer RNN-T recipe (#2366) · 69467ea5

hwangjeff authored May 10, 2022

Summary:
Modifies the example LibriSpeech Conformer RNN-T recipe as follows:
- Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module).
- Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms).
- Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments).
- Modifies training script to allow for specifying path model checkpoint to restart training from.

Pull Request resolved: https://github.com/pytorch/audio/pull/2366

Reviewed By: mthrok

Differential Revision: D36305028

Pulled By: hwangjeff

fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba

69467ea5

21 Apr, 2022 1 commit

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

13 Apr, 2022 1 commit

Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b

hwangjeff authored Apr 13, 2022

Summary:
Adds Conformer RNN-T LibriSpeech training recipe to examples directory.

Produces 30M-parameter model that achieves the following WER:

|                     |          WER |
|:-------------------:|-------------:|
| test-clean          |       0.0310 |
| test-other          |       0.0805 |
| dev-clean           |       0.0314 |
| dev-other           |       0.0827 |

Pull Request resolved: https://github.com/pytorch/audio/pull/2329

Reviewed By: xiaohui-zhang

Differential Revision: D35578727

Pulled By: hwangjeff

fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d

c262758b

04 Apr, 2022 2 commits

Use pretrained LM API for decoder example (#2317) · 66185e00

Caroline Chen authored Apr 04, 2022

Summary:
update example ASR pipeline to use the recently added pretrained LM API for decoding

Pull Request resolved: https://github.com/pytorch/audio/pull/2317

Reviewed By: mthrok

Differential Revision: D35361354

Pulled By: carolineechen

fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46

66185e00

Fix arguments in CTC decoding script (#2315) · 4a749e2d

Zhaoheng Ni authored Apr 04, 2022

Summary:
Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser.

Pull Request resolved: https://github.com/pytorch/audio/pull/2315

Reviewed By: carolineechen

Differential Revision: D35357678

Pulled By: nateanl

fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92

4a749e2d

17 Feb, 2022 1 commit

Add unit tests for PyTorch Lightning modules of emformer_rnnt recipes (#2240) · b5d77b15

Zhaoheng Ni authored Feb 17, 2022

Summary:
- Refactor the current `LibriSpeechRNNTModule`'s unit test.
- Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule`
- Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2240

Reviewed By: mthrok

Differential Revision: D34285195

Pulled By: nateanl

fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f

b5d77b15

16 Feb, 2022 6 commits

Add EMFORMER_RNNT_BASE_MUSTC into pipeline demo script (#2248) · 38569ef0

Zhaoheng Ni authored Feb 16, 2022

Summary:
This PR adds ``EMFORMER_RNNT_BASE_MUSTC`` support in `pipeline_demo.py`. The bundle is trained on MuST-C release 2.0 dataset. The model preserves the casing and punctuations in the transcript.

Here is a screen recording of how it works in streaming and non-streaming modes:

https://user-images.githubusercontent.com/8653221/154356521-fe84bdc1-fb0c-41bd-8729-9edbb3224a07.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2248

Reviewed By: hwangjeff

Differential Revision: D34282598

Pulled By: nateanl

fbshipit-source-id: 42ed7e2623031dfebd176ef0c6bfd70da3c897d4

38569ef0

Refactor pipeline_demo script in emformer_rnnt recipes (#2239) · fdea0a7c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Use dictionary to select the `RNNTBundle` and the corresponding dataset.
- Use the dictionary's keys as choices in ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2239

Reviewed By: mthrok

Differential Revision: D34267070

Pulled By: nateanl

fbshipit-source-id: 99c7942d5c7c1518694e1ae02a55a7decd87c220

fdea0a7c

Refactor eval and pipeline_demo scripts in emformer_rnnt (#2238) · e3b40d1c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Add docstring to `eval.py` and `pipeline_demo.py` under `emformer_rnnt` directory.
- Refactor logger and ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2238

Reviewed By: mthrok

Differential Revision: D34267059

Pulled By: nateanl

fbshipit-source-id: 4b8d3d183ee7bc0ad71ce305cab87bfa90208b2e

e3b40d1c

Fix lm used for ctc decoder example (#2235) · c2decba4

Caroline Chen authored Feb 16, 2022

Summary:
LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme

Pull Request resolved: https://github.com/pytorch/audio/pull/2235

Reviewed By: nateanl

Differential Revision: D34273042

Pulled By: carolineechen

fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812

c2decba4

Add shebang lines to scripts in emformer_rnnt recipes (#2237) · aac83fe5

Zhaoheng Ni authored Feb 16, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2237

Reviewed By: mthrok

Differential Revision: D34267000

Pulled By: nateanl

fbshipit-source-id: 4c264aea6cf3fba5d8728d5fe60f9f471815852d

aac83fe5

Refactor ArgumentParser arguments in emformer_rnnt recipes (#2236) · 81f56f64

Zhaoheng Ni authored Feb 16, 2022

Summary:
Replace underscore with dash in ArgumentParser's arguments.

Pull Request resolved: https://github.com/pytorch/audio/pull/2236

Reviewed By: mthrok

Differential Revision: D34266977

Pulled By: nateanl

fbshipit-source-id: ceacac12c04016a8dbf2a1a7d6bbcf65d4d53d21

81f56f64

11 Feb, 2022 5 commits

Add training recipe for Emformer RNNT trained on MuST-C release v2.0 dataset (#2219) · 4d0095a5

nateanl authored Feb 11, 2022

Summary:
- Add a MUSTC dataset under examples
- Add a lightning module for MuST-C dataset
- Refactor `train.py`, `eval.py`, and `global_stats.py` scripts

Pull Request resolved: https://github.com/pytorch/audio/pull/2219

Reviewed By: hwangjeff

Differential Revision: D34180466

Pulled By: nateanl

fbshipit-source-id: 9fc74ce7527da1a81dd0738e124428f9d516d164

4d0095a5

Add SentencePiece model training script for LibriSpeech Emformer RNN-T (#2218) · 825a5976

hwangjeff authored Feb 11, 2022

Summary:
Adds SentencePiece model training script for LibriSpeech Emformer RNN-T example recipe; updates readme with references.

Pull Request resolved: https://github.com/pytorch/audio/pull/2218

Reviewed By: nateanl

Differential Revision: D34177295

Pulled By: hwangjeff

fbshipit-source-id: 9f32805af792fb8c6f834f2812e20104177a6c43

825a5976

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles (#2203) · 16d02a9e

nateanl authored Feb 11, 2022

Summary:
We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming).

We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2203

Reviewed By: carolineechen

Differential Revision: D34006388

Pulled By: nateanl

fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632

16d02a9e

Add unit tests for Emformer RNN-T LibriSpeech recipe (#2216) · bbdbd582

hwangjeff authored Feb 11, 2022

Summary:
Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/2216

Reviewed By: nateanl

Differential Revision: D34171480

Pulled By: hwangjeff

fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e

bbdbd582

Fix bugs from Emformer RNN-T recipes merge (#2217) · 2b991225

hwangjeff authored Feb 11, 2022

Summary:
- Removes 100-batch truncation in TEDLIUM3 recipe.
- Reinstates `train_spm.py` for TEDLIUM3.

Pull Request resolved: https://github.com/pytorch/audio/pull/2217

Reviewed By: nateanl

Differential Revision: D34171525

Pulled By: hwangjeff

fbshipit-source-id: 54698e5e1b094c26c28eec9b8b1722223077876c

2b991225

10 Feb, 2022 1 commit

Refactor Emformer RNNT recipes (#2212) · 33bcb7b0

hwangjeff authored Feb 09, 2022

Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2212

Reviewed By: mthrok

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 29c6e27195d5998f76d67c35b718110e73529456

33bcb7b0

04 Feb, 2022 1 commit

Add RNNTBundle with weights pre-trained on tedlium3 dataset (#2177) · a1dc9e0a

Zhaoheng Ni authored Feb 04, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177

Reviewed By: hwangjeff

Differential Revision: D33893052

Pulled By: nateanl

fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7

a1dc9e0a

03 Feb, 2022 2 commits

Fix TimeMasking argument in TED-LIUM Emformer recipe (#2199) · b986e9ef

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2199

Reviewed By: hwangjeff

Differential Revision: D33979923

Pulled By: nateanl

fbshipit-source-id: 566ba1944dd3511fee740ac17fea2dcb0e5810fa

b986e9ef

Add training recipe of Emformer trained on TED-LIUM release 3 dataset (#2195) · 8f68b3f0

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2195

Reviewed By: hwangjeff

Differential Revision: D33950179

Pulled By: nateanl

fbshipit-source-id: 5fcfa4f433fffdcbb3b8e97f7c90fb8f723a30a2

8f68b3f0

02 Feb, 2022 1 commit

Revise RNN-T pipeline streaming decoding logic (#2192) · 612de66b

hwangjeff authored Feb 01, 2022

Summary:
Rather than apply SentencePiece's `decode` to directly convert each hypothesis's token id sequence to an output string, we convert each token id sequence to word pieces and then manually join the word pieces ourselves. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8345689/152093668-11fb775a-bf7b-4b1d-9516-9f8d5a9b6683.mov

Versus the previous behavior visualized in https://github.com/pytorch/audio/issues/2093, the scheme here properly constructs words comprising multiple pieces.

Pull Request resolved: https://github.com/pytorch/audio/pull/2192

Reviewed By: mthrok

Differential Revision: D33936622

Pulled By: hwangjeff

fbshipit-source-id: e550980c7d4cac9e982315508f793a6b816752e9

612de66b

01 Feb, 2022 2 commits

Update stale prototype references (#2189) · 1a0935c6

hwangjeff authored Feb 01, 2022

Summary:
Missed a couple of spots in https://github.com/pytorch/audio/issues/2187.

Pull Request resolved: https://github.com/pytorch/audio/pull/2189

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D33926342

Pulled By: hwangjeff

fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06

1a0935c6

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c