Commits · 87d7694d08d9873b1c9e8cf7e8fe92ea398a1488 · OpenDAS / Torchaudio

02 Feb, 2022 1 commit

Revise RNN-T pipeline streaming decoding logic (#2192) · 612de66b

hwangjeff authored Feb 01, 2022

Summary:
Rather than apply SentencePiece's `decode` to directly convert each hypothesis's token id sequence to an output string, we convert each token id sequence to word pieces and then manually join the word pieces ourselves. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8345689/152093668-11fb775a-bf7b-4b1d-9516-9f8d5a9b6683.mov

Versus the previous behavior visualized in https://github.com/pytorch/audio/issues/2093, the scheme here properly constructs words comprising multiple pieces.

Pull Request resolved: https://github.com/pytorch/audio/pull/2192

Reviewed By: mthrok

Differential Revision: D33936622

Pulled By: hwangjeff

fbshipit-source-id: e550980c7d4cac9e982315508f793a6b816752e9

612de66b

01 Feb, 2022 3 commits

Update stale prototype references (#2189) · 1a0935c6

hwangjeff authored Feb 01, 2022

Summary:
Missed a couple of spots in https://github.com/pytorch/audio/issues/2187.

Pull Request resolved: https://github.com/pytorch/audio/pull/2189

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D33926342

Pulled By: hwangjeff

fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06

1a0935c6

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c

Add global stats script and new json for LibriSpeech RNN-T training recipe (#2183) · 157cb2a2

hwangjeff authored Jan 31, 2022

Summary:
Adds script for generating global feature statistics along with new feature statistics json for LibriSpeech RNN-T training recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/2183

Reviewed By: mthrok

Differential Revision: D33902377

Pulled By: hwangjeff

fbshipit-source-id: ec347a685ae67aefc485084aac6ed2efd653250f

157cb2a2

27 Jan, 2022 1 commit

Refactor RNNT factory function to support num_symbols argument (#2178) · 2cb87c6b

Zhaoheng Ni authored Jan 26, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178

Reviewed By: mthrok

Differential Revision: D33797649

Pulled By: nateanl

fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603

2cb87c6b

30 Dec, 2021 1 commit

Enforce lint checks and fix/mute lint errors (#2116) · 8ed14782

Joao Gomes authored Dec 30, 2021

Summary:
cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/2116

Reviewed By: mthrok

Differential Revision: D33368453

Pulled By: jdsgomes

fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c

8ed14782

29 Dec, 2021 3 commits

Reorganize RNN-T components in prototype module (#2110) · 67cdf882

hwangjeff authored Dec 29, 2021

Summary:
Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`.

Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2110

Reviewed By: carolineechen, mthrok

Differential Revision: D33354116

Pulled By: hwangjeff

fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb

67cdf882

[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 697f92f1
CodemodService Bot authored Dec 29, 2021
```
Reviewed By: zertosh

Differential Revision: D33347867

fbshipit-source-id: 7672f65392e363c0359de2d86e745782a09cf9dc
```
697f92f1

Add pretrained Emformer RNN-T streaming ASR inference pipeline (#2093) · 72a98a86

hwangjeff authored Dec 28, 2021

Summary:
Adds pretrained Emformer RNN-T inference pipeline that's capable of performing streaming and non-streaming ASR.

Includes demo script that uses pipeline to alternately perform streaming and non-streaming ASR on LibriSpeech test samples (video below).

https://user-images.githubusercontent.com/8345689/147590753-d5126557-d575-4551-8dfe-5977276cb4ad.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2093

Reviewed By: mthrok

Differential Revision: D33340776

Pulled By: hwangjeff

fbshipit-source-id: fbb3b1d471b4e9f1b93fa9dea9c464154537a8ac

72a98a86

23 Dec, 2021 1 commit

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

03 Dec, 2021 1 commit

Add training recipe for RNN-T Emformer ASR model (#2052) · 7ac525e7

hwangjeff authored Dec 03, 2021

Summary:
Add training recipe for RNN-T Emformer ASR model to examples directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2052

Reviewed By: nateanl

Differential Revision: D32814096

Pulled By: hwangjeff

fbshipit-source-id: a5153044efc16cb39f0e6413369a6791637af76a

7ac525e7