Commits · aed5eb88adeba872cf9859bd5b5bfe10ba77e835 · OpenDAS / Torchaudio

10 May, 2022 2 commits

Add SoudenMVDR module (#2367) · aed5eb88

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are:
- multi-channel spectrum.
- PSD matrix of target speech.
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2367

Reviewed By: hwangjeff

Differential Revision: D36198015

Pulled By: nateanl

fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527

aed5eb88

Add citations for datasets (#2371) · 638120ca

Caroline Chen authored May 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371

Reviewed By: xiaohui-zhang

Differential Revision: D36246167

Pulled By: carolineechen

fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4

638120ca

26 Apr, 2022 2 commits

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

Fix LibriMix documentation (#2351) · 892d6d34

Zhaoheng Ni authored Apr 26, 2022

Summary:
The `LibriMix` dataset is missing on the [documentation webpage](https://pytorch.org/audio/stable/datasets.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2351

Reviewed By: carolineechen

Differential Revision: D35926695

Pulled By: nateanl

fbshipit-source-id: 168aed3bb15510d1b1ec57d77727932e481aca48

892d6d34

21 Apr, 2022 1 commit

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

18 Apr, 2022 1 commit

Add QUESST14 dataset (#2290) · aebcf6af

Caroline Chen authored Apr 18, 2022

Summary:
implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py)

modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline

Pull Request resolved: https://github.com/pytorch/audio/pull/2290

Reviewed By: nateanl

Differential Revision: D35692551

Pulled By: carolineechen

fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c

aebcf6af

12 Apr, 2022 1 commit

Add Conformer RNN-T model prototype (#2322) · b0c8e239

hwangjeff authored Apr 11, 2022

Summary:
Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
- Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
- Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
- Introduces tests for `conformer_rnnt_model`.
- Adds docs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2322

Reviewed By: xiaohui-zhang

Differential Revision: D35565987

Pulled By: hwangjeff

fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789

b0c8e239

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

26 Mar, 2022 1 commit

Update decoder pretrained lm docs (#2291) · 46ed2b98

Caroline Chen authored Mar 26, 2022

Summary:
`build_docs` test is failing on CI with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`, but with local build:

<img width="902" alt="Screen Shot 2022-03-25 at 4 02 53 PM" src="https://user-images.githubusercontent.com/16568633/160157472-c91ff9b2-a2be-4c5d-959e-53b9f45425c6.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2291

Reviewed By: mthrok

Differential Revision: D35147098

Pulled By: carolineechen

fbshipit-source-id: 682b3800d0ed5c56b402d83f221136725051ba7e

46ed2b98

25 Mar, 2022 1 commit

Pin jinja2 version for build_docs (#2292) · d484516e

Caroline Chen authored Mar 25, 2022

Summary:
`build_docs` CircleCI currently failing with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`. Pin Jinja2<3.1 to resolve this issue, see https://github.com/sphinx-doc/sphinx/issues/10291#issuecomment-1078046986

Pull Request resolved: https://github.com/pytorch/audio/pull/2292

Reviewed By: mthrok

Differential Revision: D35148397

Pulled By: carolineechen

fbshipit-source-id: 963efe2fcdee13dead4a4d542c903913c6eaa505

d484516e

24 Mar, 2022 1 commit

Update CTC decoder docs and add citation (#2278) · 05592dff

Caroline Chen authored Mar 24, 2022

Summary:
rendered:
- [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html)
- [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2278

Reviewed By: mthrok

Differential Revision: D35097734

Pulled By: carolineechen

fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c

05592dff

26 Feb, 2022 2 commits

Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.

Pull Request resolved: https://github.com/pytorch/audio/pull/2232

Reviewed By: mthrok

Differential Revision: D34474561

Pulled By: nateanl

fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d

9c56ffb4

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

25 Feb, 2022 5 commits

Add rtf_power method to torchaudio.functional (#2231) · ea74813d

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``rtf_power`` method to ``torchaudio.functional``.
The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206).
[This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English.
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2231

Reviewed By: mthrok

Differential Revision: D34474503

Pulled By: nateanl

fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb

ea74813d

Add rtf_evd method to torchaudio.functional (#2230) · 86fe4fa7

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds `rtf_evd` method to `torchaudio.functional`.
The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.
The input argument is the power spectral density (PSD) matrix of the target speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2230

Reviewed By: mthrok

Differential Revision: D34474188

Pulled By: nateanl

fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134

86fe4fa7

Add mvdr_weights_rtf to torchaudio.functional (#2229) · 3566ffc5

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference.
The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2229

Reviewed By: mthrok

Differential Revision: D34474119

Pulled By: nateanl

fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2

3566ffc5

Add mvdr_weights_souden to torchaudio.functional (#2228) · 5d06a369

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2228

Reviewed By: mthrok

Differential Revision: D34474018

Pulled By: nateanl

fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb

5d06a369

Add psd method to torchaudio.functional (#2227) · 07bd1aa3

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``psd`` method to ``torchaudio.functional``.
It computes the power spectral density (PSD) matrix of the complex-valued spectrum.
The method also supports normalization of Time-Frequency mask.

Pull Request resolved: https://github.com/pytorch/audio/pull/2227

Reviewed By: mthrok

Differential Revision: D34473908

Pulled By: nateanl

fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9

07bd1aa3

16 Feb, 2022 1 commit

Add EMFORMER_RNNT_BASE_MUSTC bundle to torchaudio.prototype (#2241) · 99b5ef5c

Zhaoheng Ni authored Feb 16, 2022

Summary:
This PR provides a RNNTBundle that is pre-trained on the MuST-C release v2.0 dataset.
The model preserves the casing and punctuations of the transcripts when training the SentencePiece model.

Here is the model performance on the dev and test sets of MuST-C 2.0:
|                   |          WER |
|:-----------------:|-------------:|
| dev               |       0.190  |
| tst-COMMON        |       0.213  |
| tst-HE            |       0.186  |

Pull Request resolved: https://github.com/pytorch/audio/pull/2241

Reviewed By: mthrok

Differential Revision: D34267792

Pulled By: nateanl

fbshipit-source-id: 67bca9f277e66d41a4530d01615f249b3cec7167

99b5ef5c

04 Feb, 2022 1 commit

Add RNNTBundle with weights pre-trained on tedlium3 dataset (#2177) · a1dc9e0a

Zhaoheng Ni authored Feb 04, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177

Reviewed By: hwangjeff

Differential Revision: D33893052

Pulled By: nateanl

fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7

a1dc9e0a

03 Feb, 2022 1 commit

Add tutorials with streaming API (#2193) · c00f65da

moto authored Feb 03, 2022

Summary:
* tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html
* tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2193

Reviewed By: hwangjeff

Differential Revision: D33971312

Pulled By: mthrok

fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f

c00f65da

02 Feb, 2022 1 commit

Add Streaming API (#2164) · 7a3e262d

moto authored Feb 01, 2022

Summary:
This PR adds the prototype streaming API.
The implementation is based on ffmpeg libraries.

For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2164

Reviewed By: hwangjeff

Differential Revision: D33934457

Pulled By: mthrok

fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe

7a3e262d

01 Feb, 2022 3 commits

Update stale prototype references (#2189) · 1a0935c6

hwangjeff authored Feb 01, 2022

Summary:
Missed a couple of spots in https://github.com/pytorch/audio/issues/2187.

Pull Request resolved: https://github.com/pytorch/audio/pull/2189

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D33926342

Pulled By: hwangjeff

fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06

1a0935c6

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c

Fix lexicon decoder docs (#2185) · ff15ba1b

Caroline Chen authored Feb 01, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2185

Reviewed By: hwangjeff, mthrok

Differential Revision: D33905767

Pulled By: carolineechen

fbshipit-source-id: 964576ab3f4a12b91fa3960b2aa2337239356513

ff15ba1b

27 Jan, 2022 1 commit

Add no lm support for CTC decoder (#2174) · 4c3fa875

Caroline Chen authored Jan 27, 2022

Summary:
Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)

Pull Request resolved: https://github.com/pytorch/audio/pull/2174

Reviewed By: hwangjeff, nateanl

Differential Revision: D33798674

Pulled By: carolineechen

fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde

4c3fa875

14 Jan, 2022 1 commit

Tweak documentation (#2152) · 7f859111

moto authored Jan 14, 2022

Summary:
- Change the version of nightly build to `Nightly Build (VERSION)`.
- Use `BUILD_VERSION` env var for release.
- Automatically change copyright year.
- Update the link to nightly in README so that the main branch directs to the corresponding document.

Because of the way CI job is setup, the resulting documentation says 0.8.0. This is fixed by https://github.com/pytorch/audio/issues/2151.

Pull Request resolved: https://github.com/pytorch/audio/pull/2152

Reviewed By: carolineechen, nateanl

Differential Revision: D33585053

Pulled By: mthrok

fbshipit-source-id: 3c2bf9fc3214c89f989f5ac65b74bc1e276a7161

7f859111

06 Jan, 2022 1 commit

[DOC] Update prototype pipeline documentation (#2148) · 1ccd33ec

moto authored Jan 06, 2022

Summary:
- Unindent RNNTBundle components so that they show up on the right side bar
- Overwrite the sigunature of RNNTBundle methods so that back links are available

 ---

## Before

<img width="1440" alt="Screen Shot 2022-01-06 at 1 36 16 PM" src="https://user-images.githubusercontent.com/855818/148433552-9ba3051d-38b1-4825-9a8f-9173b23650ea.png">

## After

<img width="1436" alt="Screen Shot 2022-01-06 at 1 35 39 PM" src="https://user-images.githubusercontent.com/855818/148433525-733d138d-9a8b-43d6-bdf5-444b52d6a7a9.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2148

Reviewed By: hwangjeff

Differential Revision: D33458574

Pulled By: mthrok

fbshipit-source-id: ac34ffc4070261563a1f4ea9337997f0fe7b2212

1ccd33ec

04 Jan, 2022 1 commit

Add custom CSS to make signatures appear in multi-line (#2123) · 832f055a

moto authored Jan 04, 2022

Summary:
* Before

https://pytorch.org/audio/main/models.html

<img width="852" alt="Screen Shot 2022-01-04 at 11 00 12 AM" src="https://user-images.githubusercontent.com/855818/148087255-3b94e63b-9870-4c7e-95c6-17acc1e65fef.png">

*After

https://503135-90321822-gh.circle-artifacts.com/0/docs/models.html

<img width="842" alt="Screen Shot 2022-01-04 at 10 59 40 AM" src="https://user-images.githubusercontent.com/855818/148087148-b951c7b0-d9cf-4014-8a50-b88c749f12ba.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2123

Reviewed By: carolineechen

Differential Revision: D33409661

Pulled By: mthrok

fbshipit-source-id: bb2dffea25ccc4356d257b2ab4a6e88f7f4e2bb3

832f055a

31 Dec, 2021 1 commit

Update CTC Hypothesis docs (#2117) · 64c7e065

Caroline Chen authored Dec 30, 2021

Summary:
add documentaion for CTC decoder `Hypothesis` and include it in docs

Pull Request resolved: https://github.com/pytorch/audio/pull/2117

Reviewed By: mthrok

Differential Revision: D33370381

Pulled By: carolineechen

fbshipit-source-id: cf6501a499e5303cda0410f733f0fab4e1c39aff

64c7e065

30 Dec, 2021 1 commit

Enforce lint checks and fix/mute lint errors (#2116) · 8ed14782

Joao Gomes authored Dec 30, 2021

Summary:
cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/2116

Reviewed By: mthrok

Differential Revision: D33368453

Pulled By: jdsgomes

fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c

8ed14782

29 Dec, 2021 3 commits

Reorganize RNN-T components in prototype module (#2110) · 67cdf882

hwangjeff authored Dec 29, 2021

Summary:
Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`.

Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2110

Reviewed By: carolineechen, mthrok

Differential Revision: D33354116

Pulled By: hwangjeff

fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb

67cdf882

Update prototype documentations (#2108) · 10cce198

moto authored Dec 28, 2021

Summary:
### Change list

* Split the documentation of prototypes
* Add a new API reference section dedicated for prototypes.
* Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder
* Hide the signature of RNNT constructor. (cc hwangjeff )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT
* Tweak CTC tutorial
  * Replace hyperlinks to API reference with backlinks
  * Add `progress=False` to download

### Follow-up

RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous.
I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough.

### Before

https://pytorch.org/audio/main/prototype.html

<img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png">

### After

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html

<img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html

<img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html

<img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2108

Reviewed By: hwangjeff, carolineechen, nateanl

Differential Revision: D33340816

Pulled By: mthrok

fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187

10cce198

Add pretrained Emformer RNN-T streaming ASR inference pipeline (#2093) · 72a98a86

hwangjeff authored Dec 28, 2021

Summary:
Adds pretrained Emformer RNN-T inference pipeline that's capable of performing streaming and non-streaming ASR.

Includes demo script that uses pipeline to alternately perform streaming and non-streaming ASR on LibriSpeech test samples (video below).

https://user-images.githubusercontent.com/8345689/147590753-d5126557-d575-4551-8dfe-5977276cb4ad.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2093

Reviewed By: mthrok

Differential Revision: D33340776

Pulled By: hwangjeff

fbshipit-source-id: fbb3b1d471b4e9f1b93fa9dea9c464154537a8ac

72a98a86

28 Dec, 2021 4 commits

Add ASR CTC inference tutorial (#2106) · 133d0065

Caroline Chen authored Dec 28, 2021

Summary:
demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model

rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html

follow-ups:
- incorporate `nbest`
- demonstrate customizability of different beam search parameters

Pull Request resolved: https://github.com/pytorch/audio/pull/2106

Reviewed By: mthrok

Differential Revision: D33340946

Pulled By: carolineechen

fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7

133d0065

Add HuBERT pretrain model to enable training from scratch (#2064) · 37a2555f

Zhaoheng Ni authored Dec 28, 2021

Summary:
- Add three factory functions:`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`, to enable the HuBERT model to train from scratch.
- Add `num_classes` argument to `hubert_pretrain_base` factory function because the base model has two iterations of training, the first iteration the `num_cluster` is 100, in the second iteration `num_cluster` is 500.
- The model takes `waveforms`, `labels`, and `lengths` as inputs
- The model generates the last layer of transformer embedding, `logit_m`, `logit_u` as the outputs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2064

Reviewed By: hwangjeff, mthrok

Differential Revision: D33338587

Pulled By: nateanl

fbshipit-source-id: 534bc17c576c5f344043d8ba098204b8da6e630a

37a2555f

Disable matplotlib warning in tutorial rendering (#2107) · 7bf04d1e

moto authored Dec 28, 2021

Summary:
*Before:*

https://pytorch.org/audio/main/tutorials/audio_data_augmentation_tutorial.html#effects-applied

<img width="831" alt="Screen Shot 2021-12-28 at 11 25 08 AM" src="https://user-images.githubusercontent.com/855818/147586457-55d566bf-f016-4327-a07e-5de68f80e984.png">

*After:*

https://484994-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html#effects-applied

<img width="830" alt="Screen Shot 2021-12-28 at 11 25 57 AM" src="https://user-images.githubusercontent.com/855818/147586531-90333201-b9e3-450f-a2d7-6fb987b7e9d9.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2107

Reviewed By: carolineechen

Differential Revision: D33337164

Pulled By: mthrok

fbshipit-source-id: 20e3309f0d11d46619f516dc46d967b34f22ec95

7bf04d1e

Add Sphinx gallery automatically (#2101) · eb8e8dc8

moto authored Dec 28, 2021

Summary:
This commit updates the documentation configuration so that if an API (function or class) is used in tutorials, then it automatically add the links to the tutorials.

It also adds `py:func:` so that it's easy to jump from tutorials to API reference.

Note: the use of `py:func:` is not required to be recognized by Shpinx-gallery.

* https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#feature-extractions

<img width="776" alt="Screen Shot 2021-12-24 at 12 41 43 PM" src="https://user-images.githubusercontent.com/855818/147367407-cd86f114-7177-426a-b5ee-a25af17ae476.png">

* https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#mvdr

<img width="769" alt="Screen Shot 2021-12-24 at 12 42 31 PM" src="https://user-images.githubusercontent.com/855818/147367422-01fd245f-2f25-4875-a206-910e17ae0161.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2101

Reviewed By: hwangjeff

Differential Revision: D33311283

Pulled By: mthrok

fbshipit-source-id: e0c124d2a761e0f8d81c3d14c4ffc836ffffe288

eb8e8dc8

23 Dec, 2021 2 commits

Add Python CTC decoder API (#2089) · a76b0066

Caroline Chen authored Dec 23, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review

This PR adds Python decoder API and basic README

Pull Request resolved: https://github.com/pytorch/audio/pull/2089

Reviewed By: mthrok

Differential Revision: D33299818

Pulled By: carolineechen

fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc

a76b0066

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a