Commits · 17c6af7f24b72228a608ae3f60ca4b57954497a5 · OpenDAS / Torchaudio

27 Feb, 2022 1 commit

Simplify setup_env.sh (#2265) · 17c6af7f

Nikita Shulga authored Feb 27, 2022

Summary:
Make them more aligned with ones in
https://github.com/pytorch/vision/blob/main/.circleci/unittest/linux/scripts/setup_env.sh

This is preliminary step towards eradicating unneeded conda-forge dependencies, see https://github.com/pytorch/audio/pull/2260

Pull Request resolved: https://github.com/pytorch/audio/pull/2265

Reviewed By: mthrok

Differential Revision: D34499635

Pulled By: malfet

fbshipit-source-id: f87a3e4568aeeab9c6787a777c3231153c4539f0

17c6af7f

26 Feb, 2022 3 commits

Enable ffmpeg prototyep unit test (#2261) · 955ffb47

Moto Hira authored Feb 25, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2261

Enables prototype ffmpeg io tests in fbcode.

Reviewed By: nateanl

Differential Revision: D33698353

fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036

955ffb47

Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.

Pull Request resolved: https://github.com/pytorch/audio/pull/2232

Reviewed By: mthrok

Differential Revision: D34474561

Pulled By: nateanl

fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d

9c56ffb4

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

25 Feb, 2022 6 commits

Add rtf_power method to torchaudio.functional (#2231) · ea74813d

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``rtf_power`` method to ``torchaudio.functional``.
The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206).
[This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English.
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2231

Reviewed By: mthrok

Differential Revision: D34474503

Pulled By: nateanl

fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb

ea74813d

ci: Limit scope of unittest to one python version (#2256) · 8c1db721

Eli Uriegas authored Feb 25, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2256



Limits scope of unittesting to one python version for both macOS and
Windows. These types of workflows are particularly expensive and take a
long time so running them on every PR / every push is a bit wasteful
considering the value in signal between different python versions is
probably negligible.
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: mthrok

Differential Revision: D34459626

Pulled By: seemethere

fbshipit-source-id: 47f5c317027f1b395edf9c1720b1b33ba689cad5

8c1db721

Add rtf_evd method to torchaudio.functional (#2230) · 86fe4fa7

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds `rtf_evd` method to `torchaudio.functional`.
The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.
The input argument is the power spectral density (PSD) matrix of the target speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2230

Reviewed By: mthrok

Differential Revision: D34474188

Pulled By: nateanl

fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134

86fe4fa7

Add mvdr_weights_rtf to torchaudio.functional (#2229) · 3566ffc5

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference.
The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2229

Reviewed By: mthrok

Differential Revision: D34474119

Pulled By: nateanl

fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2

3566ffc5

Add mvdr_weights_souden to torchaudio.functional (#2228) · 5d06a369

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2228

Reviewed By: mthrok

Differential Revision: D34474018

Pulled By: nateanl

fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb

5d06a369

Add psd method to torchaudio.functional (#2227) · 07bd1aa3

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``psd`` method to ``torchaudio.functional``.
It computes the power spectral density (PSD) matrix of the complex-valued spectrum.
The method also supports normalization of Time-Frequency mask.

Pull Request resolved: https://github.com/pytorch/audio/pull/2227

Reviewed By: mthrok

Differential Revision: D34473908

Pulled By: nateanl

fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9

07bd1aa3

24 Feb, 2022 4 commits

Update release notes retrieve PRs script (#2257) · 34b53ee7

Caroline Chen authored Feb 24, 2022

Summary:
as discussed offline w/ nateanl, cherry-picked PRs are currently being included when retrieving PRs between a release branch and newer commits. this PR fixes this by removing duplicates in the commit paths

Pull Request resolved: https://github.com/pytorch/audio/pull/2257

Reviewed By: nateanl

Differential Revision: D34459533

Pulled By: carolineechen

fbshipit-source-id: 3497c1d2dca6f8067e2068146a6e28cce591d3c8

34b53ee7

Fix style check (#2258) · 20488dd8

Caroline Chen authored Feb 24, 2022

Summary:
fix a style check failure from internal diff

Pull Request resolved: https://github.com/pytorch/audio/pull/2258

Reviewed By: nateanl

Differential Revision: D34459526

Pulled By: carolineechen

fbshipit-source-id: d0e6782b5689c3bf63214a4ec6a75dd757678e0d

20488dd8

ci: Remove CUDA 11.1 from CI (#2259) · f6585d9e

Eli Uriegas authored Feb 24, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2259



We're deprecating support for CUDA 11.1 binaries since CUDA 11.3 should
be forwards compatible with CUDA 11.1 drivers
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: atalman

Differential Revision: D34458400

Pulled By: seemethere

fbshipit-source-id: 105d96a9a175a94d85ffe6e9abcce3c77163a72f

f6585d9e

Add Python 3.10 (build and test) (#2224) · 27dff6ba

Andrey Talman authored Feb 24, 2022

Summary:
Adding py3.10 to audio

Pull Request resolved: https://github.com/pytorch/audio/pull/2224

Reviewed By: malfet, atalman, mthrok

Differential Revision: D34442377

Pulled By: seemethere

fbshipit-source-id: 2656de73427063958d609a74c01b526a476cb06a

27dff6ba

23 Feb, 2022 1 commit

[lightning] Replace deprecated DDP accelerator with ddp_find_unused_parameters_false · 1fb10077

Binh Tang authored Feb 23, 2022

Summary: We proactively remove references to the deprecated DDP accelerator to prepare for the breaking changes following the release of PyTorch Lighting 1.6 (see T112240890).

Differential Revision: D34295318

fbshipit-source-id: 7b2245ca9c7c2900f510722b33af8d8eeda49919

1fb10077

18 Feb, 2022 2 commits

Apply minor fixes to Emformer implementation (#2252) · cbf1b839

hwangjeff authored Feb 17, 2022

Summary:
Noticed some items to clean up in `Emformer`.
- Make `segment_length` a required argument in `_EmformerLayer`.
- Remove unused variables from `_unpack_state` and `_gen_attention_mask`.

These don't affect `Emformer`'s functionality or public API.

Pull Request resolved: https://github.com/pytorch/audio/pull/2252

Reviewed By: carolineechen, mthrok

Differential Revision: D34321430

Pulled By: hwangjeff

fbshipit-source-id: 38a5046f633a3e625352c476ef71c78380ccc597

cbf1b839

Update release notes labeling (#2249) · 3184aebc

Caroline Chen authored Feb 17, 2022

Summary:
- fix retrieve PR script to handle commits with unrecognized/invalid PR numbers, such as in 7b6b2d00
- add modifications similar to pytorch's [#71917](https://github.com/pytorch/pytorch/pull/71917), [#72085](https://github.com/pytorch/pytorch/pull/72085)

Pull Request resolved: https://github.com/pytorch/audio/pull/2249

Reviewed By: nateanl, mthrok

Differential Revision: D34304210

Pulled By: carolineechen

fbshipit-source-id: 245784219317e355b5cece4a139dee71d65bfdd1

3184aebc

17 Feb, 2022 4 commits

Refactor batch consistency test in functional (#2245) · 9cf59e75

Zhaoheng Ni authored Feb 17, 2022

Summary:
In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests.
This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2245

Reviewed By: mthrok

Differential Revision: D34273035

Pulled By: nateanl

fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056

9cf59e75

Update the main version to 0.12.0 (#2250) · 27a6dccc

Zhaoheng Ni authored Feb 17, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2250

Reviewed By: mthrok

Differential Revision: D34302192

Pulled By: nateanl

fbshipit-source-id: 4ea7047503ef87e22b5ef6075ad010314d5e3885

27a6dccc

Add unit tests for PyTorch Lightning modules of emformer_rnnt recipes (#2240) · b5d77b15

Zhaoheng Ni authored Feb 17, 2022

Summary:
- Refactor the current `LibriSpeechRNNTModule`'s unit test.
- Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule`
- Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2240

Reviewed By: mthrok

Differential Revision: D34285195

Pulled By: nateanl

fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f

b5d77b15

Update online ASR tutorial (#2226) · c5c4bbfd

moto authored Feb 16, 2022

Summary:
https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

1. Add figure to explain the caching
2. Fix the initialization of stream iterator

Pull Request resolved: https://github.com/pytorch/audio/pull/2226

Reviewed By: carolineechen

Differential Revision: D34265971

Pulled By: mthrok

fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4

c5c4bbfd

16 Feb, 2022 10 commits

Add EMFORMER_RNNT_BASE_MUSTC into pipeline demo script (#2248) · 38569ef0

Zhaoheng Ni authored Feb 16, 2022

Summary:
This PR adds ``EMFORMER_RNNT_BASE_MUSTC`` support in `pipeline_demo.py`. The bundle is trained on MuST-C release 2.0 dataset. The model preserves the casing and punctuations in the transcript.

Here is a screen recording of how it works in streaming and non-streaming modes:

https://user-images.githubusercontent.com/8653221/154356521-fe84bdc1-fb0c-41bd-8729-9edbb3224a07.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2248

Reviewed By: hwangjeff

Differential Revision: D34282598

Pulled By: nateanl

fbshipit-source-id: 42ed7e2623031dfebd176ef0c6bfd70da3c897d4

38569ef0

Refactor torchscript consistency test in functional (#2246) · 87d79889

Zhaoheng Ni authored Feb 16, 2022

Summary:
In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2246

Reviewed By: carolineechen

Differential Revision: D34273057

Pulled By: nateanl

fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b

87d79889

Refactor pipeline_demo script in emformer_rnnt recipes (#2239) · fdea0a7c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Use dictionary to select the `RNNTBundle` and the corresponding dataset.
- Use the dictionary's keys as choices in ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2239

Reviewed By: mthrok

Differential Revision: D34267070

Pulled By: nateanl

fbshipit-source-id: 99c7942d5c7c1518694e1ae02a55a7decd87c220

fdea0a7c

Refactor eval and pipeline_demo scripts in emformer_rnnt (#2238) · e3b40d1c

Zhaoheng Ni authored Feb 16, 2022

Summary:
- Add docstring to `eval.py` and `pipeline_demo.py` under `emformer_rnnt` directory.
- Refactor logger and ArgumentParser

Pull Request resolved: https://github.com/pytorch/audio/pull/2238

Reviewed By: mthrok

Differential Revision: D34267059

Pulled By: nateanl

fbshipit-source-id: 4b8d3d183ee7bc0ad71ce305cab87bfa90208b2e

e3b40d1c

Add complex dtype support in functional autograd test (#2244) · eeba91dc

Zhaoheng Ni authored Feb 16, 2022

Summary:
In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2244

Reviewed By: mthrok

Differential Revision: D34272998

Pulled By: nateanl

fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5

eeba91dc

Fix lm used for ctc decoder example (#2235) · c2decba4

Caroline Chen authored Feb 16, 2022

Summary:
LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme

Pull Request resolved: https://github.com/pytorch/audio/pull/2235

Reviewed By: nateanl

Differential Revision: D34273042

Pulled By: carolineechen

fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812

c2decba4

Add shebang lines to scripts in emformer_rnnt recipes (#2237) · aac83fe5

Zhaoheng Ni authored Feb 16, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2237

Reviewed By: mthrok

Differential Revision: D34267000

Pulled By: nateanl

fbshipit-source-id: 4c264aea6cf3fba5d8728d5fe60f9f471815852d

aac83fe5

Add EMFORMER_RNNT_BASE_MUSTC bundle to torchaudio.prototype (#2241) · 99b5ef5c

Zhaoheng Ni authored Feb 16, 2022

Summary:
This PR provides a RNNTBundle that is pre-trained on the MuST-C release v2.0 dataset.
The model preserves the casing and punctuations of the transcripts when training the SentencePiece model.

Here is the model performance on the dev and test sets of MuST-C 2.0:
|                   |          WER |
|:-----------------:|-------------:|
| dev               |       0.190  |
| tst-COMMON        |       0.213  |
| tst-HE            |       0.186  |

Pull Request resolved: https://github.com/pytorch/audio/pull/2241

Reviewed By: mthrok

Differential Revision: D34267792

Pulled By: nateanl

fbshipit-source-id: 67bca9f277e66d41a4530d01615f249b3cec7167

99b5ef5c

Refactor ArgumentParser arguments in emformer_rnnt recipes (#2236) · 81f56f64

Zhaoheng Ni authored Feb 16, 2022

Summary:
Replace underscore with dash in ArgumentParser's arguments.

Pull Request resolved: https://github.com/pytorch/audio/pull/2236

Reviewed By: mthrok

Differential Revision: D34266977

Pulled By: nateanl

fbshipit-source-id: ceacac12c04016a8dbf2a1a7d6bbcf65d4d53d21

81f56f64

Fix prototype exclusion in release (#2225) · a007e922

moto authored Feb 15, 2022

Summary:
This commit fixes the feature to exclude `torchaudio.prototype` module.

In `setup.py` there is a special case that is triggered if the commit is on release branch or release tag, that  excludes `torchaudio.prototype`. This was introduced to make it easy for release-related work.
It turned out that the submodules under `torchaudio.prototype`, such as `torchaudio.prototype.pipelines`, are not properly excluded from packaging.
These sub modules did not exist in previous releases, so it was not an issue.

**Note** This feature is triggered only in release branch, so the fix is not visible in the CI of this PR.
https://app.circleci.com/pipelines/github/pytorch/audio/9674/workflows/d0c9a6f1-8ca9-441a-a5f5-08926075fa39/jobs/553985?invite=true#step-104-193

The following outputs were observed when running it on local env.

* Before the change

```
$ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel
```
```
-- Git branch: prototype-exclusion
-- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1
-- Git tag: None
-- PyTorch dependency: torch
-- Building version 0.11.0+0af1eda
 --- Initializing submodules
 --- Initialized submodule
Excluding torchaudio.prototype from the package.
...
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io
copying torchaudio/prototype/io/streamer.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io
copying torchaudio/prototype/io/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines
copying torchaudio/prototype/pipelines/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines
copying torchaudio/prototype/pipelines/rnnt_pipeline.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder
copying torchaudio/prototype/ctc_decoder/ctc_decoder.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder
copying torchaudio/prototype/ctc_decoder/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder
warning: build_py: byte-compiling is disabled, skipping.
```

* After the change

```
$ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel
```

```
-- Git branch: prototype-exclusion
-- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1
-- Git tag: None
-- PyTorch dependency: torch
-- Building version 0.11.0+0af1eda
 --- Initializing submodules
 --- Initialized submodule
Excluding torchaudio.prototype from the package.
...
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
copying torchaudio/models/wav2vec2/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
copying torchaudio/models/wav2vec2/model.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
copying torchaudio/models/wav2vec2/components.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
copying torchaudio/models/wav2vec2/utils/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
copying torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
copying torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
warning: build_py: byte-compiling is disabled, skipping.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2225

Reviewed By: nateanl

Differential Revision: D34257128

Pulled By: mthrok

fbshipit-source-id: a3d6eca5803356e5aa3fe0eda82f6a9f5affb8e8

a007e922

15 Feb, 2022 3 commits

Improve ffmpeg library discovery (#2204) · 963905e4

moto authored Feb 15, 2022

Summary:
This commit fixes the issue with ffmpeg discovery at build time.
The original implementation had issues like.

1. Wrong usage of FindFFMPEG, which caused mixture of ffmpeg libraries from system directory and user directory.
2. The optional `FFMPEG_ROOT` variable was not set within cmake.

The issue 1 is problematic when a user does not have a permission to
modify the environment. For example, an old version of ffmpeg, which is
installed in a directory managed by the system (such as `/usr/local/lib`),
then there is no way to specify a path in which user installs a supported version
of ffmpeg.

This commit changes the behavior by first searching the library
in `FFMPEG_ROOT` environment variables, then
resorting to the original behavior of searching the custom paths with
system default path.

Also this commirt removes support for `libavresample`, which is deprecated in
ffmpeg 4 and removed in ffmpeg 5.

Pull Request resolved: https://github.com/pytorch/audio/pull/2204

Reviewed By: carolineechen

Differential Revision: D34225769

Pulled By: mthrok

fbshipit-source-id: 95b0bfaaef31e2e69e6df29f789010f48a48210b

963905e4

Update context building to not delay the inference (#2213) · 8e3c6144

moto authored Feb 14, 2022

Summary:
Updating the context cacher so that fetched audio chunk is used for inference immediately.

https://github.com/pytorch/audio/pull/2202#discussion_r802838174

Pull Request resolved: https://github.com/pytorch/audio/pull/2213

Reviewed By: hwangjeff

Differential Revision: D34235230

Pulled By: mthrok

fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e

8e3c6144

Adjust Conformer args (#2223) · 411b5dcf

hwangjeff authored Feb 14, 2022

Summary:
Orders and names Conformer's initializer args to be more consistent with Emformer's.

Pull Request resolved: https://github.com/pytorch/audio/pull/2223

Reviewed By: mthrok

Differential Revision: D34226177

Pulled By: hwangjeff

fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829

411b5dcf

11 Feb, 2022 6 commits

Add fixed random seed for Emformer RNN-T recipe test (#2220) · bc0fcadb

hwangjeff authored Feb 11, 2022

Summary:
Adds fixed random seed to Emformer RNN-T training recipe test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2220

Reviewed By: nateanl

Differential Revision: D34180644

Pulled By: hwangjeff

fbshipit-source-id: 2dc364f3f7cd666fa490514ae460538231c097e9

bc0fcadb

Add training recipe for Emformer RNNT trained on MuST-C release v2.0 dataset (#2219) · 4d0095a5

nateanl authored Feb 11, 2022

Summary:
- Add a MUSTC dataset under examples
- Add a lightning module for MuST-C dataset
- Refactor `train.py`, `eval.py`, and `global_stats.py` scripts

Pull Request resolved: https://github.com/pytorch/audio/pull/2219

Reviewed By: hwangjeff

Differential Revision: D34180466

Pulled By: nateanl

fbshipit-source-id: 9fc74ce7527da1a81dd0738e124428f9d516d164

4d0095a5

Add SentencePiece model training script for LibriSpeech Emformer RNN-T (#2218) · 825a5976

hwangjeff authored Feb 11, 2022

Summary:
Adds SentencePiece model training script for LibriSpeech Emformer RNN-T example recipe; updates readme with references.

Pull Request resolved: https://github.com/pytorch/audio/pull/2218

Reviewed By: nateanl

Differential Revision: D34177295

Pulled By: hwangjeff

fbshipit-source-id: 9f32805af792fb8c6f834f2812e20104177a6c43

825a5976

Pass bias and dropout args to Conformer convolution block (#2215) · 738d2f8e

hwangjeff authored Feb 11, 2022

Summary:
Modifies `ConformerLayer` to pass `bias=True` (to be consistent with feedforward network defaults) and `dropout=dropout` (omission was a bug) to the convolution block.

Pull Request resolved: https://github.com/pytorch/audio/pull/2215

Reviewed By: carolineechen, nateanl

Differential Revision: D34164345

Pulled By: hwangjeff

fbshipit-source-id: 59fc804a1fe3b96e69e9fa5a2f9de94194d7bc55

738d2f8e

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles (#2203) · 16d02a9e

nateanl authored Feb 11, 2022

Summary:
We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming).

We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2203

Reviewed By: carolineechen

Differential Revision: D34006388

Pulled By: nateanl

fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632

16d02a9e

Add unit tests for Emformer RNN-T LibriSpeech recipe (#2216) · bbdbd582

hwangjeff authored Feb 11, 2022

Summary:
Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/2216

Reviewed By: nateanl

Differential Revision: D34171480

Pulled By: hwangjeff

fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e

bbdbd582