Commits · eeba91dc564e404e477ad1cc7e7805870be846e1 · OpenDAS / Torchaudio

16 Feb, 2022 6 commits

Add complex dtype support in functional autograd test (#2244) · eeba91dc

Zhaoheng Ni authored Feb 16, 2022

Summary:
In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2244

Reviewed By: mthrok

Differential Revision: D34272998

Pulled By: nateanl

fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5

eeba91dc

Fix lm used for ctc decoder example (#2235) · c2decba4

Caroline Chen authored Feb 16, 2022

Summary:
LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme

Pull Request resolved: https://github.com/pytorch/audio/pull/2235

Reviewed By: nateanl

Differential Revision: D34273042

Pulled By: carolineechen

fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812

c2decba4

Add shebang lines to scripts in emformer_rnnt recipes (#2237) · aac83fe5

Zhaoheng Ni authored Feb 16, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2237

Reviewed By: mthrok

Differential Revision: D34267000

Pulled By: nateanl

fbshipit-source-id: 4c264aea6cf3fba5d8728d5fe60f9f471815852d

aac83fe5

Add EMFORMER_RNNT_BASE_MUSTC bundle to torchaudio.prototype (#2241) · 99b5ef5c

Zhaoheng Ni authored Feb 16, 2022

Summary:
This PR provides a RNNTBundle that is pre-trained on the MuST-C release v2.0 dataset.
The model preserves the casing and punctuations of the transcripts when training the SentencePiece model.

Here is the model performance on the dev and test sets of MuST-C 2.0:
|                   |          WER |
|:-----------------:|-------------:|
| dev               |       0.190  |
| tst-COMMON        |       0.213  |
| tst-HE            |       0.186  |

Pull Request resolved: https://github.com/pytorch/audio/pull/2241

Reviewed By: mthrok

Differential Revision: D34267792

Pulled By: nateanl

fbshipit-source-id: 67bca9f277e66d41a4530d01615f249b3cec7167

99b5ef5c

Refactor ArgumentParser arguments in emformer_rnnt recipes (#2236) · 81f56f64

Zhaoheng Ni authored Feb 16, 2022

Summary:
Replace underscore with dash in ArgumentParser's arguments.

Pull Request resolved: https://github.com/pytorch/audio/pull/2236

Reviewed By: mthrok

Differential Revision: D34266977

Pulled By: nateanl

fbshipit-source-id: ceacac12c04016a8dbf2a1a7d6bbcf65d4d53d21

81f56f64

Fix prototype exclusion in release (#2225) · a007e922

moto authored Feb 15, 2022

Summary:
This commit fixes the feature to exclude `torchaudio.prototype` module.

In `setup.py` there is a special case that is triggered if the commit is on release branch or release tag, that  excludes `torchaudio.prototype`. This was introduced to make it easy for release-related work.
It turned out that the submodules under `torchaudio.prototype`, such as `torchaudio.prototype.pipelines`, are not properly excluded from packaging.
These sub modules did not exist in previous releases, so it was not an issue.

**Note** This feature is triggered only in release branch, so the fix is not visible in the CI of this PR.
https://app.circleci.com/pipelines/github/pytorch/audio/9674/workflows/d0c9a6f1-8ca9-441a-a5f5-08926075fa39/jobs/553985?invite=true#step-104-193

The following outputs were observed when running it on local env.

* Before the change

```
$ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel
```
```
-- Git branch: prototype-exclusion
-- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1
-- Git tag: None
-- PyTorch dependency: torch
-- Building version 0.11.0+0af1eda
 --- Initializing submodules
 --- Initialized submodule
Excluding torchaudio.prototype from the package.
...
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io
copying torchaudio/prototype/io/streamer.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io
copying torchaudio/prototype/io/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines
copying torchaudio/prototype/pipelines/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines
copying torchaudio/prototype/pipelines/rnnt_pipeline.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder
copying torchaudio/prototype/ctc_decoder/ctc_decoder.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder
copying torchaudio/prototype/ctc_decoder/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder
warning: build_py: byte-compiling is disabled, skipping.
```

* After the change

```
$ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel
```

```
-- Git branch: prototype-exclusion
-- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1
-- Git tag: None
-- PyTorch dependency: torch
-- Building version 0.11.0+0af1eda
 --- Initializing submodules
 --- Initialized submodule
Excluding torchaudio.prototype from the package.
...
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
copying torchaudio/models/wav2vec2/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
copying torchaudio/models/wav2vec2/model.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
copying torchaudio/models/wav2vec2/components.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2
creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
copying torchaudio/models/wav2vec2/utils/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
copying torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
copying torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils
warning: build_py: byte-compiling is disabled, skipping.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2225

Reviewed By: nateanl

Differential Revision: D34257128

Pulled By: mthrok

fbshipit-source-id: a3d6eca5803356e5aa3fe0eda82f6a9f5affb8e8

a007e922

15 Feb, 2022 3 commits

Improve ffmpeg library discovery (#2204) · 963905e4

moto authored Feb 15, 2022

Summary:
This commit fixes the issue with ffmpeg discovery at build time.
The original implementation had issues like.

1. Wrong usage of FindFFMPEG, which caused mixture of ffmpeg libraries from system directory and user directory.
2. The optional `FFMPEG_ROOT` variable was not set within cmake.

The issue 1 is problematic when a user does not have a permission to
modify the environment. For example, an old version of ffmpeg, which is
installed in a directory managed by the system (such as `/usr/local/lib`),
then there is no way to specify a path in which user installs a supported version
of ffmpeg.

This commit changes the behavior by first searching the library
in `FFMPEG_ROOT` environment variables, then
resorting to the original behavior of searching the custom paths with
system default path.

Also this commirt removes support for `libavresample`, which is deprecated in
ffmpeg 4 and removed in ffmpeg 5.

Pull Request resolved: https://github.com/pytorch/audio/pull/2204

Reviewed By: carolineechen

Differential Revision: D34225769

Pulled By: mthrok

fbshipit-source-id: 95b0bfaaef31e2e69e6df29f789010f48a48210b

963905e4

Update context building to not delay the inference (#2213) · 8e3c6144

moto authored Feb 14, 2022

Summary:
Updating the context cacher so that fetched audio chunk is used for inference immediately.

https://github.com/pytorch/audio/pull/2202#discussion_r802838174

Pull Request resolved: https://github.com/pytorch/audio/pull/2213

Reviewed By: hwangjeff

Differential Revision: D34235230

Pulled By: mthrok

fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e

8e3c6144

Adjust Conformer args (#2223) · 411b5dcf

hwangjeff authored Feb 14, 2022

Summary:
Orders and names Conformer's initializer args to be more consistent with Emformer's.

Pull Request resolved: https://github.com/pytorch/audio/pull/2223

Reviewed By: mthrok

Differential Revision: D34226177

Pulled By: hwangjeff

fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829

411b5dcf

11 Feb, 2022 7 commits

Add fixed random seed for Emformer RNN-T recipe test (#2220) · bc0fcadb

hwangjeff authored Feb 11, 2022

Summary:
Adds fixed random seed to Emformer RNN-T training recipe test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2220

Reviewed By: nateanl

Differential Revision: D34180644

Pulled By: hwangjeff

fbshipit-source-id: 2dc364f3f7cd666fa490514ae460538231c097e9

bc0fcadb

Add training recipe for Emformer RNNT trained on MuST-C release v2.0 dataset (#2219) · 4d0095a5

nateanl authored Feb 11, 2022

Summary:
- Add a MUSTC dataset under examples
- Add a lightning module for MuST-C dataset
- Refactor `train.py`, `eval.py`, and `global_stats.py` scripts

Pull Request resolved: https://github.com/pytorch/audio/pull/2219

Reviewed By: hwangjeff

Differential Revision: D34180466

Pulled By: nateanl

fbshipit-source-id: 9fc74ce7527da1a81dd0738e124428f9d516d164

4d0095a5

Add SentencePiece model training script for LibriSpeech Emformer RNN-T (#2218) · 825a5976

hwangjeff authored Feb 11, 2022

Summary:
Adds SentencePiece model training script for LibriSpeech Emformer RNN-T example recipe; updates readme with references.

Pull Request resolved: https://github.com/pytorch/audio/pull/2218

Reviewed By: nateanl

Differential Revision: D34177295

Pulled By: hwangjeff

fbshipit-source-id: 9f32805af792fb8c6f834f2812e20104177a6c43

825a5976

Pass bias and dropout args to Conformer convolution block (#2215) · 738d2f8e

hwangjeff authored Feb 11, 2022

Summary:
Modifies `ConformerLayer` to pass `bias=True` (to be consistent with feedforward network defaults) and `dropout=dropout` (omission was a bug) to the convolution block.

Pull Request resolved: https://github.com/pytorch/audio/pull/2215

Reviewed By: carolineechen, nateanl

Differential Revision: D34164345

Pulled By: hwangjeff

fbshipit-source-id: 59fc804a1fe3b96e69e9fa5a2f9de94194d7bc55

738d2f8e

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles (#2203) · 16d02a9e

nateanl authored Feb 11, 2022

Summary:
We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming).

We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2203

Reviewed By: carolineechen

Differential Revision: D34006388

Pulled By: nateanl

fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632

16d02a9e

Add unit tests for Emformer RNN-T LibriSpeech recipe (#2216) · bbdbd582

hwangjeff authored Feb 11, 2022

Summary:
Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/2216

Reviewed By: nateanl

Differential Revision: D34171480

Pulled By: hwangjeff

fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e

bbdbd582

Fix bugs from Emformer RNN-T recipes merge (#2217) · 2b991225

hwangjeff authored Feb 11, 2022

Summary:
- Removes 100-batch truncation in TEDLIUM3 recipe.
- Reinstates `train_spm.py` for TEDLIUM3.

Pull Request resolved: https://github.com/pytorch/audio/pull/2217

Reviewed By: nateanl

Differential Revision: D34171525

Pulled By: hwangjeff

fbshipit-source-id: 54698e5e1b094c26c28eec9b8b1722223077876c

2b991225

10 Feb, 2022 1 commit

Refactor Emformer RNNT recipes (#2212) · 33bcb7b0

hwangjeff authored Feb 09, 2022

Summary:
Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2212

Reviewed By: mthrok

Differential Revision: D34120104

Pulled By: hwangjeff

fbshipit-source-id: 29c6e27195d5998f76d67c35b718110e73529456

33bcb7b0

09 Feb, 2022 2 commits

Clean up Emformer (#2207) · 87d7694d

hwangjeff authored Feb 09, 2022

Summary:
- Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases.
- Clarify expected input shapes in API documentation.
- Adjust `infer` tests to reflect expected usage.
- Add assertion for input shape for `infer`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2207

Reviewed By: mthrok

Differential Revision: D34101205

Pulled By: hwangjeff

fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304

87d7694d

Fix librosa calls (#2208) · e5d567c9

hwangjeff authored Feb 08, 2022

Summary:
Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2208

Reviewed By: mthrok

Differential Revision: D34099793

Pulled By: hwangjeff

fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc

e5d567c9

04 Feb, 2022 1 commit

Add RNNTBundle with weights pre-trained on tedlium3 dataset (#2177) · a1dc9e0a

Zhaoheng Ni authored Feb 04, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177

Reviewed By: hwangjeff

Differential Revision: D33893052

Pulled By: nateanl

fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7

a1dc9e0a

03 Feb, 2022 3 commits

Fix TimeMasking argument in TED-LIUM Emformer recipe (#2199) · b986e9ef

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2199

Reviewed By: hwangjeff

Differential Revision: D33979923

Pulled By: nateanl

fbshipit-source-id: 566ba1944dd3511fee740ac17fea2dcb0e5810fa

b986e9ef

Add training recipe of Emformer trained on TED-LIUM release 3 dataset (#2195) · 8f68b3f0

Zhaoheng Ni authored Feb 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2195

Reviewed By: hwangjeff

Differential Revision: D33950179

Pulled By: nateanl

fbshipit-source-id: 5fcfa4f433fffdcbb3b8e97f7c90fb8f723a30a2

8f68b3f0

Add tutorials with streaming API (#2193) · c00f65da

moto authored Feb 03, 2022

Summary:
* tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html
* tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2193

Reviewed By: hwangjeff

Differential Revision: D33971312

Pulled By: mthrok

fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f

c00f65da

02 Feb, 2022 5 commits

Add timesteps visualization to CTC decoder tutorial (#2188) · 94f4ef0f

Caroline Chen authored Feb 02, 2022

Summary:
resulting tutorial: https://538358-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html
- add visualization for timestep alignments
- modify section organization for decoder construction

Pull Request resolved: https://github.com/pytorch/audio/pull/2188

Reviewed By: mthrok

Differential Revision: D33954937

Pulled By: carolineechen

fbshipit-source-id: 8f397229d74c994b8793a30623e1de4c19ebd401

94f4ef0f

Revise RNN-T pipeline streaming decoding logic (#2192) · 612de66b

hwangjeff authored Feb 01, 2022

Summary:
Rather than apply SentencePiece's `decode` to directly convert each hypothesis's token id sequence to an output string, we convert each token id sequence to word pieces and then manually join the word pieces ourselves. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

https://user-images.githubusercontent.com/8345689/152093668-11fb775a-bf7b-4b1d-9516-9f8d5a9b6683.mov

Versus the previous behavior visualized in https://github.com/pytorch/audio/issues/2093, the scheme here properly constructs words comprising multiple pieces.

Pull Request resolved: https://github.com/pytorch/audio/pull/2192

Reviewed By: mthrok

Differential Revision: D33936622

Pulled By: hwangjeff

fbshipit-source-id: e550980c7d4cac9e982315508f793a6b816752e9

612de66b

Add Streaming API (#2164) · 7a3e262d

moto authored Feb 01, 2022

Summary:
This PR adds the prototype streaming API.
The implementation is based on ffmpeg libraries.

For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2164

Reviewed By: hwangjeff

Differential Revision: D33934457

Pulled By: mthrok

fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe

7a3e262d

[CI] Add "cu102" back (#2190) · db12d1a0

Nikita Shulga authored Feb 01, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2190

Reviewed By: mthrok

Differential Revision: D33930129

Pulled By: malfet

fbshipit-source-id: ddcbe79f6bdd3dc9b18c1dc337014142877b844b

db12d1a0

[CI] Pin flake8 version (#2191) · 3a5103c2

Nikita Shulga authored Feb 01, 2022

Summary:
This fixes:
```
Installed flake8: + flake8 --version
Traceback (most recent call last):
  File "/root/project/env/bin/flake8", line 6, in <module>
    from flake8.main.cli import main
  File "/root/project/env/lib/python3.7/site-packages/flake8/main/cli.py", line 6, in <module>
    from flake8.main import application
  File "/root/project/env/lib/python3.7/site-packages/flake8/main/application.py", line 24, in <module>
    from flake8.plugins import manager as plugin_manager
  File "/root/project/env/lib/python3.7/site-packages/flake8/plugins/manager.py", line 11, in <module>
    from flake8._compat import importlib_metadata
  File "/root/project/env/lib/python3.7/site-packages/flake8/_compat.py", line 7, in <module>
    import importlib_metadata
ModuleNotFoundError: No module named 'importlib_metata'
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2191

Reviewed By: atalman

Differential Revision: D33930583

Pulled By: malfet

fbshipit-source-id: 68026743c29434113893cca38041596135d3bd53

3a5103c2

01 Feb, 2022 6 commits

Update stale prototype references (#2189) · 1a0935c6

hwangjeff authored Feb 01, 2022

Summary:
Missed a couple of spots in https://github.com/pytorch/audio/issues/2187.

Pull Request resolved: https://github.com/pytorch/audio/pull/2189

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D33926342

Pulled By: hwangjeff

fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06

1a0935c6

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c

Fix lexicon decoder docs (#2185) · ff15ba1b

Caroline Chen authored Feb 01, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2185

Reviewed By: hwangjeff, mthrok

Differential Revision: D33905767

Pulled By: carolineechen

fbshipit-source-id: 964576ab3f4a12b91fa3960b2aa2337239356513

ff15ba1b

Add CTC decoder timesteps (#2184) · d43ce015

Caroline Chen authored Feb 01, 2022

Summary:
add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens

Pull Request resolved: https://github.com/pytorch/audio/pull/2184

Reviewed By: mthrok

Differential Revision: D33905530

Pulled By: carolineechen

fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a

d43ce015

Update ROCM version 4.1 -> 4.3.1 and 4.5 (#2186) · bb73934f

Nikita Shulga authored Feb 01, 2022

Summary:
Also, retire cuda-10.2

Pull Request resolved: https://github.com/pytorch/audio/pull/2186

Reviewed By: mthrok

Differential Revision: D33917595

Pulled By: malfet

fbshipit-source-id: 060d3fa706279fe45ffd1f4f99e5727520612d56

bb73934f

Add global stats script and new json for LibriSpeech RNN-T training recipe (#2183) · 157cb2a2

hwangjeff authored Jan 31, 2022

Summary:
Adds script for generating global feature statistics along with new feature statistics json for LibriSpeech RNN-T training recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/2183

Reviewed By: mthrok

Differential Revision: D33902377

Pulled By: hwangjeff

fbshipit-source-id: ec347a685ae67aefc485084aac6ed2efd653250f

157cb2a2

31 Jan, 2022 1 commit

Use download.pytorch.org for asset URL (#2182) · f654b2c9

moto authored Jan 31, 2022

Summary:
Changing the URL of tutorial assets to `download.pytorch.org` which is more appropriate for user facing materials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2182

Reviewed By: nateanl

Differential Revision: D33887839

Pulled By: mthrok

fbshipit-source-id: 30569672e8caf30aae5476036dfdadc8ebd436bf

f654b2c9

27 Jan, 2022 4 commits

Remove invalid token blanking logic from RNN-T decoder (#2180) · ed6256a2

hwangjeff authored Jan 27, 2022

Summary:
This PR removes logic in `RNNTBeamSearch` that blanks out joiner output values corresponding to special tokens, e.g. \<unk\>, \<eos\>, for the following reasons:
- Provided that the model was configured and trained properly, it shouldn't be necessary, e.g. the model would naturally produce low probabilities for special tokens if they don't exist in the training set.
- For our pre-trained LibriSpeech training pipeline, the removal of the logic doesn't affect evaluation WER on any of the dev/test splits.
- The existing logic doesn't generalize to arbitrary token vocabularies.
- Internally, it seems to have been acknowledged that this logic was introduced to compensate for quirks in other parts of the modeling infra.

Pull Request resolved: https://github.com/pytorch/audio/pull/2180

Reviewed By: carolineechen, mthrok

Differential Revision: D33822683

Pulled By: hwangjeff

fbshipit-source-id: e7047e294f71c732c77ae0c20fec60412f26f05a

ed6256a2

Add no lm support for CTC decoder (#2174) · 4c3fa875

Caroline Chen authored Jan 27, 2022

Summary:
Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)

Pull Request resolved: https://github.com/pytorch/audio/pull/2174

Reviewed By: hwangjeff, nateanl

Differential Revision: D33798674

Pulled By: carolineechen

fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde

4c3fa875

Refactor RNNT factory function to support num_symbols argument (#2178) · 2cb87c6b

Zhaoheng Ni authored Jan 26, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178

Reviewed By: mthrok

Differential Revision: D33797649

Pulled By: nateanl

fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603

2cb87c6b

Add `is_ffmpeg_available` in test (#2170) · 39fe9df6

moto authored Jan 26, 2022

Summary:
Part of https://github.com/pytorch/audio/issues/2164.
To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available,
this commit adds `is_ffmpeg_available`.

The availability of the features depend on two factors;
1. If it was enabled at build.
2. If the ffmpeg libraries are found at runtime.

A simple way (for OSS workflow) to detect these is simply checking if
`libtorchaudio_ffmpeg` presents and can be loaded without a failure.

To facilitate this, this commit changes the
`torchaudio._extension._load_lib` to return boolean result.

Pull Request resolved: https://github.com/pytorch/audio/pull/2170

Reviewed By: carolineechen

Differential Revision: D33797695

Pulled By: mthrok

fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97

39fe9df6

26 Jan, 2022 1 commit

Add beam search description to tutorial (#2173) · bcf04839

Caroline Chen authored Jan 26, 2022

Summary:
following up on https://github.com/pytorch/audio/pull/2141#discussion_r779055465, adding brief beam search description and linking to resources

Pull Request resolved: https://github.com/pytorch/audio/pull/2173

Reviewed By: nateanl

Differential Revision: D33791731

Pulled By: carolineechen

fbshipit-source-id: 603fdd177c9a3c8276a4692fb7bb385bd01b9bfb

bcf04839