- 16 Feb, 2022 6 commits
-
-
Zhaoheng Ni authored
Summary: In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/2244 Reviewed By: mthrok Differential Revision: D34272998 Pulled By: nateanl fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5
-
Caroline Chen authored
Summary: LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme Pull Request resolved: https://github.com/pytorch/audio/pull/2235 Reviewed By: nateanl Differential Revision: D34273042 Pulled By: carolineechen fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2237 Reviewed By: mthrok Differential Revision: D34267000 Pulled By: nateanl fbshipit-source-id: 4c264aea6cf3fba5d8728d5fe60f9f471815852d
-
Zhaoheng Ni authored
Summary: This PR provides a RNNTBundle that is pre-trained on the MuST-C release v2.0 dataset. The model preserves the casing and punctuations of the transcripts when training the SentencePiece model. Here is the model performance on the dev and test sets of MuST-C 2.0: | | WER | |:-----------------:|-------------:| | dev | 0.190 | | tst-COMMON | 0.213 | | tst-HE | 0.186 | Pull Request resolved: https://github.com/pytorch/audio/pull/2241 Reviewed By: mthrok Differential Revision: D34267792 Pulled By: nateanl fbshipit-source-id: 67bca9f277e66d41a4530d01615f249b3cec7167
-
Zhaoheng Ni authored
Summary: Replace underscore with dash in ArgumentParser's arguments. Pull Request resolved: https://github.com/pytorch/audio/pull/2236 Reviewed By: mthrok Differential Revision: D34266977 Pulled By: nateanl fbshipit-source-id: ceacac12c04016a8dbf2a1a7d6bbcf65d4d53d21
-
moto authored
Summary: This commit fixes the feature to exclude `torchaudio.prototype` module. In `setup.py` there is a special case that is triggered if the commit is on release branch or release tag, that excludes `torchaudio.prototype`. This was introduced to make it easy for release-related work. It turned out that the submodules under `torchaudio.prototype`, such as `torchaudio.prototype.pipelines`, are not properly excluded from packaging. These sub modules did not exist in previous releases, so it was not an issue. **Note** This feature is triggered only in release branch, so the fix is not visible in the CI of this PR. https://app.circleci.com/pipelines/github/pytorch/audio/9674/workflows/d0c9a6f1-8ca9-441a-a5f5-08926075fa39/jobs/553985?invite=true#step-104-193 The following outputs were observed when running it on local env. * Before the change ``` $ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel ``` ``` -- Git branch: prototype-exclusion -- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1 -- Git tag: None -- PyTorch dependency: torch -- Building version 0.11.0+0af1eda --- Initializing submodules --- Initialized submodule Excluding torchaudio.prototype from the package. ... creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io copying torchaudio/prototype/io/streamer.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io copying torchaudio/prototype/io/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines copying torchaudio/prototype/pipelines/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines copying torchaudio/prototype/pipelines/rnnt_pipeline.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder copying torchaudio/prototype/ctc_decoder/ctc_decoder.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder copying torchaudio/prototype/ctc_decoder/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder warning: build_py: byte-compiling is disabled, skipping. ``` * After the change ``` $ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel ``` ``` -- Git branch: prototype-exclusion -- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1 -- Git tag: None -- PyTorch dependency: torch -- Building version 0.11.0+0af1eda --- Initializing submodules --- Initialized submodule Excluding torchaudio.prototype from the package. ... creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 copying torchaudio/models/wav2vec2/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 copying torchaudio/models/wav2vec2/model.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 copying torchaudio/models/wav2vec2/components.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils copying torchaudio/models/wav2vec2/utils/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils copying torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils copying torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils warning: build_py: byte-compiling is disabled, skipping. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2225 Reviewed By: nateanl Differential Revision: D34257128 Pulled By: mthrok fbshipit-source-id: a3d6eca5803356e5aa3fe0eda82f6a9f5affb8e8
-
- 15 Feb, 2022 3 commits
-
-
moto authored
Summary: This commit fixes the issue with ffmpeg discovery at build time. The original implementation had issues like. 1. Wrong usage of FindFFMPEG, which caused mixture of ffmpeg libraries from system directory and user directory. 2. The optional `FFMPEG_ROOT` variable was not set within cmake. The issue 1 is problematic when a user does not have a permission to modify the environment. For example, an old version of ffmpeg, which is installed in a directory managed by the system (such as `/usr/local/lib`), then there is no way to specify a path in which user installs a supported version of ffmpeg. This commit changes the behavior by first searching the library in `FFMPEG_ROOT` environment variables, then resorting to the original behavior of searching the custom paths with system default path. Also this commirt removes support for `libavresample`, which is deprecated in ffmpeg 4 and removed in ffmpeg 5. Pull Request resolved: https://github.com/pytorch/audio/pull/2204 Reviewed By: carolineechen Differential Revision: D34225769 Pulled By: mthrok fbshipit-source-id: 95b0bfaaef31e2e69e6df29f789010f48a48210b
-
moto authored
Summary: Updating the context cacher so that fetched audio chunk is used for inference immediately. https://github.com/pytorch/audio/pull/2202#discussion_r802838174 Pull Request resolved: https://github.com/pytorch/audio/pull/2213 Reviewed By: hwangjeff Differential Revision: D34235230 Pulled By: mthrok fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e
-
hwangjeff authored
Summary: Orders and names Conformer's initializer args to be more consistent with Emformer's. Pull Request resolved: https://github.com/pytorch/audio/pull/2223 Reviewed By: mthrok Differential Revision: D34226177 Pulled By: hwangjeff fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829
-
- 11 Feb, 2022 7 commits
-
-
hwangjeff authored
Summary: Adds fixed random seed to Emformer RNN-T training recipe test. Pull Request resolved: https://github.com/pytorch/audio/pull/2220 Reviewed By: nateanl Differential Revision: D34180644 Pulled By: hwangjeff fbshipit-source-id: 2dc364f3f7cd666fa490514ae460538231c097e9
-
nateanl authored
Summary: - Add a MUSTC dataset under examples - Add a lightning module for MuST-C dataset - Refactor `train.py`, `eval.py`, and `global_stats.py` scripts Pull Request resolved: https://github.com/pytorch/audio/pull/2219 Reviewed By: hwangjeff Differential Revision: D34180466 Pulled By: nateanl fbshipit-source-id: 9fc74ce7527da1a81dd0738e124428f9d516d164
-
hwangjeff authored
Summary: Adds SentencePiece model training script for LibriSpeech Emformer RNN-T example recipe; updates readme with references. Pull Request resolved: https://github.com/pytorch/audio/pull/2218 Reviewed By: nateanl Differential Revision: D34177295 Pulled By: hwangjeff fbshipit-source-id: 9f32805af792fb8c6f834f2812e20104177a6c43
-
hwangjeff authored
Summary: Modifies `ConformerLayer` to pass `bias=True` (to be consistent with feedforward network defaults) and `dropout=dropout` (omission was a bug) to the convolution block. Pull Request resolved: https://github.com/pytorch/audio/pull/2215 Reviewed By: carolineechen, nateanl Differential Revision: D34164345 Pulled By: hwangjeff fbshipit-source-id: 59fc804a1fe3b96e69e9fa5a2f9de94194d7bc55
-
nateanl authored
Summary: We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming). We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR. https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2203 Reviewed By: carolineechen Differential Revision: D34006388 Pulled By: nateanl fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632
-
hwangjeff authored
Summary: Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows. Pull Request resolved: https://github.com/pytorch/audio/pull/2216 Reviewed By: nateanl Differential Revision: D34171480 Pulled By: hwangjeff fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e
-
hwangjeff authored
Summary: - Removes 100-batch truncation in TEDLIUM3 recipe. - Reinstates `train_spm.py` for TEDLIUM3. Pull Request resolved: https://github.com/pytorch/audio/pull/2217 Reviewed By: nateanl Differential Revision: D34171525 Pulled By: hwangjeff fbshipit-source-id: 54698e5e1b094c26c28eec9b8b1722223077876c
-
- 10 Feb, 2022 1 commit
-
-
hwangjeff authored
Summary: Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2212 Reviewed By: mthrok Differential Revision: D34120104 Pulled By: hwangjeff fbshipit-source-id: 29c6e27195d5998f76d67c35b718110e73529456
-
- 09 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases. - Clarify expected input shapes in API documentation. - Adjust `infer` tests to reflect expected usage. - Add assertion for input shape for `infer`. Pull Request resolved: https://github.com/pytorch/audio/pull/2207 Reviewed By: mthrok Differential Revision: D34101205 Pulled By: hwangjeff fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
-
hwangjeff authored
Summary: Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2208 Reviewed By: mthrok Differential Revision: D34099793 Pulled By: hwangjeff fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
-
- 04 Feb, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177 Reviewed By: hwangjeff Differential Revision: D33893052 Pulled By: nateanl fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7
-
- 03 Feb, 2022 3 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2199 Reviewed By: hwangjeff Differential Revision: D33979923 Pulled By: nateanl fbshipit-source-id: 566ba1944dd3511fee740ac17fea2dcb0e5810fa
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2195 Reviewed By: hwangjeff Differential Revision: D33950179 Pulled By: nateanl fbshipit-source-id: 5fcfa4f433fffdcbb3b8e97f7c90fb8f723a30a2
-
moto authored
Summary: * tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html * tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2193 Reviewed By: hwangjeff Differential Revision: D33971312 Pulled By: mthrok fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f
-
- 02 Feb, 2022 5 commits
-
-
Caroline Chen authored
Summary: resulting tutorial: https://538358-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html - add visualization for timestep alignments - modify section organization for decoder construction Pull Request resolved: https://github.com/pytorch/audio/pull/2188 Reviewed By: mthrok Differential Revision: D33954937 Pulled By: carolineechen fbshipit-source-id: 8f397229d74c994b8793a30623e1de4c19ebd401
-
hwangjeff authored
Summary: Rather than apply SentencePiece's `decode` to directly convert each hypothesis's token id sequence to an output string, we convert each token id sequence to word pieces and then manually join the word pieces ourselves. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR. https://user-images.githubusercontent.com/8345689/152093668-11fb775a-bf7b-4b1d-9516-9f8d5a9b6683.mov Versus the previous behavior visualized in https://github.com/pytorch/audio/issues/2093, the scheme here properly constructs words comprising multiple pieces. Pull Request resolved: https://github.com/pytorch/audio/pull/2192 Reviewed By: mthrok Differential Revision: D33936622 Pulled By: hwangjeff fbshipit-source-id: e550980c7d4cac9e982315508f793a6b816752e9
-
moto authored
Summary: This PR adds the prototype streaming API. The implementation is based on ffmpeg libraries. For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2164 Reviewed By: hwangjeff Differential Revision: D33934457 Pulled By: mthrok fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe
-
Nikita Shulga authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2190 Reviewed By: mthrok Differential Revision: D33930129 Pulled By: malfet fbshipit-source-id: ddcbe79f6bdd3dc9b18c1dc337014142877b844b
-
Nikita Shulga authored
Summary: This fixes: ``` Installed flake8: + flake8 --version Traceback (most recent call last): File "/root/project/env/bin/flake8", line 6, in <module> from flake8.main.cli import main File "/root/project/env/lib/python3.7/site-packages/flake8/main/cli.py", line 6, in <module> from flake8.main import application File "/root/project/env/lib/python3.7/site-packages/flake8/main/application.py", line 24, in <module> from flake8.plugins import manager as plugin_manager File "/root/project/env/lib/python3.7/site-packages/flake8/plugins/manager.py", line 11, in <module> from flake8._compat import importlib_metadata File "/root/project/env/lib/python3.7/site-packages/flake8/_compat.py", line 7, in <module> import importlib_metadata ModuleNotFoundError: No module named 'importlib_metata' ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2191 Reviewed By: atalman Differential Revision: D33930583 Pulled By: malfet fbshipit-source-id: 68026743c29434113893cca38041596135d3bd53
-
- 01 Feb, 2022 6 commits
-
-
hwangjeff authored
Summary: Missed a couple of spots in https://github.com/pytorch/audio/issues/2187. Pull Request resolved: https://github.com/pytorch/audio/pull/2189 Reviewed By: carolineechen, nateanl, mthrok Differential Revision: D33926342 Pulled By: hwangjeff fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06
-
hwangjeff authored
Summary: Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2187 Reviewed By: nateanl, mthrok Differential Revision: D33918092 Pulled By: hwangjeff fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2185 Reviewed By: hwangjeff, mthrok Differential Revision: D33905767 Pulled By: carolineechen fbshipit-source-id: 964576ab3f4a12b91fa3960b2aa2337239356513
-
Caroline Chen authored
Summary: add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens Pull Request resolved: https://github.com/pytorch/audio/pull/2184 Reviewed By: mthrok Differential Revision: D33905530 Pulled By: carolineechen fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
-
Nikita Shulga authored
Summary: Also, retire cuda-10.2 Pull Request resolved: https://github.com/pytorch/audio/pull/2186 Reviewed By: mthrok Differential Revision: D33917595 Pulled By: malfet fbshipit-source-id: 060d3fa706279fe45ffd1f4f99e5727520612d56
-
hwangjeff authored
Summary: Adds script for generating global feature statistics along with new feature statistics json for LibriSpeech RNN-T training recipe. Pull Request resolved: https://github.com/pytorch/audio/pull/2183 Reviewed By: mthrok Differential Revision: D33902377 Pulled By: hwangjeff fbshipit-source-id: ec347a685ae67aefc485084aac6ed2efd653250f
-
- 31 Jan, 2022 1 commit
-
-
moto authored
Summary: Changing the URL of tutorial assets to `download.pytorch.org` which is more appropriate for user facing materials. Pull Request resolved: https://github.com/pytorch/audio/pull/2182 Reviewed By: nateanl Differential Revision: D33887839 Pulled By: mthrok fbshipit-source-id: 30569672e8caf30aae5476036dfdadc8ebd436bf
-
- 27 Jan, 2022 4 commits
-
-
hwangjeff authored
Summary: This PR removes logic in `RNNTBeamSearch` that blanks out joiner output values corresponding to special tokens, e.g. \<unk\>, \<eos\>, for the following reasons: - Provided that the model was configured and trained properly, it shouldn't be necessary, e.g. the model would naturally produce low probabilities for special tokens if they don't exist in the training set. - For our pre-trained LibriSpeech training pipeline, the removal of the logic doesn't affect evaluation WER on any of the dev/test splits. - The existing logic doesn't generalize to arbitrary token vocabularies. - Internally, it seems to have been acknowledged that this logic was introduced to compensate for quirks in other parts of the modeling infra. Pull Request resolved: https://github.com/pytorch/audio/pull/2180 Reviewed By: carolineechen, mthrok Differential Revision: D33822683 Pulled By: hwangjeff fbshipit-source-id: e7047e294f71c732c77ae0c20fec60412f26f05a
-
Caroline Chen authored
Summary: Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future) Pull Request resolved: https://github.com/pytorch/audio/pull/2174 Reviewed By: hwangjeff, nateanl Differential Revision: D33798674 Pulled By: carolineechen fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178 Reviewed By: mthrok Differential Revision: D33797649 Pulled By: nateanl fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/2164. To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available, this commit adds `is_ffmpeg_available`. The availability of the features depend on two factors; 1. If it was enabled at build. 2. If the ffmpeg libraries are found at runtime. A simple way (for OSS workflow) to detect these is simply checking if `libtorchaudio_ffmpeg` presents and can be loaded without a failure. To facilitate this, this commit changes the `torchaudio._extension._load_lib` to return boolean result. Pull Request resolved: https://github.com/pytorch/audio/pull/2170 Reviewed By: carolineechen Differential Revision: D33797695 Pulled By: mthrok fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
-
- 26 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: following up on https://github.com/pytorch/audio/pull/2141#discussion_r779055465, adding brief beam search description and linking to resources Pull Request resolved: https://github.com/pytorch/audio/pull/2173 Reviewed By: nateanl Differential Revision: D33791731 Pulled By: carolineechen fbshipit-source-id: 603fdd177c9a3c8276a4692fb7bb385bd01b9bfb
-