- 23 Feb, 2022 1 commit
-
-
Binh Tang authored
Summary: We proactively remove references to the deprecated DDP accelerator to prepare for the breaking changes following the release of PyTorch Lighting 1.6 (see T112240890). Differential Revision: D34295318 fbshipit-source-id: 7b2245ca9c7c2900f510722b33af8d8eeda49919
-
- 18 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: Noticed some items to clean up in `Emformer`. - Make `segment_length` a required argument in `_EmformerLayer`. - Remove unused variables from `_unpack_state` and `_gen_attention_mask`. These don't affect `Emformer`'s functionality or public API. Pull Request resolved: https://github.com/pytorch/audio/pull/2252 Reviewed By: carolineechen, mthrok Differential Revision: D34321430 Pulled By: hwangjeff fbshipit-source-id: 38a5046f633a3e625352c476ef71c78380ccc597
-
Caroline Chen authored
Summary: - fix retrieve PR script to handle commits with unrecognized/invalid PR numbers, such as in 7b6b2d00 - add modifications similar to pytorch's [#71917](https://github.com/pytorch/pytorch/pull/71917), [#72085](https://github.com/pytorch/pytorch/pull/72085) Pull Request resolved: https://github.com/pytorch/audio/pull/2249 Reviewed By: nateanl, mthrok Differential Revision: D34304210 Pulled By: carolineechen fbshipit-source-id: 245784219317e355b5cece4a139dee71d65bfdd1
-
- 17 Feb, 2022 4 commits
-
-
Zhaoheng Ni authored
Summary: In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests. This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2245 Reviewed By: mthrok Differential Revision: D34273035 Pulled By: nateanl fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2250 Reviewed By: mthrok Differential Revision: D34302192 Pulled By: nateanl fbshipit-source-id: 4ea7047503ef87e22b5ef6075ad010314d5e3885
-
Zhaoheng Ni authored
Summary: - Refactor the current `LibriSpeechRNNTModule`'s unit test. - Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule` - Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test. Pull Request resolved: https://github.com/pytorch/audio/pull/2240 Reviewed By: mthrok Differential Revision: D34285195 Pulled By: nateanl fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f
-
moto authored
Summary: https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html 1. Add figure to explain the caching 2. Fix the initialization of stream iterator Pull Request resolved: https://github.com/pytorch/audio/pull/2226 Reviewed By: carolineechen Differential Revision: D34265971 Pulled By: mthrok fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4
-
- 16 Feb, 2022 10 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``EMFORMER_RNNT_BASE_MUSTC`` support in `pipeline_demo.py`. The bundle is trained on MuST-C release 2.0 dataset. The model preserves the casing and punctuations in the transcript. Here is a screen recording of how it works in streaming and non-streaming modes: https://user-images.githubusercontent.com/8653221/154356521-fe84bdc1-fb0c-41bd-8729-9edbb3224a07.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2248 Reviewed By: hwangjeff Differential Revision: D34282598 Pulled By: nateanl fbshipit-source-id: 42ed7e2623031dfebd176ef0c6bfd70da3c897d4
-
Zhaoheng Ni authored
Summary: In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`. Pull Request resolved: https://github.com/pytorch/audio/pull/2246 Reviewed By: carolineechen Differential Revision: D34273057 Pulled By: nateanl fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b
-
Zhaoheng Ni authored
Summary: - Use dictionary to select the `RNNTBundle` and the corresponding dataset. - Use the dictionary's keys as choices in ArgumentParser Pull Request resolved: https://github.com/pytorch/audio/pull/2239 Reviewed By: mthrok Differential Revision: D34267070 Pulled By: nateanl fbshipit-source-id: 99c7942d5c7c1518694e1ae02a55a7decd87c220
-
Zhaoheng Ni authored
Summary: - Add docstring to `eval.py` and `pipeline_demo.py` under `emformer_rnnt` directory. - Refactor logger and ArgumentParser Pull Request resolved: https://github.com/pytorch/audio/pull/2238 Reviewed By: mthrok Differential Revision: D34267059 Pulled By: nateanl fbshipit-source-id: 4b8d3d183ee7bc0ad71ce305cab87bfa90208b2e
-
Zhaoheng Ni authored
Summary: In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/2244 Reviewed By: mthrok Differential Revision: D34272998 Pulled By: nateanl fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5
-
Caroline Chen authored
Summary: LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme Pull Request resolved: https://github.com/pytorch/audio/pull/2235 Reviewed By: nateanl Differential Revision: D34273042 Pulled By: carolineechen fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2237 Reviewed By: mthrok Differential Revision: D34267000 Pulled By: nateanl fbshipit-source-id: 4c264aea6cf3fba5d8728d5fe60f9f471815852d
-
Zhaoheng Ni authored
Summary: This PR provides a RNNTBundle that is pre-trained on the MuST-C release v2.0 dataset. The model preserves the casing and punctuations of the transcripts when training the SentencePiece model. Here is the model performance on the dev and test sets of MuST-C 2.0: | | WER | |:-----------------:|-------------:| | dev | 0.190 | | tst-COMMON | 0.213 | | tst-HE | 0.186 | Pull Request resolved: https://github.com/pytorch/audio/pull/2241 Reviewed By: mthrok Differential Revision: D34267792 Pulled By: nateanl fbshipit-source-id: 67bca9f277e66d41a4530d01615f249b3cec7167
-
Zhaoheng Ni authored
Summary: Replace underscore with dash in ArgumentParser's arguments. Pull Request resolved: https://github.com/pytorch/audio/pull/2236 Reviewed By: mthrok Differential Revision: D34266977 Pulled By: nateanl fbshipit-source-id: ceacac12c04016a8dbf2a1a7d6bbcf65d4d53d21
-
moto authored
Summary: This commit fixes the feature to exclude `torchaudio.prototype` module. In `setup.py` there is a special case that is triggered if the commit is on release branch or release tag, that excludes `torchaudio.prototype`. This was introduced to make it easy for release-related work. It turned out that the submodules under `torchaudio.prototype`, such as `torchaudio.prototype.pipelines`, are not properly excluded from packaging. These sub modules did not exist in previous releases, so it was not an issue. **Note** This feature is triggered only in release branch, so the fix is not visible in the CI of this PR. https://app.circleci.com/pipelines/github/pytorch/audio/9674/workflows/d0c9a6f1-8ca9-441a-a5f5-08926075fa39/jobs/553985?invite=true#step-104-193 The following outputs were observed when running it on local env. * Before the change ``` $ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel ``` ``` -- Git branch: prototype-exclusion -- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1 -- Git tag: None -- PyTorch dependency: torch -- Building version 0.11.0+0af1eda --- Initializing submodules --- Initialized submodule Excluding torchaudio.prototype from the package. ... creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io copying torchaudio/prototype/io/streamer.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io copying torchaudio/prototype/io/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/io creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines copying torchaudio/prototype/pipelines/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines copying torchaudio/prototype/pipelines/rnnt_pipeline.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/pipelines creating build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder copying torchaudio/prototype/ctc_decoder/ctc_decoder.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder copying torchaudio/prototype/ctc_decoder/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/prototype/ctc_decoder warning: build_py: byte-compiling is disabled, skipping. ``` * After the change ``` $ BUILD_FFMPEG=0 BUILD_SOX=0 BUILD_CTC_DECODER=0 BUILD_RNNT=0 BUILD_KALDI=0 python setup.py clean bdist_wheel ``` ``` -- Git branch: prototype-exclusion -- Git SHA: 0af1edaa420c46be10292cbea7150c34ef80a0e1 -- Git tag: None -- PyTorch dependency: torch -- Building version 0.11.0+0af1eda --- Initializing submodules --- Initialized submodule Excluding torchaudio.prototype from the package. ... creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 copying torchaudio/models/wav2vec2/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 copying torchaudio/models/wav2vec2/model.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 copying torchaudio/models/wav2vec2/components.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2 creating build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils copying torchaudio/models/wav2vec2/utils/__init__.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils copying torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils copying torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/lib.macosx-11.0-arm64-3.9/torchaudio/models/wav2vec2/utils warning: build_py: byte-compiling is disabled, skipping. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2225 Reviewed By: nateanl Differential Revision: D34257128 Pulled By: mthrok fbshipit-source-id: a3d6eca5803356e5aa3fe0eda82f6a9f5affb8e8
-
- 15 Feb, 2022 3 commits
-
-
moto authored
Summary: This commit fixes the issue with ffmpeg discovery at build time. The original implementation had issues like. 1. Wrong usage of FindFFMPEG, which caused mixture of ffmpeg libraries from system directory and user directory. 2. The optional `FFMPEG_ROOT` variable was not set within cmake. The issue 1 is problematic when a user does not have a permission to modify the environment. For example, an old version of ffmpeg, which is installed in a directory managed by the system (such as `/usr/local/lib`), then there is no way to specify a path in which user installs a supported version of ffmpeg. This commit changes the behavior by first searching the library in `FFMPEG_ROOT` environment variables, then resorting to the original behavior of searching the custom paths with system default path. Also this commirt removes support for `libavresample`, which is deprecated in ffmpeg 4 and removed in ffmpeg 5. Pull Request resolved: https://github.com/pytorch/audio/pull/2204 Reviewed By: carolineechen Differential Revision: D34225769 Pulled By: mthrok fbshipit-source-id: 95b0bfaaef31e2e69e6df29f789010f48a48210b
-
moto authored
Summary: Updating the context cacher so that fetched audio chunk is used for inference immediately. https://github.com/pytorch/audio/pull/2202#discussion_r802838174 Pull Request resolved: https://github.com/pytorch/audio/pull/2213 Reviewed By: hwangjeff Differential Revision: D34235230 Pulled By: mthrok fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e
-
hwangjeff authored
Summary: Orders and names Conformer's initializer args to be more consistent with Emformer's. Pull Request resolved: https://github.com/pytorch/audio/pull/2223 Reviewed By: mthrok Differential Revision: D34226177 Pulled By: hwangjeff fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829
-
- 11 Feb, 2022 7 commits
-
-
hwangjeff authored
Summary: Adds fixed random seed to Emformer RNN-T training recipe test. Pull Request resolved: https://github.com/pytorch/audio/pull/2220 Reviewed By: nateanl Differential Revision: D34180644 Pulled By: hwangjeff fbshipit-source-id: 2dc364f3f7cd666fa490514ae460538231c097e9
-
nateanl authored
Summary: - Add a MUSTC dataset under examples - Add a lightning module for MuST-C dataset - Refactor `train.py`, `eval.py`, and `global_stats.py` scripts Pull Request resolved: https://github.com/pytorch/audio/pull/2219 Reviewed By: hwangjeff Differential Revision: D34180466 Pulled By: nateanl fbshipit-source-id: 9fc74ce7527da1a81dd0738e124428f9d516d164
-
hwangjeff authored
Summary: Adds SentencePiece model training script for LibriSpeech Emformer RNN-T example recipe; updates readme with references. Pull Request resolved: https://github.com/pytorch/audio/pull/2218 Reviewed By: nateanl Differential Revision: D34177295 Pulled By: hwangjeff fbshipit-source-id: 9f32805af792fb8c6f834f2812e20104177a6c43
-
hwangjeff authored
Summary: Modifies `ConformerLayer` to pass `bias=True` (to be consistent with feedforward network defaults) and `dropout=dropout` (omission was a bug) to the convolution block. Pull Request resolved: https://github.com/pytorch/audio/pull/2215 Reviewed By: carolineechen, nateanl Differential Revision: D34164345 Pulled By: hwangjeff fbshipit-source-id: 59fc804a1fe3b96e69e9fa5a2f9de94194d7bc55
-
nateanl authored
Summary: We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming). We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR. https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2203 Reviewed By: carolineechen Differential Revision: D34006388 Pulled By: nateanl fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632
-
hwangjeff authored
Summary: Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows. Pull Request resolved: https://github.com/pytorch/audio/pull/2216 Reviewed By: nateanl Differential Revision: D34171480 Pulled By: hwangjeff fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e
-
hwangjeff authored
Summary: - Removes 100-batch truncation in TEDLIUM3 recipe. - Reinstates `train_spm.py` for TEDLIUM3. Pull Request resolved: https://github.com/pytorch/audio/pull/2217 Reviewed By: nateanl Differential Revision: D34171525 Pulled By: hwangjeff fbshipit-source-id: 54698e5e1b094c26c28eec9b8b1722223077876c
-
- 10 Feb, 2022 1 commit
-
-
hwangjeff authored
Summary: Consolidates LibriSpeech and TED-LIUM Release 3 Emformer RNN-T training recipes in a single directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2212 Reviewed By: mthrok Differential Revision: D34120104 Pulled By: hwangjeff fbshipit-source-id: 29c6e27195d5998f76d67c35b718110e73529456
-
- 09 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases. - Clarify expected input shapes in API documentation. - Adjust `infer` tests to reflect expected usage. - Add assertion for input shape for `infer`. Pull Request resolved: https://github.com/pytorch/audio/pull/2207 Reviewed By: mthrok Differential Revision: D34101205 Pulled By: hwangjeff fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
-
hwangjeff authored
Summary: Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2208 Reviewed By: mthrok Differential Revision: D34099793 Pulled By: hwangjeff fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
-
- 04 Feb, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177 Reviewed By: hwangjeff Differential Revision: D33893052 Pulled By: nateanl fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7
-
- 03 Feb, 2022 3 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2199 Reviewed By: hwangjeff Differential Revision: D33979923 Pulled By: nateanl fbshipit-source-id: 566ba1944dd3511fee740ac17fea2dcb0e5810fa
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2195 Reviewed By: hwangjeff Differential Revision: D33950179 Pulled By: nateanl fbshipit-source-id: 5fcfa4f433fffdcbb3b8e97f7c90fb8f723a30a2
-
moto authored
Summary: * tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html * tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2193 Reviewed By: hwangjeff Differential Revision: D33971312 Pulled By: mthrok fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f
-
- 02 Feb, 2022 5 commits
-
-
Caroline Chen authored
Summary: resulting tutorial: https://538358-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html - add visualization for timestep alignments - modify section organization for decoder construction Pull Request resolved: https://github.com/pytorch/audio/pull/2188 Reviewed By: mthrok Differential Revision: D33954937 Pulled By: carolineechen fbshipit-source-id: 8f397229d74c994b8793a30623e1de4c19ebd401
-
hwangjeff authored
Summary: Rather than apply SentencePiece's `decode` to directly convert each hypothesis's token id sequence to an output string, we convert each token id sequence to word pieces and then manually join the word pieces ourselves. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR. https://user-images.githubusercontent.com/8345689/152093668-11fb775a-bf7b-4b1d-9516-9f8d5a9b6683.mov Versus the previous behavior visualized in https://github.com/pytorch/audio/issues/2093, the scheme here properly constructs words comprising multiple pieces. Pull Request resolved: https://github.com/pytorch/audio/pull/2192 Reviewed By: mthrok Differential Revision: D33936622 Pulled By: hwangjeff fbshipit-source-id: e550980c7d4cac9e982315508f793a6b816752e9
-
moto authored
Summary: This PR adds the prototype streaming API. The implementation is based on ffmpeg libraries. For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2164 Reviewed By: hwangjeff Differential Revision: D33934457 Pulled By: mthrok fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe
-
Nikita Shulga authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2190 Reviewed By: mthrok Differential Revision: D33930129 Pulled By: malfet fbshipit-source-id: ddcbe79f6bdd3dc9b18c1dc337014142877b844b
-
Nikita Shulga authored
Summary: This fixes: ``` Installed flake8: + flake8 --version Traceback (most recent call last): File "/root/project/env/bin/flake8", line 6, in <module> from flake8.main.cli import main File "/root/project/env/lib/python3.7/site-packages/flake8/main/cli.py", line 6, in <module> from flake8.main import application File "/root/project/env/lib/python3.7/site-packages/flake8/main/application.py", line 24, in <module> from flake8.plugins import manager as plugin_manager File "/root/project/env/lib/python3.7/site-packages/flake8/plugins/manager.py", line 11, in <module> from flake8._compat import importlib_metadata File "/root/project/env/lib/python3.7/site-packages/flake8/_compat.py", line 7, in <module> import importlib_metadata ModuleNotFoundError: No module named 'importlib_metata' ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2191 Reviewed By: atalman Differential Revision: D33930583 Pulled By: malfet fbshipit-source-id: 68026743c29434113893cca38041596135d3bd53
-
- 01 Feb, 2022 1 commit
-
-
hwangjeff authored
Summary: Missed a couple of spots in https://github.com/pytorch/audio/issues/2187. Pull Request resolved: https://github.com/pytorch/audio/pull/2189 Reviewed By: carolineechen, nateanl, mthrok Differential Revision: D33926342 Pulled By: hwangjeff fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06
-