- 01 Feb, 2022 5 commits
-
-
hwangjeff authored
Summary: Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2187 Reviewed By: nateanl, mthrok Differential Revision: D33918092 Pulled By: hwangjeff fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2185 Reviewed By: hwangjeff, mthrok Differential Revision: D33905767 Pulled By: carolineechen fbshipit-source-id: 964576ab3f4a12b91fa3960b2aa2337239356513
-
Caroline Chen authored
Summary: add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens Pull Request resolved: https://github.com/pytorch/audio/pull/2184 Reviewed By: mthrok Differential Revision: D33905530 Pulled By: carolineechen fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
-
Nikita Shulga authored
Summary: Also, retire cuda-10.2 Pull Request resolved: https://github.com/pytorch/audio/pull/2186 Reviewed By: mthrok Differential Revision: D33917595 Pulled By: malfet fbshipit-source-id: 060d3fa706279fe45ffd1f4f99e5727520612d56
-
hwangjeff authored
Summary: Adds script for generating global feature statistics along with new feature statistics json for LibriSpeech RNN-T training recipe. Pull Request resolved: https://github.com/pytorch/audio/pull/2183 Reviewed By: mthrok Differential Revision: D33902377 Pulled By: hwangjeff fbshipit-source-id: ec347a685ae67aefc485084aac6ed2efd653250f
-
- 31 Jan, 2022 1 commit
-
-
moto authored
Summary: Changing the URL of tutorial assets to `download.pytorch.org` which is more appropriate for user facing materials. Pull Request resolved: https://github.com/pytorch/audio/pull/2182 Reviewed By: nateanl Differential Revision: D33887839 Pulled By: mthrok fbshipit-source-id: 30569672e8caf30aae5476036dfdadc8ebd436bf
-
- 27 Jan, 2022 4 commits
-
-
hwangjeff authored
Summary: This PR removes logic in `RNNTBeamSearch` that blanks out joiner output values corresponding to special tokens, e.g. \<unk\>, \<eos\>, for the following reasons: - Provided that the model was configured and trained properly, it shouldn't be necessary, e.g. the model would naturally produce low probabilities for special tokens if they don't exist in the training set. - For our pre-trained LibriSpeech training pipeline, the removal of the logic doesn't affect evaluation WER on any of the dev/test splits. - The existing logic doesn't generalize to arbitrary token vocabularies. - Internally, it seems to have been acknowledged that this logic was introduced to compensate for quirks in other parts of the modeling infra. Pull Request resolved: https://github.com/pytorch/audio/pull/2180 Reviewed By: carolineechen, mthrok Differential Revision: D33822683 Pulled By: hwangjeff fbshipit-source-id: e7047e294f71c732c77ae0c20fec60412f26f05a
-
Caroline Chen authored
Summary: Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future) Pull Request resolved: https://github.com/pytorch/audio/pull/2174 Reviewed By: hwangjeff, nateanl Differential Revision: D33798674 Pulled By: carolineechen fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178 Reviewed By: mthrok Differential Revision: D33797649 Pulled By: nateanl fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/2164. To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available, this commit adds `is_ffmpeg_available`. The availability of the features depend on two factors; 1. If it was enabled at build. 2. If the ffmpeg libraries are found at runtime. A simple way (for OSS workflow) to detect these is simply checking if `libtorchaudio_ffmpeg` presents and can be loaded without a failure. To facilitate this, this commit changes the `torchaudio._extension._load_lib` to return boolean result. Pull Request resolved: https://github.com/pytorch/audio/pull/2170 Reviewed By: carolineechen Differential Revision: D33797695 Pulled By: mthrok fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
-
- 26 Jan, 2022 6 commits
-
-
Caroline Chen authored
Summary: following up on https://github.com/pytorch/audio/pull/2141#discussion_r779055465, adding brief beam search description and linking to resources Pull Request resolved: https://github.com/pytorch/audio/pull/2173 Reviewed By: nateanl Differential Revision: D33791731 Pulled By: carolineechen fbshipit-source-id: 603fdd177c9a3c8276a4692fb7bb385bd01b9bfb
-
hwangjeff authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2179 Reviewed By: carolineechen, mthrok Differential Revision: D33797937 Pulled By: hwangjeff fbshipit-source-id: a14030d513cb1324d4742992eac0e88f1b358b65
-
hwangjeff authored
Summary: Currently, `mean` and `invstddev` exist as vanilla object attributes in the global stats normalization module that uses them. This, however, would preclude them from being moved to the same device that the module is moved to. To resolve this, this PR registers them as buffers. Pull Request resolved: https://github.com/pytorch/audio/pull/2175 Reviewed By: nateanl Differential Revision: D33794239 Pulled By: hwangjeff fbshipit-source-id: 78eb699ab5e0844f9436afc529b851e651f4f451
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2176 Reviewed By: carolineechen, mthrok Differential Revision: D33794216 Pulled By: nateanl fbshipit-source-id: e039c1fc03a89f1e8130a5c4dbc4beceff4081eb
-
hwangjeff authored
Summary: Adds integration test for pretrained ASR pipeline `EMFORMER_RNNT_BASE_LIBRISPEECH`. Pull Request resolved: https://github.com/pytorch/audio/pull/2172 Reviewed By: carolineechen, nateanl Differential Revision: D33793324 Pulled By: hwangjeff fbshipit-source-id: d0613e2ab98fe5afa7b16ca39b67f0a0304d13fc
-
hwangjeff authored
Summary: To facilitate experimenting with different strategies, this PR removes the existing subsampling and positional embedding logic from `Conformer`. Pull Request resolved: https://github.com/pytorch/audio/pull/2171 Reviewed By: nateanl Differential Revision: D33793338 Pulled By: hwangjeff fbshipit-source-id: 9f97614b09964a101a891b9c840b61a26fc1541f
-
- 24 Jan, 2022 1 commit
-
-
popcornell authored
Summary: it seems to me that the current Tacotron2 model does not allow for decoding batch size 1 examples: e.g. following code fails. I may have a fix for that. ```python if __name__ == "__main__": max_length = 400 n_batch = 1 hdim = 32 dec = _Decoder( encoder_embedding_dim=hdim, n_mels = hdim, n_frames_per_step = 1, decoder_rnn_dim = 1024, decoder_max_step = 2000, decoder_dropout = 0.1, decoder_early_stopping = True, attention_rnn_dim = 1024, attention_hidden_dim = 128, attention_location_n_filter = 32, attention_location_kernel_size = 31, attention_dropout = 0.1, prenet_dim = 256, gate_threshold = 0.5) inp = torch.rand((n_batch, max_length, hdim)) lengths = torch.tensor([max_length]).expand(n_batch).to(inp.device, inp.dtype) dec(inp, torch.rand((n_batch, hdim, max_length)), lengths)[0] dec.infer(inp, lengths)[0] ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2156 Reviewed By: carolineechen Differential Revision: D33744006 Pulled By: nateanl fbshipit-source-id: 7d04726dfe7e45951ab0007f22f10f90f26379a7
-
- 22 Jan, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Rename `BucketizeSampler` to `BucketizeBatchSampler` - Fix bugs in `BucketizeBatchSampler` - Adjust HuBERTDataset based on the latest `BucketizeBatchSampler`. Pull Request resolved: https://github.com/pytorch/audio/pull/2150 Reviewed By: mthrok Differential Revision: D33689963 Pulled By: nateanl fbshipit-source-id: 203764e9af5b7577ba08ebaa30ba5da3b67fb7e7
-
- 21 Jan, 2022 3 commits
-
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/2164. Removes debug code and associated arguments/fields left in previous PRs. Pull Request resolved: https://github.com/pytorch/audio/pull/2168 Reviewed By: hwangjeff Differential Revision: D33712999 Pulled By: mthrok fbshipit-source-id: 0729e9fbc146c48887379b6231e4d6e8cb520c44
-
moto authored
Summary: Split from https://github.com/pytorch/audio/issues/2164 Add new test assets. Adding this commit separately so that this commit message about the origin of the file is easier to find. The original video is in public domain par - https://svs.gsfc.nasa.gov/13013 - https://www.nasa.gov/multimedia/guidelines/index.html (The YouTube page directly says so) - https://www.youtube.com/watch?v=6zNsc0e3Zns So, the video is modified to fit the needs for testing. 1. multiple audio/video streams 2. Non-audio/video (subtitle) streams 3. Different FPS and sampling rate 4. Ones without audio and video. ``` #!/usr/bin/env bash original=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 subtitle=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-SRT-CC.en_US.srt # Fetch the original video, embed the subtitle ffmpeg -i "${original}" -i "${subtitle}" -c:v copy -c:a copy -c:s mov_text -metadata:s:2 language=eng original.mp4 -y # Extract, rescale video and resample audio ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=480:270 -af aresample=16000 tmp1.mp4 -y ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=320:180 -r 25 -af aresample=8000 tmp2.mp4 -y # Merge them, retaining all the streams (6 in total) ffmpeg -i tmp2.mp4 -i tmp1.mp4 -map 0 -map 1 -c:s copy nasa_13013.mp4 -y # Make versions without audio / video ffmpeg -i tmp2.mp4 -c copy -vn nasa_13013_no_video.mp4 -y ffmpeg -i tmp2.mp4 -c copy -an nasa_13013_no_video.mp4 -y ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2167 Reviewed By: carolineechen Differential Revision: D33712954 Pulled By: mthrok fbshipit-source-id: b7cfc1358043a4abd1c0b416e8a8fb0039867211
-
Nikita Shulga authored
Summary: s/gpu.small/gpu.nvidia.medium/ Pull Request resolved: https://github.com/pytorch/audio/pull/1791 Reviewed By: mthrok Differential Revision: D33697984 Pulled By: malfet fbshipit-source-id: 0aacad6d4badf023753fa874c8b80c7f65170d0d
-
- 20 Jan, 2022 2 commits
-
-
yonMaor authored
Summary: Closes https://github.com/pytorch/audio/issues/2162 Pull Request resolved: https://github.com/pytorch/audio/pull/2163 Reviewed By: nateanl Differential Revision: D33666354 Pulled By: mthrok fbshipit-source-id: 3e7a963b9ac85046317df8d5dab91af363e5668b
-
Nikita Shulga authored
Summary: Find out that tests are failing after change for tester GPU class, see https://github.com/pytorch/audio/pull/1791 Pull Request resolved: https://github.com/pytorch/audio/pull/2165 Reviewed By: mthrok Differential Revision: D33674802 Pulled By: malfet fbshipit-source-id: 2e39386c0f129cf44a30d5dfea67e9e2d0e875cf
-
- 19 Jan, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: According to [the dataset discription](https://paperswithcode.com/dataset/ted-lium-3), the ``dev`` and ``test`` subsets of TEDLIUM v3 dataset are the same as v2. (under ``TEDLIUM_release-3/legacy`` directory). The ``train`` subset is under ``TEDLIUM_release-3/data`` directory. This PR adds subset support for it. This also aligns with [TensorFlow's tedlium/release3](https://www.tensorflow.org/datasets/catalog/tedlium#tedliumrelease3) dataset. Pull Request resolved: https://github.com/pytorch/audio/pull/2157 Reviewed By: mthrok Differential Revision: D33585211 Pulled By: nateanl fbshipit-source-id: 87cfe0d02b3a4c2cf7e2da0ccb7443fff5c43689
-
Caroline Chen authored
Summary: update the labeling reminder to be triggered when PRs are closed rather than merged, because of the transition of merging through fbcode Pull Request resolved: https://github.com/pytorch/audio/pull/2160 Reviewed By: nateanl Differential Revision: D33642490 Pulled By: carolineechen fbshipit-source-id: bb39c66653782694d967303065d40386689789a8
-
- 18 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: additionally add decoding results for wav2vec2 large and also on the test-clean dataset Pull Request resolved: https://github.com/pytorch/audio/pull/2161 Reviewed By: mthrok Differential Revision: D33644670 Pulled By: carolineechen fbshipit-source-id: a219a15af46f82a6bd90169bb3001dbad8f0a96e
-
- 14 Jan, 2022 2 commits
-
-
moto authored
Summary: Currently, the doc build job uses `pytorch/manylinux-cuda100` as base environment image. This PR changes that with `python:3.X`. The problem with the previous one is - The image is unnecessarily huge with tools not needed for build doc. (+3GB) - No easy way to install ffmpeg>=4.1. https://518849-90321822-gh.circle-artifacts.com/0/docs/index.html Pull Request resolved: https://github.com/pytorch/audio/pull/2151 Reviewed By: carolineechen Differential Revision: D33585043 Pulled By: mthrok fbshipit-source-id: d6d2f6ab33511b8f5c7ca358bc6545e253c1b752
-
moto authored
Summary: - Change the version of nightly build to `Nightly Build (VERSION)`. - Use `BUILD_VERSION` env var for release. - Automatically change copyright year. - Update the link to nightly in README so that the main branch directs to the corresponding document. Because of the way CI job is setup, the resulting documentation says 0.8.0. This is fixed by https://github.com/pytorch/audio/issues/2151. Pull Request resolved: https://github.com/pytorch/audio/pull/2152 Reviewed By: carolineechen, nateanl Differential Revision: D33585053 Pulled By: mthrok fbshipit-source-id: 3c2bf9fc3214c89f989f5ac65b74bc1e276a7161
-
- 08 Jan, 2022 1 commit
-
-
Binh Tang authored
[PyTorchLightning/pytorch-lightning] Add deprecation path for renamed training type plugins (#11227) Summary: ### New commit log messages 4eede7c30 Add deprecation path for renamed training type plugins (#11227) Reviewed By: edward-io, daniellepintz Differential Revision: D33409991 fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0
-
- 07 Jan, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add explanation and demonstration of different beam search decoder parameters. Additionally use a better sample audio file and load in with token list instead of tokens file. Pull Request resolved: https://github.com/pytorch/audio/pull/2141 Reviewed By: mthrok Differential Revision: D33463230 Pulled By: carolineechen fbshipit-source-id: d3dd6452b03d4fc2e095d778189c66f7161e4c68
-
moto authored
Summary: This commit enables ffmpeg-feature build in tests and binary builds of all platforms. (Linux/macOS/Windows x conda/wheel) It also moves the definition of BUILD_FFMPEG env vars to the top level `config.yml`. --- Manual checking if all the build log contains `libtorchaudio_ffmpeg`. ### binary build - [x] `binary_linux_conda_py3.7_cpu` - [x] `binary_linux_conda_py3.7_cu102` - [x] `binary_linux_wheel_py3.7_cpu` - [x] `binary_linux_wheel_py3.7_cu102` - [x] `binary_macos_conda_py3.7_cpu` - [x] `binary_macos_wheel_py3.7_cpu` - [x] `binary_windows_conda_py3.7_cpu` - [x] `binary_windows_conda_py3.7_cu113` - [x] `binary_windows_wheel_py3.7_cpu` - [x] `binary_windows_wheel_py3.7_cu113` ### test - [x] `unittest_linux_cpu_py3.7` - [x] `unittest_linux_gpu_py3.7` - [x] `unittest_macos_cpu_py3.7` - [x] `unittest_windows_cpu_py3.7` - [x] `unittest_windows_gpu_py3.7` - [x] `integration test` Pull Request resolved: https://github.com/pytorch/audio/pull/2140 Reviewed By: hwangjeff Differential Revision: D33464430 Pulled By: mthrok fbshipit-source-id: 2c5b72be75d49019bf1599036180d4e56074e46b
-
- 06 Jan, 2022 5 commits
-
-
moto authored
Summary: - Unindent RNNTBundle components so that they show up on the right side bar - Overwrite the sigunature of RNNTBundle methods so that back links are available --- ## Before <img width="1440" alt="Screen Shot 2022-01-06 at 1 36 16 PM" src="https://user-images.githubusercontent.com/855818/148433552-9ba3051d-38b1-4825-9a8f-9173b23650ea.png"> ## After <img width="1436" alt="Screen Shot 2022-01-06 at 1 35 39 PM" src="https://user-images.githubusercontent.com/855818/148433525-733d138d-9a8b-43d6-bdf5-444b52d6a7a9.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2148 Reviewed By: hwangjeff Differential Revision: D33458574 Pulled By: mthrok fbshipit-source-id: ac34ffc4070261563a1f4ea9337997f0fe7b2212
-
Elijah Rippeth authored
Summary: This PR: - Replaces the `data_source` with `lengths` - Adds a `shuffle` argument to decide whether to shuffle the samples in the buckets. - Add `max_len` and `min_len` to filter out samples that are > max_len or < min_len. cc nateanl Pull Request resolved: https://github.com/pytorch/audio/pull/2147 Reviewed By: carolineechen Differential Revision: D33454369 Pulled By: nateanl fbshipit-source-id: 3835169ec7f808f8dd9650e7f183f79091efe886
-
moto authored
Summary: This commits supress the stderr output when shell commands are executed `setup.py`. `setup.py` performs multiple git commands to gather metadata. When `git tag` command tries to find a tag, (which fails unless on release) it produces `fatal: No names found, cannot describe anything.` This is confusing especially when it is executed from higher level, like `pip install git+https://...`. Pull Request resolved: https://github.com/pytorch/audio/pull/2133 Reviewed By: nateanl Differential Revision: D33455339 Pulled By: mthrok fbshipit-source-id: 3e24451eb6fedcd0ad90f7e16e38fcdb70dc9704
-
Werner Chao authored
Summary: Drop support for python 3.6, and update dependencies documentation. More details [Issue 2051](https://github.com/pytorch/audio/issues/2051). Pull Request resolved: https://github.com/pytorch/audio/pull/2139 Reviewed By: mthrok Differential Revision: D33454583 Pulled By: wernerchao fbshipit-source-id: 64eccb38e26853ba63f72fb92723e3f0155e806e
-
Binh Tang authored
Summary: ### New commit log messages b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142) Reviewed By: jjenniferdai Differential Revision: D33259306 fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0
-
- 05 Jan, 2022 4 commits
-
-
moto authored
Summary: Update the internal of `skipIfXXX` decorators so that tests in CI will not be automatically skipped. Currently we automatically skip some tests based on the availability of related features/test tools. This causes issues where we miss signals on certain important features. (CUDA on Windows) https://github.com/pytorch/audio/issues/1565 The new `skipIf` decorator will fail if in CI unless it is explicitly allowed to skip tests. It does so by checking `CI` and `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` environment variables. For non-CI environments, the behavior is same as before, but users can now set `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX=false` to disallow the automatic skip. Results without `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` https://app.circleci.com/pipelines/github/pytorch/audio/9112/workflows/4e6db046-a1a2-4965-b0fe-d5baf4a1efac Pull Request resolved: https://github.com/pytorch/audio/pull/2127 Reviewed By: hwangjeff Differential Revision: D33430711 Pulled By: mthrok fbshipit-source-id: d8954dd720469c5ab0f34ea062fd8cf04a8afa3e
-
Caroline Chen authored
Summary: remove unnecessary RNNT Loss variables and comment as indicated in https://github.com/pytorch/audio/issues/1479 review comments (will follow up on `workspace` comments separately depending on complexity) Pull Request resolved: https://github.com/pytorch/audio/pull/2142 Reviewed By: mthrok Differential Revision: D33433764 Pulled By: carolineechen fbshipit-source-id: be0ecb77dabd63d733f0d33ff258eae32305eeaf
-
moto authored
Summary: This change adds a minimal ffmpeg installation step to the build wheel job so that later, we can use the resulting ffmpeg libraries for building torchaudio's ffmpeg-features. The linux wheel build jobs run in CentOS 8 based environment, which does not provide an easy way to install ffmpeg without conda. After https://github.com/pytorch/audio/pull/2124 is merged, then we can enable the ffmpeg-feature build in Linux wheel. Pull Request resolved: https://github.com/pytorch/audio/pull/2137 Reviewed By: carolineechen Differential Revision: D33430032 Pulled By: mthrok fbshipit-source-id: bf946d394c0718ddbdc679d7970befc3221982b9
-
moto authored
Summary: Update ffmpeg discovery logic Previously the build process used pkg-config to locate an installation of ffmpeg, which does not work well Windows/CentOS. This commit update the discovery process to use the custom FindFFMPEG.cmake adopted from Kitware/VTK repository with addition of conda environment. The custom discovery logic can support Windows and CentOS. Pull Request resolved: https://github.com/pytorch/audio/pull/2124 Reviewed By: carolineechen Differential Revision: D33429564 Pulled By: mthrok fbshipit-source-id: 6cb50c1d8c58f51e0f3f3af5c5b541aa3a699bba
-