- 01 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2187 Reviewed By: nateanl, mthrok Differential Revision: D33918092 Pulled By: hwangjeff fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
-
Caroline Chen authored
Summary: add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens Pull Request resolved: https://github.com/pytorch/audio/pull/2184 Reviewed By: mthrok Differential Revision: D33905530 Pulled By: carolineechen fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
-
- 27 Jan, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future) Pull Request resolved: https://github.com/pytorch/audio/pull/2174 Reviewed By: hwangjeff, nateanl Differential Revision: D33798674 Pulled By: carolineechen fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/2164. To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available, this commit adds `is_ffmpeg_available`. The availability of the features depend on two factors; 1. If it was enabled at build. 2. If the ffmpeg libraries are found at runtime. A simple way (for OSS workflow) to detect these is simply checking if `libtorchaudio_ffmpeg` presents and can be loaded without a failure. To facilitate this, this commit changes the `torchaudio._extension._load_lib` to return boolean result. Pull Request resolved: https://github.com/pytorch/audio/pull/2170 Reviewed By: carolineechen Differential Revision: D33797695 Pulled By: mthrok fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
-
- 26 Jan, 2022 3 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2176 Reviewed By: carolineechen, mthrok Differential Revision: D33794216 Pulled By: nateanl fbshipit-source-id: e039c1fc03a89f1e8130a5c4dbc4beceff4081eb
-
hwangjeff authored
Summary: Adds integration test for pretrained ASR pipeline `EMFORMER_RNNT_BASE_LIBRISPEECH`. Pull Request resolved: https://github.com/pytorch/audio/pull/2172 Reviewed By: carolineechen, nateanl Differential Revision: D33793324 Pulled By: hwangjeff fbshipit-source-id: d0613e2ab98fe5afa7b16ca39b67f0a0304d13fc
-
hwangjeff authored
Summary: To facilitate experimenting with different strategies, this PR removes the existing subsampling and positional embedding logic from `Conformer`. Pull Request resolved: https://github.com/pytorch/audio/pull/2171 Reviewed By: nateanl Differential Revision: D33793338 Pulled By: hwangjeff fbshipit-source-id: 9f97614b09964a101a891b9c840b61a26fc1541f
-
- 21 Jan, 2022 1 commit
-
-
moto authored
Summary: Split from https://github.com/pytorch/audio/issues/2164 Add new test assets. Adding this commit separately so that this commit message about the origin of the file is easier to find. The original video is in public domain par - https://svs.gsfc.nasa.gov/13013 - https://www.nasa.gov/multimedia/guidelines/index.html (The YouTube page directly says so) - https://www.youtube.com/watch?v=6zNsc0e3Zns So, the video is modified to fit the needs for testing. 1. multiple audio/video streams 2. Non-audio/video (subtitle) streams 3. Different FPS and sampling rate 4. Ones without audio and video. ``` #!/usr/bin/env bash original=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 subtitle=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-SRT-CC.en_US.srt # Fetch the original video, embed the subtitle ffmpeg -i "${original}" -i "${subtitle}" -c:v copy -c:a copy -c:s mov_text -metadata:s:2 language=eng original.mp4 -y # Extract, rescale video and resample audio ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=480:270 -af aresample=16000 tmp1.mp4 -y ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=320:180 -r 25 -af aresample=8000 tmp2.mp4 -y # Merge them, retaining all the streams (6 in total) ffmpeg -i tmp2.mp4 -i tmp1.mp4 -map 0 -map 1 -c:s copy nasa_13013.mp4 -y # Make versions without audio / video ffmpeg -i tmp2.mp4 -c copy -vn nasa_13013_no_video.mp4 -y ffmpeg -i tmp2.mp4 -c copy -an nasa_13013_no_video.mp4 -y ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2167 Reviewed By: carolineechen Differential Revision: D33712954 Pulled By: mthrok fbshipit-source-id: b7cfc1358043a4abd1c0b416e8a8fb0039867211
-
- 20 Jan, 2022 1 commit
-
-
Nikita Shulga authored
Summary: Find out that tests are failing after change for tester GPU class, see https://github.com/pytorch/audio/pull/1791 Pull Request resolved: https://github.com/pytorch/audio/pull/2165 Reviewed By: mthrok Differential Revision: D33674802 Pulled By: malfet fbshipit-source-id: 2e39386c0f129cf44a30d5dfea67e9e2d0e875cf
-
- 05 Jan, 2022 1 commit
-
-
moto authored
Summary: Update the internal of `skipIfXXX` decorators so that tests in CI will not be automatically skipped. Currently we automatically skip some tests based on the availability of related features/test tools. This causes issues where we miss signals on certain important features. (CUDA on Windows) https://github.com/pytorch/audio/issues/1565 The new `skipIf` decorator will fail if in CI unless it is explicitly allowed to skip tests. It does so by checking `CI` and `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` environment variables. For non-CI environments, the behavior is same as before, but users can now set `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX=false` to disallow the automatic skip. Results without `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` https://app.circleci.com/pipelines/github/pytorch/audio/9112/workflows/4e6db046-a1a2-4965-b0fe-d5baf4a1efac Pull Request resolved: https://github.com/pytorch/audio/pull/2127 Reviewed By: hwangjeff Differential Revision: D33430711 Pulled By: mthrok fbshipit-source-id: d8954dd720469c5ab0f34ea062fd8cf04a8afa3e
-
- 30 Dec, 2021 2 commits
-
-
Joao Gomes authored
Summary: cc mthrok Pull Request resolved: https://github.com/pytorch/audio/pull/2116 Reviewed By: mthrok Differential Revision: D33368453 Pulled By: jdsgomes fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c
-
hwangjeff authored
Summary: * Removes redundant declaration `right_context_blocks = []`, as flagged by kobenaxie. * Adds random seed to tests, as flagged by carolineechen in other PRs. Pull Request resolved: https://github.com/pytorch/audio/pull/2091 Reviewed By: mthrok Differential Revision: D33340964 Pulled By: hwangjeff fbshipit-source-id: a9de43e28d1bae7bd4806b280717b0d822bb42fc
-
- 29 Dec, 2021 3 commits
-
-
hwangjeff authored
Summary: Adds parameter `p` to `TimeMasking` to allow for enforcing an upper bound on the proportion of time steps that it can mask. This behavior is consistent with the specifications provided in the SpecAugment paper (https://arxiv.org/abs/1904.08779). Pull Request resolved: https://github.com/pytorch/audio/pull/2090 Reviewed By: carolineechen Differential Revision: D33344772 Pulled By: hwangjeff fbshipit-source-id: 6ff65f5304e489fa1c23e15c3d96b9946229fdcf
-
Caroline Chen authored
Summary: Additionally accept list of tokens as CTC decoder input. This makes it possible to directly pass in something like `bundles.get_labels()` into the decoder factory function instead of requiring a separate tokens file. Pull Request resolved: https://github.com/pytorch/audio/pull/2112 Reviewed By: hwangjeff, nateanl, mthrok Differential Revision: D33352909 Pulled By: carolineechen fbshipit-source-id: 6d22072e34f6cd7c6f931ce4eaf294ae4cf0c5cc
-
hwangjeff authored
Summary: Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`. Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html Pull Request resolved: https://github.com/pytorch/audio/pull/2110 Reviewed By: carolineechen, mthrok Differential Revision: D33354116 Pulled By: hwangjeff fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb
-
- 23 Dec, 2021 3 commits
-
-
Caroline Chen authored
Summary: Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review This PR adds Python decoder API and basic README Pull Request resolved: https://github.com/pytorch/audio/pull/2089 Reviewed By: mthrok Differential Revision: D33299818 Pulled By: carolineechen fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
hwangjeff authored
Summary: Adds implementation of Conformer module. Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770. Pull Request resolved: https://github.com/pytorch/audio/pull/2068 Reviewed By: mthrok Differential Revision: D33236957 Pulled By: hwangjeff fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6
-
- 21 Dec, 2021 1 commit
-
-
moto authored
Summary: ## bug description When a 24 bits-par-sample audio is loaded via file-like object, the loaded Tensor is wrong. It was fine if the audio is loaded from local file. ## The cause of the bug The core of the sox's decoding mechanism is `sox_read` function, one of which parameter is the maximum number of samples to decode from the given buffer. https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)] The `sox_read` function is called in what is called `drain` effect, callback and this callback receives output buffer and its size in byte. The previous implementation passed this size value as the argument of `sox_read` for the maximum number of samples to read. Since buffer size is larger than the number of samples fit in the buffer, `sox_read` function always consumed the entire buffer. (This behavior is not wrong except when the input is 24 bits-per-sample and file-like object.) When the input is read from file-like object, inside of drain callback, new data are fetched via Python's `read` method and loaded on fixed-size memory region. The size of this memory region can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`, but the default value is 8096. If the input format is 24 bits-per-sample, the end of memory region does not necessarily correspond to the end of a valid sample. When `sox_read` consumes all the data in the buffer region, the data at the end introduces some unexpected values. This causes the aforementioned bug ## Fix Pass proper (better estimated) maximum number of samples decodable to `sox_read`. Pull Request resolved: https://github.com/pytorch/audio/pull/2084 Reviewed By: carolineechen Differential Revision: D33236947 Pulled By: mthrok fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
-
- 30 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time. For one of the four tests: Before: ``` test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%] ======================== 1 passed in 517.35s (0:08:37) ========================= ``` After: ``` test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%] ======================== 1 passed in 104.59s (0:01:44) ========================= ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2037 Reviewed By: mthrok Differential Revision: D32726213 Pulled By: hwangjeff fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e
-
- 24 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference. Pull Request resolved: https://github.com/pytorch/audio/pull/2028 Reviewed By: mthrok Differential Revision: D32627919 Pulled By: hwangjeff fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
-
- 23 Nov, 2021 1 commit
-
-
moto authored
Summary: The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test. Pull Request resolved: https://github.com/pytorch/audio/pull/2025 Reviewed By: nateanl Differential Revision: D32615933 Pulled By: mthrok fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f
-
- 22 Nov, 2021 2 commits
-
-
Zhaoheng Ni authored
Summary: Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram. Pull Request resolved: https://github.com/pytorch/audio/pull/2024 Reviewed By: carolineechen Differential Revision: D32594051 Pulled By: nateanl fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46
-
Zhaoheng Ni authored
Summary: Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check. Pull Request resolved: https://github.com/pytorch/audio/pull/2004 Reviewed By: mthrok Differential Revision: D32539827 Pulled By: nateanl fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22
-
- 18 Nov, 2021 2 commits
-
-
hwangjeff authored
Summary: Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model. Pull Request resolved: https://github.com/pytorch/audio/pull/2003 Reviewed By: nateanl Differential Revision: D32440879 Pulled By: hwangjeff fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360
-
Facebook Community Bot authored
Co-authored-by:Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
-
- 17 Nov, 2021 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2015 as titled Reviewed By: hwangjeff, mthrok Differential Revision: D32495691 fbshipit-source-id: 60d8a2337585e3147f24ca9f0b6518e30cd9134a
-
- 04 Nov, 2021 2 commits
-
-
Caroline Chen authored
-
moto authored
This commit changes all the `torch.hub` network utility functions to be imported from `torchaudio._internal`, so that later we can replace the function within fbcode.
-
- 03 Nov, 2021 3 commits
-
-
moto authored
Following the plan #1337, this commit drops the support for pseudo complex type from `F.phase_vocoder` and `T.TimeStretch`.
-
moto authored
Following the plan #1337, this commit drops the support for pseudo complex type from `F.spectrogram` and `T.Spectrogram`. It also deprecates the use of `return_complex` argument.
-
moto authored
-
- 02 Nov, 2021 3 commits
- 28 Oct, 2021 1 commit
-
-
S Harish authored
-
- 27 Oct, 2021 1 commit
-
-
moto authored
-
- 25 Oct, 2021 1 commit
-
-
moto authored
-
- 22 Oct, 2021 1 commit
-
-
moto authored
- Make the test support other languages - Fetch tetst asset on-the-fly
-
- 21 Oct, 2021 1 commit
-
-
moto authored
* [BC-breaking] Remove unused dimension from pretrained Wav2Vec2 ASR The Wav2Vec2 ASR pretrained weights originated from fairseq have extra dimension that have nothing to do with the ASR task. https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/data/dictionary.py#L18-L37 which is masked during the loss computation as https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/criterions/ctc.py#L126-L128 This change removes it. * Use '-' for blank token representation.
-