- 25 Feb, 2022 4 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds `rtf_evd` method to `torchaudio.functional`. The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. The input argument is the power spectral density (PSD) matrix of the target speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2230 Reviewed By: mthrok Differential Revision: D34474188 Pulled By: nateanl fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference. The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2229 Reviewed By: mthrok Differential Revision: D34474119 Pulled By: nateanl fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2228 Reviewed By: mthrok Differential Revision: D34474018 Pulled By: nateanl fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb
-
Zhaoheng Ni authored
Summary: This PR adds ``psd`` method to ``torchaudio.functional``. It computes the power spectral density (PSD) matrix of the complex-valued spectrum. The method also supports normalization of Time-Frequency mask. Pull Request resolved: https://github.com/pytorch/audio/pull/2227 Reviewed By: mthrok Differential Revision: D34473908 Pulled By: nateanl fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9
-
- 24 Feb, 2022 1 commit
-
-
Andrey Talman authored
Summary: Adding py3.10 to audio Pull Request resolved: https://github.com/pytorch/audio/pull/2224 Reviewed By: malfet, atalman, mthrok Differential Revision: D34442377 Pulled By: seemethere fbshipit-source-id: 2656de73427063958d609a74c01b526a476cb06a
-
- 17 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests. This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2245 Reviewed By: mthrok Differential Revision: D34273035 Pulled By: nateanl fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056
-
Zhaoheng Ni authored
Summary: - Refactor the current `LibriSpeechRNNTModule`'s unit test. - Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule` - Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test. Pull Request resolved: https://github.com/pytorch/audio/pull/2240 Reviewed By: mthrok Differential Revision: D34285195 Pulled By: nateanl fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f
-
- 16 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`. Pull Request resolved: https://github.com/pytorch/audio/pull/2246 Reviewed By: carolineechen Differential Revision: D34273057 Pulled By: nateanl fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b
-
Zhaoheng Ni authored
Summary: In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/2244 Reviewed By: mthrok Differential Revision: D34272998 Pulled By: nateanl fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5
-
- 15 Feb, 2022 1 commit
-
-
hwangjeff authored
Summary: Orders and names Conformer's initializer args to be more consistent with Emformer's. Pull Request resolved: https://github.com/pytorch/audio/pull/2223 Reviewed By: mthrok Differential Revision: D34226177 Pulled By: hwangjeff fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829
-
- 11 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: Adds fixed random seed to Emformer RNN-T training recipe test. Pull Request resolved: https://github.com/pytorch/audio/pull/2220 Reviewed By: nateanl Differential Revision: D34180644 Pulled By: hwangjeff fbshipit-source-id: 2dc364f3f7cd666fa490514ae460538231c097e9
-
hwangjeff authored
Summary: Adds unit tests for Emformer RNN-T LibriSpeech recipe. Also makes changes to recipe to resolve errors with pickling lambda functions in Windows. Pull Request resolved: https://github.com/pytorch/audio/pull/2216 Reviewed By: nateanl Differential Revision: D34171480 Pulled By: hwangjeff fbshipit-source-id: 5fcebb457051f3041766324863728411180f5e1e
-
- 09 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases. - Clarify expected input shapes in API documentation. - Adjust `infer` tests to reflect expected usage. - Add assertion for input shape for `infer`. Pull Request resolved: https://github.com/pytorch/audio/pull/2207 Reviewed By: mthrok Differential Revision: D34101205 Pulled By: hwangjeff fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
-
hwangjeff authored
Summary: Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2208 Reviewed By: mthrok Differential Revision: D34099793 Pulled By: hwangjeff fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
-
- 02 Feb, 2022 1 commit
-
-
moto authored
Summary: This PR adds the prototype streaming API. The implementation is based on ffmpeg libraries. For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2164 Reviewed By: hwangjeff Differential Revision: D33934457 Pulled By: mthrok fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe
-
- 01 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2187 Reviewed By: nateanl, mthrok Differential Revision: D33918092 Pulled By: hwangjeff fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
-
Caroline Chen authored
Summary: add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens Pull Request resolved: https://github.com/pytorch/audio/pull/2184 Reviewed By: mthrok Differential Revision: D33905530 Pulled By: carolineechen fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
-
- 27 Jan, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future) Pull Request resolved: https://github.com/pytorch/audio/pull/2174 Reviewed By: hwangjeff, nateanl Differential Revision: D33798674 Pulled By: carolineechen fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/2164. To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available, this commit adds `is_ffmpeg_available`. The availability of the features depend on two factors; 1. If it was enabled at build. 2. If the ffmpeg libraries are found at runtime. A simple way (for OSS workflow) to detect these is simply checking if `libtorchaudio_ffmpeg` presents and can be loaded without a failure. To facilitate this, this commit changes the `torchaudio._extension._load_lib` to return boolean result. Pull Request resolved: https://github.com/pytorch/audio/pull/2170 Reviewed By: carolineechen Differential Revision: D33797695 Pulled By: mthrok fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
-
- 26 Jan, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2176 Reviewed By: carolineechen, mthrok Differential Revision: D33794216 Pulled By: nateanl fbshipit-source-id: e039c1fc03a89f1e8130a5c4dbc4beceff4081eb
-
hwangjeff authored
Summary: To facilitate experimenting with different strategies, this PR removes the existing subsampling and positional embedding logic from `Conformer`. Pull Request resolved: https://github.com/pytorch/audio/pull/2171 Reviewed By: nateanl Differential Revision: D33793338 Pulled By: hwangjeff fbshipit-source-id: 9f97614b09964a101a891b9c840b61a26fc1541f
-
- 21 Jan, 2022 1 commit
-
-
moto authored
Summary: Split from https://github.com/pytorch/audio/issues/2164 Add new test assets. Adding this commit separately so that this commit message about the origin of the file is easier to find. The original video is in public domain par - https://svs.gsfc.nasa.gov/13013 - https://www.nasa.gov/multimedia/guidelines/index.html (The YouTube page directly says so) - https://www.youtube.com/watch?v=6zNsc0e3Zns So, the video is modified to fit the needs for testing. 1. multiple audio/video streams 2. Non-audio/video (subtitle) streams 3. Different FPS and sampling rate 4. Ones without audio and video. ``` #!/usr/bin/env bash original=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 subtitle=https://svs.gsfc.nasa.gov/vis/a010000/a013000/a013013/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-SRT-CC.en_US.srt # Fetch the original video, embed the subtitle ffmpeg -i "${original}" -i "${subtitle}" -c:v copy -c:a copy -c:s mov_text -metadata:s:2 language=eng original.mp4 -y # Extract, rescale video and resample audio ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=480:270 -af aresample=16000 tmp1.mp4 -y ffmpeg -i original.mp4 -ss 29 -to 42 -c:s copy -vf scale=320:180 -r 25 -af aresample=8000 tmp2.mp4 -y # Merge them, retaining all the streams (6 in total) ffmpeg -i tmp2.mp4 -i tmp1.mp4 -map 0 -map 1 -c:s copy nasa_13013.mp4 -y # Make versions without audio / video ffmpeg -i tmp2.mp4 -c copy -vn nasa_13013_no_video.mp4 -y ffmpeg -i tmp2.mp4 -c copy -an nasa_13013_no_video.mp4 -y ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2167 Reviewed By: carolineechen Differential Revision: D33712954 Pulled By: mthrok fbshipit-source-id: b7cfc1358043a4abd1c0b416e8a8fb0039867211
-
- 20 Jan, 2022 1 commit
-
-
Nikita Shulga authored
Summary: Find out that tests are failing after change for tester GPU class, see https://github.com/pytorch/audio/pull/1791 Pull Request resolved: https://github.com/pytorch/audio/pull/2165 Reviewed By: mthrok Differential Revision: D33674802 Pulled By: malfet fbshipit-source-id: 2e39386c0f129cf44a30d5dfea67e9e2d0e875cf
-
- 05 Jan, 2022 1 commit
-
-
moto authored
Summary: Update the internal of `skipIfXXX` decorators so that tests in CI will not be automatically skipped. Currently we automatically skip some tests based on the availability of related features/test tools. This causes issues where we miss signals on certain important features. (CUDA on Windows) https://github.com/pytorch/audio/issues/1565 The new `skipIf` decorator will fail if in CI unless it is explicitly allowed to skip tests. It does so by checking `CI` and `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` environment variables. For non-CI environments, the behavior is same as before, but users can now set `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX=false` to disallow the automatic skip. Results without `TORCHAUDIO_TEST_ALLOW_SKIP_IF_XXX` https://app.circleci.com/pipelines/github/pytorch/audio/9112/workflows/4e6db046-a1a2-4965-b0fe-d5baf4a1efac Pull Request resolved: https://github.com/pytorch/audio/pull/2127 Reviewed By: hwangjeff Differential Revision: D33430711 Pulled By: mthrok fbshipit-source-id: d8954dd720469c5ab0f34ea062fd8cf04a8afa3e
-
- 30 Dec, 2021 2 commits
-
-
Joao Gomes authored
Summary: cc mthrok Pull Request resolved: https://github.com/pytorch/audio/pull/2116 Reviewed By: mthrok Differential Revision: D33368453 Pulled By: jdsgomes fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c
-
hwangjeff authored
Summary: * Removes redundant declaration `right_context_blocks = []`, as flagged by kobenaxie. * Adds random seed to tests, as flagged by carolineechen in other PRs. Pull Request resolved: https://github.com/pytorch/audio/pull/2091 Reviewed By: mthrok Differential Revision: D33340964 Pulled By: hwangjeff fbshipit-source-id: a9de43e28d1bae7bd4806b280717b0d822bb42fc
-
- 29 Dec, 2021 3 commits
-
-
hwangjeff authored
Summary: Adds parameter `p` to `TimeMasking` to allow for enforcing an upper bound on the proportion of time steps that it can mask. This behavior is consistent with the specifications provided in the SpecAugment paper (https://arxiv.org/abs/1904.08779). Pull Request resolved: https://github.com/pytorch/audio/pull/2090 Reviewed By: carolineechen Differential Revision: D33344772 Pulled By: hwangjeff fbshipit-source-id: 6ff65f5304e489fa1c23e15c3d96b9946229fdcf
-
Caroline Chen authored
Summary: Additionally accept list of tokens as CTC decoder input. This makes it possible to directly pass in something like `bundles.get_labels()` into the decoder factory function instead of requiring a separate tokens file. Pull Request resolved: https://github.com/pytorch/audio/pull/2112 Reviewed By: hwangjeff, nateanl, mthrok Differential Revision: D33352909 Pulled By: carolineechen fbshipit-source-id: 6d22072e34f6cd7c6f931ce4eaf294ae4cf0c5cc
-
hwangjeff authored
Summary: Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`. Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html Pull Request resolved: https://github.com/pytorch/audio/pull/2110 Reviewed By: carolineechen, mthrok Differential Revision: D33354116 Pulled By: hwangjeff fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb
-
- 23 Dec, 2021 3 commits
-
-
Caroline Chen authored
Summary: Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review This PR adds Python decoder API and basic README Pull Request resolved: https://github.com/pytorch/audio/pull/2089 Reviewed By: mthrok Differential Revision: D33299818 Pulled By: carolineechen fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
hwangjeff authored
Summary: Adds implementation of Conformer module. Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770. Pull Request resolved: https://github.com/pytorch/audio/pull/2068 Reviewed By: mthrok Differential Revision: D33236957 Pulled By: hwangjeff fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6
-
- 21 Dec, 2021 1 commit
-
-
moto authored
Summary: ## bug description When a 24 bits-par-sample audio is loaded via file-like object, the loaded Tensor is wrong. It was fine if the audio is loaded from local file. ## The cause of the bug The core of the sox's decoding mechanism is `sox_read` function, one of which parameter is the maximum number of samples to decode from the given buffer. https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)] The `sox_read` function is called in what is called `drain` effect, callback and this callback receives output buffer and its size in byte. The previous implementation passed this size value as the argument of `sox_read` for the maximum number of samples to read. Since buffer size is larger than the number of samples fit in the buffer, `sox_read` function always consumed the entire buffer. (This behavior is not wrong except when the input is 24 bits-per-sample and file-like object.) When the input is read from file-like object, inside of drain callback, new data are fetched via Python's `read` method and loaded on fixed-size memory region. The size of this memory region can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`, but the default value is 8096. If the input format is 24 bits-per-sample, the end of memory region does not necessarily correspond to the end of a valid sample. When `sox_read` consumes all the data in the buffer region, the data at the end introduces some unexpected values. This causes the aforementioned bug ## Fix Pass proper (better estimated) maximum number of samples decodable to `sox_read`. Pull Request resolved: https://github.com/pytorch/audio/pull/2084 Reviewed By: carolineechen Differential Revision: D33236947 Pulled By: mthrok fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
-
- 30 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time. For one of the four tests: Before: ``` test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%] ======================== 1 passed in 517.35s (0:08:37) ========================= ``` After: ``` test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%] ======================== 1 passed in 104.59s (0:01:44) ========================= ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2037 Reviewed By: mthrok Differential Revision: D32726213 Pulled By: hwangjeff fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e
-
- 24 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference. Pull Request resolved: https://github.com/pytorch/audio/pull/2028 Reviewed By: mthrok Differential Revision: D32627919 Pulled By: hwangjeff fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
-
- 23 Nov, 2021 1 commit
-
-
moto authored
Summary: The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test. Pull Request resolved: https://github.com/pytorch/audio/pull/2025 Reviewed By: nateanl Differential Revision: D32615933 Pulled By: mthrok fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f
-
- 22 Nov, 2021 2 commits
-
-
Zhaoheng Ni authored
Summary: Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram. Pull Request resolved: https://github.com/pytorch/audio/pull/2024 Reviewed By: carolineechen Differential Revision: D32594051 Pulled By: nateanl fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46
-
Zhaoheng Ni authored
Summary: Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check. Pull Request resolved: https://github.com/pytorch/audio/pull/2004 Reviewed By: mthrok Differential Revision: D32539827 Pulled By: nateanl fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22
-
- 18 Nov, 2021 2 commits
-
-
hwangjeff authored
Summary: Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model. Pull Request resolved: https://github.com/pytorch/audio/pull/2003 Reviewed By: nateanl Differential Revision: D32440879 Pulled By: hwangjeff fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360
-
Facebook Community Bot authored
Co-authored-by:Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
-