- 24 Mar, 2022 2 commits
-
-
Caroline Chen authored
Summary: rendered: - [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2278 Reviewed By: mthrok Differential Revision: D35097734 Pulled By: carolineechen fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2288 Reviewed By: hwangjeff Differential Revision: D35099492 Pulled By: mthrok fbshipit-source-id: 955c5e617469009ae2600d2764d601d794ee916f
-
- 22 Mar, 2022 3 commits
-
-
moto authored
Summary: Originally, the global property TORCHAUDIO_THIRD_PARTIES was introduced to handle the optional third party dependencies that can change based on the build config. After revising the CMake, it turned out this is not really necessary, as our torchaudio/csrc/CMakeLists.txt properly branches out for conditional dependencies. Rather we should leave the global scope untouched. Pull Request resolved: https://github.com/pytorch/audio/pull/2282 Reviewed By: hwangjeff Differential Revision: D35059838 Pulled By: mthrok fbshipit-source-id: ed3557eaa9a669e4466d64893beab5089eca78b8
-
moto authored
Summary: In recent updates, torchaudio added features that download assets/models from download.pytorch.org/torchaudio. To reduce the code duplication, the implementations uses utilities from ``torch.hub``, but still, there are patterns repeated in implementing the fetch mechanism, notably cache and local file path handling. This commit introduces the utility function that handles download/cache/local path management that can be used for fetching pre-trained model data. Pull Request resolved: https://github.com/pytorch/audio/pull/2283 Reviewed By: carolineechen Differential Revision: D35050469 Pulled By: mthrok fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b
-
Hagen Wierstorf authored
Summary: The calculation of the SNR in tha data augmentation examples seems to be wrong to me:  If we start from the definition of the signal-to-noise ratio using the root mean square value we get: ``` SNR = 20 log10 ( rms(scale * speech) / rms(noise) ) ``` this can be transformed to ``` scale = 10^(SNR/20) rms(noise) / rms(speech) ``` In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have ``` rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2) ``` this would lead us to: ``` 10^(SNR/20) = e^(SNR / 10) ``` which is not true. Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`. For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41. Pull Request resolved: https://github.com/pytorch/audio/pull/2285 Reviewed By: nateanl Differential Revision: D35047737 Pulled By: mthrok fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3
-
- 17 Mar, 2022 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2281 Reviewed By: carolineechen Differential Revision: D34939494 Pulled By: mthrok fbshipit-source-id: e97100b95a8e3d3e28805d8fab43b66120c2254d
-
- 10 Mar, 2022 3 commits
-
-
moto authored
Summary: Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202 Pull Request resolved: https://github.com/pytorch/audio/pull/2270 Reviewed By: hwangjeff Differential Revision: D34793460 Pulled By: mthrok fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2273 Reviewed By: mthrok Differential Revision: D34799335 Pulled By: carolineechen fbshipit-source-id: d0eea79448efdbd84758a3f433ab9350b4c94e91
-
Zhaoheng Ni authored
Summary: Add torchaudio 0.11.0 version to the table. Pull Request resolved: https://github.com/pytorch/audio/pull/2272 Reviewed By: carolineechen Differential Revision: D34790836 Pulled By: nateanl fbshipit-source-id: af9ec1a4b470b04b793f39d12dbf722d67c62fce
-
- 08 Mar, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2143 Reviewed By: carolineechen Differential Revision: D34722238 Pulled By: nateanl fbshipit-source-id: 72809c9db91c94d8e853c80ed8522eeffe5ff136
-
- 06 Mar, 2022 1 commit
-
-
moto authored
Summary: When building Kaldi submodule, it requires to run `get_version.sh`, so that version header is available. It was pointed that the script should run with `bash`, instead of `sh`. Fixes https://github.com/pytorch/audio/issues/2268 Pull Request resolved: https://github.com/pytorch/audio/pull/2269 Reviewed By: carolineechen Differential Revision: D34667726 Pulled By: mthrok fbshipit-source-id: 761b82c54b58af2bfb2836cbe18c9708f853f1e1
-
- 04 Mar, 2022 2 commits
-
-
moto authored
Summary: This commit adds the following behavior to `seek` so that `seek` works after a frame is decoded. 1. Flush the decoder buffer. 2. Recreate filter graphs (so that internal state is re-initialized) 3. Discard the buffered tensor. (decoded chunks) Also it disallows negative values for seek timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2264 Reviewed By: carolineechen Differential Revision: D34497826 Pulled By: mthrok fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
-
moto authored
Summary: `torchaudio.prototype.io.Streamer` class takes context dependant options as `option` argument in the form of mappings of strings. Currently there is no check if the provided options were valid for the given input. This commit adds the check and raise an error if an invalid erro is given. This is analogous to `ffmpeg` command error handling. ``` $ ffmpeg -foo ... Unrecognized option 'foo'. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2263 Reviewed By: hwangjeff Differential Revision: D34495111 Pulled By: mthrok fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
-
- 27 Feb, 2022 1 commit
-
-
Nikita Shulga authored
Summary: Make them more aligned with ones in https://github.com/pytorch/vision/blob/main/.circleci/unittest/linux/scripts/setup_env.sh This is preliminary step towards eradicating unneeded conda-forge dependencies, see https://github.com/pytorch/audio/pull/2260 Pull Request resolved: https://github.com/pytorch/audio/pull/2265 Reviewed By: mthrok Differential Revision: D34499635 Pulled By: malfet fbshipit-source-id: f87a3e4568aeeab9c6787a777c3231153c4539f0
-
- 26 Feb, 2022 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2261 Enables prototype ffmpeg io tests in fbcode. Reviewed By: nateanl Differential Revision: D33698353 fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036
-
Zhaoheng Ni authored
Summary: This PR adds ``apply_beamforming`` method to ``torchaudio.functional``. The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum. Pull Request resolved: https://github.com/pytorch/audio/pull/2232 Reviewed By: mthrok Differential Revision: D34474561 Pulled By: nateanl fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d
-
moto authored
Summary: This commit adds tutorial for device ASR, and update API for device streaming. The changes for the interface are 1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods. 2. Move `fill_buffer` method to private. When dealing with device stream, there are situations where the device buffer is not ready and the system returns `EAGAIN`. In such case, the previous implementation of `process_packet` method raised an exception in Python layer , but for device ASR, this is inefficient. A better approach is to retry within C++ layer in blocking manner. The new `timeout` parameter serves this purpose. Pull Request resolved: https://github.com/pytorch/audio/pull/2202 Reviewed By: nateanl Differential Revision: D34475829 Pulled By: mthrok fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
-
- 25 Feb, 2022 6 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``rtf_power`` method to ``torchaudio.functional``. The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206). [This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English. The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2231 Reviewed By: mthrok Differential Revision: D34474503 Pulled By: nateanl fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb
-
Eli Uriegas authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2256 Limits scope of unittesting to one python version for both macOS and Windows. These types of workflows are particularly expensive and take a long time so running them on every PR / every push is a bit wasteful considering the value in signal between different python versions is probably negligible. Signed-off-by:
Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: mthrok Differential Revision: D34459626 Pulled By: seemethere fbshipit-source-id: 47f5c317027f1b395edf9c1720b1b33ba689cad5
-
Zhaoheng Ni authored
Summary: This PR adds `rtf_evd` method to `torchaudio.functional`. The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. The input argument is the power spectral density (PSD) matrix of the target speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2230 Reviewed By: mthrok Differential Revision: D34474188 Pulled By: nateanl fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference. The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2229 Reviewed By: mthrok Differential Revision: D34474119 Pulled By: nateanl fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2228 Reviewed By: mthrok Differential Revision: D34474018 Pulled By: nateanl fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb
-
Zhaoheng Ni authored
Summary: This PR adds ``psd`` method to ``torchaudio.functional``. It computes the power spectral density (PSD) matrix of the complex-valued spectrum. The method also supports normalization of Time-Frequency mask. Pull Request resolved: https://github.com/pytorch/audio/pull/2227 Reviewed By: mthrok Differential Revision: D34473908 Pulled By: nateanl fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9
-
- 24 Feb, 2022 4 commits
-
-
Caroline Chen authored
Summary: as discussed offline w/ nateanl, cherry-picked PRs are currently being included when retrieving PRs between a release branch and newer commits. this PR fixes this by removing duplicates in the commit paths Pull Request resolved: https://github.com/pytorch/audio/pull/2257 Reviewed By: nateanl Differential Revision: D34459533 Pulled By: carolineechen fbshipit-source-id: 3497c1d2dca6f8067e2068146a6e28cce591d3c8
-
Caroline Chen authored
Summary: fix a style check failure from internal diff Pull Request resolved: https://github.com/pytorch/audio/pull/2258 Reviewed By: nateanl Differential Revision: D34459526 Pulled By: carolineechen fbshipit-source-id: d0e6782b5689c3bf63214a4ec6a75dd757678e0d
-
Eli Uriegas authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2259 We're deprecating support for CUDA 11.1 binaries since CUDA 11.3 should be forwards compatible with CUDA 11.1 drivers Signed-off-by:
Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: atalman Differential Revision: D34458400 Pulled By: seemethere fbshipit-source-id: 105d96a9a175a94d85ffe6e9abcce3c77163a72f
-
Andrey Talman authored
Summary: Adding py3.10 to audio Pull Request resolved: https://github.com/pytorch/audio/pull/2224 Reviewed By: malfet, atalman, mthrok Differential Revision: D34442377 Pulled By: seemethere fbshipit-source-id: 2656de73427063958d609a74c01b526a476cb06a
-
- 23 Feb, 2022 1 commit
-
-
Binh Tang authored
Summary: We proactively remove references to the deprecated DDP accelerator to prepare for the breaking changes following the release of PyTorch Lighting 1.6 (see T112240890). Differential Revision: D34295318 fbshipit-source-id: 7b2245ca9c7c2900f510722b33af8d8eeda49919
-
- 18 Feb, 2022 2 commits
-
-
hwangjeff authored
Summary: Noticed some items to clean up in `Emformer`. - Make `segment_length` a required argument in `_EmformerLayer`. - Remove unused variables from `_unpack_state` and `_gen_attention_mask`. These don't affect `Emformer`'s functionality or public API. Pull Request resolved: https://github.com/pytorch/audio/pull/2252 Reviewed By: carolineechen, mthrok Differential Revision: D34321430 Pulled By: hwangjeff fbshipit-source-id: 38a5046f633a3e625352c476ef71c78380ccc597
-
Caroline Chen authored
Summary: - fix retrieve PR script to handle commits with unrecognized/invalid PR numbers, such as in 7b6b2d00 - add modifications similar to pytorch's [#71917](https://github.com/pytorch/pytorch/pull/71917), [#72085](https://github.com/pytorch/pytorch/pull/72085) Pull Request resolved: https://github.com/pytorch/audio/pull/2249 Reviewed By: nateanl, mthrok Differential Revision: D34304210 Pulled By: carolineechen fbshipit-source-id: 245784219317e355b5cece4a139dee71d65bfdd1
-
- 17 Feb, 2022 4 commits
-
-
Zhaoheng Ni authored
Summary: In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests. This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2245 Reviewed By: mthrok Differential Revision: D34273035 Pulled By: nateanl fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2250 Reviewed By: mthrok Differential Revision: D34302192 Pulled By: nateanl fbshipit-source-id: 4ea7047503ef87e22b5ef6075ad010314d5e3885
-
Zhaoheng Ni authored
Summary: - Refactor the current `LibriSpeechRNNTModule`'s unit test. - Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule` - Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test. Pull Request resolved: https://github.com/pytorch/audio/pull/2240 Reviewed By: mthrok Differential Revision: D34285195 Pulled By: nateanl fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f
-
moto authored
Summary: https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html 1. Add figure to explain the caching 2. Fix the initialization of stream iterator Pull Request resolved: https://github.com/pytorch/audio/pull/2226 Reviewed By: carolineechen Differential Revision: D34265971 Pulled By: mthrok fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4
-
- 16 Feb, 2022 6 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``EMFORMER_RNNT_BASE_MUSTC`` support in `pipeline_demo.py`. The bundle is trained on MuST-C release 2.0 dataset. The model preserves the casing and punctuations in the transcript. Here is a screen recording of how it works in streaming and non-streaming modes: https://user-images.githubusercontent.com/8653221/154356521-fe84bdc1-fb0c-41bd-8729-9edbb3224a07.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2248 Reviewed By: hwangjeff Differential Revision: D34282598 Pulled By: nateanl fbshipit-source-id: 42ed7e2623031dfebd176ef0c6bfd70da3c897d4
-
Zhaoheng Ni authored
Summary: In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`. Pull Request resolved: https://github.com/pytorch/audio/pull/2246 Reviewed By: carolineechen Differential Revision: D34273057 Pulled By: nateanl fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b
-
Zhaoheng Ni authored
Summary: - Use dictionary to select the `RNNTBundle` and the corresponding dataset. - Use the dictionary's keys as choices in ArgumentParser Pull Request resolved: https://github.com/pytorch/audio/pull/2239 Reviewed By: mthrok Differential Revision: D34267070 Pulled By: nateanl fbshipit-source-id: 99c7942d5c7c1518694e1ae02a55a7decd87c220
-
Zhaoheng Ni authored
Summary: - Add docstring to `eval.py` and `pipeline_demo.py` under `emformer_rnnt` directory. - Refactor logger and ArgumentParser Pull Request resolved: https://github.com/pytorch/audio/pull/2238 Reviewed By: mthrok Differential Revision: D34267059 Pulled By: nateanl fbshipit-source-id: 4b8d3d183ee7bc0ad71ce305cab87bfa90208b2e
-
Zhaoheng Ni authored
Summary: In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/2244 Reviewed By: mthrok Differential Revision: D34272998 Pulled By: nateanl fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5
-
Caroline Chen authored
Summary: LM in example script was unintentionally changed to None when adding no LM support previously. this changes it back and is consistent with the WERs listed in the readme Pull Request resolved: https://github.com/pytorch/audio/pull/2235 Reviewed By: nateanl Differential Revision: D34273042 Pulled By: carolineechen fbshipit-source-id: 824b1ce18195e39dc534b2ec9c5312bbe3bb1812
-