- 08 Apr, 2022 1 commit
-
-
moto authored
Summary: Add badges of supported properties and devices to functionals and transforms. This commit adds `.. devices::` and `.. properties::` directives to sphinx. APIs with these directives will have badges (based off of shields.io) which link to the page with description of these features. Continuation of https://github.com/pytorch/audio/issues/2316 Excluded dtypes for further improvement, and actually added badges to most of functional/transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2321 Reviewed By: hwangjeff Differential Revision: D35489063 Pulled By: mthrok fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
-
- 06 Apr, 2022 2 commits
-
-
Xiaohui Zhang authored
Summary: Add an option to use GroupNorm rather than BatchNorm1d, and another option to re-order Convolution/MHA modules in Conformer model. Pull Request resolved: https://github.com/pytorch/audio/pull/2320 Reviewed By: hwangjeff Differential Revision: D35422112 Pulled By: xiaohui-zhang fbshipit-source-id: 360a8aaa37b883b0f656da2e4f654e86688ac270
-
Xiaohui Zhang authored
Summary: Add an option to use Tanh instead of ReLU in RNNT joiner, which enables better training performance sometimes. --- Pull Request resolved: https://github.com/pytorch/audio/pull/2319 Reviewed By: hwangjeff Differential Revision: D35422122 Pulled By: xiaohui-zhang fbshipit-source-id: c6a0f8b25936e47081110af046b57d0e8751f9a2
-
- 05 Apr, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue. Pull Request resolved: https://github.com/pytorch/audio/pull/2311 Reviewed By: mthrok Differential Revision: D35393813 Pulled By: nateanl fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11
-
Caroline Chen authored
Summary: Resolves https://github.com/pytorch/audio/issues/2294 Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type. Pull Request resolved: https://github.com/pytorch/audio/pull/2318 Reviewed By: mthrok Differential Revision: D35379276 Pulled By: carolineechen fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0
-
- 04 Apr, 2022 2 commits
-
-
Caroline Chen authored
Summary: update example ASR pipeline to use the recently added pretrained LM API for decoding Pull Request resolved: https://github.com/pytorch/audio/pull/2317 Reviewed By: mthrok Differential Revision: D35361354 Pulled By: carolineechen fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46
-
Zhaoheng Ni authored
Summary: Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser. Pull Request resolved: https://github.com/pytorch/audio/pull/2315 Reviewed By: carolineechen Differential Revision: D35357678 Pulled By: nateanl fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92
-
- 01 Apr, 2022 5 commits
-
-
Zhaoheng Ni authored
Summary: When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default. Pull Request resolved: https://github.com/pytorch/audio/pull/2310 Reviewed By: mthrok Differential Revision: D35316903 Pulled By: nateanl fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed
-
moto authored
Summary: This commit 1. Updates the config.guess and config.sub files and 2. applies them to all the third party libraries that use them. This resolves the following build failure on M1 mac with newer SDK. On MacBookPro with M1 chip, with the recent OS update, something about the development environment has been changed (probably newer version of XCode) and the build stopeed working with the following errors from third party dependencies. ``` checking build system type... Invalid configuration ‘arm64-apple-darwin20.0.0': machine ‘arm64-apple' not recognized ``` note: config files are taken from https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html Pull Request resolved: https://github.com/pytorch/audio/pull/2307 Reviewed By: nateanl Differential Revision: D35318273 Pulled By: mthrok fbshipit-source-id: 746ac51dd1816767aa78b88445f76a29acfd29e8
-
moto authored
Summary: Change the cmake logic to search CONDA_PREFIX before falling back to the other default paths and system paths. 1. FFMPEG_ROOT 2. CONDA_PREFIX 3. Other locations (Package managers and system paths) For users with regular conda installation, ffmpeg from conda should be picked automatically. If anyone wants to specify the ffmpeg, then can set FFMPEG_ROOT variable to the location of desired installation. Pull Request resolved: https://github.com/pytorch/audio/pull/2312 Reviewed By: hwangjeff Differential Revision: D35317383 Pulled By: mthrok fbshipit-source-id: 52aef8f3f7f0f8f1eaf7a89a2d1ccfb6265e2c50
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2309 For upcoming improved Kaldi features which are comprised of multiple classes / functions, put all the transforms implementations in dedicated directory. Reviewed By: nateanl Differential Revision: D35303682 fbshipit-source-id: 5bc8c07ef639683008c0f76ffe56e3941f772659
-
moto authored
Summary: The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows. https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272 ``` > self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol) E AssertionError: Tensor-likes are not close! E E Mismatched elements: 28 / 196608 (0.0%) E Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed) E Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed) ``` The value of atol==1e-08 seems very strict but all the other batch consistency tests are passing. The violation is for very small number of samples, which looks suspicious, but I think it is okay to reduce it to `1e-06` for Windows. `1e-06` is still more strict than the majority of the comparison tests we have. Pull Request resolved: https://github.com/pytorch/audio/pull/2305 Reviewed By: hwangjeff Differential Revision: D35298056 Pulled By: mthrok fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7
-
- 31 Mar, 2022 2 commits
-
-
moto authored
Summary: This commit update `get_sinusoid` function in test utility so that when a multi channel is requested, non-primal channel have randomized initial phase. This adds some variety in test data which should not break the tests. Currently `get_sinusoid` returns identical waveforms for all the channels. This multi channel support was added just to mock the input data so that it is easy to test features with multi-channel inputs, so tests should not be expecting the all channels to be identical. When working on numerical parity, it is more useful if the raw waveforms are somewhat different. Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel <img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2301 Reviewed By: hwangjeff Differential Revision: D35291689 Pulled By: mthrok fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd
-
moto authored
Summary: Tests on `torchaudio.compliance.kaldi` were scattered at different places. This commit put all of them in dedicated `test/torchaudio_unittest/compliance/kaldi/` directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2303 Reviewed By: nateanl Differential Revision: D35288400 Pulled By: mthrok fbshipit-source-id: 1426f236bc7786539d7a3110f992ad6220a52f46
-
- 30 Mar, 2022 3 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2300 Reviewed By: xiaohui-zhang Differential Revision: D35258323 Pulled By: nateanl fbshipit-source-id: 4b9f86600399ba0f5ec47f1c402968a812aa557d
-
Xiaohui Zhang authored
Summary: Addressing the issue https://github.com/pytorch/audio/issues/2274: Raise Runtime errors when the input tensors to the CTC decoder are GPU tensors since the CTC decoder only runs on CPU. Also update the data type check to use "raise" rather than "assert". --- Pull Request resolved: https://github.com/pytorch/audio/pull/2289 Reviewed By: mthrok Differential Revision: D35255630 Pulled By: xiaohui-zhang fbshipit-source-id: d6c6e88d9ad4b9690bb741557fa9a9504e60872e
-
Zhaoheng Ni authored
Summary: This PR addresses https://github.com/pytorch/audio/issues/2295 by updating `zlib`'s url to the one on sourceforge.net. `zlib` 1.2.11 source code is removed from the official site. According to https://zlib.net, ```Due to the bug fixes, any installations of 1.2.11 should be replaced with 1.2.12.``` sourceforge preserves the older versions thus is more stable. The PR keep 1.2.11 as currently there is no 1.2.12 on sourceforge. Pull Request resolved: https://github.com/pytorch/audio/pull/2297 Reviewed By: mthrok Differential Revision: D35251361 Pulled By: nateanl fbshipit-source-id: 174c2c2e1c34bef9799bbacfd1e12c8ff13ff15d
-
- 26 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: `build_docs` test is failing on CI with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`, but with local build: <img width="902" alt="Screen Shot 2022-03-25 at 4 02 53 PM" src="https://user-images.githubusercontent.com/16568633/160157472-c91ff9b2-a2be-4c5d-959e-53b9f45425c6.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2291 Reviewed By: mthrok Differential Revision: D35147098 Pulled By: carolineechen fbshipit-source-id: 682b3800d0ed5c56b402d83f221136725051ba7e
-
- 25 Mar, 2022 3 commits
-
-
moto authored
Summary: Following the issue https://github.com/pytorch/text/issues/1662, add more clarification on LTS. Also tidy up a bit by moving older versions in to details. cc Nayef211 --- <img width="794" alt="Screen Shot 2022-03-25 at 2 30 49 PM" src="https://user-images.githubusercontent.com/855818/160203327-acc5cbcb-ca86-43ee-b59f-48795b9e676c.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2293 Reviewed By: hwangjeff Differential Revision: D35159211 Pulled By: mthrok fbshipit-source-id: 18908c62440fc02773634c2700020fc407893dd3
-
Caroline Chen authored
Summary: `build_docs` CircleCI currently failing with `ImportError: cannot import name 'environmentfilter' from 'jinja2'`. Pin Jinja2<3.1 to resolve this issue, see https://github.com/sphinx-doc/sphinx/issues/10291#issuecomment-1078046986 Pull Request resolved: https://github.com/pytorch/audio/pull/2292 Reviewed By: mthrok Differential Revision: D35148397 Pulled By: carolineechen fbshipit-source-id: 963efe2fcdee13dead4a4d542c903913c6eaa505
-
Caroline Chen authored
Summary: add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/2275 Reviewed By: mthrok Differential Revision: D35115418 Pulled By: carolineechen fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0
-
- 24 Mar, 2022 2 commits
-
-
Caroline Chen authored
Summary: rendered: - [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2278 Reviewed By: mthrok Differential Revision: D35097734 Pulled By: carolineechen fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2288 Reviewed By: hwangjeff Differential Revision: D35099492 Pulled By: mthrok fbshipit-source-id: 955c5e617469009ae2600d2764d601d794ee916f
-
- 22 Mar, 2022 3 commits
-
-
moto authored
Summary: Originally, the global property TORCHAUDIO_THIRD_PARTIES was introduced to handle the optional third party dependencies that can change based on the build config. After revising the CMake, it turned out this is not really necessary, as our torchaudio/csrc/CMakeLists.txt properly branches out for conditional dependencies. Rather we should leave the global scope untouched. Pull Request resolved: https://github.com/pytorch/audio/pull/2282 Reviewed By: hwangjeff Differential Revision: D35059838 Pulled By: mthrok fbshipit-source-id: ed3557eaa9a669e4466d64893beab5089eca78b8
-
moto authored
Summary: In recent updates, torchaudio added features that download assets/models from download.pytorch.org/torchaudio. To reduce the code duplication, the implementations uses utilities from ``torch.hub``, but still, there are patterns repeated in implementing the fetch mechanism, notably cache and local file path handling. This commit introduces the utility function that handles download/cache/local path management that can be used for fetching pre-trained model data. Pull Request resolved: https://github.com/pytorch/audio/pull/2283 Reviewed By: carolineechen Differential Revision: D35050469 Pulled By: mthrok fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b
-
Hagen Wierstorf authored
Summary: The calculation of the SNR in tha data augmentation examples seems to be wrong to me:  If we start from the definition of the signal-to-noise ratio using the root mean square value we get: ``` SNR = 20 log10 ( rms(scale * speech) / rms(noise) ) ``` this can be transformed to ``` scale = 10^(SNR/20) rms(noise) / rms(speech) ``` In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have ``` rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2) ``` this would lead us to: ``` 10^(SNR/20) = e^(SNR / 10) ``` which is not true. Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`. For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41. Pull Request resolved: https://github.com/pytorch/audio/pull/2285 Reviewed By: nateanl Differential Revision: D35047737 Pulled By: mthrok fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3
-
- 17 Mar, 2022 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2281 Reviewed By: carolineechen Differential Revision: D34939494 Pulled By: mthrok fbshipit-source-id: e97100b95a8e3d3e28805d8fab43b66120c2254d
-
- 10 Mar, 2022 3 commits
-
-
moto authored
Summary: Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202 Pull Request resolved: https://github.com/pytorch/audio/pull/2270 Reviewed By: hwangjeff Differential Revision: D34793460 Pulled By: mthrok fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2273 Reviewed By: mthrok Differential Revision: D34799335 Pulled By: carolineechen fbshipit-source-id: d0eea79448efdbd84758a3f433ab9350b4c94e91
-
Zhaoheng Ni authored
Summary: Add torchaudio 0.11.0 version to the table. Pull Request resolved: https://github.com/pytorch/audio/pull/2272 Reviewed By: carolineechen Differential Revision: D34790836 Pulled By: nateanl fbshipit-source-id: af9ec1a4b470b04b793f39d12dbf722d67c62fce
-
- 08 Mar, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2143 Reviewed By: carolineechen Differential Revision: D34722238 Pulled By: nateanl fbshipit-source-id: 72809c9db91c94d8e853c80ed8522eeffe5ff136
-
- 06 Mar, 2022 1 commit
-
-
moto authored
Summary: When building Kaldi submodule, it requires to run `get_version.sh`, so that version header is available. It was pointed that the script should run with `bash`, instead of `sh`. Fixes https://github.com/pytorch/audio/issues/2268 Pull Request resolved: https://github.com/pytorch/audio/pull/2269 Reviewed By: carolineechen Differential Revision: D34667726 Pulled By: mthrok fbshipit-source-id: 761b82c54b58af2bfb2836cbe18c9708f853f1e1
-
- 04 Mar, 2022 2 commits
-
-
moto authored
Summary: This commit adds the following behavior to `seek` so that `seek` works after a frame is decoded. 1. Flush the decoder buffer. 2. Recreate filter graphs (so that internal state is re-initialized) 3. Discard the buffered tensor. (decoded chunks) Also it disallows negative values for seek timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2264 Reviewed By: carolineechen Differential Revision: D34497826 Pulled By: mthrok fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
-
moto authored
Summary: `torchaudio.prototype.io.Streamer` class takes context dependant options as `option` argument in the form of mappings of strings. Currently there is no check if the provided options were valid for the given input. This commit adds the check and raise an error if an invalid erro is given. This is analogous to `ffmpeg` command error handling. ``` $ ffmpeg -foo ... Unrecognized option 'foo'. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2263 Reviewed By: hwangjeff Differential Revision: D34495111 Pulled By: mthrok fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
-
- 27 Feb, 2022 1 commit
-
-
Nikita Shulga authored
Summary: Make them more aligned with ones in https://github.com/pytorch/vision/blob/main/.circleci/unittest/linux/scripts/setup_env.sh This is preliminary step towards eradicating unneeded conda-forge dependencies, see https://github.com/pytorch/audio/pull/2260 Pull Request resolved: https://github.com/pytorch/audio/pull/2265 Reviewed By: mthrok Differential Revision: D34499635 Pulled By: malfet fbshipit-source-id: f87a3e4568aeeab9c6787a777c3231153c4539f0
-
- 26 Feb, 2022 3 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2261 Enables prototype ffmpeg io tests in fbcode. Reviewed By: nateanl Differential Revision: D33698353 fbshipit-source-id: 61de997c564135e677cd68e34fd7cc5dc0c5e036
-
Zhaoheng Ni authored
Summary: This PR adds ``apply_beamforming`` method to ``torchaudio.functional``. The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum. Pull Request resolved: https://github.com/pytorch/audio/pull/2232 Reviewed By: mthrok Differential Revision: D34474561 Pulled By: nateanl fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d
-
moto authored
Summary: This commit adds tutorial for device ASR, and update API for device streaming. The changes for the interface are 1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods. 2. Move `fill_buffer` method to private. When dealing with device stream, there are situations where the device buffer is not ready and the system returns `EAGAIN`. In such case, the previous implementation of `process_packet` method raised an exception in Python layer , but for device ASR, this is inefficient. A better approach is to retry within C++ layer in blocking manner. The new `timeout` parameter serves this purpose. Pull Request resolved: https://github.com/pytorch/audio/pull/2202 Reviewed By: nateanl Differential Revision: D34475829 Pulled By: mthrok fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
-
- 25 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``rtf_power`` method to ``torchaudio.functional``. The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206). [This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English. The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2231 Reviewed By: mthrok Differential Revision: D34474503 Pulled By: nateanl fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb
-
Eli Uriegas authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2256 Limits scope of unittesting to one python version for both macOS and Windows. These types of workflows are particularly expensive and take a long time so running them on every PR / every push is a bit wasteful considering the value in signal between different python versions is probably negligible. Signed-off-by:
Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: mthrok Differential Revision: D34459626 Pulled By: seemethere fbshipit-source-id: 47f5c317027f1b395edf9c1720b1b33ba689cad5
-