- 07 Aug, 2023 1 commit
-
-
moto authored
Summary: Currently `torchaudio.functional.forced_align` function requires full information on input/target lengths. When performing non-batched alignment, these can be inferred from the size of Tensor. Pull Request resolved: https://github.com/pytorch/audio/pull/3533 Reviewed By: nateanl Differential Revision: D48111041 Pulled By: mthrok fbshipit-source-id: fbf07124d3959c5cc5533dcd86296851587082fb
-
- 31 Jul, 2023 1 commit
-
-
moto authored
Summary: torch.norm is now deprecated. The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm Resolves https://github.com/pytorch/audio/issues/3484 Pull Request resolved: https://github.com/pytorch/audio/pull/3522 Reviewed By: huangruizhe Differential Revision: D47926659 Pulled By: mthrok fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084
-
- 28 Jul, 2023 1 commit
-
-
moto authored
Summary: Context: https://github.com/pytorch/audio/issues/3448 The documentation of amplitude_to_DB is ambigious on how cut-off values are computed when the input tensor is 3D. This commit clarifies that. Closes: https://github.com/pytorch/audio/issues/3448 Pull Request resolved: https://github.com/pytorch/audio/pull/3519 Reviewed By: huangruizhe Differential Revision: D47875505 Pulled By: mthrok fbshipit-source-id: e06bb997e7a27e2abe35c8e2ac91ddfbded4e641
-
- 25 Jul, 2023 1 commit
-
-
moto authored
Summary: Resolves https://github.com/pytorch/audio/issues/3486 Pull Request resolved: https://github.com/pytorch/audio/pull/3487 Differential Revision: D47724733 Pulled By: mthrok fbshipit-source-id: 26f5641a8271a7e50c4a33861d09b0c8274b29e4
-
- 12 Jul, 2023 1 commit
-
-
Bogdan Teleaga authored
Summary: This is a port of https://github.com/adefossez/julius/pull/17 for torchaudio. Not sure if it's possible/desirable to add tests to test the functionality of ONNX exports, but I did a quick test on my machine to ensure this works. The logic is a bit simpler compared to the other PR because the torchaudio version does not support the additional flags available in julius. Pull Request resolved: https://github.com/pytorch/audio/pull/3473 Differential Revision: D47401988 Pulled By: mthrok fbshipit-source-id: 62fa1e4388923f6a62cef2c0f902a79ea179cec4
-
- 11 Jul, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3468 Differential Revision: D47368070 Pulled By: mthrok fbshipit-source-id: 9b5d57b0cb861a2556a1903121f526f8011a0e2d
-
- 05 Jul, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3433 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: mthrok Differential Revision: D46657526 fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89
-
- 13 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3435 Reviewed By: nateanl Differential Revision: D46659362 Pulled By: mthrok fbshipit-source-id: ffa033ad6759de6fd958b63ac51a4a1153ffb45d
-
- 07 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415 Differential Revision: D46526437 Pulled By: mthrok fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b
-
- 06 Jun, 2023 2 commits
-
-
Moto Hira authored
Differential Revision: D46126226 Original commit changeset: 42cb52b19d91 Original Phabricator Diff: D46126226 fbshipit-source-id: 372b2526d9e196e37e014f1556bf117d29bb1ac6
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3365 Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`). Reviewed By: vineelpratap Differential Revision: D46126226 fbshipit-source-id: 42cb52b19d91bbff7dc040ccf60350545d75b3a2
-
- 02 Jun, 2023 1 commit
-
-
moto authored
Summary: This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion https://github.com/pytorch/audio/issues/1269 Pull Request resolved: https://github.com/pytorch/audio/pull/3368 Differential Revision: D46406176 Pulled By: mthrok fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
-
- 01 Jun, 2023 2 commits
-
-
moto authored
Summary: Follow-up https://github.com/pytorch/audio/issues/3386 The intended change was to use path of temporary file, instead of file-like object Pull Request resolved: https://github.com/pytorch/audio/pull/3397 Reviewed By: hwangjeff Differential Revision: D46346189 Pulled By: mthrok fbshipit-source-id: 44da799c6587bcb63a118a6313b7299bad742a40
-
moto authored
Summary: To prepare for the upcoming removal of file-like object support from sox_io backend, this commit changes apply_codec function to use tempfile. `apply_codec` function is now deprecated and users are encourated to use `torchaudio.io.AudioEffector`. We will not remove the function itself, but will remove the entry from the doc. Pull Request resolved: https://github.com/pytorch/audio/pull/3386 Reviewed By: hwangjeff Differential Revision: D46330610 Pulled By: mthrok fbshipit-source-id: 3071bdefa05b4cbb9f00629bef50f0981eae89b4
-
- 24 May, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3366 Reviewed By: nateanl Differential Revision: D46136238 Pulled By: mthrok fbshipit-source-id: 3432f5d007293831bab21460a79ae26b1bbc81a8
-
- 22 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: - Fix latex formula rendering issue - Add `devices` and `properties` tags - Fix grammar Pull Request resolved: https://github.com/pytorch/audio/pull/3357 Reviewed By: mthrok Differential Revision: D46068633 Pulled By: nateanl fbshipit-source-id: 80cb84508396fbcaf81c068228d46a24bb63b975
-
- 20 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3348 The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame. Reviewed By: vineelpratap, xiaohui-zhang Differential Revision: D45867265 fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb
-
- 04 May, 2023 1 commit
-
-
Xiaohui Zhang authored
Summary: (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed) The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems: - Only zero masking can be done; masking by mean value is not supported. - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor. - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding. - It's not straightforward to apply multiple time/frequency masks by the current design. To solve these issues, here we - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor. The introduction of SpecAugment transform will be done in another PR. Pull Request resolved: https://github.com/pytorch/audio/pull/3289 Reviewed By: hwangjeff Differential Revision: D45460357 Pulled By: xiaohui-zhang fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
-
- 08 Mar, 2023 1 commit
-
-
cai525 authored
Summary: Address #3101. The documentation for `power=1` should represent magnitude instead of energy. Pull Request resolved: https://github.com/pytorch/audio/pull/3134 Reviewed By: mthrok Differential Revision: D43910652 Pulled By: nateanl fbshipit-source-id: e0768438e819222a5dde6b86c5123ab0e8af59fb
-
- 17 Feb, 2023 1 commit
-
-
hwangjeff authored
Summary: Makes lengths input optional for `torchaudio.functional.speed`, `torchaudio.transforms.Speed`, and `torchaudio.transforms.SpeedPerturbation`. Pull Request resolved: https://github.com/pytorch/audio/pull/3072 Reviewed By: nateanl, mthrok Differential Revision: D43371406 Pulled By: hwangjeff fbshipit-source-id: ecb38bcc2bfff5c5a396a37eff238b22238e795a
-
- 15 Feb, 2023 1 commit
-
-
hwangjeff authored
Summary: Relaxes input dimension matching constraint on `convolve` to enable broadcasting for inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/3061 Reviewed By: mthrok Differential Revision: D43298078 Pulled By: hwangjeff fbshipit-source-id: a6cc36674754523b88390fac0a05f06562921319
-
- 24 Jan, 2023 1 commit
-
-
hwangjeff authored
Summary: Moves `add_noise`, `fftconvolve`, `convolve`, `speed`, `preemphasis`, and `deemphasis` out of `torchaudio.prototype.functional` and into `torchaudio.functional`. Pull Request resolved: https://github.com/pytorch/audio/pull/3001 Reviewed By: mthrok Differential Revision: D42688971 Pulled By: hwangjeff fbshipit-source-id: 43280bd3ffeccddae57f1092ac45afb64dd426cc
-
- 12 Jan, 2023 1 commit
-
-
mthrok authored
Summary: * Refactor _extension module so that * the implementation of initialization logic and its execution are separated. * logic goes to `_extension.utils` * the execution is at `_extension.__init__` * global variables are defined and modified in `__init__`. * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED` * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE` * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`. * Merge the sox-related initialization logic in `_extension.utils` module. Pull Request resolved: https://github.com/pytorch/audio/pull/2968 Reviewed By: hwangjeff Differential Revision: D42387251 Pulled By: mthrok fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f
-
- 16 Dec, 2022 1 commit
-
-
Caroline Chen authored
Summary: resolves https://github.com/pytorch/audio/issues/2891 Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged. Pull Request resolved: https://github.com/pytorch/audio/pull/2922 Reviewed By: mthrok Differential Revision: D42083619 Pulled By: carolineechen fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
-
- 14 Nov, 2022 1 commit
-
-
Caroline Chen authored
Summary: follow up to https://github.com/pytorch/audio/issues/2823 - move bark spectrogram to prototype - decrease autograd test tolerance (passing on circle ci) - add diagram for bark fbanks cc jdariasl Pull Request resolved: https://github.com/pytorch/audio/pull/2843 Reviewed By: nateanl Differential Revision: D41199522 Pulled By: carolineechen fbshipit-source-id: 8e6c2e20fb7b14f39477683b3c6ed8356359a213
-
- 10 Nov, 2022 1 commit
-
-
Julián D. Arias-Londoño authored
Summary: I have added BarkScale transform, which can transform a regular Spectrogram into a BarkSpectrograms similar to MelScale. ahmed-fau opened this requirement in December 2021 with the number (https://github.com/pytorch/audio/issues/2103). The new functionality includes three different well-known approximations of the Bark scale. Pull Request resolved: https://github.com/pytorch/audio/pull/2823 Reviewed By: nateanl Differential Revision: D41162100 Pulled By: carolineechen fbshipit-source-id: b2670c4972e49c9ef424da5d5982576f7a4df831
-
- 08 Nov, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss. If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function. The following should produce the same output: ``` rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True) ``` ``` log_probs = torch.nn.functional.log_softmax(logits, dim=-1) rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False) ``` testing -- unit tests + get same results on the conformer rnnt recipe Pull Request resolved: https://github.com/pytorch/audio/pull/2798 Reviewed By: xiaohui-zhang Differential Revision: D41083523 Pulled By: carolineechen fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61
-
- 15 Sep, 2022 1 commit
-
-
moto authored
Summary: Preparation for the adoptation of `autosummary`. Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`. Example: https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html Pull Request resolved: https://github.com/pytorch/audio/pull/2676 Reviewed By: carolineechen Differential Revision: D39509431 Pulled By: mthrok fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a
-
- 16 Aug, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2618 Reviewed By: carolineechen Differential Revision: D38744137 Pulled By: nateanl fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd
-
- 03 Aug, 2022 1 commit
-
-
bshall authored
Summary: I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details: - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`). - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything. - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature. - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support? I hope this is helpful! looking forward to hearing from you. Pull Request resolved: https://github.com/pytorch/audio/pull/2472 Reviewed By: hwangjeff Differential Revision: D38389155 Pulled By: carolineechen fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
-
- 28 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104 Pull Request resolved: https://github.com/pytorch/audio/pull/2554 Reviewed By: carolineechen, mthrok Differential Revision: D38247554 Pulled By: skim0514 fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602
-
- 27 Jul, 2022 1 commit
-
-
Piyush Soni authored
Summary: `assert` is not executed when running in optimized mode. This commit replaces all instances of "assert" in /fbcode/pytorch/audio/torchaudio/functional/functional.py Pull Request resolved: https://github.com/pytorch/audio/pull/2579 Reviewed By: mthrok Differential Revision: D38158280 fbshipit-source-id: f8d7fca1c8f9b3955c6ca312b16947eb12894d81
-
- 25 Jul, 2022 1 commit
-
-
proxyphi authored
Summary: The momentum in GriffinLim transform is modified before being passed to the functional. causing inconsistency between functional and transforms. Fix this by making it pass through in transform. Fixes https://github.com/pytorch/audio/issues/2567 Pull Request resolved: https://github.com/pytorch/audio/pull/2568 Reviewed By: nateanl Differential Revision: D38117632 Pulled By: mthrok fbshipit-source-id: 99754be4b3b6dea45ba115aaea9fb6d7285bc2c9
-
- 21 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Added back device in case of tensor creation Pull Request resolved: https://github.com/pytorch/audio/pull/2561 Reviewed By: mthrok Differential Revision: D38035351 Pulled By: skim0514 fbshipit-source-id: bdea07cbb34d0aa487187cded1a5636da6623d96
-
- 20 Jul, 2022 1 commit
-
-
Sean Kim authored
Summary: Modification from pull request https://github.com/pytorch/audio/issues/2415 to improve resample. Benchmarked for a 89% time reduction, tested in comparison to original resample method. Pull Request resolved: https://github.com/pytorch/audio/pull/2553 Reviewed By: carolineechen Differential Revision: D37997533 Pulled By: skim0514 fbshipit-source-id: ef4b719450ac26794db6ea01f9882509f4fda5cf
-
- 12 Jul, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: The docstring of `apply_beamforming` has warning when building the documentation page. Fix it in this PR. Pull Request resolved: https://github.com/pytorch/audio/pull/2540 Reviewed By: mthrok Differential Revision: D37763745 Pulled By: nateanl fbshipit-source-id: 0e9f1e098865af032b00ac56d918cb9d2ffc5024
-
- 13 Jun, 2022 1 commit
-
-
Reviewed By: ivanmurashko Differential Revision: D37103342 fbshipit-source-id: adc908c790a413384bd88a75d3c2b4b0974c6674
-
- 10 Jun, 2022 1 commit
-
-
Sean Kim authored
Summary: Split existing Pitchshift into multiple helper functions in order to cache kernel and speed up overall process addressing https://github.com/pytorch/audio/issues/2359. Existing unit tests pass. edit: functional and transforms unit test pass. Adopted lazy initialization to avoid BC-breaking. Pull Request resolved: https://github.com/pytorch/audio/pull/2441 Reviewed By: carolineechen Differential Revision: D36905582 Pulled By: skim0514 fbshipit-source-id: 6780db3ac8a29d59017a6abe7e82ce1fd17aaac2
-
- 02 Jun, 2022 1 commit
-
-
moto authored
Summary: Remove the code related to libmad, which had been disabled in https://github.com/pytorch/audio/issues/2354 In https://github.com/pytorch/audio/issues/2419, we mp3 decoding to ffmpeg. But CI tests were still using libmad. This commit completely removes libmad from torchaudio. This is BC-breaking change as `apply_sox_effects_file` function cannot handle MP3, and it cannot fallback to ffmpeg. The workaround for this is to use `torchaudio.load` then `apply_sox_effects_tensor`. Pull Request resolved: https://github.com/pytorch/audio/pull/2428 Reviewed By: carolineechen Differential Revision: D36851805 Pulled By: mthrok fbshipit-source-id: f98795c59a1ac61cef511f2bbeac37f7c3c69d55
-
- 23 May, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices. - The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask. - The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc. - The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR. Pull Request resolved: https://github.com/pytorch/audio/pull/2401 Reviewed By: carolineechen Differential Revision: D36597689 Pulled By: nateanl fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6
-