- 10 Nov, 2022 1 commit
-
-
Julián D. Arias-Londoño authored
Summary: I have added BarkScale transform, which can transform a regular Spectrogram into a BarkSpectrograms similar to MelScale. ahmed-fau opened this requirement in December 2021 with the number (https://github.com/pytorch/audio/issues/2103). The new functionality includes three different well-known approximations of the Bark scale. Pull Request resolved: https://github.com/pytorch/audio/pull/2823 Reviewed By: nateanl Differential Revision: D41162100 Pulled By: carolineechen fbshipit-source-id: b2670c4972e49c9ef424da5d5982576f7a4df831
-
- 21 Sep, 2022 1 commit
-
-
moto authored
Summary: * Introduce the mini-index at `torchaudio.pipelines` page. * Add introductions * Update pipeline tutorials https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html <img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png"> <img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png"> https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle <img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2689 Reviewed By: carolineechen Differential Revision: D39691253 Pulled By: mthrok fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49
-
- 16 Sep, 2022 1 commit
-
-
moto authored
Summary: * Introduce the mini-index at `torchaudio.transforms` page. * Add "Augmentations" subsection. * Also updated the overall introduction. https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html <img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png"> <img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2683 Reviewed By: carolineechen Differential Revision: D39574255 Pulled By: mthrok fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627
-
- 15 Sep, 2022 1 commit
-
-
moto authored
Summary: Preparation for the adoptation of `autosummary`. Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`. Example: https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html Pull Request resolved: https://github.com/pytorch/audio/pull/2676 Reviewed By: carolineechen Differential Revision: D39509431 Pulled By: mthrok fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a
-
- 18 Aug, 2022 1 commit
-
-
moto authored
Summary: Resolves the following warning ``` /torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short. :hidden:`Loudness` ----------------- ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2627 Reviewed By: carolineechen Differential Revision: D38814802 Pulled By: mthrok fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749
-
- 03 Aug, 2022 1 commit
-
-
bshall authored
Summary: I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details: - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`). - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything. - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature. - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support? I hope this is helpful! looking forward to hearing from you. Pull Request resolved: https://github.com/pytorch/audio/pull/2472 Reviewed By: hwangjeff Differential Revision: D38389155 Pulled By: carolineechen fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
-
- 10 May, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. The input arguments are: - multi-channel spectrum. - RTF vector of the target speech - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2368 Reviewed By: carolineechen Differential Revision: D36214940 Pulled By: nateanl fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
-
Zhaoheng Ni authored
Summary: Add a new design of MVDR module. The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are: - multi-channel spectrum. - PSD matrix of target speech. - PSD matrix of noise. - reference channel in the microphone array. - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation. - diag_eps for computing the inverse of the matrix. - eps for computing the beamforming weight. The output of the module is the single-channel complex-valued spectrum for the enhanced speech. Pull Request resolved: https://github.com/pytorch/audio/pull/2367 Reviewed By: hwangjeff Differential Revision: D36198015 Pulled By: nateanl fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
-
- 04 Nov, 2021 1 commit
-
-
moto authored
-
- 28 Oct, 2021 1 commit
-
-
S Harish authored
-
- 20 Sep, 2021 1 commit
-
-
nateanl authored
-
- 20 Aug, 2021 1 commit
-
-
Caroline Chen authored
-
- 19 Aug, 2021 1 commit
-
-
Caroline Chen authored
-
- 14 Aug, 2021 1 commit
-
-
nateanl authored
-
- 29 Jul, 2021 1 commit
-
-
Joel Frank authored
Summary: - Add linear_fbank method - Add LFCC in transforms
-
- 16 Jul, 2021 1 commit
-
-
nateanl authored
-
- 03 Jun, 2021 1 commit
-
-
moto authored
* Use `bibtex` for paper citations. * add `override.css` for fixing back reference. * wav2vec2 * wav2letter * convtasnet * deepspeech * rnnt-loss * griffinlim * Fix broken references in `filtering`. * Fix note in soundfile backends. * Tweak wav2vec2 example. * Removes unused `pytorch_theme.css`
-
- 26 Feb, 2021 1 commit
-
-
Vincent QB authored
-
- 28 Apr, 2020 1 commit
-
-
Artyom Astafurov authored
* initial test, stub function, transform and docstring * add draft working implementation, update docstrings * merge VadSate into Vad calss, move Channel into Vad class * remove functional stub for vad * add wav file for test * refactor _measure() to improve performance * rename argument * replace copy_ with assignment * refactor init, update documentation, update test for readability * clean up default values * move code from transforms.py to funtional.py and integrate state into a function * remove Channel state class * fix calcuation of a flush point * make multiple channels work * clean up multi-channel, update test * rename variables and re-org arguments for _measure * fix linting errors * add torchscript consistency test and fix errors * support and test batch consistency, fix normalization * update documentation, switch torchscript consistancy test to use transform to improve coverage * fix linting errors * remove un-used imports * address PR comments * add doc references into rst
-
- 17 Apr, 2020 1 commit
-
-
wanglong001 authored
* add cmvn * Update transforms.rst add cmvn * Correct the format * Correct the format * Correct the format * add test unit and cmvn change to cmn * fix bug Co-authored-by:Vincent QB <vincentqb@users.noreply.github.com>
-
- 24 Mar, 2020 1 commit
-
-
Tomás Osório authored
* Add Vol with gain_type amplitude * add gain in db and add tests * add gain_type "power" and tests * add functional DB_to_amplitude * simplify * remove functional * improve docstring * add to documentation
-
- 10 Mar, 2020 1 commit
-
-
Tomás Osório authored
* add basics for Fade * add fade possibilities: at start, end or both * add different types of fade * add docstrings, add overriding possibility * remove unnecessary logic * correct typing * agnostic to batch size or n_channels * add batch test to Fade * add transform to options * add test_script_module * add coherency with test batch * remove extra step for waveform_length * update docstring * add test to compare fade with sox * change name of fade_shape * update test fade vs sox with new nomenclature for fade_shape * add Documentation Co-authored-by:Vincent QB <vincentqb@users.noreply.github.com>
-
- 28 Feb, 2020 1 commit
-
-
moto authored
* Inverse Mel Scale Implementation * Inverse Mel Scale Docs * Better working version. * GPU fix * These shouldn't go on git.. * Even better one, but does not support JITability. * Remove JITability test * Flake8 * n_stft is a must * minor clean up of initialization * Add librosa consistency test This PR follows up #366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility. For `MelScale` compatibility test; 1. Generate spectrogram 2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance 3. Feed the spectrogram to `librosa.feature.melspectrogram` function. 4. Compare the result from 2 and 3 elementwise. Element-wise numerical comparison is possible because under the hood their implementations use the same algorith. For `InverseMelScale` compatibility test, it is more elaborated than that. 1. Generate the original spectrogram 2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance 3. Reconstruct spectrogram using torchaudio implementation 3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram. 3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1. 4. Reconstruct spectrogram using librosa 4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram. 4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.) 5. Check that resulting P1 distance are in a roughly same value range. Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude. Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen ``` print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1)) print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1)) >>> p1 dist (orig <-> ta): tensor(1482.1917) >>> p1 dist (orig <-> lr): tensor(1420.7103) ``` This value can vary based on the length and the kind of the signal being processed, so it was handpicked. * Address review feedbacks * Support arbitrary batch dimensions. * Add batch test * Use view for batch * fix sgd * Use negative indices and update docstring * Update threshold Co-authored-by:Charles J.Y. Yoon <jaeyeun97@gmail.com>
-
- 26 Dec, 2019 1 commit
-
-
Charles J.Y. Yoon authored
* Griffin-Lim Transformation Implementation * Griffin-Lim Docs * Remove f-string from backwards compatibility * iSTFT is now jit-able. * Comment changes * Functional Implementation & now jitable * flake8 * Doc & GPU Fix * Librosa comparison test * test directly griffinlim's output. tighter atol. * matching signature to docstring. Co-authored-by:Vincent QB <vincentqb@users.noreply.github.com>
-
- 21 Nov, 2019 2 commits
-
-
Vincent QB authored
* since we no longer use decoration, this fixes #165. * remove import of _docs.
-
Vincent QB authored
* sync docs with functionals. * Adding transforms to documentations. Moving augmentations in transforms.
-
- 29 Jul, 2019 1 commit
-
-
jamarshon authored
-
- 18 Dec, 2017 1 commit
-
-
Soumith Chintala authored
-