Commits · 8e0c2a3bab2c09c1d489377b66c1dfbf4c79498d · OpenDAS / Torchaudio

10 May, 2022 2 commits

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
The input arguments are:
- multi-channel spectrum.
- RTF vector of the target speech
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.
The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2368

Reviewed By: carolineechen

Differential Revision: D36214940

Pulled By: nateanl

fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937

4b021ae3

Add SoudenMVDR module (#2367) · aed5eb88

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are:
- multi-channel spectrum.
- PSD matrix of target speech.
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2367

Reviewed By: hwangjeff

Differential Revision: D36198015

Pulled By: nateanl

fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527

aed5eb88

04 Nov, 2021 1 commit
- Add Sphinx-gallery to doc (#1967) · a3363539
  moto authored Nov 04, 2021
  
  a3363539
28 Oct, 2021 1 commit
- Remove F.complex_norm and T.ComplexNorm (#1942) · ab50909d
  S Harish authored Oct 28, 2021
  
  ab50909d
20 Sep, 2021 1 commit
- Move MVDR and PSD modules to transforms (#1771) · ac97ad82
  nateanl authored Sep 20, 2021
  
  ac97ad82
20 Aug, 2021 1 commit
- Add sections to transforms docs (#1720) · ecfaac11
  Caroline Chen authored Aug 20, 2021
  
  ecfaac11
19 Aug, 2021 1 commit
- Move RNNT Loss out of prototype (#1711) · 2c115821
  Caroline Chen authored Aug 19, 2021
  
  2c115821
14 Aug, 2021 1 commit
- Add doc for InverseSpectrogram (#1706) · ee74056f
  nateanl authored Aug 14, 2021
  
  ee74056f
29 Jul, 2021 1 commit
- Add LFCC feature to transforms (#1611) · 86370639
  Joel Frank authored Jul 29, 2021
```
Summary:
- Add linear_fbank method
- Add LFCC in transforms
```
  86370639
16 Jul, 2021 1 commit
- Add PitchShift to functional and transform (#1629) · f5dbb002
  nateanl authored Jul 16, 2021
  
  f5dbb002
03 Jun, 2021 1 commit

Update docs (#1550) · 0166a851

moto authored Jun 03, 2021

* Use `bibtex` for paper citations.
  * add `override.css` for fixing back reference.
  * wav2vec2
  * wav2letter
  * convtasnet
  * deepspeech
  * rnnt-loss
  * griffinlim
* Fix broken references in `filtering`.
* Fix note in soundfile backends.
* Tweak wav2vec2 example.
* Removes unused `pytorch_theme.css`

0166a851

26 Feb, 2021 1 commit
- Fixes #1314 (#1316) · 457148ea
  Vincent QB authored Feb 26, 2021
  
  457148ea
28 Apr, 2020 1 commit

Port sox::vad (#578) · 3ecc7016

Artyom Astafurov authored Apr 28, 2020

* initial test, stub function, transform and docstring

* add draft working implementation, update docstrings

* merge VadSate into Vad calss, move Channel into Vad class

* remove functional stub for vad

* add wav file for test

* refactor _measure() to improve performance

* rename argument

* replace copy_ with assignment

* refactor init, update documentation, update test for readability

* clean up default values

* move code from transforms.py to funtional.py and integrate state into a function

* remove Channel state class

* fix calcuation of a flush point

* make multiple channels work

* clean up multi-channel, update test

* rename variables and re-org arguments for _measure

* fix linting errors

* add torchscript consistency test and fix errors

* support and test batch consistency, fix normalization

* update documentation, switch torchscript consistancy test to use transform to improve coverage

* fix linting errors

* remove un-used imports

* address PR comments

* add doc references into rst

3ecc7016

17 Apr, 2020 1 commit

add cmvn (#540) · b42d6100

wanglong001 authored Apr 17, 2020



* add cmvn

* Update transforms.rst

add cmvn

* Correct the format

* Correct the format

* Correct the format

* add test unit and cmvn change to cmn

* fix bug
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

b42d6100

24 Mar, 2020 1 commit

Add Vol Transformation (#468) · 11fb22aa

Tomás Osório authored Mar 24, 2020

* Add Vol with gain_type amplitude

* add gain in db and add tests

* add gain_type "power" and tests

* add functional DB_to_amplitude

* simplify

* remove functional

* improve docstring

* add to documentation

11fb22aa

10 Mar, 2020 1 commit

Add fade (#449) · 9efc3503

Tomás Osório authored Mar 10, 2020



* add basics for Fade

* add fade possibilities: at start, end or both

* add different types of fade

* add docstrings, add overriding possibility

* remove unnecessary logic

* correct typing

* agnostic to batch size or n_channels

* add batch test to Fade

* add transform to options

* add test_script_module

* add coherency with test batch

* remove extra step for waveform_length

* update docstring

* add test to compare fade with sox

* change name of fade_shape

* update test fade vs sox with new nomenclature for fade_shape

* add Documentation
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

9efc3503

28 Feb, 2020 1 commit

Add test for InverseMelScale (#448) · babc24af

moto authored Feb 28, 2020



* Inverse Mel Scale Implementation

* Inverse Mel Scale Docs

* Better working version.

* GPU fix

* These shouldn't go on git..

* Even better one, but does not support JITability.

* Remove JITability test

* Flake8

* n_stft is a must

* minor clean up of initialization

* Add librosa consistency test

This PR follows up #366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility.

For `MelScale` compatibility test;
1. Generate spectrogram
2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance
3. Feed the spectrogram to `librosa.feature.melspectrogram` function.
4. Compare the result from 2 and 3 elementwise.
Element-wise numerical comparison is possible because under the hood their implementations use the same algorith.

For `InverseMelScale` compatibility test, it is more elaborated than that.
1. Generate the original spectrogram
2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance
3. Reconstruct spectrogram using torchaudio implementation
3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram.
3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1.
4. Reconstruct spectrogram using librosa
4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram.
4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.)
5. Check that resulting P1 distance are in a roughly same value range.

Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude.
Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen

```
print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1))
print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1))
>>> p1 dist (orig <-> ta): tensor(1482.1917)
>>> p1 dist (orig <-> lr): tensor(1420.7103)
```

This value can vary based on the length and the kind of the signal being processed, so it was handpicked.

* Address review feedbacks

* Support arbitrary batch dimensions.

* Add batch test

* Use view for batch

* fix sgd

* Use negative indices and update docstring

* Update threshold
Co-authored-by: Charles J.Y. Yoon <jaeyeun97@gmail.com>

babc24af

26 Dec, 2019 1 commit

Griffin-Lim Transformation Implementation (#365) · 4a934693

Charles J.Y. Yoon authored Dec 27, 2019



* Griffin-Lim Transformation Implementation

* Griffin-Lim Docs

* Remove f-string from backwards compatibility

* iSTFT is now jit-able.

* Comment changes

* Functional Implementation & now jitable

* flake8

* Doc & GPU Fix

* Librosa comparison test

* test directly griffinlim's output. tighter atol.

* matching signature to docstring.
Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com>

4a934693

21 Nov, 2019 2 commits

Remove _docs.py (#349) · c74e580f

Vincent QB authored Nov 21, 2019

* since we no longer use decoration, this fixes #165.

* remove import of _docs.

c74e580f

Move augmentations in transforms (#348) · 99ed0521

Vincent QB authored Nov 21, 2019

* sync docs with functionals.

* Adding transforms to documentations. Moving augmentations in transforms.

99ed0521

29 Jul, 2019 1 commit
- Large re-amp on the torchaudio/docs (#166) · 95235f31
  jamarshon authored Jul 29, 2019
  
  95235f31
18 Dec, 2017 1 commit
- improve README and add sphinx docs generator · 088d5674
  Soumith Chintala authored Dec 17, 2017
  
  088d5674