Commits · f5036c7182f43e06ddf12f2b327cafaecf5763cf · hehl2 / Torchaudio

12 May, 2022 2 commits

Zhaoheng Ni authored May 12, 2022

Summary:
- Use `apply_beamforming`, `rtf_evd`, `rtf_power`, `mvdr_weights_souden`, `mvdr_weights_rtf` methods under `torchaudio.functional` to replace the class methods.
- Refactor docstrings in `PSD` and `MVDR`.
- Put `_get_mvdr_vector` outside of `MVDR` class as it doesn't call self methods inside.
- Since MVDR uses einsum for matrix operations, packing and unpacking batches are not necessary. It can be tested by the [batch_consistency_test](https://github.com/pytorch/audio/blob/main/test/torchaudio_unittest/transforms/batch_consistency_test.py#L202). Removed it from the code.

Pull Request resolved: https://github.com/pytorch/audio/pull/2383

Reviewed By: carolineechen, mthrok

Differential Revision: D36338373

Pulled By: nateanl

fbshipit-source-id: a48a6ae2825657e5967a19656245596cdf037c5f

f5036c71

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

10 May, 2022 3 commits

Add RTFMVDR module (#2368) · 4b021ae3

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
The input arguments are:
- multi-channel spectrum.
- RTF vector of the target speech
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.
The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2368

Reviewed By: carolineechen

Differential Revision: D36214940

Pulled By: nateanl

fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937

4b021ae3

Add diagonal_loading optional to rtf_power (#2369) · da1e83cc

Zhaoheng Ni authored May 10, 2022

Summary:
When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.

This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2369

Reviewed By: carolineechen

Differential Revision: D36204130

Pulled By: nateanl

fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420

da1e83cc

Add SoudenMVDR module (#2367) · aed5eb88

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are:
- multi-channel spectrum.
- PSD matrix of target speech.
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2367

Reviewed By: hwangjeff

Differential Revision: D36198015

Pulled By: nateanl

fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527

aed5eb88

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

05 Apr, 2022 1 commit

Raise error for resampling int waveform (#2318) · 11328d23

Caroline Chen authored Apr 05, 2022

Summary:
Resolves https://github.com/pytorch/audio/issues/2294

Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type.

Pull Request resolved: https://github.com/pytorch/audio/pull/2318

Reviewed By: mthrok

Differential Revision: D35379276

Pulled By: carolineechen

fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0

11328d23

26 Feb, 2022 1 commit

Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.

Pull Request resolved: https://github.com/pytorch/audio/pull/2232

Reviewed By: mthrok

Differential Revision: D34474561

Pulled By: nateanl

fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d

9c56ffb4

25 Feb, 2022 5 commits

Add rtf_power method to torchaudio.functional (#2231) · ea74813d

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``rtf_power`` method to ``torchaudio.functional``.
The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206).
[This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English.
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2231

Reviewed By: mthrok

Differential Revision: D34474503

Pulled By: nateanl

fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb

ea74813d

Add rtf_evd method to torchaudio.functional (#2230) · 86fe4fa7

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds `rtf_evd` method to `torchaudio.functional`.
The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.
The input argument is the power spectral density (PSD) matrix of the target speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2230

Reviewed By: mthrok

Differential Revision: D34474188

Pulled By: nateanl

fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134

86fe4fa7

Add mvdr_weights_rtf to torchaudio.functional (#2229) · 3566ffc5

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference.
The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2229

Reviewed By: mthrok

Differential Revision: D34474119

Pulled By: nateanl

fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2

3566ffc5

Add mvdr_weights_souden to torchaudio.functional (#2228) · 5d06a369

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2228

Reviewed By: mthrok

Differential Revision: D34474018

Pulled By: nateanl

fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb

5d06a369

Add psd method to torchaudio.functional (#2227) · 07bd1aa3

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``psd`` method to ``torchaudio.functional``.
It computes the power spectral density (PSD) matrix of the complex-valued spectrum.
The method also supports normalization of Time-Frequency mask.

Pull Request resolved: https://github.com/pytorch/audio/pull/2227

Reviewed By: mthrok

Differential Revision: D34473908

Pulled By: nateanl

fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9

07bd1aa3

29 Dec, 2021 1 commit

Add parameter p to TimeMasking (#2090) · 1ec7ff73

hwangjeff authored Dec 29, 2021

Summary:
Adds parameter `p` to `TimeMasking` to allow for enforcing an upper bound on the proportion of time steps that it can mask. This behavior is consistent with the specifications provided in the SpecAugment paper (https://arxiv.org/abs/1904.08779).

Pull Request resolved: https://github.com/pytorch/audio/pull/2090

Reviewed By: carolineechen

Differential Revision: D33344772

Pulled By: hwangjeff

fbshipit-source-id: 6ff65f5304e489fa1c23e15c3d96b9946229fdcf

1ec7ff73

23 Dec, 2021 1 commit

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

03 Dec, 2021 1 commit

Adding warnings in mu_law* for the wrong input type (#2034) · 338d38a2

Joao Gomes authored Dec 03, 2021

Summary:
Addresses  https://github.com/pytorch/audio/issues/1493

cc mthrok hwangjeff

Pull Request resolved: https://github.com/pytorch/audio/pull/2034

Reviewed By: hwangjeff

Differential Revision: D32807006

Pulled By: mthrok

fbshipit-source-id: badf148646c5f768328c5a4e51bd6016b0be46f3

338d38a2

10 Nov, 2021 1 commit
- [BC-Breaking] Remove deprecated create_fb_matrix (#1998) · 22379d14
  Krishna Kalyan authored Nov 10, 2021
  
  22379d14
03 Nov, 2021 2 commits

[BC-Breaking] Drop pseudo complex support from phase_vocoder / TimeStretch (#1957) · d3e146fd
moto authored Nov 03, 2021
```
Following the plan #1337, this commit drops the support for pseudo complex type from `F.phase_vocoder` and `T.TimeStretch`.
```
d3e146fd

[BC-Breaking] Drop pseudo complex support from spectrogram (#1958) · 5ec6ada6

moto authored Nov 03, 2021

Following the plan #1337, this commit drops the support for pseudo complex type from 
`F.spectrogram` and `T.Spectrogram`.

It also deprecates the use of `return_complex` argument.

5ec6ada6

28 Oct, 2021 1 commit
- Remove F.complex_norm and T.ComplexNorm (#1942) · ab50909d
  S Harish authored Oct 28, 2021
  
  ab50909d
27 Oct, 2021 1 commit
- Remove deprecated F.angle (#1935) · 1d3dcdbd
  S Harish authored Oct 27, 2021
  
  1d3dcdbd
26 Oct, 2021 1 commit
- Remove deprecated `F.magphase` (#1934) · d35ea80e
  S Harish authored Oct 26, 2021
  
  d35ea80e
18 Oct, 2021 1 commit
- [DOC] Standardization and minor fixes (#1892) · cb40dd72
  Caroline Chen authored Oct 18, 2021
  
  cb40dd72
16 Oct, 2021 1 commit
- Add filter bank figures (#1891) · 89aeb686
  moto authored Oct 16, 2021
  
  89aeb686
13 Oct, 2021 1 commit
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 25a8adf6
  Caroline Chen authored Oct 13, 2021
  
  25a8adf6
12 Oct, 2021 2 commits
- Use integer rates in pitch shift resample (#1861) · e8ed8f46
  Caroline Chen authored Oct 12, 2021
  
  e8ed8f46
- [BC-Breaking] Replace waveform with specgram in SlidingWindowCmn (#1859) · 0cc28748
  nateanl authored Oct 12, 2021
  
  0cc28748
07 Oct, 2021 2 commits
- Standardize tensor shapes format in docs (#1838) · 21a0d29e
  Caroline Chen authored Oct 07, 2021
  
  21a0d29e
- Update RNNT Loss docs and add example (#1835) · 33a655fd
  Caroline Chen authored Oct 07, 2021
  
  33a655fd
02 Sep, 2021 1 commit
- Standardize optional types in docstrings (#1746) · 768432c3
  Caroline Chen authored Sep 02, 2021
  
  768432c3
19 Aug, 2021 1 commit
- Move RNNT Loss out of prototype (#1711) · 2c115821
  Caroline Chen authored Aug 19, 2021
  
  2c115821
11 Aug, 2021 1 commit

Add InverseSpectrogram to transforms and functional (#1652) · 6e0af713

nateanl authored Aug 11, 2021



- Provide InverseSpectrogram module that corresponds to Spectrogram module
- Add length parameter to the forward method in transforms
Co-authored-by: dgenzel <dgenzel@fb.com>
Co-authored-by: Zhaoheng Ni <zni@fb.com>

6e0af713

02 Aug, 2021 1 commit

Add melscale_fbanks and deprecate create_fb_matrix (#1653) · 83dc5ec7

Joel Frank authored Aug 02, 2021

- Renamed torchaudio.functional.create_fb_matrix to torchaudio.functional.melscale_fbanks.
- Added interface with a warning for create_fb_matrix

83dc5ec7

29 Jul, 2021 1 commit
- Add LFCC feature to transforms (#1611) · 86370639
  Joel Frank authored Jul 29, 2021
```
Summary:
- Add linear_fbank method
- Add LFCC in transforms
```
  86370639
27 Jul, 2021 1 commit
- Simplify axis value checks (#1501) · d1d6dbc6
  Zack Kneupper authored Jul 27, 2021
  
  d1d6dbc6
16 Jul, 2021 1 commit
- Add PitchShift to functional and transform (#1629) · f5dbb002
  nateanl authored Jul 16, 2021
  
  f5dbb002
25 Jun, 2021 1 commit
- Add edit_distance · 6bfd83b4
  yangarbiter authored Jun 25, 2021
  
  6bfd83b4
14 Jun, 2021 1 commit
- add name of paper before reference. (#1575) · e39ece66
  Vincent QB authored Jun 14, 2021
  
  e39ece66
04 Jun, 2021 2 commits

[BC-Breaking] Default to native complex type when returning raw spect… (#1549) · 5432a3f5

moto authored Jun 04, 2021

* [BC-Breaking] Default to native complex type when returning raw spectrogram

Part of https://github.com/pytorch/audio/issues/1337 .

- This code changes the return type of spectrogram to be native complex dtype,
when (and only when) returning raw (complex-valued) spectrogram.
- Change `return_complex=False` to `return_complex=True` in spectrogram ops.
- `return_complex` is only effective when `power` is `None`. It is ignored for
cases where `power` is not `None`. Because the returned Tensor is power spectrogram,
which is real-valued Tensors.

5432a3f5

Set removal version of pseudo complex support (#1553) · f2a4aac0
moto authored Jun 04, 2021

f2a4aac0