Commits · 8a893fb3bbcb4b9707c7496b5e7547a0a6b2e288 · OpenDAS / Torchaudio

22 May, 2023 1 commit

Fix CPU kernel of forced_align function (#3354) · 8a893fb3

Zhaoheng Ni authored May 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3354

when start ==0, the first item instead of Sth item of t row in backPtr_a should be 0.

Reviewed By: xiaohui-zhang

Differential Revision: D46059971

fbshipit-source-id: 89933134878513034eae033764b19f8562f24cb8

8a893fb3

20 May, 2023 1 commit

[audio][PR] Add forced_align function to torchaudio (#3348) · e7935cff

Zhaoheng Ni authored May 19, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3348

The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame.

Reviewed By: vineelpratap, xiaohui-zhang

Differential Revision: D45867265

fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb

e7935cff

04 May, 2023 1 commit

Extend mask_along_axis{,_iid} (#3289) · 74bd971a

Xiaohui Zhang authored May 04, 2023

Summary:
(1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design.

To solve these issues, here we
- Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.

The introduction of SpecAugment transform will be done in another PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/3289

Reviewed By: hwangjeff

Differential Revision: D45460357

Pulled By: xiaohui-zhang

fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3

74bd971a

17 Feb, 2023 1 commit

Make lengths optional for speed functions and modules (#3072) · 5af309d3

hwangjeff authored Feb 16, 2023

Summary:
Makes lengths input optional for `torchaudio.functional.speed`, `torchaudio.transforms.Speed`, and `torchaudio.transforms.SpeedPerturbation`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3072

Reviewed By: nateanl, mthrok

Differential Revision: D43371406

Pulled By: hwangjeff

fbshipit-source-id: ecb38bcc2bfff5c5a396a37eff238b22238e795a

5af309d3

15 Feb, 2023 1 commit

Enable broadcasting for inputs to convolve (#3061) · a49edea5

hwangjeff authored Feb 15, 2023

Summary:
Relaxes input dimension matching constraint on `convolve` to enable broadcasting for inputs.

Pull Request resolved: https://github.com/pytorch/audio/pull/3061

Reviewed By: mthrok

Differential Revision: D43298078

Pulled By: hwangjeff

fbshipit-source-id: a6cc36674754523b88390fac0a05f06562921319

a49edea5

24 Jan, 2023 1 commit

Move data augmentation functions out of prototype (#3001) · 41b88314

hwangjeff authored Jan 23, 2023

Summary:
Moves `add_noise`, `fftconvolve`, `convolve`, `speed`, `preemphasis`, and `deemphasis` out of `torchaudio.prototype.functional` and into `torchaudio.functional`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3001

Reviewed By: mthrok

Differential Revision: D42688971

Pulled By: hwangjeff

fbshipit-source-id: 43280bd3ffeccddae57f1092ac45afb64dd426cc

41b88314

16 Dec, 2022 1 commit

Rename resampling_method options (#2922) · e6bebe6a

Caroline Chen authored Dec 16, 2022

Summary:
resolves https://github.com/pytorch/audio/issues/2891

Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.

Pull Request resolved: https://github.com/pytorch/audio/pull/2922

Reviewed By: mthrok

Differential Revision: D42083619

Pulled By: carolineechen

fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70

e6bebe6a

08 Nov, 2022 1 commit

Enable log probs input for rnnt loss (#2798) · ca478823

Caroline Chen authored Nov 08, 2022

Summary:
Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss.

If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function.

The following should produce the same output:
```
rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True)
```

```
log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False)
```

testing -- unit tests + get same results on the conformer rnnt recipe

Pull Request resolved: https://github.com/pytorch/audio/pull/2798

Reviewed By: xiaohui-zhang

Differential Revision: D41083523

Pulled By: carolineechen

fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61

ca478823

28 Jul, 2022 1 commit

Add Union normalization parameter on spectrogram and inverse spectrogram (#2554) · 0fde7c57

Sean Kim authored Jul 28, 2022

Summary:
Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104

Pull Request resolved: https://github.com/pytorch/audio/pull/2554

Reviewed By: carolineechen, mthrok

Differential Revision: D38247554

Pulled By: skim0514

fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602

0fde7c57

01 Jun, 2022 1 commit

Move Seed to Setup (#2425) · ac82bdc4

Sean Kim authored Jun 01, 2022

Summary:
Bringing in move seed commit from previous open commit https://github.com/pytorch/audio/issues/2267. Organizes seed to utils.

Pull Request resolved: https://github.com/pytorch/audio/pull/2425

Reviewed By: carolineechen, nateanl

Differential Revision: D36787599

Pulled By: skim0514

fbshipit-source-id: 37a0d632d13d4336a830c4b98bdb04828ed88c20

ac82bdc4

15 May, 2022 1 commit

[codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402214

fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c

d62875cc

10 May, 2022 1 commit

Add diagonal_loading optional to rtf_power (#2369) · da1e83cc

Zhaoheng Ni authored May 10, 2022

Summary:
When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.

This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2369

Reviewed By: carolineechen

Differential Revision: D36204130

Pulled By: nateanl

fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420

da1e83cc

26 Feb, 2022 1 commit

Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.

Pull Request resolved: https://github.com/pytorch/audio/pull/2232

Reviewed By: mthrok

Differential Revision: D34474561

Pulled By: nateanl

fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d

9c56ffb4

25 Feb, 2022 5 commits

Add rtf_power method to torchaudio.functional (#2231) · ea74813d

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``rtf_power`` method to ``torchaudio.functional``.
The method computes the relative transfer function (RTF) or the steering vector by [the power iteration method](https://onlinelibrary.wiley.com/doi/abs/10.1002/zamm.19290090206).
[This paper](https://arxiv.org/pdf/2011.15003.pdf) describes the power iteration method in English.
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, number of iterations, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2231

Reviewed By: mthrok

Differential Revision: D34474503

Pulled By: nateanl

fbshipit-source-id: 47011427ec4373f808755f0e8eff1efca57655eb

ea74813d

Add rtf_evd method to torchaudio.functional (#2230) · 86fe4fa7

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds `rtf_evd` method to `torchaudio.functional`.
The method computes the relative transfer function (RTF) or the steering vector by eigenvalue decomposition.
The input argument is the power spectral density (PSD) matrix of the target speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2230

Reviewed By: mthrok

Differential Revision: D34474188

Pulled By: nateanl

fbshipit-source-id: 888df4b187608ed3c2b7271b34d2231cdabb0134

86fe4fa7

Add mvdr_weights_rtf to torchaudio.functional (#2229) · 3566ffc5

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_rtf`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution that applies relative transfer function (RTF). See [the paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf) for the reference.
The input arguments are the complex-valued RTF Tensor of the target speech, power spectral density (PSD) matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2229

Reviewed By: mthrok

Differential Revision: D34474119

Pulled By: nateanl

fbshipit-source-id: 2d6f62cd0858f29ed6e4e03c23dcc11c816204e2

3566ffc5

Add mvdr_weights_souden to torchaudio.functional (#2228) · 5d06a369

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2228

Reviewed By: mthrok

Differential Revision: D34474018

Pulled By: nateanl

fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb

5d06a369

Add psd method to torchaudio.functional (#2227) · 07bd1aa3

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``psd`` method to ``torchaudio.functional``.
It computes the power spectral density (PSD) matrix of the complex-valued spectrum.
The method also supports normalization of Time-Frequency mask.

Pull Request resolved: https://github.com/pytorch/audio/pull/2227

Reviewed By: mthrok

Differential Revision: D34473908

Pulled By: nateanl

fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9

07bd1aa3

29 Dec, 2021 1 commit

Add parameter p to TimeMasking (#2090) · 1ec7ff73

hwangjeff authored Dec 29, 2021

Summary:
Adds parameter `p` to `TimeMasking` to allow for enforcing an upper bound on the proportion of time steps that it can mask. This behavior is consistent with the specifications provided in the SpecAugment paper (https://arxiv.org/abs/1904.08779).

Pull Request resolved: https://github.com/pytorch/audio/pull/2090

Reviewed By: carolineechen

Differential Revision: D33344772

Pulled By: hwangjeff

fbshipit-source-id: 6ff65f5304e489fa1c23e15c3d96b9946229fdcf

1ec7ff73

23 Dec, 2021 1 commit

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

04 Nov, 2021 1 commit
- Doc fixes (#1982) · c670898c
  Caroline Chen authored Nov 04, 2021
  
  c670898c
03 Nov, 2021 1 commit
- [BC-Breaking] Drop pseudo complex support from phase_vocoder / TimeStretch (#1957) · d3e146fd
  moto authored Nov 03, 2021
```
Following the plan #1337, this commit drops the support for pseudo complex type from `F.phase_vocoder` and `T.TimeStretch`.
```
  d3e146fd
28 Oct, 2021 1 commit
- Remove F.complex_norm and T.ComplexNorm (#1942) · ab50909d
  S Harish authored Oct 28, 2021
  
  ab50909d
13 Oct, 2021 1 commit
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 25a8adf6
  Caroline Chen authored Oct 13, 2021
  
  25a8adf6
20 Aug, 2021 1 commit

Add basic filtfilt implementation (#1681) · 496b381a

hwangjeff authored Aug 20, 2021



* Add basic filtfilt implementation

* Add filtfilt to functional package; add tests
Co-authored-by: V G <vladislav.goncharenko@phystech.edu>

496b381a

19 Aug, 2021 1 commit
- Move RNNT Loss out of prototype (#1711) · 2c115821
  Caroline Chen authored Aug 19, 2021
  
  2c115821
10 Aug, 2021 1 commit
- Add batch support to lfilter (#1638) · 8094751f
  Chin-Yun Yu authored Aug 11, 2021
  
  8094751f
02 Aug, 2021 1 commit

Add melscale_fbanks and deprecate create_fb_matrix (#1653) · 83dc5ec7

Joel Frank authored Aug 02, 2021

- Renamed torchaudio.functional.create_fb_matrix to torchaudio.functional.melscale_fbanks.
- Added interface with a warning for create_fb_matrix

83dc5ec7

21 Jul, 2021 1 commit
- Add filterbanks support to lfilter (#1587) · aa0dd03b
  Chin-Yun Yu authored Jul 22, 2021
  
  aa0dd03b
16 Jul, 2021 1 commit
- Add PitchShift to functional and transform (#1629) · f5dbb002
  nateanl authored Jul 16, 2021
  
  f5dbb002
25 Jun, 2021 1 commit
- Add edit_distance · 6bfd83b4
  yangarbiter authored Jun 25, 2021
  
  6bfd83b4
04 Jun, 2021 1 commit
- Migrate resample tests from kaldi to functional (#1520) · 15a7f78c
  Caroline Chen authored Jun 03, 2021
  
  15a7f78c
01 Jun, 2021 1 commit
- Ensure resampling identity is unchanged (#1537) · fad19fab
  Caroline Chen authored Jun 01, 2021
  
  fad19fab
22 May, 2021 1 commit

fbsync (#1524) · ae9560da

parmeet authored May 22, 2021

* Remove `class FunctionalComplex` header accidentally re-introduced in #1490

ae9560da

11 May, 2021 1 commit
- Add warning for non-integer resampling frequencies (#1490) · 4b2de71f
  Caroline Chen authored May 11, 2021
  
  4b2de71f
06 May, 2021 1 commit
- Merge test classes for complex (#1491) · 7d45851d
  moto authored May 06, 2021
  
  7d45851d
03 May, 2021 1 commit

Ensure axis masking operations are not in-place (#1481) · 7fd5fce4

Caroline Chen authored May 03, 2021

It was reported in #1478 that spectrogram masking operations were done in-place and modified the original input tensors. This PR fixes this behavior and adds tests to ensure that the input tensor is not changed.

7fd5fce4

26 Apr, 2021 1 commit
- Run functional tests on GPU as well as CPU (#1475) · b5d80279
  Mark Saroufim authored Apr 26, 2021
  
  b5d80279
19 Apr, 2021 1 commit
- Refactor functional test (#1463) · b059f087
  dhthompson authored Apr 19, 2021
```
- Put functional test logic into one place, `functional_impl.py`
- Tidy imports
```
  b059f087
06 Apr, 2021 1 commit

Refactors functional test (#1435) · e9726f08

steveplazafb authored Apr 06, 2021

Merges lfilter and spectrogram classes together in the common implementation and modifies the cpu and gpu test definitions accordingly

e9726f08