Commits · 667c6a9ee4eeb5f786426e2821ebc31731b4847d · OpenDAS / Torchaudio

10 May, 2023 2 commits

Update `torchaudio` doc and tutorial (#3285) · 667c6a9e

moto authored May 09, 2023

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

667c6a9e

[BC-Breaking] Update InverseMelScale solution (#3280) · 5a85a461

Zhaoheng Ni authored May 09, 2023

Summary:
Address https://github.com/pytorch/audio/issues/2643

- replace `SGD` optimization with `torch.linalg.lstsq` which is much faster.
- Add autograd test for `InverseMelScale`
- update other tests

Pull Request resolved: https://github.com/pytorch/audio/pull/3280

Reviewed By: hwangjeff

Differential Revision: D45679988

Pulled By: nateanl

fbshipit-source-id: a42e8bff9dc0f38e47e0482fd8a2aad902eedd59

5a85a461

09 May, 2023 6 commits

Remove NumPy from conda build env (#3315) · 282ed27a

moto authored May 09, 2023

Summary:
NumPy is an optional runtime dependency of TorchAudio, and it is not required at build time.

Pull Request resolved: https://github.com/pytorch/audio/pull/3315

Reviewed By: nateanl

Differential Revision: D45702243

Pulled By: mthrok

fbshipit-source-id: 6ca6598931764c46be6323868e8cce7c8adc5024

282ed27a

Refactor StreamReader/Writer PyBinding (#3296) · 8d7268f1

Moto Hira authored May 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3296

Reviewed By: hwangjeff

Differential Revision: D45503774

fbshipit-source-id: 806c22bd0f54fd0cea43d61ef3dbedd67ffeb012

8d7268f1

Add StreamReaderCustomIO (#3320) · 007cca23

Moto Hira authored May 09, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3320

Add StreamReaderCustomIO, which is analogous to StreamWriterCustomIO and which takes custom read/seek functions to fetch media data.

Reviewed By: hwangjeff

Differential Revision: D45482843

fbshipit-source-id: 3ccf771c0fdce153aaa7551053e9a77facedc983

007cca23

Refactor StreamWriterCustomIO (#3319) · 51767917

Moto Hira authored May 09, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3319

* Merge the source with StreamWriter
* Add docstrings
* Move CustomIO to detail::CustomOutput to prepare for adding CustomInput

Reviewed By: hwangjeff

Differential Revision: D45481807

fbshipit-source-id: 4a9ac8a57acda47b126f8ae18e607b72919f9988

51767917

Fix batch consistency test for InverseBarkScale (#3322) · 51cc1cbf

Zhaoheng Ni authored May 09, 2023

Summary:
The batch consistency test function should call `InverseBarkScale` instead of `InverseMelScale`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3322

Reviewed By: mthrok

Differential Revision: D45691769

Pulled By: nateanl

fbshipit-source-id: 4a1ed80c4a56c3a847a49a8d02f8b5cbe4f09045

51cc1cbf

[BE] Add description to wheel package (#3321) · 3a49a2d2

Nikita Shulga authored May 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3321

Reviewed By: atalman, mthrok

Differential Revision: D45673225

Pulled By: malfet

fbshipit-source-id: f2b915f3307ba95445702e3018254ad254fe2bb3

3a49a2d2

05 May, 2023 6 commits

fix doc of specaugment transform (#3314) · a8dc4de5

Xiaohui Zhang authored May 05, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3314

Reviewed By: nateanl

Differential Revision: D45621958

Pulled By: xiaohui-zhang

fbshipit-source-id: 17555a865790adadc2abd40a86571596386a12fc

a8dc4de5

Update squim tutorial (#3313) · 05ef7dc6

Zhaoheng Ni authored May 05, 2023

Summary:
Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.

Pull Request resolved: https://github.com/pytorch/audio/pull/3313

Reviewed By: hwangjeff

Differential Revision: D45620311

Pulled By: nateanl

fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557

05ef7dc6

Add SpecAugment transform (#3309) · 82febc59

Xiaohui Zhang authored May 05, 2023

Summary:
(2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this.

To solve these issues, here we
[done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
[done in this PR] Introducing SpecAugment transform.

Pull Request resolved: https://github.com/pytorch/audio/pull/3309

Reviewed By: nateanl

Differential Revision: D45592926

Pulled By: xiaohui-zhang

fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2

82febc59

Fix missing PTS initialization with NVIDIA encoder (#3312) · 1e3af12f

huyao authored May 05, 2023

Summary:
Fix **Failed to write packet (Invalid argument)** error when encoding FLV video streams using NVIDIA hardware encoders.

Resolve https://github.com/pytorch/audio/issues/3311

Pull Request resolved: https://github.com/pytorch/audio/pull/3312

Reviewed By: nateanl

Differential Revision: D45611656

Pulled By: mthrok

fbshipit-source-id: 531a83a27d3b19ed9e9aedd161769c60aa0bd175

1e3af12f

Fix doc version (#3310) · bfb47017

moto authored May 05, 2023

Summary:
Fixes the regression caused by build_doc job GHA migration. The version number is not properly set.

Pull Request resolved: https://github.com/pytorch/audio/pull/3310

Reviewed By: nateanl

Differential Revision: D45607829

Pulled By: mthrok

fbshipit-source-id: 3450a38fa6982fcc56676a80144e9eed1aad02ec

bfb47017

Fix MKL issue on Intel mac build (#3307) · 3e897ca7

moto authored May 05, 2023

Summary:
* Remove MKL and NumPy from Conda build env
* Remove `caffe2::mkl` dependency from `torch_cpu`, which introduced unnecessary and undesided dependency on Intel mac.

TorchAudio does not use BLAS libraries directly, thus all the mentions to MKL should be removed from the codebase.
However, this was causing an issue on Intel mac. It turned out that `torch_cpu` target is pulling `caffe2::mkl` dependency, and the linker on macOS keeps library dependency even if no symbol from that library is used. This stray mkl dependency should be fixed on core side, but also we can modify the target temporarily and remove them.

Also we don't need NumPy on build/run time, so that is removed as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/3307

Reviewed By: atalman

Differential Revision: D45606944

Pulled By: mthrok

fbshipit-source-id: 853411ccbbca31796b808a2b052b4cfa564718cd

3e897ca7

04 May, 2023 3 commits

Add older mkl build contraint only (#3302) · 1e48af06

atalman authored May 04, 2023

Summary:
Similar to what we used to have here:
https://github.com/pytorch/test-infra/pull/3896/files

Pull Request resolved: https://github.com/pytorch/audio/pull/3302

Reviewed By: nateanl

Differential Revision: D45574845

Pulled By: atalman

fbshipit-source-id: 142c35dfd811a5f5c170dcd082bec8d055edd9cb

1e48af06

Add mkl dependency to torchaudio MacOS x86 builds (#3300) · b5795943

atalman authored May 04, 2023

Summary:
Add mkl dependency to torchaudio MacOS x86 builds

Already tested here: https://github.com/pytorch/audio/actions/runs/4878179835/jobs/8703586137

Pull Request resolved: https://github.com/pytorch/audio/pull/3300

Reviewed By: jeanschmidt, mthrok

Differential Revision: D45566352

Pulled By: atalman

fbshipit-source-id: a0376016506891240b2dd03d4fa4889028bf764b

b5795943

Extend mask_along_axis{,_iid} (#3289) · 74bd971a

Xiaohui Zhang authored May 04, 2023

Summary:
(1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design.

To solve these issues, here we
- Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.

The introduction of SpecAugment transform will be done in another PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/3289

Reviewed By: hwangjeff

Differential Revision: D45460357

Pulled By: xiaohui-zhang

fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3

74bd971a

03 May, 2023 4 commits

Fix lint and format PR label message (#3299) · c51f20f9

moto authored May 03, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3299

Reviewed By: xiaohui-zhang

Differential Revision: D45530945

Pulled By: mthrok

fbshipit-source-id: 3443e4de693898534687b26ee1a9376ff86651f9

c51f20f9

[AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 76f91135
generatedunixname89002005367269 authored May 03, 2023
```
Reviewed By: adamjernst

Differential Revision: D45522319

fbshipit-source-id: d73a137c8738a215cc711ad39461f5b2f9ba76da
```
76f91135

Remove build doc job from CCI (#3293) · bc9451e6

moto authored May 03, 2023

Summary:
https://github.com/pytorch/audio/pull/3292 migrates the doc deployment to GHA.

Pull Request resolved: https://github.com/pytorch/audio/pull/3293

Reviewed By: xiaohui-zhang

Differential Revision: D45527256

Pulled By: mthrok

fbshipit-source-id: 18eb2580243b6b842147caaac10b3d28aa3d6dd0

bc9451e6

Fix doc deloyment condition (#3294) · 171a65d3

moto authored May 03, 2023

Summary:
Follow-up of https://github.com/pytorch/audio/issues/3292

Doc deployment is gated by branch_name == nightly, but nightly branch fires push and PR events and there will be two deployment jobs.

This commit specify push event.

Pull Request resolved: https://github.com/pytorch/audio/pull/3294

Reviewed By: hwangjeff

Differential Revision: D45501983

Pulled By: mthrok

fbshipit-source-id: 8eb66b463800f6a30affafb27f5f2448a561cfe1

171a65d3

02 May, 2023 3 commits

[Nova] Add windows conda workflows (#3288) · d8a095ef

atalman authored May 02, 2023

Summary:
[Nova] Add windows conda workflows
Same as: https://github.com/pytorch/vision/pull/7547

Pull Request resolved: https://github.com/pytorch/audio/pull/3288

Reviewed By: osalpekar

Differential Revision: D45456203

Pulled By: atalman

fbshipit-source-id: 067fd3b9abaeb9b7b0cd45c05b7c72982dfbfe0f

d8a095ef

Deploy documentation from GHA (#3292) · 79d6795d

moto authored May 02, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3292

Reviewed By: nateanl

Differential Revision: D45492729

Pulled By: mthrok

fbshipit-source-id: 11578166854c01deb50a6011550a91b87b426385

79d6795d

adjust reference PR labels and add labeling guidance (#3162) · 37f2b4f0

Xiaohui Zhang authored May 01, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3162

Reviewed By: mthrok

Differential Revision: D43964995

Pulled By: xiaohui-zhang

fbshipit-source-id: bba8fffe25f2f39f558f080fef319b1df4c6e440

37f2b4f0

01 May, 2023 2 commits

Adding win wheels builds (#3287) · 27471a02

atalman authored May 01, 2023

Summary:
Adding win wheels builds
Same as : https://github.com/pytorch/vision/pull/7540

Pull Request resolved: https://github.com/pytorch/audio/pull/3287

Reviewed By: osalpekar

Differential Revision: D45452770

Pulled By: atalman

fbshipit-source-id: e70ad3a8f456e805b46da3d1752c42208dadb8da

27471a02

Add CUDA 12.1 builds (#3284) · 795bdc2e

pbialecki authored May 01, 2023

Summary:
CC atalman malfet

Pull Request resolved: https://github.com/pytorch/audio/pull/3284

Reviewed By: mthrok

Differential Revision: D45444670

Pulled By: atalman

fbshipit-source-id: d0cf8696a99000c2b9a7e41ceeb781f5a54daeda

795bdc2e

29 Apr, 2023 1 commit

Add tutorial for TorchAudio-SQUIM pipelines (#3279) · 9b93e7df

Zhaoheng Ni authored Apr 29, 2023

Summary:
The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS).

Pull Request resolved: https://github.com/pytorch/audio/pull/3279

Reviewed By: hwangjeff

Differential Revision: D45415404

Pulled By: nateanl

fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903

9b93e7df

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

25 Apr, 2023 1 commit

Introduce StreamWriterCustomIO (#3277) · 151ac4d8

Jeff Hwang authored Apr 25, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3277

Adds `StreamWriterCustomIO` to support encoding and writing media to arbitrary destinations.

Reviewed By: mthrok

Differential Revision: D44904807

fbshipit-source-id: 23a47531973a7dce0638feb825d38c81d46dc02f

151ac4d8

19 Apr, 2023 2 commits

Update url for collect_env.py (#3271) · 5472cdae

Zhaoheng Ni authored Apr 19, 2023

Summary:
The `master` branch of PyTorch has been updated to `main` recently. The url of `collect_env.py` in the new issue page should be updated as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/3271

Reviewed By: xiaohui-zhang

Differential Revision: D45087038

Pulled By: nateanl

fbshipit-source-id: 167262ae6ed179baabcf55064fc5f0f0ac3b0be9

5472cdae

Amend StreamReader docs to reflect deprecation of tensor decoding (#3272) · 70350a69

hwangjeff authored Apr 18, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3272

Reviewed By: mthrok

Differential Revision: D45095440

Pulled By: hwangjeff

fbshipit-source-id: 135eb0f5d9047bf172563a9a05a9d2e323796d4d

70350a69

18 Apr, 2023 1 commit

Add multi-channel DNN beamforming training recipe (#3036) · 94f5027e

nateanl authored Apr 18, 2023

Summary:
The PR adds the training recipe of DNN beamforming for multi-channel speech enhancement.

Pull Request resolved: https://github.com/pytorch/audio/pull/3036

Reviewed By: hwangjeff

Differential Revision: D45061841

Pulled By: nateanl

fbshipit-source-id: 48ede5dd579efe200669dbc83e9cb4dea809e4b4

94f5027e

12 Apr, 2023 3 commits

Merge key_padding_mask into attn_mask_rel_pos in WavLM (#3265) · d5b2996b

Zhaoheng Ni authored Apr 12, 2023

Summary:
When `key_padding_mask` is not `None`, it needs to be combined with `attn_mask_rel_pos` as one mask for `scaled_dot_product_attention` function.

Pull Request resolved: https://github.com/pytorch/audio/pull/3265

Reviewed By: hwangjeff

Differential Revision: D44901093

Pulled By: nateanl

fbshipit-source-id: 73ca7af48faf7f4eb36b35b603187a11e5582c70

d5b2996b

Allow overwrite temp data in ffmpeg test (#3263) · cc7b8bd4

moto authored Apr 11, 2023

Summary:
When `TORCHAUDIO_TEST_TEMP_DIR` is set,
all the unit test temporary data are stored in the  given directory.
Running unit tests multiple times reuses the
directory and the temporary files from the
previous test runs are found there.

FFmpeg save test writes reference data to the
temporary directory, but it is not given the
overwrite flag ("-y"), so it fails in such cases.

This commit fixes that.

Pull Request resolved: https://github.com/pytorch/audio/pull/3263

Reviewed By: hwangjeff

Differential Revision: D44859003

Pulled By: mthrok

fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b

cc7b8bd4

Specify backend directly in test (#3262) · 563e409c

moto authored Apr 11, 2023

Summary:
Preparation to land https://github.com/pytorch/audio/pull/3241

This commit applies patch to make the sox_io TorchScript test pass when dispatcher is enabled.

Pull Request resolved: https://github.com/pytorch/audio/pull/3262

Reviewed By: hwangjeff

Differential Revision: D44897513

Pulled By: mthrok

fbshipit-source-id: 9b65f705cd02324328a2bc1c414aa4b7ca0fed32

563e409c

11 Apr, 2023 2 commits

Fix nightly doc build (CircleCI) (#3258) · 4b0254ba

moto authored Apr 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3258

Reviewed By: nateanl

Differential Revision: D44859397

Pulled By: mthrok

fbshipit-source-id: 361ac6a8c7092cc753f77d7745ec178760e8b9c3

4b0254ba

Update windows build doc (#3257) · 623e33d9

moto authored Apr 11, 2023

Summary:
GCC should not be used when building FFmpeg for torchaudio, as torchaudio uses MSVC (cl.exe)

Pull Request resolved: https://github.com/pytorch/audio/pull/3257

Reviewed By: nateanl

Differential Revision: D44835169

Pulled By: mthrok

fbshipit-source-id: 038c70caae58cec47dd2d6d08b8244c193104eda

623e33d9

10 Apr, 2023 3 commits

Use scaled_dot_product_attention in WavLM attention (#3252) · adb03385

Zhaoheng Ni authored Apr 10, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3219.

`torch.nn.MultiheadAttention` will throw an error if `torch.no_grad()` and mask are both given. The pull request fixes it by replacing the forward method with `torch.nn.functional.scaled_dot_product_attention`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3252

Reviewed By: mthrok

Differential Revision: D44798634

Pulled By: nateanl

fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665

adb03385

Use scaled_dot_product_attention in Wav2vec2/HuBERT's SelfAttention (#3253) · 94cc4bd9

Zhaoheng Ni authored Apr 10, 2023

Summary:
Replace the attention computation with `torch.nn.functional.scaled_dot_product_attention` to improve running efficiency.

Pull Request resolved: https://github.com/pytorch/audio/pull/3253

Reviewed By: mthrok

Differential Revision: D44800353

Pulled By: nateanl

fbshipit-source-id: 41550d868c809099aadbe812b0ebe2c38121efb8

94cc4bd9

Update description of Squim pipelines (#3254) · 5a5b0fc3

Zhaoheng Ni authored Apr 10, 2023

Summary:
- Add citations of [`TorchAudio-Squim`](https://arxiv.org/abs/2304.01448) publication.
- Update descriptions in the `SQUIM_OBJECTIVE` and `SQUIM_SUBJECTIVE` pipelines.

Pull Request resolved: https://github.com/pytorch/audio/pull/3254

Reviewed By: hwangjeff

Differential Revision: D44802015

Pulled By: nateanl

fbshipit-source-id: ca08298ec1eafefdd671ff2e010ef18f7372f9f8

5a5b0fc3