Commits · 3e897ca79d3c8081a575fb1dd31ccc57d23a77d7 · OpenDAS / Torchaudio

05 May, 2023 1 commit

Fix MKL issue on Intel mac build (#3307) · 3e897ca7

moto authored May 05, 2023

Summary:
* Remove MKL and NumPy from Conda build env
* Remove `caffe2::mkl` dependency from `torch_cpu`, which introduced unnecessary and undesided dependency on Intel mac.

TorchAudio does not use BLAS libraries directly, thus all the mentions to MKL should be removed from the codebase.
However, this was causing an issue on Intel mac. It turned out that `torch_cpu` target is pulling `caffe2::mkl` dependency, and the linker on macOS keeps library dependency even if no symbol from that library is used. This stray mkl dependency should be fixed on core side, but also we can modify the target temporarily and remove them.

Also we don't need NumPy on build/run time, so that is removed as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/3307

Reviewed By: atalman

Differential Revision: D45606944

Pulled By: mthrok

fbshipit-source-id: 853411ccbbca31796b808a2b052b4cfa564718cd

3e897ca7

04 May, 2023 3 commits

Add older mkl build contraint only (#3302) · 1e48af06

atalman authored May 04, 2023

Summary:
Similar to what we used to have here:
https://github.com/pytorch/test-infra/pull/3896/files

Pull Request resolved: https://github.com/pytorch/audio/pull/3302

Reviewed By: nateanl

Differential Revision: D45574845

Pulled By: atalman

fbshipit-source-id: 142c35dfd811a5f5c170dcd082bec8d055edd9cb

1e48af06

Add mkl dependency to torchaudio MacOS x86 builds (#3300) · b5795943

atalman authored May 04, 2023

Summary:
Add mkl dependency to torchaudio MacOS x86 builds

Already tested here: https://github.com/pytorch/audio/actions/runs/4878179835/jobs/8703586137

Pull Request resolved: https://github.com/pytorch/audio/pull/3300

Reviewed By: jeanschmidt, mthrok

Differential Revision: D45566352

Pulled By: atalman

fbshipit-source-id: a0376016506891240b2dd03d4fa4889028bf764b

b5795943

Extend mask_along_axis{,_iid} (#3289) · 74bd971a

Xiaohui Zhang authored May 04, 2023

Summary:
(1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design.

To solve these issues, here we
- Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.

The introduction of SpecAugment transform will be done in another PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/3289

Reviewed By: hwangjeff

Differential Revision: D45460357

Pulled By: xiaohui-zhang

fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3

74bd971a

03 May, 2023 4 commits

Fix lint and format PR label message (#3299) · c51f20f9

moto authored May 03, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3299

Reviewed By: xiaohui-zhang

Differential Revision: D45530945

Pulled By: mthrok

fbshipit-source-id: 3443e4de693898534687b26ee1a9376ff86651f9

c51f20f9

[AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 76f91135
generatedunixname89002005367269 authored May 03, 2023
```
Reviewed By: adamjernst

Differential Revision: D45522319

fbshipit-source-id: d73a137c8738a215cc711ad39461f5b2f9ba76da
```
76f91135

Remove build doc job from CCI (#3293) · bc9451e6

moto authored May 03, 2023

Summary:
https://github.com/pytorch/audio/pull/3292 migrates the doc deployment to GHA.

Pull Request resolved: https://github.com/pytorch/audio/pull/3293

Reviewed By: xiaohui-zhang

Differential Revision: D45527256

Pulled By: mthrok

fbshipit-source-id: 18eb2580243b6b842147caaac10b3d28aa3d6dd0

bc9451e6

Fix doc deloyment condition (#3294) · 171a65d3

moto authored May 03, 2023

Summary:
Follow-up of https://github.com/pytorch/audio/issues/3292

Doc deployment is gated by branch_name == nightly, but nightly branch fires push and PR events and there will be two deployment jobs.

This commit specify push event.

Pull Request resolved: https://github.com/pytorch/audio/pull/3294

Reviewed By: hwangjeff

Differential Revision: D45501983

Pulled By: mthrok

fbshipit-source-id: 8eb66b463800f6a30affafb27f5f2448a561cfe1

171a65d3

02 May, 2023 3 commits

[Nova] Add windows conda workflows (#3288) · d8a095ef

atalman authored May 02, 2023

Summary:
[Nova] Add windows conda workflows
Same as: https://github.com/pytorch/vision/pull/7547

Pull Request resolved: https://github.com/pytorch/audio/pull/3288

Reviewed By: osalpekar

Differential Revision: D45456203

Pulled By: atalman

fbshipit-source-id: 067fd3b9abaeb9b7b0cd45c05b7c72982dfbfe0f

d8a095ef

Deploy documentation from GHA (#3292) · 79d6795d

moto authored May 02, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3292

Reviewed By: nateanl

Differential Revision: D45492729

Pulled By: mthrok

fbshipit-source-id: 11578166854c01deb50a6011550a91b87b426385

79d6795d

adjust reference PR labels and add labeling guidance (#3162) · 37f2b4f0

Xiaohui Zhang authored May 01, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3162

Reviewed By: mthrok

Differential Revision: D43964995

Pulled By: xiaohui-zhang

fbshipit-source-id: bba8fffe25f2f39f558f080fef319b1df4c6e440

37f2b4f0

01 May, 2023 2 commits

Adding win wheels builds (#3287) · 27471a02

atalman authored May 01, 2023

Summary:
Adding win wheels builds
Same as : https://github.com/pytorch/vision/pull/7540

Pull Request resolved: https://github.com/pytorch/audio/pull/3287

Reviewed By: osalpekar

Differential Revision: D45452770

Pulled By: atalman

fbshipit-source-id: e70ad3a8f456e805b46da3d1752c42208dadb8da

27471a02

Add CUDA 12.1 builds (#3284) · 795bdc2e

pbialecki authored May 01, 2023

Summary:
CC atalman malfet

Pull Request resolved: https://github.com/pytorch/audio/pull/3284

Reviewed By: mthrok

Differential Revision: D45444670

Pulled By: atalman

fbshipit-source-id: d0cf8696a99000c2b9a7e41ceeb781f5a54daeda

795bdc2e

29 Apr, 2023 1 commit

Add tutorial for TorchAudio-SQUIM pipelines (#3279) · 9b93e7df

Zhaoheng Ni authored Apr 29, 2023

Summary:
The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS).

Pull Request resolved: https://github.com/pytorch/audio/pull/3279

Reviewed By: hwangjeff

Differential Revision: D45415404

Pulled By: nateanl

fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903

9b93e7df

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

25 Apr, 2023 1 commit

Introduce StreamWriterCustomIO (#3277) · 151ac4d8

Jeff Hwang authored Apr 25, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3277

Adds `StreamWriterCustomIO` to support encoding and writing media to arbitrary destinations.

Reviewed By: mthrok

Differential Revision: D44904807

fbshipit-source-id: 23a47531973a7dce0638feb825d38c81d46dc02f

151ac4d8

19 Apr, 2023 2 commits

Update url for collect_env.py (#3271) · 5472cdae

Zhaoheng Ni authored Apr 19, 2023

Summary:
The `master` branch of PyTorch has been updated to `main` recently. The url of `collect_env.py` in the new issue page should be updated as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/3271

Reviewed By: xiaohui-zhang

Differential Revision: D45087038

Pulled By: nateanl

fbshipit-source-id: 167262ae6ed179baabcf55064fc5f0f0ac3b0be9

5472cdae

Amend StreamReader docs to reflect deprecation of tensor decoding (#3272) · 70350a69

hwangjeff authored Apr 18, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3272

Reviewed By: mthrok

Differential Revision: D45095440

Pulled By: hwangjeff

fbshipit-source-id: 135eb0f5d9047bf172563a9a05a9d2e323796d4d

70350a69

18 Apr, 2023 1 commit

Add multi-channel DNN beamforming training recipe (#3036) · 94f5027e

nateanl authored Apr 18, 2023

Summary:
The PR adds the training recipe of DNN beamforming for multi-channel speech enhancement.

Pull Request resolved: https://github.com/pytorch/audio/pull/3036

Reviewed By: hwangjeff

Differential Revision: D45061841

Pulled By: nateanl

fbshipit-source-id: 48ede5dd579efe200669dbc83e9cb4dea809e4b4

94f5027e

12 Apr, 2023 3 commits

Merge key_padding_mask into attn_mask_rel_pos in WavLM (#3265) · d5b2996b

Zhaoheng Ni authored Apr 12, 2023

Summary:
When `key_padding_mask` is not `None`, it needs to be combined with `attn_mask_rel_pos` as one mask for `scaled_dot_product_attention` function.

Pull Request resolved: https://github.com/pytorch/audio/pull/3265

Reviewed By: hwangjeff

Differential Revision: D44901093

Pulled By: nateanl

fbshipit-source-id: 73ca7af48faf7f4eb36b35b603187a11e5582c70

d5b2996b

Allow overwrite temp data in ffmpeg test (#3263) · cc7b8bd4

moto authored Apr 11, 2023

Summary:
When `TORCHAUDIO_TEST_TEMP_DIR` is set,
all the unit test temporary data are stored in the  given directory.
Running unit tests multiple times reuses the
directory and the temporary files from the
previous test runs are found there.

FFmpeg save test writes reference data to the
temporary directory, but it is not given the
overwrite flag ("-y"), so it fails in such cases.

This commit fixes that.

Pull Request resolved: https://github.com/pytorch/audio/pull/3263

Reviewed By: hwangjeff

Differential Revision: D44859003

Pulled By: mthrok

fbshipit-source-id: 2db92fbdec1c015455f3779e10a18f7f1146166b

cc7b8bd4

Specify backend directly in test (#3262) · 563e409c

moto authored Apr 11, 2023

Summary:
Preparation to land https://github.com/pytorch/audio/pull/3241

This commit applies patch to make the sox_io TorchScript test pass when dispatcher is enabled.

Pull Request resolved: https://github.com/pytorch/audio/pull/3262

Reviewed By: hwangjeff

Differential Revision: D44897513

Pulled By: mthrok

fbshipit-source-id: 9b65f705cd02324328a2bc1c414aa4b7ca0fed32

563e409c

11 Apr, 2023 2 commits

Fix nightly doc build (CircleCI) (#3258) · 4b0254ba

moto authored Apr 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3258

Reviewed By: nateanl

Differential Revision: D44859397

Pulled By: mthrok

fbshipit-source-id: 361ac6a8c7092cc753f77d7745ec178760e8b9c3

4b0254ba

Update windows build doc (#3257) · 623e33d9

moto authored Apr 11, 2023

Summary:
GCC should not be used when building FFmpeg for torchaudio, as torchaudio uses MSVC (cl.exe)

Pull Request resolved: https://github.com/pytorch/audio/pull/3257

Reviewed By: nateanl

Differential Revision: D44835169

Pulled By: mthrok

fbshipit-source-id: 038c70caae58cec47dd2d6d08b8244c193104eda

623e33d9

10 Apr, 2023 4 commits

Use scaled_dot_product_attention in WavLM attention (#3252) · adb03385

Zhaoheng Ni authored Apr 10, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3219.

`torch.nn.MultiheadAttention` will throw an error if `torch.no_grad()` and mask are both given. The pull request fixes it by replacing the forward method with `torch.nn.functional.scaled_dot_product_attention`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3252

Reviewed By: mthrok

Differential Revision: D44798634

Pulled By: nateanl

fbshipit-source-id: abfa7fb84b7bd71848a92ab26da5a5f0f095c665

adb03385

Use scaled_dot_product_attention in Wav2vec2/HuBERT's SelfAttention (#3253) · 94cc4bd9

Zhaoheng Ni authored Apr 10, 2023

Summary:
Replace the attention computation with `torch.nn.functional.scaled_dot_product_attention` to improve running efficiency.

Pull Request resolved: https://github.com/pytorch/audio/pull/3253

Reviewed By: mthrok

Differential Revision: D44800353

Pulled By: nateanl

fbshipit-source-id: 41550d868c809099aadbe812b0ebe2c38121efb8

94cc4bd9

Update description of Squim pipelines (#3254) · 5a5b0fc3

Zhaoheng Ni authored Apr 10, 2023

Summary:
- Add citations of [`TorchAudio-Squim`](https://arxiv.org/abs/2304.01448) publication.
- Update descriptions in the `SQUIM_OBJECTIVE` and `SQUIM_SUBJECTIVE` pipelines.

Pull Request resolved: https://github.com/pytorch/audio/pull/3254

Reviewed By: hwangjeff

Differential Revision: D44802015

Pulled By: nateanl

fbshipit-source-id: ca08298ec1eafefdd671ff2e010ef18f7372f9f8

5a5b0fc3

Get rid of (pseudo) hungarian notation (#3255) · a6602715

Moto Hira authored Apr 10, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3255

Prefixing what is always pointer with `p` does not improve readability...

Reviewed By: hwangjeff

Differential Revision: D44799531

fbshipit-source-id: bc2ce4e534009e2cb577719953207ddb82cf2d3d

a6602715

07 Apr, 2023 5 commits

Simplify FilterGraph interface (#3251) · 631bcc9f

Moto Hira authored Apr 07, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3251

Removes unnecessary media type check in FilterGraph.
Allows to define filters that have different media type for input and output.

Reviewed By: nateanl

Differential Revision: D44792340

fbshipit-source-id: e00497e0d30b5b3c3aacc66dd9b8c401757af288

631bcc9f

Tweak managed pointer interface (#3249) · ea78478e

Moto Hira authored Apr 07, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3249

- Put ptr member private so that it's more secure and subclasses won't mess with it
- Remove unused `reset` method
- Do not default construct the managed object
  - Introduce helper function for default allocation.
    (for AVFrame and AVPacket as they are allocated in both reader and writer)
  - for others, allocation logics are moved to where it is used.
- Remove unused `pHWBufferRef` attribute from `StreamWriter`.

Reviewed By: hwangjeff

Differential Revision: D44775297

fbshipit-source-id: ff6db528152cd54c1ae398191110c30b9c1e238c

ea78478e

Remove temp channel for python 3.11, simplify logic around cuda (#3250) · f7c8a7d3

atalman authored Apr 07, 2023

Summary:
Remove temp channel for python 3.11, simplify logic around cuda

Pull Request resolved: https://github.com/pytorch/audio/pull/3250

Reviewed By: mthrok

Differential Revision: D44788219

Pulled By: atalman

fbshipit-source-id: 421ff9e0bf1818b41e395708cc4589d4a9c865bd

f7c8a7d3

Introduce packet passthrough feature to streaming api (#3220) · 000878e0

Jeff Hwang authored Apr 07, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3220

Introduces methods to `StreamReader` and `StreamWriter` that allow for reading and writing `AVPacket` instances rather than tensors. Useful for efficiently remuxing data pulled as is from source.

Reviewed By: mthrok

Differential Revision: D44271536

fbshipit-source-id: 9b9d743c0119a5eb564fa628fd6a67806d120985

000878e0

Fix path normalization for StreamWriter-based save operation (#3248) · 9da92cdb

moto authored Apr 07, 2023

Summary:
Follow up of https://github.com/pytorch/audio/issues/3243. Save compat module had different semantics than info and load, which requires different way of performing path normalization.

Pull Request resolved: https://github.com/pytorch/audio/pull/3248

Reviewed By: hwangjeff

Differential Revision: D44774997

Pulled By: mthrok

fbshipit-source-id: 4b967ae3ca6b45850d455b8e95aaa31762c5457e

9da92cdb

06 Apr, 2023 2 commits

Remove custom flashlight import (#3246) · ae614ed3

moto authored Apr 06, 2023

Summary:
In https://github.com/pytorch/audio/pull/3232, the CTC decoder is excluded from binary distribution.
To use CTCDecoder, users need to install flashlight-text.

Currently, if flashlight-text is not available, torchaudio still attempts to import the custom bundle.
This commit clean up this behavior by delaying the error until one of the components is actually used,
and providing a better message.

Pull Request resolved: https://github.com/pytorch/audio/pull/3246

Test Plan:
Binary smoke tests import torchaudio without installing flashlight.
Unit test CI jobs run the CTC decoder with flashlight installed.

Reviewed By: jacobkahn

Differential Revision: D44748413

Pulled By: mthrok

fbshipit-source-id: 21d2cbd9961ed88405a739cc682071066712f5e4

ae614ed3

Add frame writing API to StreamWriter (#3244) · f4d94cab

Jeff Hwang authored Apr 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3244

Adds methods to `StreamWriter` that allow for passing in `AVFrame` instances rather than tensors.

Reviewed By: mthrok

Differential Revision: D44589256

fbshipit-source-id: f100e0d349708482b873a9a4bae1eaf5eb65301a

f4d94cab

05 Apr, 2023 2 commits

Fix path-like object support in FFmpeg dispatcher (#3243) · d69e8857

moto authored Apr 05, 2023

Summary:
In dispatcher mode, FFmpeg backend does not handle file-like object, and C++ implementation raises an issue.

This commit fixes it by normalizing file-like object to string.

Pull Request resolved: https://github.com/pytorch/audio/pull/3243

Reviewed By: nateanl

Differential Revision: D44719280

Pulled By: mthrok

fbshipit-source-id: 9dae459e2a5fb4992b4ef53fe4829fe8c35b2edd

d69e8857

Remove source for flashlight-text bundle (#3236) · 5053aa7f

moto authored Apr 05, 2023

Summary:
Following https://github.com/pytorch/audio/pull/3232, static build of flashlight-text has been disabled and removed from nightly build.

This commit removes the related source/build from torchaudio code base.

Pull Request resolved: https://github.com/pytorch/audio/pull/3236

Reviewed By: jacobkahn

Differential Revision: D44712539

Pulled By: mthrok

fbshipit-source-id: a201c89b5046f224526309cd4e17a5105e58a949

5053aa7f

04 Apr, 2023 3 commits

[BC-breaking] Make I/O optional arguments kw-only (#3227) · ab40a3a3

moto authored Apr 04, 2023

Summary:
Recently, we added bunch of options to make StreamReader/Writer flexible. As a result, their methods have many number of arguments, and some of them have semantic grouping.

For example, the arguments of ``StreamWriter.add_video_stream`` are roughly grouped as follow;

- Information about input media format
   `frame_rate`, `width`, `height`, `format`
- Information about encoder
   `encoder`, `encoder_option`
- Information about codec configuration
   `codec_config`
- Information about encode media format
   `encoder_format`, `encoder_frame_rate`, `encoder_width`, `encoder_height`
- Information about additional processing
   `filter_desc`
- Hardware acceleration
   `hw_accel`

We do not know what arguments will be added in the future, but when we do,
we want to keep them roughly grouped, by inserting the new argument
somewhere in a middle without breaking backward compatibility.

This commit puts most of them in keyword-only argument, so that we can
rearrange them without breaking backward compatibility.

Pull Request resolved: https://github.com/pytorch/audio/pull/3227

Reviewed By: hwangjeff

Differential Revision: D44681620

Pulled By: mthrok

fbshipit-source-id: b55f6168f4c2f3d0f59731b9bb0db4ae54e5a90f

ab40a3a3

Disable CTC decoder bundle by default (#3232) · 3844a2bd

moto authored Apr 04, 2023

Summary:
As we migrate to use upstream flashlight-text and KenLM, this PR disable building CTC decoder by default.
This will stop shipping flashlight-text and KenLM bundle in torchaudio binary.

Ref: https://github.com/pytorch/audio/issues/3088

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/3232

Reviewed By: hwangjeff

Differential Revision: D44650872

Pulled By: mthrok

fbshipit-source-id: 2415623abaf3cafa181135db5112d3c711137cd7

3844a2bd

Swap in assertions for decoder setup checks (#3235) · ea212c6e

hwangjeff authored Apr 03, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3235

Reviewed By: mthrok

Differential Revision: D44653654

Pulled By: hwangjeff

fbshipit-source-id: f28a6068e826581d76ed4a216adb6019b6486e53

ea212c6e