Commits · c0702338f35f42d12e371b8f3bcb914b77470dd0 · OpenDAS / Torchaudio

22 May, 2023 3 commits

Update forced_align document (#3357) · c0702338

Zhaoheng Ni authored May 22, 2023

Summary:
- Fix latex formula rendering issue
- Add `devices` and `properties` tags
- Fix grammar

Pull Request resolved: https://github.com/pytorch/audio/pull/3357

Reviewed By: mthrok

Differential Revision: D46068633

Pulled By: nateanl

fbshipit-source-id: 80cb84508396fbcaf81c068228d46a24bb63b975

c0702338

Fix CPU kernel of forced_align function (#3354) · 8a893fb3

Zhaoheng Ni authored May 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3354

when start ==0, the first item instead of Sth item of t row in backPtr_a should be 0.

Reviewed By: xiaohui-zhang

Differential Revision: D46059971

fbshipit-source-id: 89933134878513034eae033764b19f8562f24cb8

8a893fb3

Add doc for forced_align (#3355) · 011f7f3d

Zhaoheng Ni authored May 21, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3355

Reviewed By: xiaohui-zhang

Differential Revision: D46060254

Pulled By: nateanl

fbshipit-source-id: c2e44f994739755daf049fe350dd24a987a9cc29

011f7f3d

21 May, 2023 2 commits

Revert D45960556: add CTC forced alignment API tutorial to torchaudio · f9b4f74f

Moto Hira authored May 20, 2023

Differential Revision:
D45960556

Original commit changeset: 93f2271f7130

Original Phabricator Diff: D45960556

fbshipit-source-id: d22883fbcf9c5f2bb5d49274bcc194bdffaca72a

f9b4f74f

add CTC forced alignment API tutorial to torchaudio (#3351) · 93adc3e4

Xiaohui Zhang authored May 20, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3351

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: vineelpratap, nateanl

Differential Revision: D45960556

fbshipit-source-id: 93f2271f71307404e6a7732385cf7d646dc8ceaa

93adc3e4

20 May, 2023 1 commit

[audio][PR] Add forced_align function to torchaudio (#3348) · e7935cff

Zhaoheng Ni authored May 19, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3348

The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame.

Reviewed By: vineelpratap, xiaohui-zhang

Differential Revision: D45867265

fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb

e7935cff

19 May, 2023 1 commit

Build and use GPU-enabled FFmpeg in doc CI (#3045) · 0db5ab25

moto authored May 19, 2023

Summary:
This commit add the step to build FFmpeg with GPU decoder in build_doc job so that we can use GPU decoder/encoder in documentations.

Pull Request resolved: https://github.com/pytorch/audio/pull/3045

Reviewed By: nateanl

Differential Revision: D45965739

Pulled By: mthrok

fbshipit-source-id: c167eb3ef347860a51efa906068fa2daa556f017

0db5ab25

17 May, 2023 4 commits

Improve the performance of YUV420P frame conversion (#3342) · 72d3fe09

moto authored May 17, 2023

Summary:
This commit improve the performance of conversions of YUV420P format from AVFrame to torch Tensor.

It changes two things;
1. Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
2.  Get rid of intermediate UV plane copy

The following compares the time it takes to process 30 seconds of YUV420P frame at 25 FPS of resolution 320x240. The measurement times are sorted by values.

Some observations
* `torch::nn::functional::interpolate` with `torch::kNearest` option is not as fast as copying data manually.
* switching from `interpolate` to manual data copy reduces the variance.

run | main | 1 | 1+2 | improvement (from main to 1+2)
-- | -- | -- | -- | --
1 | 0.452250583 | 0.417490125 | 0.40155375 | 11.21%
2 | 0.462039958 | 0.42006675 | 0.401764125 | 13.05%
3 | 0.463067666 | 0.42416 | 0.402651334 | 13.05%
4 | 0.464228166 | 0.424545458 | 0.402985667 | 13.19%
5 | 0.465777375 | 0.425629208 | 0.405604625 | 12.92%
6 | 0.469628666 | 0.427044333 | 0.40628525 | 13.49%
7 | 0.475935125 | 0.42805875 | 0.406412167 | 14.61%
8 | 0.482277667 | 0.429921209 | 0.407279 | 15.55%
9 | 0.496695208 | 0.431182792 | 0.442013791 | 11.01%
10 | 0.546653625 | 0.541639584 | 0.4711585 | 13.81%

[second]

Increasing the resolution, the improvement is smaller but is consistent.

run | main | 1+2 | improvement
-- | -- | -- | --
1 | 4.032393 | 3.991784667 | 1.01%
2 | 4.052248084 | 3.992672208 | 1.47%
3 | 4.07705575 | 4.000541666 | 1.88%
4 | 4.143954792 | 4.020671584 | 2.98%
5 | 4.170711959 | 4.025753125 | 3.48%
6 | 4.240229292 | 4.045504875 | 4.59%
7 | 4.267384042 | 4.045588125 | 5.20%
8 | 4.277025958 | 4.061980083 | 5.03%
9 | 4.312192042 | 4.163251959 | 3.45%
10 | 4.406109875 | 4.312560334 | 2.12%

<details><summary>code</summary>

```python
import time

from torchaudio.io import StreamReader

def test():
    r = StreamReader(src="testsrc=duration=30", format="lavfi")
    # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
    r.add_video_stream(-1, filter_desc="format=yuv420p")
    t0 = time.monotonic()
    r.process_all_packets()
    elapsed = time.monotonic() - t0
    print(elapsed)

for _ in range(10):
    test()
```
</details>

<details><summary>env</summary>

```
PyTorch version: 2.1.0.dev20230325
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.6
CMake version: version 3.22.1
Libc version: N/A

Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] torch==2.1.0.dev20230325
[pip3] torchaudio==2.1.0a0+541b525
[conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
[conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
```

</details>

Pull Request resolved: https://github.com/pytorch/audio/pull/3342

Reviewed By: xiaohui-zhang

Differential Revision: D45947716

Pulled By: mthrok

fbshipit-source-id: 17e5930f57544b4f2e48a9b2185464694a88ab68

72d3fe09

Improve the performance of NV12 frame conversion (#3344) · c11661e0

moto authored May 17, 2023

Summary:
Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion.

It changes two things;

- Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
- Get rid of intermediate UV plane copy

with 320x240

run | main | pr | improvement
-- | -- | -- | --
1 | 0.600671417 | 0.464993125 | 22.59%
2 | 0.638846084 | 0.456763542 | 28.50%
3 | 0.64158175 | 0.458295333 | 28.57%
4 | 0.649868584 | 0.455450583 | 29.92%
5 | 0.612171333 | 0.462435625 | 24.46%
6 | 0.6128095 | 0.456716166 | 25.47%
7 | 0.632084583 | 0.463357083 | 26.69%
8 | 0.610733083 | 0.46148625 | 24.44%
9 | 0.613825834 | 0.4559555 | 25.72%
10 | 0.653857458 | 0.455375375 | 30.36%

[second]

with 1080x720 video

run | main | pr | improvement
-- | -- | -- | --
1 | 4.984154333 | 4.21090375 | 15.51%
2 | 4.988090625 | 4.239649375 | 15.00%
3 | 4.988896375 | 4.227277458 | 15.27%
4 | 4.998186584 | 4.161077042 | 16.75%
5 | 5.06180425 | 4.191672584 | 17.19%
6 | 5.108769667 | 4.198468458 | 17.82%
7 | 5.151363625 | 4.181942167 | 18.82%
8 | 5.199527875 | 4.239319084 | 18.47%
9 | 5.224903708 | 4.194901959 | 19.71%
10 | 5.333422583 | 4.320925792 | 18.98%

[second]

<details><summary>code</summary>

```python
import time

from torchaudio.io import StreamReader

def test():
    r = StreamReader(src="testsrc=duration=30", format="lavfi")
    # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
    r.add_video_stream(-1, filter_desc="format=nv12")
    t0 = time.monotonic()
    r.process_all_packets()
    elapsed = time.monotonic() - t0
    print(elapsed)

for _ in range(10):
    test()
```
</details>

<details><summary>env</summary>

```
PyTorch version: 2.1.0.dev20230325
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.6
CMake version: version 3.22.1
Libc version: N/A

Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] torch==2.1.0.dev20230325
[pip3] torchaudio==2.1.0a0+541b525
[conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
[conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
```

</details>

Pull Request resolved: https://github.com/pytorch/audio/pull/3344

Reviewed By: xiaohui-zhang

Differential Revision: D45948511

Pulled By: mthrok

fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5

c11661e0

Fix for breadcrumbs displaying "Old version (stable)" on Nightly build (#3333) · 3ffd76c8

Carl Parker authored May 16, 2023

Summary:
Previously, `breadcrumbs.html` identified a nightly build version by the prefix "Nightly" which would normally be prepended to the version in `conf.py`. However, the version string is coming through without the "Nightly" prefix, so this change causes `breadcrumbs.html` to key on the substring "dev" instead.

The reason we aren't getting "Nightly" is apparently because the environment variable BUILD_VERSION is available, so `conf.py` is using the value of that env var instead of the version string imported from the `torchaudio` module itself, which actually appears to be incorrect; see below.

If I install torchaudio using

conda install torchaudio -c pytorch-nightly

then `torchaudio.__version__` returns the incorrect version string:

2.0.0.dev20230309

Pull Request resolved: https://github.com/pytorch/audio/pull/3333

Reviewed By: mthrok

Differential Revision: D45926466

Pulled By: carljparker

fbshipit-source-id: d5516f2d9f1716c2400d3e9b285bd5d32b4b3a77

3ffd76c8

Add 420p10le CPU support to StreamReader (#3332) · c12f4734

moto authored May 16, 2023

Summary:
This commit add support to decode YUV420P010LE format.

The image tensor returned by this format
- NCHW format (C == 3)
- int16 type
- value range [0, 2^10).

Note that the value range is different from what "hevc_cuvid" decoder
returns. "hevc_cuvid" decoder uses full range of int16 (internally,
it's uint16) to express the color (with some intervals), but the values
returned by CPU "hevc" decoder are with in [0, 2^10).

Address https://github.com/pytorch/audio/issues/3331

Pull Request resolved: https://github.com/pytorch/audio/pull/3332

Reviewed By: hwangjeff

Differential Revision: D45925097

Pulled By: mthrok

fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508

c12f4734

16 May, 2023 3 commits

Upgrade to FFmpeg5 (#3298) · d38a7854

moto authored May 16, 2023

Summary:
This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4.

FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5.
Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg

Pull Request resolved: https://github.com/pytorch/audio/pull/3298

Reviewed By: hwangjeff

Differential Revision: D45865599

Pulled By: mthrok

fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b

d38a7854

Remove obsolete third party dependencies of CTC decoder (#3339) · e4c1d70b

moto authored May 16, 2023

Summary:
TorchAudio has migrated CTC decoder to flashlight-text, and code related CTC decoder was removed in https://github.com/pytorch/audio/issues/3236.

This commit cleans up the residual, removes the third party libraries used for CTC decoder, and mention to environment variable for CTC decoder.

Pull Request resolved: https://github.com/pytorch/audio/pull/3339

Reviewed By: nateanl

Differential Revision: D45920878

Pulled By: mthrok

fbshipit-source-id: 8d93e64138697781570e5b0b1c9f86e1a7923a89

e4c1d70b

[Doc] Fix a word in documents (#3334) · 04f67546

Amir Masoud Nourollah authored May 15, 2023

Summary:
A redundant "and" just removed.

Pull Request resolved: https://github.com/pytorch/audio/pull/3334

Reviewed By: xiaohui-zhang

Differential Revision: D45864314

Pulled By: mthrok

fbshipit-source-id: ad67bde8fa73eac995fbd0d3809709cc38486884

04f67546

15 May, 2023 1 commit

Switch windows nightly builds to GHA (#3330) · 00247576

atalman authored May 15, 2023

Summary:
Switch windows nightly builds to GHA

Similar to: https://github.com/pytorch/vision/pull/7578

Pull Request resolved: https://github.com/pytorch/audio/pull/3330

Reviewed By: mthrok

Differential Revision: D45871892

Pulled By: atalman

fbshipit-source-id: 817490a2abcaffceec5174c624f9e7d0377bbc4a

00247576

11 May, 2023 3 commits

Clean-up StreamReader/StreamWriter interface (#3328) · d9643f50

Moto Hira authored May 11, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3328

Make the `AVIOContext`-based constructor protected for better encapsulation.
AVFormatContext and optional AVIOContext are managed by StreamReader/Writer, so it's better that they are abstracted away from client code.

Reviewed By: hwangjeff

Differential Revision: D45779629

fbshipit-source-id: 44c31e8af785447cb47aad0c44bf4ecf1aeebeaa

d9643f50

Add doc preview (#3326) · 1c7309d2

moto authored May 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3326

Reviewed By: hwangjeff

Differential Revision: D45760678

Pulled By: mthrok

fbshipit-source-id: 79b5d846c93516ca90c9700279124a9a04470242

1c7309d2

Add 2.0.1 to the version compatibility matrix (#3325) · 608775bf

moto authored May 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3325

Reviewed By: hwangjeff

Differential Revision: D45759434

Pulled By: mthrok

fbshipit-source-id: f3b1127fcf3b23beeab61fb7ff18f1b89b11ddc6

608775bf

10 May, 2023 4 commits

[BC-Breaking] Switch to the backend dispatcher (#3241) · 4463fbdf

moto authored May 10, 2023

Summary:
This commit makes the code defaults to the backend dispatcher by default. Enabling backend dispatcher puts the FFmpeg-based I/O implementation on higher priority (if the corresponding FFmpeg is available), and allows individual function call to specify the backend.

See also https://github.com/pytorch/audio/issues/2950

Pull Request resolved: https://github.com/pytorch/audio/pull/3241

Reviewed By: hwangjeff

Differential Revision: D44709068

Pulled By: mthrok

fbshipit-source-id: 43aac3433f78a681df6669e9ac46e8ecf3beb1be

4463fbdf

Add AudioEffector tutorial (#3226) · 2ab49e5b

moto authored May 09, 2023

Summary:
https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/3226

Reviewed By: nateanl

Differential Revision: D45402724

Pulled By: mthrok

fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262

2ab49e5b

Update `torchaudio` doc and tutorial (#3285) · 667c6a9e

moto authored May 09, 2023

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

667c6a9e

[BC-Breaking] Update InverseMelScale solution (#3280) · 5a85a461

Zhaoheng Ni authored May 09, 2023

Summary:
Address https://github.com/pytorch/audio/issues/2643

- replace `SGD` optimization with `torch.linalg.lstsq` which is much faster.
- Add autograd test for `InverseMelScale`
- update other tests

Pull Request resolved: https://github.com/pytorch/audio/pull/3280

Reviewed By: hwangjeff

Differential Revision: D45679988

Pulled By: nateanl

fbshipit-source-id: a42e8bff9dc0f38e47e0482fd8a2aad902eedd59

5a85a461

09 May, 2023 6 commits

Remove NumPy from conda build env (#3315) · 282ed27a

moto authored May 09, 2023

Summary:
NumPy is an optional runtime dependency of TorchAudio, and it is not required at build time.

Pull Request resolved: https://github.com/pytorch/audio/pull/3315

Reviewed By: nateanl

Differential Revision: D45702243

Pulled By: mthrok

fbshipit-source-id: 6ca6598931764c46be6323868e8cce7c8adc5024

282ed27a

Refactor StreamReader/Writer PyBinding (#3296) · 8d7268f1

Moto Hira authored May 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3296

Reviewed By: hwangjeff

Differential Revision: D45503774

fbshipit-source-id: 806c22bd0f54fd0cea43d61ef3dbedd67ffeb012

8d7268f1

Add StreamReaderCustomIO (#3320) · 007cca23

Moto Hira authored May 09, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3320

Add StreamReaderCustomIO, which is analogous to StreamWriterCustomIO and which takes custom read/seek functions to fetch media data.

Reviewed By: hwangjeff

Differential Revision: D45482843

fbshipit-source-id: 3ccf771c0fdce153aaa7551053e9a77facedc983

007cca23

Refactor StreamWriterCustomIO (#3319) · 51767917

Moto Hira authored May 09, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3319

* Merge the source with StreamWriter
* Add docstrings
* Move CustomIO to detail::CustomOutput to prepare for adding CustomInput

Reviewed By: hwangjeff

Differential Revision: D45481807

fbshipit-source-id: 4a9ac8a57acda47b126f8ae18e607b72919f9988

51767917

Fix batch consistency test for InverseBarkScale (#3322) · 51cc1cbf

Zhaoheng Ni authored May 09, 2023

Summary:
The batch consistency test function should call `InverseBarkScale` instead of `InverseMelScale`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3322

Reviewed By: mthrok

Differential Revision: D45691769

Pulled By: nateanl

fbshipit-source-id: 4a1ed80c4a56c3a847a49a8d02f8b5cbe4f09045

51cc1cbf

[BE] Add description to wheel package (#3321) · 3a49a2d2

Nikita Shulga authored May 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3321

Reviewed By: atalman, mthrok

Differential Revision: D45673225

Pulled By: malfet

fbshipit-source-id: f2b915f3307ba95445702e3018254ad254fe2bb3

3a49a2d2

05 May, 2023 6 commits

fix doc of specaugment transform (#3314) · a8dc4de5

Xiaohui Zhang authored May 05, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3314

Reviewed By: nateanl

Differential Revision: D45621958

Pulled By: xiaohui-zhang

fbshipit-source-id: 17555a865790adadc2abd40a86571596386a12fc

a8dc4de5

Update squim tutorial (#3313) · 05ef7dc6

Zhaoheng Ni authored May 05, 2023

Summary:
Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.

Pull Request resolved: https://github.com/pytorch/audio/pull/3313

Reviewed By: hwangjeff

Differential Revision: D45620311

Pulled By: nateanl

fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557

05ef7dc6

Add SpecAugment transform (#3309) · 82febc59

Xiaohui Zhang authored May 05, 2023

Summary:
(2/2 of the previous https://github.com/pytorch/audio/pull/2360 which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design. If we need N masks across time/frequency axis, we need to sequentially apply N Frequency/TimeMasking transforms to input tensors, and such API looks very inconvenient. We need to introduce a separate SpecAugment transform to handle this.

To solve these issues, here we
[done in the previous [PR](https://github.com/pytorch/audio/pull/3289)] Extend mask_along_axis_iid to support 3D+ tensors and mask_along_axis to support 2D+ tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
[done in this PR] Introducing SpecAugment transform.

Pull Request resolved: https://github.com/pytorch/audio/pull/3309

Reviewed By: nateanl

Differential Revision: D45592926

Pulled By: xiaohui-zhang

fbshipit-source-id: 97cd686dbb6c1c6ff604716b71a876e616aaf1a2

82febc59

Fix missing PTS initialization with NVIDIA encoder (#3312) · 1e3af12f

huyao authored May 05, 2023

Summary:
Fix **Failed to write packet (Invalid argument)** error when encoding FLV video streams using NVIDIA hardware encoders.

Resolve https://github.com/pytorch/audio/issues/3311

Pull Request resolved: https://github.com/pytorch/audio/pull/3312

Reviewed By: nateanl

Differential Revision: D45611656

Pulled By: mthrok

fbshipit-source-id: 531a83a27d3b19ed9e9aedd161769c60aa0bd175

1e3af12f

Fix doc version (#3310) · bfb47017

moto authored May 05, 2023

Summary:
Fixes the regression caused by build_doc job GHA migration. The version number is not properly set.

Pull Request resolved: https://github.com/pytorch/audio/pull/3310

Reviewed By: nateanl

Differential Revision: D45607829

Pulled By: mthrok

fbshipit-source-id: 3450a38fa6982fcc56676a80144e9eed1aad02ec

bfb47017

Fix MKL issue on Intel mac build (#3307) · 3e897ca7

moto authored May 05, 2023

Summary:
* Remove MKL and NumPy from Conda build env
* Remove `caffe2::mkl` dependency from `torch_cpu`, which introduced unnecessary and undesided dependency on Intel mac.

TorchAudio does not use BLAS libraries directly, thus all the mentions to MKL should be removed from the codebase.
However, this was causing an issue on Intel mac. It turned out that `torch_cpu` target is pulling `caffe2::mkl` dependency, and the linker on macOS keeps library dependency even if no symbol from that library is used. This stray mkl dependency should be fixed on core side, but also we can modify the target temporarily and remove them.

Also we don't need NumPy on build/run time, so that is removed as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/3307

Reviewed By: atalman

Differential Revision: D45606944

Pulled By: mthrok

fbshipit-source-id: 853411ccbbca31796b808a2b052b4cfa564718cd

3e897ca7

04 May, 2023 3 commits

Add older mkl build contraint only (#3302) · 1e48af06

atalman authored May 04, 2023

Summary:
Similar to what we used to have here:
https://github.com/pytorch/test-infra/pull/3896/files

Pull Request resolved: https://github.com/pytorch/audio/pull/3302

Reviewed By: nateanl

Differential Revision: D45574845

Pulled By: atalman

fbshipit-source-id: 142c35dfd811a5f5c170dcd082bec8d055edd9cb

1e48af06

Add mkl dependency to torchaudio MacOS x86 builds (#3300) · b5795943

atalman authored May 04, 2023

Summary:
Add mkl dependency to torchaudio MacOS x86 builds

Already tested here: https://github.com/pytorch/audio/actions/runs/4878179835/jobs/8703586137

Pull Request resolved: https://github.com/pytorch/audio/pull/3300

Reviewed By: jeanschmidt, mthrok

Differential Revision: D45566352

Pulled By: atalman

fbshipit-source-id: a0376016506891240b2dd03d4fa4889028bf764b

b5795943

Extend mask_along_axis{,_iid} (#3289) · 74bd971a

Xiaohui Zhang authored May 04, 2023

Summary:
(1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)

The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
- Only zero masking can be done; masking by mean value is not supported.
- mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
- For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
- For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
- It's not straightforward to apply multiple time/frequency masks by the current design.

To solve these issues, here we
- Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.

The introduction of SpecAugment transform will be done in another PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/3289

Reviewed By: hwangjeff

Differential Revision: D45460357

Pulled By: xiaohui-zhang

fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3

74bd971a

03 May, 2023 3 commits

Fix lint and format PR label message (#3299) · c51f20f9

moto authored May 03, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3299

Reviewed By: xiaohui-zhang

Differential Revision: D45530945

Pulled By: mthrok

fbshipit-source-id: 3443e4de693898534687b26ee1a9376ff86651f9

c51f20f9

[AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 76f91135
generatedunixname89002005367269 authored May 03, 2023
```
Reviewed By: adamjernst

Differential Revision: D45522319

fbshipit-source-id: d73a137c8738a215cc711ad39461f5b2f9ba76da
```
76f91135

Remove build doc job from CCI (#3293) · bc9451e6

moto authored May 03, 2023

Summary:
https://github.com/pytorch/audio/pull/3292 migrates the doc deployment to GHA.

Pull Request resolved: https://github.com/pytorch/audio/pull/3293

Reviewed By: xiaohui-zhang

Differential Revision: D45527256

Pulled By: mthrok

fbshipit-source-id: 18eb2580243b6b842147caaac10b3d28aa3d6dd0

bc9451e6