Commits · 92d0fb55debb8bfd2643b3864e352e4b4d3ea73d · OpenDAS / Torchaudio

31 May, 2023 2 commits

Windows GPU workflows (#3364) · 92d0fb55

atalman authored May 31, 2023

Summary:
Windows GPU workflows

Pull Request resolved: https://github.com/pytorch/audio/pull/3364

Reviewed By: mthrok

Differential Revision: D46292403

Pulled By: atalman

fbshipit-source-id: ee3c6f8082ca77bdc1ffdb930c59fa5a9cb25a4a

92d0fb55

Fixes to #3295 Improve RNN-T streaming decoding (#3379) · b8016e44

Jeff Hwang authored May 30, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3379

Fixes `RNNTBeamSearch.infer`'s docstring and removes unused import from tutorial.

Reviewed By: mthrok

Differential Revision: D46227174

fbshipit-source-id: 7c1c3f05a6476cb0437622dea6f3ae6cb3ea9468

b8016e44

30 May, 2023 3 commits

Disable failing GPU unit test (#3384) · caf3ac07

atalman authored May 30, 2023

Summary:
Disable failing GPU unit test.
See associated issue: https://github.com/pytorch/audio/issues/3376

Pull Request resolved: https://github.com/pytorch/audio/pull/3384

Reviewed By: mthrok

Differential Revision: D46279324

Pulled By: atalman

fbshipit-source-id: 3a606bb992e0261451f48d1fb458e054f7fd5583

caf3ac07

Use const reference (#3389) · 9cdf26fd

Moto Hira authored May 30, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3389

Adopt more of const reference in sox source code.

Differential Revision: D46264068

fbshipit-source-id: 809d34a6e16f621c856d4278ef7ce45a5868a717

9cdf26fd

Simplify sox namespace (#3383) · a81b0ed2

Moto Hira authored May 30, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3383

This commit reduces `torchaudio::sox_*` namespace into `torchaudio::sox`.
Also put Pybind11 registration and TorchBind registration into anonymous namescope.

Differential Revision: D46257367

fbshipit-source-id: 0f0f181eaa72036916e223263daf4b7c298fca0d

a81b0ed2

29 May, 2023 1 commit

[Nova] Windows CPU Unittests on Nova (#3329) · 6425d46c

Omkar Salpekar authored May 29, 2023

Summary:
Continuing with the job migrations from CCI to Nova, this PR introduces the Windows CPU Unittest job as a Nova workflow.

The job is passing: https://github.com/pytorch/audio/actions/runs/5094569687/jobs/9159020192?pr=3329.

Pull Request resolved: https://github.com/pytorch/audio/pull/3329

Reviewed By: huydhn

Differential Revision: D46265649

Pulled By: atalman

fbshipit-source-id: 7659dfbcc8ad400f2e109ff64530e1f768e82ef9

6425d46c

27 May, 2023 1 commit

Fix AudioEffector for mulaw (#3372) · af932cc7

moto authored May 26, 2023

Summary:
When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform.

Pull Request resolved: https://github.com/pytorch/audio/pull/3372

Reviewed By: hwangjeff

Differential Revision: D46234772

Pulled By: mthrok

fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552

af932cc7

26 May, 2023 6 commits

Fix encoding g722 format (#3373) · 1b05ca7e

moto authored May 26, 2023

Summary:
g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up.

Pull Request resolved: https://github.com/pytorch/audio/pull/3373

Reviewed By: hwangjeff

Differential Revision: D46233181

Pulled By: mthrok

fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c

1b05ca7e

Use the same CUDNN version on Windows as PyTorch (#3380) · c120f316

Huy Do authored May 26, 2023

Summary:
11.7 uses 8.5.0; 11.8 uses 8.7.0; 12.1 uses 8.8.1.  Otherwise, Windows vision job (8.5.0) would overwrite the CUDNN version setup by PyTorch (8.7.0) leading to this flaky failures https://github.com/pytorch/pytorch/actions/runs/5088860652/jobs/9146641450

```
RuntimeError: cuDNN version incompatibility: PyTorch was compiled  against (8, 7, 0) but found runtime version (8, 5, 0). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/3380

Reviewed By: atalman

Differential Revision: D46236286

Pulled By: huydhn

fbshipit-source-id: 9ca12d5068c3029688347d52c5c284488f33728d

c120f316

Use cuda 11.8 for circleci tests (#3381) · 5c0249b0

atalman authored May 26, 2023

Summary:
Use cuda 11.8 for circleci tests.
11.7 was deprecated

Pull Request resolved: https://github.com/pytorch/audio/pull/3381

Reviewed By: osalpekar

Differential Revision: D46236223

Pulled By: atalman

fbshipit-source-id: 6d6a8e09603807a07241f31c1bd1e6d3a2b67d9d

5c0249b0

Temporarily remove test for extract_features (#3378) · 05649ca3

Zhaoheng Ni authored May 26, 2023

Summary:
The tests failed for several bundles. Remove them and will re-add once the root cause is figured out.

Pull Request resolved: https://github.com/pytorch/audio/pull/3378

Reviewed By: atalman

Differential Revision: D46230884

Pulled By: nateanl

fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92

05649ca3

Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9

atalman authored May 26, 2023

Summary:
This reverts commit d38a7854.

This is temporary revert to unblock unit test migration from circleci to github

Pull Request resolved: https://github.com/pytorch/audio/pull/3377

Reviewed By: mthrok

Differential Revision: D46230498

Pulled By: atalman

fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d

37779ef9

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa

25 May, 2023 1 commit

Add LRS3 AV-ASR recipe (#3278) · c6624fa6

Pingchuan Ma authored May 25, 2023

Summary:
This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes.

CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff

Pull Request resolved: https://github.com/pytorch/audio/pull/3278

Reviewed By: nateanl

Differential Revision: D46121550

Pulled By: mpc001

fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6

c6624fa6

24 May, 2023 6 commits

Add StreamReader/Writer custom IO to doc (#3367) · f41ba26d

moto authored May 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3367

Reviewed By: nateanl

Differential Revision: D46148139

Pulled By: mthrok

fbshipit-source-id: 50f297ac69bb95562976eb452e4e382b8c064c3c

f41ba26d

Fix build doc (#3349) · 8b85ca5d

moto authored May 24, 2023

Summary:
Follow-up https://github.com/pytorch/audio/issues/3045
- Revert the removal of HW acceleration doc
- comment out FFmpeg CLI test run

Pull Request resolved: https://github.com/pytorch/audio/pull/3349

Reviewed By: nateanl

Differential Revision: D46121899

Pulled By: mthrok

fbshipit-source-id: dfc030a69f05addec73637cfb6a720c184e37323

8b85ca5d

Update smoke test (#3346) · 71b2634b

moto authored May 24, 2023

Summary:
* Delay the import of torchaudio until the CLI options are parsed.
* Add option to set log level to DEBUG so that it's easy to see the issue with external libraries.

Pull Request resolved: https://github.com/pytorch/audio/pull/3346

Reviewed By: nateanl

Differential Revision: D46022546

Pulled By: mthrok

fbshipit-source-id: 9f988bbd770c2fd2bb260c3cfe02b238a9da2808

71b2634b

Amend commit to gh-pages branch (#3345) · a79cf3ba

moto authored May 24, 2023

Summary:
This commit changes the way doc is pushed.
It ammends instead of adding a new commit.

Currently each commit in gh-pages contain like 100MB of data. gh-pages branch is fetched by default when `git clone`. So the size of torchaudio repo grows significantly.

Pull Request resolved: https://github.com/pytorch/audio/pull/3345

Reviewed By: nateanl

Differential Revision: D46136612

Pulled By: mthrok

fbshipit-source-id: 39479ee5d1a6888254ef50f0db252453d976d183

a79cf3ba

Remove CUDA 11.7 builds; replace with 11.8 (#3360) · 5a6f4eba

pbialecki authored May 24, 2023

Summary:
CC atalman malfet

Pull Request resolved: https://github.com/pytorch/audio/pull/3360

Reviewed By: mthrok

Differential Revision: D46150898

Pulled By: atalman

fbshipit-source-id: 985a0ef69406f48fb15f239d6b16616c0a5379f5

5a6f4eba

Resolve lint issue on LaTeX (#3366) · 8690e6ec

moto authored May 23, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3366

Reviewed By: nateanl

Differential Revision: D46136238

Pulled By: mthrok

fbshipit-source-id: 3432f5d007293831bab21460a79ae26b1bbc81a8

8690e6ec

23 May, 2023 6 commits

[BugFix] Fix extract_features method for WavLM models (#3350) · 7d0f3369

Zhaoheng Ni authored May 23, 2023

Summary:
resolve https://github.com/pytorch/audio/issues/3347

`position_bias` is ignored in `extract_features` method, this doesn't affect Wav2Vec2 or HuBERT models, but it changes the output of transformer layers (except the first layer) in WavLM model. This PR fixes it by adding `position_bias` to the method.

Pull Request resolved: https://github.com/pytorch/audio/pull/3350

Reviewed By: mthrok

Differential Revision: D46112148

Pulled By: nateanl

fbshipit-source-id: 3d21aa4b32b22da437b440097fd9b00238152596

7d0f3369

[Nova] MacOS Unittests on Nova (#3324) · fce54fd1

Omkar Salpekar authored May 23, 2023

Summary:
As discussed in the [Torchaudio Migration Proposal](https://docs.google.com/document/d/1PF8biwiGzsjzfEBM78mlLiRrkcsGsvuYkeqkI66Ym8A/edit), this PR moves MacOS unittest job to Nova tooling. Note that this does not touch anything within the existing CircleCI job at the moment.

Passing job: https://github.com/pytorch/audio/actions/runs/4932497525/jobs/8815581251?pr=3324

Pull Request resolved: https://github.com/pytorch/audio/pull/3324

Reviewed By: atalman, mthrok

Differential Revision: D46113524

Pulled By: osalpekar

fbshipit-source-id: d048d300489f992fa187628cb6744d95ab4fb68a

fce54fd1

Fix cuda test failure (#3363) · fa59855f

Zhaoheng Ni authored May 23, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3361

When adding FunctionalCUDAOnlyTest, the class should inherit from `TestBaseMixin` instead of `Functional`

Pull Request resolved: https://github.com/pytorch/audio/pull/3363

Reviewed By: atalman, osalpekar

Differential Revision: D46112084

Pulled By: nateanl

fbshipit-source-id: 67c6472fda98cb718e0fc53ab248beda745feab5

fa59855f

Unset BPS when using sox vorbis (#3359) · d850ff61

moto authored May 23, 2023

Summary:
When saving audio with vorbis, BPS should not be specified, otherwise warnings that cannot be turned off are shown.

Address: https://github.com/pytorch/audio/issues/3358

Pull Request resolved: https://github.com/pytorch/audio/pull/3359

Reviewed By: nateanl

Differential Revision: D46095037

Pulled By: mthrok

fbshipit-source-id: 6885a12dc3ec84bf39f0159ee58d1a2a87cff7e4

d850ff61

[Nova] Linux CPU Unittests to Nova (#3323) · 2255a0fc

Omkar Salpekar authored May 23, 2023

Summary:
As discussed in the [Torchaudio Migration Proposal](https://docs.google.com/document/d/1PF8biwiGzsjzfEBM78mlLiRrkcsGsvuYkeqkI66Ym8A/edit), this PR moves the Linux CPU unittest job to Nova tooling. Note that this does not disable the existing CircleCI job at the moment.

Passing Job: https://github.com/pytorch/audio/actions/runs/4986115298/jobs/8926499354?pr=3323

Pull Request resolved: https://github.com/pytorch/audio/pull/3323

Reviewed By: atalman, mthrok

Differential Revision: D46113506

Pulled By: osalpekar

fbshipit-source-id: 1778c360e17b9d02c63bcc60100834c75798d380

2255a0fc

[audio] add CTC forced alignment API tutorial to torchaudio (#3356) · f046f7e3

Xiaohui Zhang authored May 22, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3356

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: mthrok

Differential Revision: D46060238

fbshipit-source-id: d90e7db5669a58d1e9ef5c2ec3c6d175b4e394ec

f046f7e3

22 May, 2023 4 commits

Cleaning up Deprecated Jobs from CCI Config (#3340) · 150234bd

Omkar Salpekar authored May 22, 2023

Summary:
Cleaning up CCI configs that are no longer used.

Pull Request resolved: https://github.com/pytorch/audio/pull/3340

Reviewed By: mthrok

Differential Revision: D46077882

Pulled By: osalpekar

fbshipit-source-id: 0dce08fc14b5efc4517ab1f559e7ef7eb245af64

150234bd

Update forced_align document (#3357) · c0702338

Zhaoheng Ni authored May 22, 2023

Summary:
- Fix latex formula rendering issue
- Add `devices` and `properties` tags
- Fix grammar

Pull Request resolved: https://github.com/pytorch/audio/pull/3357

Reviewed By: mthrok

Differential Revision: D46068633

Pulled By: nateanl

fbshipit-source-id: 80cb84508396fbcaf81c068228d46a24bb63b975

c0702338

Fix CPU kernel of forced_align function (#3354) · 8a893fb3

Zhaoheng Ni authored May 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3354

when start ==0, the first item instead of Sth item of t row in backPtr_a should be 0.

Reviewed By: xiaohui-zhang

Differential Revision: D46059971

fbshipit-source-id: 89933134878513034eae033764b19f8562f24cb8

8a893fb3

Add doc for forced_align (#3355) · 011f7f3d

Zhaoheng Ni authored May 21, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3355

Reviewed By: xiaohui-zhang

Differential Revision: D46060254

Pulled By: nateanl

fbshipit-source-id: c2e44f994739755daf049fe350dd24a987a9cc29

011f7f3d

21 May, 2023 2 commits

Revert D45960556: add CTC forced alignment API tutorial to torchaudio · f9b4f74f

Moto Hira authored May 20, 2023

Differential Revision:
D45960556

Original commit changeset: 93f2271f7130

Original Phabricator Diff: D45960556

fbshipit-source-id: d22883fbcf9c5f2bb5d49274bcc194bdffaca72a

f9b4f74f

add CTC forced alignment API tutorial to torchaudio (#3351) · 93adc3e4

Xiaohui Zhang authored May 20, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3351

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: vineelpratap, nateanl

Differential Revision: D45960556

fbshipit-source-id: 93f2271f71307404e6a7732385cf7d646dc8ceaa

93adc3e4

20 May, 2023 1 commit

[audio][PR] Add forced_align function to torchaudio (#3348) · e7935cff

Zhaoheng Ni authored May 19, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3348

The pull request adds a CTC-based forced alignment function that supports both CPU and CUDA deviced. The function takes the CTC emissions and target labels as inputs and generates the corresponding labels for each frame.

Reviewed By: vineelpratap, xiaohui-zhang

Differential Revision: D45867265

fbshipit-source-id: 3e25b06bf9bc8bb1bdcdc08de7f4434d912154cb

e7935cff

19 May, 2023 1 commit

Build and use GPU-enabled FFmpeg in doc CI (#3045) · 0db5ab25

moto authored May 19, 2023

Summary:
This commit add the step to build FFmpeg with GPU decoder in build_doc job so that we can use GPU decoder/encoder in documentations.

Pull Request resolved: https://github.com/pytorch/audio/pull/3045

Reviewed By: nateanl

Differential Revision: D45965739

Pulled By: mthrok

fbshipit-source-id: c167eb3ef347860a51efa906068fa2daa556f017

0db5ab25

17 May, 2023 4 commits

Improve the performance of YUV420P frame conversion (#3342) · 72d3fe09

moto authored May 17, 2023

Summary:
This commit improve the performance of conversions of YUV420P format from AVFrame to torch Tensor.

It changes two things;
1. Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
2.  Get rid of intermediate UV plane copy

The following compares the time it takes to process 30 seconds of YUV420P frame at 25 FPS of resolution 320x240. The measurement times are sorted by values.

Some observations
* `torch::nn::functional::interpolate` with `torch::kNearest` option is not as fast as copying data manually.
* switching from `interpolate` to manual data copy reduces the variance.

run | main | 1 | 1+2 | improvement (from main to 1+2)
-- | -- | -- | -- | --
1 | 0.452250583 | 0.417490125 | 0.40155375 | 11.21%
2 | 0.462039958 | 0.42006675 | 0.401764125 | 13.05%
3 | 0.463067666 | 0.42416 | 0.402651334 | 13.05%
4 | 0.464228166 | 0.424545458 | 0.402985667 | 13.19%
5 | 0.465777375 | 0.425629208 | 0.405604625 | 12.92%
6 | 0.469628666 | 0.427044333 | 0.40628525 | 13.49%
7 | 0.475935125 | 0.42805875 | 0.406412167 | 14.61%
8 | 0.482277667 | 0.429921209 | 0.407279 | 15.55%
9 | 0.496695208 | 0.431182792 | 0.442013791 | 11.01%
10 | 0.546653625 | 0.541639584 | 0.4711585 | 13.81%

[second]

Increasing the resolution, the improvement is smaller but is consistent.

run | main | 1+2 | improvement
-- | -- | -- | --
1 | 4.032393 | 3.991784667 | 1.01%
2 | 4.052248084 | 3.992672208 | 1.47%
3 | 4.07705575 | 4.000541666 | 1.88%
4 | 4.143954792 | 4.020671584 | 2.98%
5 | 4.170711959 | 4.025753125 | 3.48%
6 | 4.240229292 | 4.045504875 | 4.59%
7 | 4.267384042 | 4.045588125 | 5.20%
8 | 4.277025958 | 4.061980083 | 5.03%
9 | 4.312192042 | 4.163251959 | 3.45%
10 | 4.406109875 | 4.312560334 | 2.12%

<details><summary>code</summary>

```python
import time

from torchaudio.io import StreamReader

def test():
    r = StreamReader(src="testsrc=duration=30", format="lavfi")
    # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
    r.add_video_stream(-1, filter_desc="format=yuv420p")
    t0 = time.monotonic()
    r.process_all_packets()
    elapsed = time.monotonic() - t0
    print(elapsed)

for _ in range(10):
    test()
```
</details>

<details><summary>env</summary>

```
PyTorch version: 2.1.0.dev20230325
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.6
CMake version: version 3.22.1
Libc version: N/A

Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] torch==2.1.0.dev20230325
[pip3] torchaudio==2.1.0a0+541b525
[conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
[conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
```

</details>

Pull Request resolved: https://github.com/pytorch/audio/pull/3342

Reviewed By: xiaohui-zhang

Differential Revision: D45947716

Pulled By: mthrok

fbshipit-source-id: 17e5930f57544b4f2e48a9b2185464694a88ab68

72d3fe09

Improve the performance of NV12 frame conversion (#3344) · c11661e0

moto authored May 17, 2023

Summary:
Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion.

It changes two things;

- Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
- Get rid of intermediate UV plane copy

with 320x240

run | main | pr | improvement
-- | -- | -- | --
1 | 0.600671417 | 0.464993125 | 22.59%
2 | 0.638846084 | 0.456763542 | 28.50%
3 | 0.64158175 | 0.458295333 | 28.57%
4 | 0.649868584 | 0.455450583 | 29.92%
5 | 0.612171333 | 0.462435625 | 24.46%
6 | 0.6128095 | 0.456716166 | 25.47%
7 | 0.632084583 | 0.463357083 | 26.69%
8 | 0.610733083 | 0.46148625 | 24.44%
9 | 0.613825834 | 0.4559555 | 25.72%
10 | 0.653857458 | 0.455375375 | 30.36%

[second]

with 1080x720 video

run | main | pr | improvement
-- | -- | -- | --
1 | 4.984154333 | 4.21090375 | 15.51%
2 | 4.988090625 | 4.239649375 | 15.00%
3 | 4.988896375 | 4.227277458 | 15.27%
4 | 4.998186584 | 4.161077042 | 16.75%
5 | 5.06180425 | 4.191672584 | 17.19%
6 | 5.108769667 | 4.198468458 | 17.82%
7 | 5.151363625 | 4.181942167 | 18.82%
8 | 5.199527875 | 4.239319084 | 18.47%
9 | 5.224903708 | 4.194901959 | 19.71%
10 | 5.333422583 | 4.320925792 | 18.98%

[second]

<details><summary>code</summary>

```python
import time

from torchaudio.io import StreamReader

def test():
    r = StreamReader(src="testsrc=duration=30", format="lavfi")
    # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
    r.add_video_stream(-1, filter_desc="format=nv12")
    t0 = time.monotonic()
    r.process_all_packets()
    elapsed = time.monotonic() - t0
    print(elapsed)

for _ in range(10):
    test()
```
</details>

<details><summary>env</summary>

```
PyTorch version: 2.1.0.dev20230325
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.6
CMake version: version 3.22.1
Libc version: N/A

Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] torch==2.1.0.dev20230325
[pip3] torchaudio==2.1.0a0+541b525
[conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
[conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
```

</details>

Pull Request resolved: https://github.com/pytorch/audio/pull/3344

Reviewed By: xiaohui-zhang

Differential Revision: D45948511

Pulled By: mthrok

fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5

c11661e0

Fix for breadcrumbs displaying "Old version (stable)" on Nightly build (#3333) · 3ffd76c8

Carl Parker authored May 16, 2023

Summary:
Previously, `breadcrumbs.html` identified a nightly build version by the prefix "Nightly" which would normally be prepended to the version in `conf.py`. However, the version string is coming through without the "Nightly" prefix, so this change causes `breadcrumbs.html` to key on the substring "dev" instead.

The reason we aren't getting "Nightly" is apparently because the environment variable BUILD_VERSION is available, so `conf.py` is using the value of that env var instead of the version string imported from the `torchaudio` module itself, which actually appears to be incorrect; see below.

If I install torchaudio using

conda install torchaudio -c pytorch-nightly

then `torchaudio.__version__` returns the incorrect version string:

2.0.0.dev20230309

Pull Request resolved: https://github.com/pytorch/audio/pull/3333

Reviewed By: mthrok

Differential Revision: D45926466

Pulled By: carljparker

fbshipit-source-id: d5516f2d9f1716c2400d3e9b285bd5d32b4b3a77

3ffd76c8

Add 420p10le CPU support to StreamReader (#3332) · c12f4734

moto authored May 16, 2023

Summary:
This commit add support to decode YUV420P010LE format.

The image tensor returned by this format
- NCHW format (C == 3)
- int16 type
- value range [0, 2^10).

Note that the value range is different from what "hevc_cuvid" decoder
returns. "hevc_cuvid" decoder uses full range of int16 (internally,
it's uint16) to express the color (with some intervals), but the values
returned by CPU "hevc" decoder are with in [0, 2^10).

Address https://github.com/pytorch/audio/issues/3331

Pull Request resolved: https://github.com/pytorch/audio/pull/3332

Reviewed By: hwangjeff

Differential Revision: D45925097

Pulled By: mthrok

fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508

c12f4734

16 May, 2023 2 commits

Upgrade to FFmpeg5 (#3298) · d38a7854

moto authored May 16, 2023

Summary:
This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4.

FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5.
Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg

Pull Request resolved: https://github.com/pytorch/audio/pull/3298

Reviewed By: hwangjeff

Differential Revision: D45865599

Pulled By: mthrok

fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b

d38a7854

Remove obsolete third party dependencies of CTC decoder (#3339) · e4c1d70b

moto authored May 16, 2023

Summary:
TorchAudio has migrated CTC decoder to flashlight-text, and code related CTC decoder was removed in https://github.com/pytorch/audio/issues/3236.

This commit cleans up the residual, removes the third party libraries used for CTC decoder, and mention to environment variable for CTC decoder.

Pull Request resolved: https://github.com/pytorch/audio/pull/3339

Reviewed By: nateanl

Differential Revision: D45920878

Pulled By: mthrok

fbshipit-source-id: 8d93e64138697781570e5b0b1c9f86e1a7923a89

e4c1d70b