Commits · 989702b3638c4cb4b0690f906f688cde28821bc0 · OpenDAS / Torchaudio

12 Jul, 2023 3 commits

Use FFmpeg6 in build doc (#3475) · 989702b3

moto authored Jul 12, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3475

Differential Revision: D47403772

Pulled By: mthrok

fbshipit-source-id: 5cdde521dbbbbf33856470a9dc79419b4a3a1683

989702b3

Fix FFmpeg initialization logic (#3474) · 49e269ab

Moto Hira authored Jul 12, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3474

Differential Revision: D47398447

fbshipit-source-id: f77b685d54ddfc222b806475707d4a10239872f5

49e269ab

Support multiple FFmpeg versions (#3464) · 786066b4

moto authored Jul 11, 2023

Summary:
This commit introduces support for multiple FFmpeg versions for OSS binary distributions.

Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.

The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
The order of preference is 6, 5, then 4.

To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
They are LGPL and downloaded from S3 at build time, instead of building every time.

The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
so that it will only support one specific version of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3464

Differential Revision: D47300223

Pulled By: mthrok

fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04

786066b4

11 Jul, 2023 4 commits

Clean up FFmpeg build scripts (#3470) · cc41178b

moto authored Jul 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3470

Differential Revision: D47374347

Pulled By: mthrok

fbshipit-source-id: 003b83e50a70f6e1d06eb196f0be5dbba1640226

cc41178b

Fix doc style (#3468) · 18b20f77

moto authored Jul 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3468

Differential Revision: D47368070

Pulled By: mthrok

fbshipit-source-id: 9b5d57b0cb861a2556a1903121f526f8011a0e2d

18b20f77

Update doc analytics (#3469) · 216146ab

moto authored Jul 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3469

Differential Revision: D47368140

Pulled By: mthrok

fbshipit-source-id: d82ddb91ae1f6612298486fb8401f95c48db5620

216146ab

Clean up FFMPEG env var and remove pre/post build script (#3466) · c825c019

moto authored Jul 11, 2023

Summary:
Now that we do not build FFmpeg as part of CI build process, we can remove the pre/post build scripts.

Needs to land after https://github.com/pytorch/test-infra/pull/4358

Pull Request resolved: https://github.com/pytorch/audio/pull/3466

Reviewed By: atalman

Differential Revision: D47367022

Pulled By: mthrok

fbshipit-source-id: 17aafff74ee7d269236cffb8a88c803a8d4c44b7

c825c019

10 Jul, 2023 1 commit

Update package smoke test (#3465) · 589de109

moto authored Jul 10, 2023

Summary:
1. Update smoke test script to change directory so that there is no `torchaudio` directory in CWD when smoke test is being executed.
2. Disable the part of smoke test which requires FFmpeg for wheel. The preparation for https://github.com/pytorch/test-infra/pull/4358

Pull Request resolved: https://github.com/pytorch/audio/pull/3465

Reviewed By: nateanl

Differential Revision: D47345117

Pulled By: mthrok

fbshipit-source-id: 95aad0a22922d44ee9a24a05d9ece85166b8c17e

589de109

07 Jul, 2023 3 commits

Set the default #threads to 1 in StreamWriter (#3370) · 9c7bf1bc

moto authored Jul 07, 2023

Summary:
Similrt to https://github.com/pytorch/audio/issues/2949

Pull Request resolved: https://github.com/pytorch/audio/pull/3370

Differential Revision: D47298746

Pulled By: mthrok

fbshipit-source-id: 0cc0f395772b33f8b2f5f55253d659e451f506c4

9c7bf1bc

Fix StreamWriter regression around RGB0/BGR0 (#3428) · 9210cba2

moto authored Jul 07, 2023

Summary:
- Add RGB0/BGR0 support to CPU encoder
- Allow to pass RGB/BGR when expectged format is RGB0/BGR0

Pull Request resolved: https://github.com/pytorch/audio/pull/3428

Differential Revision: D47274370

Pulled By: mthrok

fbshipit-source-id: d34d940e04b07673bb86f518fe895c0735912444

9210cba2

Use pre-built binaries for ffmpeg extension (#3460) · f77c3e5b

moto authored Jul 07, 2023

Summary:
This commit changes the way FFmpeg extension is built.

Originally, the build process expected the FFmpeg binaries to be somehow available in build env.
This makes the build process unpredictable and prevents default enabling FFmpeg extension.

The proposed change uses pre-built FFmpeg binaries as build-time only scaffold, which are built in our CI job https://github.com/pytorch/audio/actions/workflows/ffmpeg.yml.

This makes the build process more predictable and removes the necessity to build FFmpeg in our CI.
Currently, it supports macOS (arm64, x86_64), unix (x86_64, aarch64) and windows (amd64).
The downside is that it no longer works with the architecture not listed above.
We can potentially workaround by searching the FFmpeg binaries available in system (the old way) for
these system, but since they are not supported by PyTorch, the priority is low.

Pull Request resolved: https://github.com/pytorch/audio/pull/3460

Differential Revision: D47261885

Pulled By: mthrok

fbshipit-source-id: 223a15e95c9140c95688af968beb35ff40354476

f77c3e5b

06 Jul, 2023 2 commits

Add ARM linux ffmpeg build (#3462) · d9f51ce5

moto authored Jul 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3462

Differential Revision: D47270241

Pulled By: mthrok

fbshipit-source-id: 6a3b02380dfb381ffb47c1f46b46f4833c765246

d9f51ce5

Fix mac ffmpeg build (#3459) · 2fa39dbd

moto authored Jul 06, 2023

Summary:
Follow up of  https://github.com/pytorch/audio/pull/3455

FFMPEG_VERSION env ver is not defined in existing CI jobs.

Pull Request resolved: https://github.com/pytorch/audio/pull/3459

Reviewed By: atalman

Differential Revision: D47249074

Pulled By: mthrok

fbshipit-source-id: 20f82d749adef5f45a984ab8125592ef36279e94

2fa39dbd

05 Jul, 2023 4 commits

Revert "[audio][PR] Add option to dlopen FFmpeg libraries (#3402)" (#3456) · ca66a1d3

moto authored Jul 05, 2023

Summary:
This reverts commit b7d3e89a.

We will use pre-built binaries instead of dlopen.

Pull Request resolved: https://github.com/pytorch/audio/pull/3456

Differential Revision: D47239681

Pulled By: mthrok

fbshipit-source-id: 0446a62410d914081184fc20c386afa00b1e41b6

ca66a1d3

Add stand alone job to build FFmpeg binaries (#3455) · 662f067b

moto authored Jul 05, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3455

Differential Revision: D47242316

Pulled By: mthrok

fbshipit-source-id: 0eb4bdb0a45fccfe9ff97eaed79db63cd7bfc7d8

662f067b

Untangle third party inclusion in CMake (#3457) · c34a1d6d

moto authored Jul 05, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3457

Differential Revision: D47241343

Pulled By: mthrok

fbshipit-source-id: fd1bfd1531397cb59e9cf11de9dede6949f8517e

c34a1d6d

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

03 Jul, 2023 1 commit

Update README (#3434) · 163157d3

Zhaoheng Ni authored Jul 03, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3434

Add one bullet point for `torchaudio.functional` and forced alignment as one example.

Reviewed By: mthrok

Differential Revision: D46658058

fbshipit-source-id: 6e037b7bb6ed2fc2e27ad1e55c5728c17ce69ce8

163157d3

28 Jun, 2023 2 commits

include a link to index.rst (#3441) · a8ce4a87

Pingchuan Ma authored Jun 28, 2023

Summary:
Include Conformer/Emformer RNN-T ASR/VSR/AV-ASR link to index.rst

Pull Request resolved: https://github.com/pytorch/audio/pull/3441

Differential Revision: D47094158

Pulled By: mthrok

fbshipit-source-id: 9ab42ac2bf52a5ce488003897ffba2f10a6ca941

a8ce4a87

Follow up on tutorial update (#3449) · 4a121aa5

moto authored Jun 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449

Differential Revision: D47094402

Pulled By: mthrok

fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622

4a121aa5

26 Jun, 2023 1 commit

Add more explanation about `n_fft` (#3442) · 105b77fe

moto authored Jun 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442

Differential Revision: D46797481

Pulled By: mthrok

fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe

105b77fe

21 Jun, 2023 2 commits

Introduce chroma spectrogram transform (#3427) · 70968293

Jeff Hwang authored Jun 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3427

Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms.

Reviewed By: mthrok

Differential Revision: D46547418

fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960

70968293

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

16 Jun, 2023 1 commit

Add LRS3 data preparation (#3421) · 77cdd160

Pingchuan Ma authored Jun 16, 2023

Summary:
This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.

This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.

Pull Request resolved: https://github.com/pytorch/audio/pull/3421

Reviewed By: mpc001

Differential Revision: D46799748

Pulled By: mthrok

fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9

77cdd160

15 Jun, 2023 1 commit

Update forced alignment tutorial (#3440) · 18601691

moto authored Jun 15, 2023

Summary:
* Fix backtrack visualization (the cooridnate was off-by-one.)
* Add note about the simplification and the new align API
* Explicitly handle SOS and EOS

Pull Request resolved: https://github.com/pytorch/audio/pull/3440

Reviewed By: xiaohui-zhang

Differential Revision: D46761282

Pulled By: mthrok

fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8

18601691

14 Jun, 2023 1 commit

Add resample option to AudioEffector (#3374) · 406e9c8d

moto authored Jun 14, 2023

Summary:
Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate.

Pull Request resolved: https://github.com/pytorch/audio/pull/3374

Differential Revision: D46235358

Pulled By: mthrok

fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4

406e9c8d

13 Jun, 2023 2 commits

[SoX/Flac] disable xmms_plugin dependency (#3436) · 58a51b5b

Kyle Finn authored Jun 13, 2023

Summary:
This plugin pulls glib and gtk which breaks the build on some headless systems

Since the plugin is not actually used, it seems right to disable it

This change fixed the build on my system

Pull Request resolved: https://github.com/pytorch/audio/pull/3436

Differential Revision: D46683297

Pulled By: mthrok

fbshipit-source-id: 5b1c1eee1929f4a69a1cc6c7d7bb3ed998ec5872

58a51b5b

Fix build doc (#3435) · 0f682c77

moto authored Jun 13, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3435

Reviewed By: nateanl

Differential Revision: D46659362

Pulled By: mthrok

fbshipit-source-id: ffa033ad6759de6fd958b63ac51a4a1153ffb45d

0f682c77

12 Jun, 2023 1 commit

feat: add guard in `lfilter` for a non-default cuda device (#3432) · c76d952e

Chin-Yun Yu authored Jun 12, 2023

Summary:
Should resolve https://github.com/pytorch/audio/issues/3425

cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/3432

Differential Revision: D46656180

Pulled By: mthrok

fbshipit-source-id: 5c534bee2f143ef5cb5e50ec74828012dbcab7e9

c76d952e

09 Jun, 2023 3 commits

Use torch/types.h where possible (#3422) · c5877157

moto authored Jun 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3422

Differential Revision: D46558184

Pulled By: mthrok

fbshipit-source-id: a775c4fb193496d9b2bf9db7bee186ee23512b99

c5877157

Disable HF integration test (#3431) · f5d7635e

moto authored Jun 09, 2023

Summary:
The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test.

See https://github.com/pytorch/audio/issues/3430

Pull Request resolved: https://github.com/pytorch/audio/pull/3431

Differential Revision: D46592883

Pulled By: mthrok

fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6

f5d7635e

Fix the input pixel format when using GPU video encoder (#3426) · 30afaa9b

moto authored Jun 09, 2023

Summary:
StreamWriter's encoding pipeline looks like the following

1. convert tensor to AVFrame
2. pass AVFrame to AVFilter
3. pass the resulting AVFrame to AVCodecContext (encoder) and AVFormatContext (muxer)

When dealing with CUDA tensor, the AVFilter becomes no-op, as we have not added support for CUDA-compatible filters.

When CUDA frame is passed, the existing solution passes the software pixel format to AVFilter, which issues warning later as what AVFilter sees is AV_PIX_FMT_CUDA.

Since the filter itself is no-op, it functions as expected. But this commit fixes it.

See https://github.com/pytorch/audio/issues/3317

Pull Request resolved: https://github.com/pytorch/audio/pull/3426

Differential Revision: D46562370

Pulled By: mthrok

fbshipit-source-id: ce0131f1e50bcc826ee036fc0f35db2a5162b660

30afaa9b

08 Jun, 2023 8 commits

Introduce chroma filter bank function (#3395) · dfd0c5fd

Jeff Hwang authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3395

Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.

Reviewed By: mthrok

Differential Revision: D46307672

fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5

dfd0c5fd

[Nova] Add cache ffmpeg before building #2 (#3423) · 25e96f42

atalman authored Jun 08, 2023

Summary:
[Nova] Add cache ffmpeg before building - 2
Follow up after https://github.com/pytorch/audio/pull/3417, need to pass new arguments to test-infra workflows

Pull Request resolved: https://github.com/pytorch/audio/pull/3423

Reviewed By: mthrok

Differential Revision: D46559344

Pulled By: atalman

fbshipit-source-id: fa5cccc3bfb052688de4a05cc3b4f37fcbe3a6f5

25e96f42

Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca

moto authored Jun 08, 2023

Summary:
StreamReader decoding process is composed of the three steps;

1. Decode the incoming AVPacket into AVFrame
2. Pass AVFrame through AVFilter to perform post process
3. Convert the resulgint AVFrame

The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.

For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405

AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.

Fix https://github.com/pytorch/audio/issues/3405

Pull Request resolved: https://github.com/pytorch/audio/pull/3419

Differential Revision: D46557505

Pulled By: mthrok

fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6

7dff24ca

Remove CCI badge from README (#3420) · a7fea8a6

moto authored Jun 08, 2023

Summary:
CI jobs are migrated from CCI to GHA

Pull Request resolved: https://github.com/pytorch/audio/pull/3420

Differential Revision: D46548562

Pulled By: mthrok

fbshipit-source-id: d7e17201e8b256efaa54543e445a0f139aa549b2

a7fea8a6

Clean up CI scripts (#3407) · f0803152

moto authored Jun 08, 2023

Summary:
- Moving the unit test scripts from .circleci to .github
- Remove docker file for unit test base
- Use the Conda from Docker image in Linux jobs.

Remaining follow-up items

- Reuse the unittest script in Linux GPU job like done in Linux CPU job.

The unit test script needs to be fixed to be used for Linux GPU job
in new GHA workflow. Keeping it as a separate follow-up work item.

Pull Request resolved: https://github.com/pytorch/audio/pull/3407

Differential Revision: D46498263

Pulled By: mthrok

fbshipit-source-id: d8256717a55bb4257151d819d3b2ebd453601eac

f0803152

Optimize Torchaudio Vad (#3382) · 1e117f57

Kuba Rad authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3382

The voice activity detector function was unoptimized, confusingly written, and buggy.

The optimizations created here allow for the function to run roughly 17x faster.
The main optimizations were to loop over windows of audio rather than individual audio samples. Reducing the number of copies also helped.

There was an off by one error where the array slice referenced was [1: 16001] (for the default settings) instead of [0: 16000]

Reviewed By: hwangjeff

Differential Revision: D44749359

fbshipit-source-id: c76c9412e70cdc6fcd527d113603c88f78480558

1e117f57

Merge all the lint/style checks to pre-commit hook (#3414) · c3ca2562

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3414

Differential Revision: D46536717

Pulled By: mthrok

fbshipit-source-id: 505bdcdd1b59ca9fe5afc2c8516a0a821e2b8d7e

c3ca2562

[Nova] Add cache ffmpeg before building (#3417) · 5ca03f42

atalman authored Jun 07, 2023

Summary:
[Nova] Add cache ffmpeg before building

Pull Request resolved: https://github.com/pytorch/audio/pull/3417

Reviewed By: mthrok

Differential Revision: D46537892

Pulled By: atalman

fbshipit-source-id: 9f8dc0ecfc305c3b378557d46f89a5d7de67a165

5ca03f42