Commits · ca66a1d3a0a000031ee0927c17c5b223dc119077 · OpenDAS / Torchaudio

05 Jul, 2023 4 commits

Revert "[audio][PR] Add option to dlopen FFmpeg libraries (#3402)" (#3456) · ca66a1d3

moto authored Jul 05, 2023

Summary:
This reverts commit b7d3e89a.

We will use pre-built binaries instead of dlopen.

Pull Request resolved: https://github.com/pytorch/audio/pull/3456

Differential Revision: D47239681

Pulled By: mthrok

fbshipit-source-id: 0446a62410d914081184fc20c386afa00b1e41b6

ca66a1d3

Add stand alone job to build FFmpeg binaries (#3455) · 662f067b

moto authored Jul 05, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3455

Differential Revision: D47242316

Pulled By: mthrok

fbshipit-source-id: 0eb4bdb0a45fccfe9ff97eaed79db63cd7bfc7d8

662f067b

Untangle third party inclusion in CMake (#3457) · c34a1d6d

moto authored Jul 05, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3457

Differential Revision: D47241343

Pulled By: mthrok

fbshipit-source-id: fd1bfd1531397cb59e9cf11de9dede6949f8517e

c34a1d6d

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

03 Jul, 2023 1 commit

Update README (#3434) · 163157d3

Zhaoheng Ni authored Jul 03, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3434

Add one bullet point for `torchaudio.functional` and forced alignment as one example.

Reviewed By: mthrok

Differential Revision: D46658058

fbshipit-source-id: 6e037b7bb6ed2fc2e27ad1e55c5728c17ce69ce8

163157d3

28 Jun, 2023 2 commits

include a link to index.rst (#3441) · a8ce4a87

Pingchuan Ma authored Jun 28, 2023

Summary:
Include Conformer/Emformer RNN-T ASR/VSR/AV-ASR link to index.rst

Pull Request resolved: https://github.com/pytorch/audio/pull/3441

Differential Revision: D47094158

Pulled By: mthrok

fbshipit-source-id: 9ab42ac2bf52a5ce488003897ffba2f10a6ca941

a8ce4a87

Follow up on tutorial update (#3449) · 4a121aa5

moto authored Jun 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449

Differential Revision: D47094402

Pulled By: mthrok

fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622

4a121aa5

26 Jun, 2023 1 commit

Add more explanation about `n_fft` (#3442) · 105b77fe

moto authored Jun 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442

Differential Revision: D46797481

Pulled By: mthrok

fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe

105b77fe

21 Jun, 2023 2 commits

Introduce chroma spectrogram transform (#3427) · 70968293

Jeff Hwang authored Jun 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3427

Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms.

Reviewed By: mthrok

Differential Revision: D46547418

fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960

70968293

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

16 Jun, 2023 1 commit

Add LRS3 data preparation (#3421) · 77cdd160

Pingchuan Ma authored Jun 16, 2023

Summary:
This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.

This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.

Pull Request resolved: https://github.com/pytorch/audio/pull/3421

Reviewed By: mpc001

Differential Revision: D46799748

Pulled By: mthrok

fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9

77cdd160

15 Jun, 2023 1 commit

Update forced alignment tutorial (#3440) · 18601691

moto authored Jun 15, 2023

Summary:
* Fix backtrack visualization (the cooridnate was off-by-one.)
* Add note about the simplification and the new align API
* Explicitly handle SOS and EOS

Pull Request resolved: https://github.com/pytorch/audio/pull/3440

Reviewed By: xiaohui-zhang

Differential Revision: D46761282

Pulled By: mthrok

fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8

18601691

14 Jun, 2023 1 commit

Add resample option to AudioEffector (#3374) · 406e9c8d

moto authored Jun 14, 2023

Summary:
Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate.

Pull Request resolved: https://github.com/pytorch/audio/pull/3374

Differential Revision: D46235358

Pulled By: mthrok

fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4

406e9c8d

13 Jun, 2023 2 commits

[SoX/Flac] disable xmms_plugin dependency (#3436) · 58a51b5b

Kyle Finn authored Jun 13, 2023

Summary:
This plugin pulls glib and gtk which breaks the build on some headless systems

Since the plugin is not actually used, it seems right to disable it

This change fixed the build on my system

Pull Request resolved: https://github.com/pytorch/audio/pull/3436

Differential Revision: D46683297

Pulled By: mthrok

fbshipit-source-id: 5b1c1eee1929f4a69a1cc6c7d7bb3ed998ec5872

58a51b5b

Fix build doc (#3435) · 0f682c77

moto authored Jun 13, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3435

Reviewed By: nateanl

Differential Revision: D46659362

Pulled By: mthrok

fbshipit-source-id: ffa033ad6759de6fd958b63ac51a4a1153ffb45d

0f682c77

12 Jun, 2023 1 commit

feat: add guard in `lfilter` for a non-default cuda device (#3432) · c76d952e

Chin-Yun Yu authored Jun 12, 2023

Summary:
Should resolve https://github.com/pytorch/audio/issues/3425

cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/3432

Differential Revision: D46656180

Pulled By: mthrok

fbshipit-source-id: 5c534bee2f143ef5cb5e50ec74828012dbcab7e9

c76d952e

09 Jun, 2023 3 commits

Use torch/types.h where possible (#3422) · c5877157

moto authored Jun 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3422

Differential Revision: D46558184

Pulled By: mthrok

fbshipit-source-id: a775c4fb193496d9b2bf9db7bee186ee23512b99

c5877157

Disable HF integration test (#3431) · f5d7635e

moto authored Jun 09, 2023

Summary:
The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test.

See https://github.com/pytorch/audio/issues/3430

Pull Request resolved: https://github.com/pytorch/audio/pull/3431

Differential Revision: D46592883

Pulled By: mthrok

fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6

f5d7635e

Fix the input pixel format when using GPU video encoder (#3426) · 30afaa9b

moto authored Jun 09, 2023

Summary:
StreamWriter's encoding pipeline looks like the following

1. convert tensor to AVFrame
2. pass AVFrame to AVFilter
3. pass the resulting AVFrame to AVCodecContext (encoder) and AVFormatContext (muxer)

When dealing with CUDA tensor, the AVFilter becomes no-op, as we have not added support for CUDA-compatible filters.

When CUDA frame is passed, the existing solution passes the software pixel format to AVFilter, which issues warning later as what AVFilter sees is AV_PIX_FMT_CUDA.

Since the filter itself is no-op, it functions as expected. But this commit fixes it.

See https://github.com/pytorch/audio/issues/3317

Pull Request resolved: https://github.com/pytorch/audio/pull/3426

Differential Revision: D46562370

Pulled By: mthrok

fbshipit-source-id: ce0131f1e50bcc826ee036fc0f35db2a5162b660

30afaa9b

08 Jun, 2023 8 commits

Introduce chroma filter bank function (#3395) · dfd0c5fd

Jeff Hwang authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3395

Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.

Reviewed By: mthrok

Differential Revision: D46307672

fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5

dfd0c5fd

[Nova] Add cache ffmpeg before building #2 (#3423) · 25e96f42

atalman authored Jun 08, 2023

Summary:
[Nova] Add cache ffmpeg before building - 2
Follow up after https://github.com/pytorch/audio/pull/3417, need to pass new arguments to test-infra workflows

Pull Request resolved: https://github.com/pytorch/audio/pull/3423

Reviewed By: mthrok

Differential Revision: D46559344

Pulled By: atalman

fbshipit-source-id: fa5cccc3bfb052688de4a05cc3b4f37fcbe3a6f5

25e96f42

Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca

moto authored Jun 08, 2023

Summary:
StreamReader decoding process is composed of the three steps;

1. Decode the incoming AVPacket into AVFrame
2. Pass AVFrame through AVFilter to perform post process
3. Convert the resulgint AVFrame

The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.

For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405

AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.

Fix https://github.com/pytorch/audio/issues/3405

Pull Request resolved: https://github.com/pytorch/audio/pull/3419

Differential Revision: D46557505

Pulled By: mthrok

fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6

7dff24ca

Remove CCI badge from README (#3420) · a7fea8a6

moto authored Jun 08, 2023

Summary:
CI jobs are migrated from CCI to GHA

Pull Request resolved: https://github.com/pytorch/audio/pull/3420

Differential Revision: D46548562

Pulled By: mthrok

fbshipit-source-id: d7e17201e8b256efaa54543e445a0f139aa549b2

a7fea8a6

Clean up CI scripts (#3407) · f0803152

moto authored Jun 08, 2023

Summary:
- Moving the unit test scripts from .circleci to .github
- Remove docker file for unit test base
- Use the Conda from Docker image in Linux jobs.

Remaining follow-up items

- Reuse the unittest script in Linux GPU job like done in Linux CPU job.

The unit test script needs to be fixed to be used for Linux GPU job
in new GHA workflow. Keeping it as a separate follow-up work item.

Pull Request resolved: https://github.com/pytorch/audio/pull/3407

Differential Revision: D46498263

Pulled By: mthrok

fbshipit-source-id: d8256717a55bb4257151d819d3b2ebd453601eac

f0803152

Optimize Torchaudio Vad (#3382) · 1e117f57

Kuba Rad authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3382

The voice activity detector function was unoptimized, confusingly written, and buggy.

The optimizations created here allow for the function to run roughly 17x faster.
The main optimizations were to loop over windows of audio rather than individual audio samples. Reducing the number of copies also helped.

There was an off by one error where the array slice referenced was [1: 16001] (for the default settings) instead of [0: 16000]

Reviewed By: hwangjeff

Differential Revision: D44749359

fbshipit-source-id: c76c9412e70cdc6fcd527d113603c88f78480558

1e117f57

Merge all the lint/style checks to pre-commit hook (#3414) · c3ca2562

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3414

Differential Revision: D46536717

Pulled By: mthrok

fbshipit-source-id: 505bdcdd1b59ca9fe5afc2c8516a0a821e2b8d7e

c3ca2562

[Nova] Add cache ffmpeg before building (#3417) · 5ca03f42

atalman authored Jun 07, 2023

Summary:
[Nova] Add cache ffmpeg before building

Pull Request resolved: https://github.com/pytorch/audio/pull/3417

Reviewed By: mthrok

Differential Revision: D46537892

Pulled By: atalman

fbshipit-source-id: 9f8dc0ecfc305c3b378557d46f89a5d7de67a165

5ca03f42

07 Jun, 2023 2 commits

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

Make dlopen ffmpeg default off (#3418) · 91db978b

moto authored Jun 07, 2023

Summary:
To investigate https://github.com/pytorch/audio/issues/3411

Pull Request resolved: https://github.com/pytorch/audio/pull/3418

Differential Revision: D46535891

Pulled By: mthrok

fbshipit-source-id: b90bba399eb54f9f0ae073bd590cd8a46054ed7e

91db978b

06 Jun, 2023 4 commits

Fix style issue (#3410) · 27aa52fb

moto authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410

Differential Revision: D46496786

Pulled By: mthrok

fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059

27aa52fb

[Nova] Remove unused files (#3409) · 23e756af

atalman authored Jun 06, 2023

Summary:
We are using Project Nova workflows now. These are not required.

Same as: https://github.com/pytorch/vision/pull/7656

Pull Request resolved: https://github.com/pytorch/audio/pull/3409

Reviewed By: mthrok

Differential Revision: D46494331

Pulled By: atalman

fbshipit-source-id: a642ae55b75482918e0afb7c55dc876bc8356e70

23e756af

Revert D46126226: Update forced_align method to only support batch Tensors · bbc13b9a

Moto Hira authored Jun 06, 2023

Differential Revision:
D46126226

Original commit changeset: 42cb52b19d91

Original Phabricator Diff: D46126226

fbshipit-source-id: 372b2526d9e196e37e014f1556bf117d29bb1ac6

bbc13b9a

Update forced_align method to only support batch Tensors (#3365) · 5f17d81c

Zhaoheng Ni authored Jun 06, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3365

Reviewed By: vineelpratap

Differential Revision: D46126226

fbshipit-source-id: 42cb52b19d91bbff7dc040ccf60350545d75b3a2

5f17d81c

05 Jun, 2023 1 commit

Clean-up ComputeKaldiPitch residue (#3403) · c076d1a8

moto authored Jun 05, 2023

Summary:
Follow up of: https://github.com/pytorch/audio/pull/3368

Remove files and lines no longer used.

Pull Request resolved: https://github.com/pytorch/audio/pull/3403

Differential Revision: D46441462

Pulled By: mthrok

fbshipit-source-id: 11b881ec4b24fa0d625c6aee9f4bd91f637f9923

c076d1a8

04 Jun, 2023 1 commit

Update HuBERT/SSL training recipes to support Lightning 2.x (#3396) · e9083571

Zhaoheng Ni authored Jun 04, 2023

Summary:
There are some BC-Breaking changes from pytorch_lightning to lightning library. The PR adjust those changes to support latest lightning library.

Pull Request resolved: https://github.com/pytorch/audio/pull/3396

Reviewed By: mthrok

Differential Revision: D46345206

Pulled By: nateanl

fbshipit-source-id: 59469c15dc5fe5466a99a5b5380eb4f98c2c633f

e9083571

03 Jun, 2023 1 commit

[audio][PR] Add option to dlopen FFmpeg libraries (#3402) · b7d3e89a

Moto Hira authored Jun 02, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3402

This is a second attempt of https://github.com/pytorch/audio/pull/3353.

The basic logic to enable dlopen for FFmpeg libraries are same.
It uses `at::DynamicLibrary`, which allows to compile torchaudio without
linking FFmpeg libraries.

This time, the option to enable this feature DLOPEN_FFMPEG has been added,
so that users have a way to disable this feature and keep using build-time
linking.

Please refer to stub.h for more technical detail.

Differential Revision: D46403783

fbshipit-source-id: ca3db57ff6bdc50c8c225d22f12f3e76c6dc3f16

b7d3e89a

02 Jun, 2023 3 commits

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

Update data augmentation tutorial (#3375) · 2ba36b47

moto authored Jun 02, 2023

Summary:
Replace sox_effects with `torchaudio.io.AudioEffector`

1. To show case the new and better feature
2. To prepare for the upcoming removal of file-like support object

Pull Request resolved: https://github.com/pytorch/audio/pull/3375

Reviewed By: nateanl

Differential Revision: D46379016

Pulled By: mthrok

fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315

2ba36b47

Revert D46059199: [audio][PR] Use dlopen for FFmpeg · ab7a39f7

Moto Hira authored Jun 02, 2023

Differential Revision:
D46059199

Original commit changeset: 4493a5fd8a4c

Original Phabricator Diff: D46059199

fbshipit-source-id: 71cde3f8cd870d1ad9114e3e87cdd1ba564441c0

ab7a39f7

01 Jun, 2023 1 commit

Use dlopen for FFmpeg (#3353) · b14ced1a

moto authored Jun 01, 2023

Summary:
This commit changes the way FFmpeg extension is built and used.
Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time,
It uses dlopen to search and link them at run time.

For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides
portable wrapper.

Pull Request resolved: https://github.com/pytorch/audio/pull/3353

Differential Revision: D46059199

Pulled By: mthrok

fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d

b14ced1a