Commits · 627c37a9e7f98ae9f6d9ff981dcae0f83f77d731 · OpenDAS / Torchaudio

21 Jun, 2023 1 commit

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

16 Jun, 2023 1 commit

Add LRS3 data preparation (#3421) · 77cdd160

Pingchuan Ma authored Jun 16, 2023

Summary:
This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.

This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.

Pull Request resolved: https://github.com/pytorch/audio/pull/3421

Reviewed By: mpc001

Differential Revision: D46799748

Pulled By: mthrok

fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9

77cdd160

15 Jun, 2023 1 commit

Update forced alignment tutorial (#3440) · 18601691

moto authored Jun 15, 2023

Summary:
* Fix backtrack visualization (the cooridnate was off-by-one.)
* Add note about the simplification and the new align API
* Explicitly handle SOS and EOS

Pull Request resolved: https://github.com/pytorch/audio/pull/3440

Reviewed By: xiaohui-zhang

Differential Revision: D46761282

Pulled By: mthrok

fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8

18601691

14 Jun, 2023 1 commit

Add resample option to AudioEffector (#3374) · 406e9c8d

moto authored Jun 14, 2023

Summary:
Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate.

Pull Request resolved: https://github.com/pytorch/audio/pull/3374

Differential Revision: D46235358

Pulled By: mthrok

fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4

406e9c8d

13 Jun, 2023 2 commits

[SoX/Flac] disable xmms_plugin dependency (#3436) · 58a51b5b

Kyle Finn authored Jun 13, 2023

Summary:
This plugin pulls glib and gtk which breaks the build on some headless systems

Since the plugin is not actually used, it seems right to disable it

This change fixed the build on my system

Pull Request resolved: https://github.com/pytorch/audio/pull/3436

Differential Revision: D46683297

Pulled By: mthrok

fbshipit-source-id: 5b1c1eee1929f4a69a1cc6c7d7bb3ed998ec5872

58a51b5b

Fix build doc (#3435) · 0f682c77

moto authored Jun 13, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3435

Reviewed By: nateanl

Differential Revision: D46659362

Pulled By: mthrok

fbshipit-source-id: ffa033ad6759de6fd958b63ac51a4a1153ffb45d

0f682c77

12 Jun, 2023 1 commit

feat: add guard in `lfilter` for a non-default cuda device (#3432) · c76d952e

Chin-Yun Yu authored Jun 12, 2023

Summary:
Should resolve https://github.com/pytorch/audio/issues/3425

cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/3432

Differential Revision: D46656180

Pulled By: mthrok

fbshipit-source-id: 5c534bee2f143ef5cb5e50ec74828012dbcab7e9

c76d952e

09 Jun, 2023 3 commits

Use torch/types.h where possible (#3422) · c5877157

moto authored Jun 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3422

Differential Revision: D46558184

Pulled By: mthrok

fbshipit-source-id: a775c4fb193496d9b2bf9db7bee186ee23512b99

c5877157

Disable HF integration test (#3431) · f5d7635e

moto authored Jun 09, 2023

Summary:
The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test.

See https://github.com/pytorch/audio/issues/3430

Pull Request resolved: https://github.com/pytorch/audio/pull/3431

Differential Revision: D46592883

Pulled By: mthrok

fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6

f5d7635e

Fix the input pixel format when using GPU video encoder (#3426) · 30afaa9b

moto authored Jun 09, 2023

Summary:
StreamWriter's encoding pipeline looks like the following

1. convert tensor to AVFrame
2. pass AVFrame to AVFilter
3. pass the resulting AVFrame to AVCodecContext (encoder) and AVFormatContext (muxer)

When dealing with CUDA tensor, the AVFilter becomes no-op, as we have not added support for CUDA-compatible filters.

When CUDA frame is passed, the existing solution passes the software pixel format to AVFilter, which issues warning later as what AVFilter sees is AV_PIX_FMT_CUDA.

Since the filter itself is no-op, it functions as expected. But this commit fixes it.

See https://github.com/pytorch/audio/issues/3317

Pull Request resolved: https://github.com/pytorch/audio/pull/3426

Differential Revision: D46562370

Pulled By: mthrok

fbshipit-source-id: ce0131f1e50bcc826ee036fc0f35db2a5162b660

30afaa9b

08 Jun, 2023 8 commits

Introduce chroma filter bank function (#3395) · dfd0c5fd

Jeff Hwang authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3395

Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.

Reviewed By: mthrok

Differential Revision: D46307672

fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5

dfd0c5fd

[Nova] Add cache ffmpeg before building #2 (#3423) · 25e96f42

atalman authored Jun 08, 2023

Summary:
[Nova] Add cache ffmpeg before building - 2
Follow up after https://github.com/pytorch/audio/pull/3417, need to pass new arguments to test-infra workflows

Pull Request resolved: https://github.com/pytorch/audio/pull/3423

Reviewed By: mthrok

Differential Revision: D46559344

Pulled By: atalman

fbshipit-source-id: fa5cccc3bfb052688de4a05cc3b4f37fcbe3a6f5

25e96f42

Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca

moto authored Jun 08, 2023

Summary:
StreamReader decoding process is composed of the three steps;

1. Decode the incoming AVPacket into AVFrame
2. Pass AVFrame through AVFilter to perform post process
3. Convert the resulgint AVFrame

The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.

For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405

AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.

Fix https://github.com/pytorch/audio/issues/3405

Pull Request resolved: https://github.com/pytorch/audio/pull/3419

Differential Revision: D46557505

Pulled By: mthrok

fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6

7dff24ca

Remove CCI badge from README (#3420) · a7fea8a6

moto authored Jun 08, 2023

Summary:
CI jobs are migrated from CCI to GHA

Pull Request resolved: https://github.com/pytorch/audio/pull/3420

Differential Revision: D46548562

Pulled By: mthrok

fbshipit-source-id: d7e17201e8b256efaa54543e445a0f139aa549b2

a7fea8a6

Clean up CI scripts (#3407) · f0803152

moto authored Jun 08, 2023

Summary:
- Moving the unit test scripts from .circleci to .github
- Remove docker file for unit test base
- Use the Conda from Docker image in Linux jobs.

Remaining follow-up items

- Reuse the unittest script in Linux GPU job like done in Linux CPU job.

The unit test script needs to be fixed to be used for Linux GPU job
in new GHA workflow. Keeping it as a separate follow-up work item.

Pull Request resolved: https://github.com/pytorch/audio/pull/3407

Differential Revision: D46498263

Pulled By: mthrok

fbshipit-source-id: d8256717a55bb4257151d819d3b2ebd453601eac

f0803152

Optimize Torchaudio Vad (#3382) · 1e117f57

Kuba Rad authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3382

The voice activity detector function was unoptimized, confusingly written, and buggy.

The optimizations created here allow for the function to run roughly 17x faster.
The main optimizations were to loop over windows of audio rather than individual audio samples. Reducing the number of copies also helped.

There was an off by one error where the array slice referenced was [1: 16001] (for the default settings) instead of [0: 16000]

Reviewed By: hwangjeff

Differential Revision: D44749359

fbshipit-source-id: c76c9412e70cdc6fcd527d113603c88f78480558

1e117f57

Merge all the lint/style checks to pre-commit hook (#3414) · c3ca2562

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3414

Differential Revision: D46536717

Pulled By: mthrok

fbshipit-source-id: 505bdcdd1b59ca9fe5afc2c8516a0a821e2b8d7e

c3ca2562

[Nova] Add cache ffmpeg before building (#3417) · 5ca03f42

atalman authored Jun 07, 2023

Summary:
[Nova] Add cache ffmpeg before building

Pull Request resolved: https://github.com/pytorch/audio/pull/3417

Reviewed By: mthrok

Differential Revision: D46537892

Pulled By: atalman

fbshipit-source-id: 9f8dc0ecfc305c3b378557d46f89a5d7de67a165

5ca03f42

07 Jun, 2023 2 commits

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

Make dlopen ffmpeg default off (#3418) · 91db978b

moto authored Jun 07, 2023

Summary:
To investigate https://github.com/pytorch/audio/issues/3411

Pull Request resolved: https://github.com/pytorch/audio/pull/3418

Differential Revision: D46535891

Pulled By: mthrok

fbshipit-source-id: b90bba399eb54f9f0ae073bd590cd8a46054ed7e

91db978b

06 Jun, 2023 4 commits

Fix style issue (#3410) · 27aa52fb

moto authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410

Differential Revision: D46496786

Pulled By: mthrok

fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059

27aa52fb

[Nova] Remove unused files (#3409) · 23e756af

atalman authored Jun 06, 2023

Summary:
We are using Project Nova workflows now. These are not required.

Same as: https://github.com/pytorch/vision/pull/7656

Pull Request resolved: https://github.com/pytorch/audio/pull/3409

Reviewed By: mthrok

Differential Revision: D46494331

Pulled By: atalman

fbshipit-source-id: a642ae55b75482918e0afb7c55dc876bc8356e70

23e756af

Revert D46126226: Update forced_align method to only support batch Tensors · bbc13b9a

Moto Hira authored Jun 06, 2023

Differential Revision:
D46126226

Original commit changeset: 42cb52b19d91

Original Phabricator Diff: D46126226

fbshipit-source-id: 372b2526d9e196e37e014f1556bf117d29bb1ac6

bbc13b9a

Update forced_align method to only support batch Tensors (#3365) · 5f17d81c

Zhaoheng Ni authored Jun 06, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3365

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: vineelpratap

Differential Revision: D46126226

fbshipit-source-id: 42cb52b19d91bbff7dc040ccf60350545d75b3a2

5f17d81c

05 Jun, 2023 1 commit

Clean-up ComputeKaldiPitch residue (#3403) · c076d1a8

moto authored Jun 05, 2023

Summary:
Follow up of: https://github.com/pytorch/audio/pull/3368

Remove files and lines no longer used.

Pull Request resolved: https://github.com/pytorch/audio/pull/3403

Differential Revision: D46441462

Pulled By: mthrok

fbshipit-source-id: 11b881ec4b24fa0d625c6aee9f4bd91f637f9923

c076d1a8

04 Jun, 2023 1 commit

Update HuBERT/SSL training recipes to support Lightning 2.x (#3396) · e9083571

Zhaoheng Ni authored Jun 04, 2023

Summary:
There are some BC-Breaking changes from pytorch_lightning to lightning library. The PR adjust those changes to support latest lightning library.

Pull Request resolved: https://github.com/pytorch/audio/pull/3396

Reviewed By: mthrok

Differential Revision: D46345206

Pulled By: nateanl

fbshipit-source-id: 59469c15dc5fe5466a99a5b5380eb4f98c2c633f

e9083571

03 Jun, 2023 1 commit

[audio][PR] Add option to dlopen FFmpeg libraries (#3402) · b7d3e89a

Moto Hira authored Jun 02, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3402

This is a second attempt of https://github.com/pytorch/audio/pull/3353.

The basic logic to enable dlopen for FFmpeg libraries are same.
It uses `at::DynamicLibrary`, which allows to compile torchaudio without
linking FFmpeg libraries.

This time, the option to enable this feature DLOPEN_FFMPEG has been added,
so that users have a way to disable this feature and keep using build-time
linking.

Please refer to stub.h for more technical detail.

Differential Revision: D46403783

fbshipit-source-id: ca3db57ff6bdc50c8c225d22f12f3e76c6dc3f16

b7d3e89a

02 Jun, 2023 3 commits

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

Update data augmentation tutorial (#3375) · 2ba36b47

moto authored Jun 02, 2023

Summary:
Replace sox_effects with `torchaudio.io.AudioEffector`

1. To show case the new and better feature
2. To prepare for the upcoming removal of file-like support object

Pull Request resolved: https://github.com/pytorch/audio/pull/3375

Reviewed By: nateanl

Differential Revision: D46379016

Pulled By: mthrok

fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315

2ba36b47

Revert D46059199: [audio][PR] Use dlopen for FFmpeg · ab7a39f7

Moto Hira authored Jun 02, 2023

Differential Revision:
D46059199

Original commit changeset: 4493a5fd8a4c

Original Phabricator Diff: D46059199

fbshipit-source-id: 71cde3f8cd870d1ad9114e3e87cdd1ba564441c0

ab7a39f7

01 Jun, 2023 8 commits

Use dlopen for FFmpeg (#3353) · b14ced1a

moto authored Jun 01, 2023

Summary:
This commit changes the way FFmpeg extension is built and used.
Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time,
It uses dlopen to search and link them at run time.

For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides
portable wrapper.

Pull Request resolved: https://github.com/pytorch/audio/pull/3353

Differential Revision: D46059199

Pulled By: mthrok

fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d

b14ced1a

[BC-breaking] Remove file-like object support from sox_io backend (#3035) · bc54ac8a

moto authored Jun 01, 2023

Summary:
This commit removes file-like obejct support so that we can remove custom patch

The motivation and plan is outlined in https://github.com/pytorch/audio/issues/2950.

Pull Request resolved: https://github.com/pytorch/audio/pull/3035

Reviewed By: hwangjeff

Differential Revision: D44695647

Pulled By: mthrok

fbshipit-source-id: 13af0234e288c041bc7b490e1f967f85ce7eb8ec

bc54ac8a

[Nova] Deleting Remaining CircleCI jobs (#3399) · cc89f743

Omkar Salpekar authored Jun 01, 2023

Summary:
This job completely deletes the CircleCI `config.yml`. Here is what was remaining in the config at the point of deletion:

Used Jobs:
* **Lint** - Now running on Nova - see https://github.com/pytorch/audio/actions/runs/5144082942 for an example run on the latest PR in trunk
* **CircleCI Consistency** - Not needed anymore now if there is no CCI config.

Unused Jobs:
* **build-ffmpeg-$OS** - For the build jobs, we are already building FFMPEG from source as part of the Nova workflows.
* **download-third-parties** - This is caching. We currently do not have caching in Nova jobs, but atalman is working on adding support for this as a future optimization.

Pull Request resolved: https://github.com/pytorch/audio/pull/3399

Reviewed By: mthrok

Differential Revision: D46363921

Pulled By: osalpekar

fbshipit-source-id: 8abf5b0c1612c3492908fb2f5797e6b0a3c70766

cc89f743

Fix style issue (#3398) · c7ac1aff

moto authored Jun 01, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3398

Reviewed By: nateanl

Differential Revision: D46354862

Pulled By: mthrok

fbshipit-source-id: b86dcdfeff8ed9db87b0b78eca20f6f18117e97e

c7ac1aff

Fix apply_codec to use named file (#3397) · 1dfac469

moto authored Jun 01, 2023

Summary:
Follow-up https://github.com/pytorch/audio/issues/3386 The intended change was to use path of temporary file, instead of file-like object

Pull Request resolved: https://github.com/pytorch/audio/pull/3397

Reviewed By: hwangjeff

Differential Revision: D46346189

Pulled By: mthrok

fbshipit-source-id: 44da799c6587bcb63a118a6313b7299bad742a40

1dfac469

Refactor arg mapping in ffmpeg save function (#3387) · b99e5f46

moto authored May 31, 2023

Summary:
The arguments of TorchAudio's save function ("format", "bits_per_sample" and "encoding")
are not one-to-one mapping to the arguments of FFmpeg encoding.

For example, to use vorbis codec, FFmpeg expects "ogg" container/extension with "vorbis"
encoder. It does not recognize "vorbis" extension like TorchAudio (libsox) does.

This commit refactors the logic to parse/map the arguments.

As a result it now properly works with vorbis and mp3 extension.

Pull Request resolved: https://github.com/pytorch/audio/pull/3387

Reviewed By: hwangjeff

Differential Revision: D46328787

Pulled By: mthrok

fbshipit-source-id: 36f993952a062bfec58a8b51be6aa86297571f90

b99e5f46

Update and deprecate apply_codec function (#3386) · d6dd497c

moto authored May 31, 2023

Summary:
To prepare for the upcoming removal of file-like object support from sox_io backend,
this commit changes apply_codec function to use tempfile.

`apply_codec` function is now deprecated and users are encourated to use `torchaudio.io.AudioEffector`.
We will not remove the function itself, but will remove the entry from the doc.

Pull Request resolved: https://github.com/pytorch/audio/pull/3386

Reviewed By: hwangjeff

Differential Revision: D46330610

Pulled By: mthrok

fbshipit-source-id: 3071bdefa05b4cbb9f00629bef50f0981eae89b4

d6dd497c

Delete CCI Linux and MacOS Unittest Jobs (#3391) · d5d94b7e

Omkar Salpekar authored May 31, 2023

Summary:
Deprecates the Linux and MacOS Unittest jobs now that they've been running on Nova for over a week.

Aside: There was also a stylecheck job that was dependent on the Linux Unittest job. I also put up https://github.com/pytorch/audio/pull/3390 to move that stylecheck job to Nova. I'm happy to reintroduce the CCI stylecheck job standalone in CCI if we want the Nova version to run on main for a week.

Pull Request resolved: https://github.com/pytorch/audio/pull/3391

Reviewed By: mthrok

Differential Revision: D46324198

Pulled By: osalpekar

fbshipit-source-id: 2115748e153c5dee1a38db2b6230acebc4f56927

d5d94b7e

31 May, 2023 2 commits

[Nova] Stylechecks on Nova (#3390) · f7cb6c68

Omkar Salpekar authored May 31, 2023

Summary:
Introducing the stylecheck job on Nova. It seems like it is failing on trunk, but the functionality of this job itself is working and it fails with the same error as it does on trunk with CCI.

Pull Request resolved: https://github.com/pytorch/audio/pull/3390

Reviewed By: mthrok

Differential Revision: D46324223

Pulled By: osalpekar

fbshipit-source-id: 1324202e53569d610559ef6f1b90cb5c364e6909

f7cb6c68

[Nova] Lint on GHA (#3341) · 5d0697bc

Omkar Salpekar authored May 31, 2023

Summary:
See title. If all is well, we can deprecate the CCI job in a few days.

Pull Request resolved: https://github.com/pytorch/audio/pull/3341

Reviewed By: mthrok

Differential Revision: D46324265

Pulled By: osalpekar

fbshipit-source-id: bc706c6ae4285d4085dc5f0223ea41d8fc290f1c

5d0697bc