Commits · 2e0dfafa4242d05aea0a3fd38dbca896c1cab119 · OpenDAS / Torchaudio

10 Aug, 2023 1 commit

Misc tutorial updates (#3546) · bc264256

moto authored Aug 10, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3546

Reviewed By: huangruizhe

Differential Revision: D48219274

Pulled By: mthrok

fbshipit-source-id: 6881f039bf70cf7240fbcfeb48443471ef457bd4

bc264256

08 Aug, 2023 4 commits

Updating CTC FA tutorial (#3542) · eab8aa74

moto authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3542

Reviewed By: huangruizhe

Differential Revision: D48166025

Pulled By: mthrok

fbshipit-source-id: 29fee7dbf08394993972ec2967f94ce9fcb1c853

eab8aa74

Add tutorial link to AVSR recipe (#3532) · f7ab406a

Pingchuan Ma authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3532

Reviewed By: mthrok

Differential Revision: D48165499

Pulled By: mpc001

fbshipit-source-id: c87b3361f0e6282684f218b32888df883d56682b

f7ab406a

Adopt MMS_FA bundle in multilingual FA tutorials (#3534) · 19e9046a

moto authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3534

Reviewed By: huangruizhe

Differential Revision: D48155817

Pulled By: mthrok

fbshipit-source-id: a3d45fdfd360f9668063a3ecb3b00364290134c9

19e9046a

Librispeech RNNT recipe updates for pytorch lightening 2.0 (#3336) · e6c89731

Ruizhe (Ray) Huang authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3336

Reviewed By: mthrok

Differential Revision: D47846814

Pulled By: huangruizhe

fbshipit-source-id: dc12362bf243c52222dccadec3176e25e43dd652

e6c89731

04 Aug, 2023 1 commit

Update ctc forced alignment tutorial (#3529) · b645c07b

moto authored Aug 04, 2023

Summary:
- Simplify the step to generate token-level alignment

Pull Request resolved: https://github.com/pytorch/audio/pull/3529

Reviewed By: huangruizhe

Differential Revision: D48066787

Pulled By: mthrok

fbshipit-source-id: 452c243d278e508926a59894928e280fea76dcc6

b645c07b

01 Aug, 2023 1 commit

Add cuctc tutorial, change blank skip threshold into prob (#3297) · 732c94a3

Yuekai Zhang authored Aug 01, 2023

Summary:
Add a separate tutorial for cuctc.
Reslove https://github.com/pytorch/audio/issues/3096

Pull Request resolved: https://github.com/pytorch/audio/pull/3297

Reviewed By: huangruizhe

Differential Revision: D47928400

Pulled By: mthrok

fbshipit-source-id: 8c16492fb4d007b6ea7969ba77c866a51749c0ec

732c94a3

31 Jul, 2023 2 commits

Migrate torch.norm to torch.linalg.vector_norm (#3522) · 8a2e12d3

moto authored Jul 31, 2023

Summary:
torch.norm is now deprecated.
The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm

Resolves https://github.com/pytorch/audio/issues/3484

Pull Request resolved: https://github.com/pytorch/audio/pull/3522

Reviewed By: huangruizhe

Differential Revision: D47926659

Pulled By: mthrok

fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084

8a2e12d3

Set and tweak global matplotlib configuration in tutorials (#3515) · 84b12306

moto authored Jul 31, 2023

Summary:
- Set global matplotlib rc params
- Fix style check
- Fix and updates FA tutorial plots
- Add av-asr index cars

Pull Request resolved: https://github.com/pytorch/audio/pull/3515

Reviewed By: huangruizhe

Differential Revision: D47894156

Pulled By: mthrok

fbshipit-source-id: b40d8d31f12ffc2b337e35e632afc216e9d59a6e

84b12306

29 Jul, 2023 1 commit

Refactor compat (#3518) · 8497ee91

moto authored Jul 29, 2023

Summary:
The I/O functions in _compat module was introduced there so that
everything related to FFmpeg is in torchaudio.io and FFmpeg library
initialization can be carried out in `torchaudio.io.__init__`.

Now that this constraint is removed, (all the initialization happens
at `torchaudio._extension.__init__`) and `_compat` is only used by
FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
for better locality.

Pull Request resolved: https://github.com/pytorch/audio/pull/3518

Reviewed By: huangruizhe

Differential Revision: D47877412

Pulled By: mthrok

fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f

8497ee91

28 Jul, 2023 2 commits

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

Add real-time av-asr tutorial (#3511) · d6aeaa74

Pingchuan Ma authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3511

Reviewed By: mthrok

Differential Revision: D47852108

Pulled By: mpc001

fbshipit-source-id: c0ecb4b5bcc8670013dcbe1164e3929f5793c8aa

d6aeaa74

26 Jul, 2023 1 commit

av-asr: move video loading outside detector (#3498) · c977afe0

Pingchuan Ma authored Jul 26, 2023

Summary:
This PR moves video loading outside detector during pre-processing.

Pull Request resolved: https://github.com/pytorch/audio/pull/3498

Reviewed By: mthrok

Differential Revision: D47811044

Pulled By: mpc001

fbshipit-source-id: f17839b695b13d3cf2d9db343d7e9a0202eea7d5

c977afe0

25 Jul, 2023 2 commits

Update avsr recipe (#3493) · d4644793

Pingchuan Ma authored Jul 25, 2023

Summary:
This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/3493

Reviewed By: mthrok

Differential Revision: D47758072

Pulled By: mpc001

fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc

d4644793

Update nvdec/nvenc tutorials (#3483) · 56e22664

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483

Differential Revision: D47725664

Pulled By: mthrok

fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b

56e22664

24 Jul, 2023 1 commit

Move examples/asr/avsr_rnnt to examples/avsr folder (#3489) · 66f661df

Pingchuan Ma authored Jul 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489

Reviewed By: mthrok

Differential Revision: D47726448

Pulled By: mpc001

fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841

66f661df

18 Jul, 2023 1 commit

Extract NVDEC tutorial from the current notebook (#3478) · 63244623

moto authored Jul 17, 2023

Summary:
Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders.

Pull Request resolved: https://github.com/pytorch/audio/pull/3478

Differential Revision: D47519672

Pulled By: mthrok

fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2

63244623

15 Jul, 2023 1 commit

Update notes on FFmpeg version (#3480) · 5a809aa0

moto authored Jul 15, 2023

Summary:
The nightly builds support FFmpeg version 4, 5 and 6.

Pull Request resolved: https://github.com/pytorch/audio/pull/3480

Differential Revision: D47482841

Pulled By: mthrok

fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9

5a809aa0

05 Jul, 2023 1 commit

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

28 Jun, 2023 1 commit

Follow up on tutorial update (#3449) · 4a121aa5

moto authored Jun 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449

Differential Revision: D47094402

Pulled By: mthrok

fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622

4a121aa5

26 Jun, 2023 1 commit

Add more explanation about `n_fft` (#3442) · 105b77fe

moto authored Jun 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442

Differential Revision: D46797481

Pulled By: mthrok

fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe

105b77fe

21 Jun, 2023 1 commit

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

16 Jun, 2023 1 commit

Add LRS3 data preparation (#3421) · 77cdd160

Pingchuan Ma authored Jun 16, 2023

Summary:
This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.

This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.

Pull Request resolved: https://github.com/pytorch/audio/pull/3421

Reviewed By: mpc001

Differential Revision: D46799748

Pulled By: mthrok

fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9

77cdd160

15 Jun, 2023 1 commit

Update forced alignment tutorial (#3440) · 18601691

moto authored Jun 15, 2023

Summary:
* Fix backtrack visualization (the cooridnate was off-by-one.)
* Add note about the simplification and the new align API
* Explicitly handle SOS and EOS

Pull Request resolved: https://github.com/pytorch/audio/pull/3440

Reviewed By: xiaohui-zhang

Differential Revision: D46761282

Pulled By: mthrok

fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8

18601691

07 Jun, 2023 1 commit

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

06 Jun, 2023 1 commit

Fix style issue (#3410) · 27aa52fb

moto authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410

Differential Revision: D46496786

Pulled By: mthrok

fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059

27aa52fb

04 Jun, 2023 1 commit

Update HuBERT/SSL training recipes to support Lightning 2.x (#3396) · e9083571

Zhaoheng Ni authored Jun 04, 2023

Summary:
There are some BC-Breaking changes from pytorch_lightning to lightning library. The PR adjust those changes to support latest lightning library.

Pull Request resolved: https://github.com/pytorch/audio/pull/3396

Reviewed By: mthrok

Differential Revision: D46345206

Pulled By: nateanl

fbshipit-source-id: 59469c15dc5fe5466a99a5b5380eb4f98c2c633f

e9083571

02 Jun, 2023 2 commits

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

Update data augmentation tutorial (#3375) · 2ba36b47

moto authored Jun 02, 2023

Summary:
Replace sox_effects with `torchaudio.io.AudioEffector`

1. To show case the new and better feature
2. To prepare for the upcoming removal of file-like support object

Pull Request resolved: https://github.com/pytorch/audio/pull/3375

Reviewed By: nateanl

Differential Revision: D46379016

Pulled By: mthrok

fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315

2ba36b47

31 May, 2023 1 commit

Fixes to #3295 Improve RNN-T streaming decoding (#3379) · b8016e44

Jeff Hwang authored May 30, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3379

Fixes `RNNTBeamSearch.infer`'s docstring and removes unused import from tutorial.

Reviewed By: mthrok

Differential Revision: D46227174

fbshipit-source-id: 7c1c3f05a6476cb0437622dea6f3ae6cb3ea9468

b8016e44

26 May, 2023 2 commits

Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9

atalman authored May 26, 2023

Summary:
This reverts commit d38a7854.

This is temporary revert to unblock unit test migration from circleci to github

Pull Request resolved: https://github.com/pytorch/audio/pull/3377

Reviewed By: mthrok

Differential Revision: D46230498

Pulled By: atalman

fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d

37779ef9

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa

25 May, 2023 1 commit

Add LRS3 AV-ASR recipe (#3278) · c6624fa6

Pingchuan Ma authored May 25, 2023

Summary:
This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes.

CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff

Pull Request resolved: https://github.com/pytorch/audio/pull/3278

Reviewed By: nateanl

Differential Revision: D46121550

Pulled By: mpc001

fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6

c6624fa6

23 May, 2023 1 commit

[audio] add CTC forced alignment API tutorial to torchaudio (#3356) · f046f7e3

Xiaohui Zhang authored May 22, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3356

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: mthrok

Differential Revision: D46060238

fbshipit-source-id: d90e7db5669a58d1e9ef5c2ec3c6d175b4e394ec

f046f7e3

21 May, 2023 2 commits

Revert D45960556: add CTC forced alignment API tutorial to torchaudio · f9b4f74f

Moto Hira authored May 20, 2023

Differential Revision:
D45960556

Original commit changeset: 93f2271f7130

Original Phabricator Diff: D45960556

fbshipit-source-id: d22883fbcf9c5f2bb5d49274bcc194bdffaca72a

f9b4f74f

add CTC forced alignment API tutorial to torchaudio (#3351) · 93adc3e4

Xiaohui Zhang authored May 20, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3351

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: vineelpratap, nateanl

Differential Revision: D45960556

fbshipit-source-id: 93f2271f71307404e6a7732385cf7d646dc8ceaa

93adc3e4

16 May, 2023 1 commit

Upgrade to FFmpeg5 (#3298) · d38a7854

moto authored May 16, 2023

Summary:
This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4.

FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5.
Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg

Pull Request resolved: https://github.com/pytorch/audio/pull/3298

Reviewed By: hwangjeff

Differential Revision: D45865599

Pulled By: mthrok

fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b

d38a7854

10 May, 2023 2 commits

Add AudioEffector tutorial (#3226) · 2ab49e5b

moto authored May 09, 2023

Summary:
https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/3226

Reviewed By: nateanl

Differential Revision: D45402724

Pulled By: mthrok

fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262

2ab49e5b

Update `torchaudio` doc and tutorial (#3285) · 667c6a9e

moto authored May 09, 2023

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

667c6a9e

05 May, 2023 1 commit

Update squim tutorial (#3313) · 05ef7dc6

Zhaoheng Ni authored May 05, 2023

Summary:
Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.

Pull Request resolved: https://github.com/pytorch/audio/pull/3313

Reviewed By: hwangjeff

Differential Revision: D45620311

Pulled By: nateanl

fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557

05ef7dc6