Commits · ac63c4545e8f55fb7c79aa0e1668f204dc0f27e4 · OpenDAS / Torchaudio

19 Sep, 2023 1 commit

Fix doc nightly doc CI (#3611) · ac63c454

moto authored Sep 19, 2023

Some changes at matplotlib 3.8.0 rejects torch.Tensor passed to `plot` function.

ac63c454

13 Sep, 2023 1 commit
- Update README.md (#3609) · 4bbf65e4
  moto authored Sep 13, 2023
  
  4bbf65e4
08 Sep, 2023 1 commit
- Simplify trainining step in av-asr recipe (#3598) · 5e893d6f
  Pingchuan Ma authored Sep 08, 2023
```
* Simplify trainining step in av-asr recipe

* Run pre-commit
```
  5e893d6f
04 Sep, 2023 2 commits

Fix decoder call in Device ASR/AVSR tutorials (#3572) · 7d37f69c

hwangjeff authored Sep 04, 2023

Summary:
Fixes decoder calls and related code in Device ASR/AVSR tutorials to account for changes to RNN-T decoder introduced in https://github.com/pytorch/audio/issues/3295.

Pull Request resolved: https://github.com/pytorch/audio/pull/3572

Reviewed By: mthrok

Differential Revision: D48629428

Pulled By: hwangjeff

fbshipit-source-id: 63ede307fb4412aa28f88972d56dca8405607b7a

7d37f69c

Add incremental decoding support to CTC decoder (#3594) · 6fbc1e68

moto authored Sep 04, 2023

Summary:
Add incremental decoding support to CTC decoder.

Resolves https://github.com/pytorch/audio/issues/3574

Pull Request resolved: https://github.com/pytorch/audio/pull/3594

Reviewed By: nateanl

Differential Revision: D48940584

Pulled By: mthrok

fbshipit-source-id: 31871614008cf197cf3900f7183ec6cff34d2905

6fbc1e68

21 Aug, 2023 1 commit

Fix style (#3569) · 3318bcec

moto authored Aug 21, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3569

Reviewed By: huangruizhe

Differential Revision: D48508244

Pulled By: mthrok

fbshipit-source-id: 6e14267e2dbdf08ea3c25a1dab480cb0e908e0c3

3318bcec

20 Aug, 2023 1 commit

Add detail about CTC peaky behavior (#3566) · a25bcb6b

moto authored Aug 20, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3566

Reviewed By: huangruizhe

Differential Revision: D48499338

Pulled By: mthrok

fbshipit-source-id: 7f837e1a1f8116d7d82411607c91628b729077d8

a25bcb6b

15 Aug, 2023 1 commit

[BC-breaking] Update pre-built ffmpeg4 to 4.4.4 (#3561) · bf07ea6b

moto authored Aug 15, 2023

Summary:
In https://github.com/pytorch/audio/pull/3460, we switched the build process for FFmpeg extension.
Since it is complicated to install FFmpeg in some environments, at build time, pre-built binaries and its headers
are downloaded and used as a scaffolding for torchaudio build.

Now even though we did not change any code or FFmpeg version, it turned out that this causes segmentation
fault on Ubuntu when using system Python and FFmpeg 4.4 installed via aptitude.
While investigating the issue, I swapped the said pre-built FFmpeg scaffolding with FFmpeg 4.4 from aptitude,
and the segmentation fault did not happen. This indicates that it is binary compatibility issue.

Before https://github.com/pytorch/audio/issues/3460, each binary build job was building FFmpeg 4.1.8 using the same compiler used to build torchaudio,
but after https://github.com/pytorch/audio/issues/3460 the environments to build FFmpeg 4.1.8 and torchaudio are different. My hypothesis is that
this difference is causing some ABI incompatibility when linking against FFmpeg 4.4. (Also, I don't remember well,
but I read somewhere that 4.4 has a different ABI)

Through experiments, it turned out upgrading the pre-built FFmpeg scaffolding to 4.4 resolves this.
So this commit upgrade the pre-built FFmpeg 4 to 4.4.
The potential (yet unconfirmed) downside is that torchaudio will no longer work with 4.1, 4.2, and 4.3.
Since FFmpeg 4.4 is what Ubuntu 20.04 and 22.04 support by default, and Google Colab is also on 20.04,
I think it is more important to support 4.4.

Therefore we drop the support for 4.1-4.3 from normal build (and official distributions). Those who wish to
use 4.1-4.3 can build torchaudio from source by linking to specific FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3561

Reviewed By: hwangjeff

Differential Revision: D48340201

Pulled By: mthrok

fbshipit-source-id: 7ece82910f290c7cf83f58311c4cf6a384e8795e

bf07ea6b

10 Aug, 2023 1 commit

Misc tutorial updates (#3546) · bc264256

moto authored Aug 10, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3546

Reviewed By: huangruizhe

Differential Revision: D48219274

Pulled By: mthrok

fbshipit-source-id: 6881f039bf70cf7240fbcfeb48443471ef457bd4

bc264256

08 Aug, 2023 4 commits

Updating CTC FA tutorial (#3542) · eab8aa74

moto authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3542

Reviewed By: huangruizhe

Differential Revision: D48166025

Pulled By: mthrok

fbshipit-source-id: 29fee7dbf08394993972ec2967f94ce9fcb1c853

eab8aa74

Add tutorial link to AVSR recipe (#3532) · f7ab406a

Pingchuan Ma authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3532

Reviewed By: mthrok

Differential Revision: D48165499

Pulled By: mpc001

fbshipit-source-id: c87b3361f0e6282684f218b32888df883d56682b

f7ab406a

Adopt MMS_FA bundle in multilingual FA tutorials (#3534) · 19e9046a

moto authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3534

Reviewed By: huangruizhe

Differential Revision: D48155817

Pulled By: mthrok

fbshipit-source-id: a3d45fdfd360f9668063a3ecb3b00364290134c9

19e9046a

Librispeech RNNT recipe updates for pytorch lightening 2.0 (#3336) · e6c89731

Ruizhe (Ray) Huang authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3336

Reviewed By: mthrok

Differential Revision: D47846814

Pulled By: huangruizhe

fbshipit-source-id: dc12362bf243c52222dccadec3176e25e43dd652

e6c89731

04 Aug, 2023 1 commit

Update ctc forced alignment tutorial (#3529) · b645c07b

moto authored Aug 04, 2023

Summary:
- Simplify the step to generate token-level alignment

Pull Request resolved: https://github.com/pytorch/audio/pull/3529

Reviewed By: huangruizhe

Differential Revision: D48066787

Pulled By: mthrok

fbshipit-source-id: 452c243d278e508926a59894928e280fea76dcc6

b645c07b

01 Aug, 2023 1 commit

Add cuctc tutorial, change blank skip threshold into prob (#3297) · 732c94a3

Yuekai Zhang authored Aug 01, 2023

Summary:
Add a separate tutorial for cuctc.
Reslove https://github.com/pytorch/audio/issues/3096

Pull Request resolved: https://github.com/pytorch/audio/pull/3297

Reviewed By: huangruizhe

Differential Revision: D47928400

Pulled By: mthrok

fbshipit-source-id: 8c16492fb4d007b6ea7969ba77c866a51749c0ec

732c94a3

31 Jul, 2023 2 commits

Migrate torch.norm to torch.linalg.vector_norm (#3522) · 8a2e12d3

moto authored Jul 31, 2023

Summary:
torch.norm is now deprecated.
The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm

Resolves https://github.com/pytorch/audio/issues/3484

Pull Request resolved: https://github.com/pytorch/audio/pull/3522

Reviewed By: huangruizhe

Differential Revision: D47926659

Pulled By: mthrok

fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084

8a2e12d3

Set and tweak global matplotlib configuration in tutorials (#3515) · 84b12306

moto authored Jul 31, 2023

Summary:
- Set global matplotlib rc params
- Fix style check
- Fix and updates FA tutorial plots
- Add av-asr index cars

Pull Request resolved: https://github.com/pytorch/audio/pull/3515

Reviewed By: huangruizhe

Differential Revision: D47894156

Pulled By: mthrok

fbshipit-source-id: b40d8d31f12ffc2b337e35e632afc216e9d59a6e

84b12306

29 Jul, 2023 1 commit

Refactor compat (#3518) · 8497ee91

moto authored Jul 29, 2023

Summary:
The I/O functions in _compat module was introduced there so that
everything related to FFmpeg is in torchaudio.io and FFmpeg library
initialization can be carried out in `torchaudio.io.__init__`.

Now that this constraint is removed, (all the initialization happens
at `torchaudio._extension.__init__`) and `_compat` is only used by
FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
for better locality.

Pull Request resolved: https://github.com/pytorch/audio/pull/3518

Reviewed By: huangruizhe

Differential Revision: D47877412

Pulled By: mthrok

fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f

8497ee91

28 Jul, 2023 2 commits

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

Add real-time av-asr tutorial (#3511) · d6aeaa74

Pingchuan Ma authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3511

Reviewed By: mthrok

Differential Revision: D47852108

Pulled By: mpc001

fbshipit-source-id: c0ecb4b5bcc8670013dcbe1164e3929f5793c8aa

d6aeaa74

26 Jul, 2023 1 commit

av-asr: move video loading outside detector (#3498) · c977afe0

Pingchuan Ma authored Jul 26, 2023

Summary:
This PR moves video loading outside detector during pre-processing.

Pull Request resolved: https://github.com/pytorch/audio/pull/3498

Reviewed By: mthrok

Differential Revision: D47811044

Pulled By: mpc001

fbshipit-source-id: f17839b695b13d3cf2d9db343d7e9a0202eea7d5

c977afe0

25 Jul, 2023 2 commits

Update avsr recipe (#3493) · d4644793

Pingchuan Ma authored Jul 25, 2023

Summary:
This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/3493

Reviewed By: mthrok

Differential Revision: D47758072

Pulled By: mpc001

fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc

d4644793

Update nvdec/nvenc tutorials (#3483) · 56e22664

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483

Differential Revision: D47725664

Pulled By: mthrok

fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b

56e22664

24 Jul, 2023 1 commit

Move examples/asr/avsr_rnnt to examples/avsr folder (#3489) · 66f661df

Pingchuan Ma authored Jul 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489

Reviewed By: mthrok

Differential Revision: D47726448

Pulled By: mpc001

fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841

66f661df

18 Jul, 2023 1 commit

Extract NVDEC tutorial from the current notebook (#3478) · 63244623

moto authored Jul 17, 2023

Summary:
Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders.

Pull Request resolved: https://github.com/pytorch/audio/pull/3478

Differential Revision: D47519672

Pulled By: mthrok

fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2

63244623

15 Jul, 2023 1 commit

Update notes on FFmpeg version (#3480) · 5a809aa0

moto authored Jul 15, 2023

Summary:
The nightly builds support FFmpeg version 4, 5 and 6.

Pull Request resolved: https://github.com/pytorch/audio/pull/3480

Differential Revision: D47482841

Pulled By: mthrok

fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9

5a809aa0

05 Jul, 2023 1 commit

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

28 Jun, 2023 1 commit

Follow up on tutorial update (#3449) · 4a121aa5

moto authored Jun 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449

Differential Revision: D47094402

Pulled By: mthrok

fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622

4a121aa5

26 Jun, 2023 1 commit

Add more explanation about `n_fft` (#3442) · 105b77fe

moto authored Jun 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442

Differential Revision: D46797481

Pulled By: mthrok

fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe

105b77fe

21 Jun, 2023 1 commit

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

16 Jun, 2023 1 commit

Add LRS3 data preparation (#3421) · 77cdd160

Pingchuan Ma authored Jun 16, 2023

Summary:
This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.

This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.

Pull Request resolved: https://github.com/pytorch/audio/pull/3421

Reviewed By: mpc001

Differential Revision: D46799748

Pulled By: mthrok

fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9

77cdd160

15 Jun, 2023 1 commit

Update forced alignment tutorial (#3440) · 18601691

moto authored Jun 15, 2023

Summary:
* Fix backtrack visualization (the cooridnate was off-by-one.)
* Add note about the simplification and the new align API
* Explicitly handle SOS and EOS

Pull Request resolved: https://github.com/pytorch/audio/pull/3440

Reviewed By: xiaohui-zhang

Differential Revision: D46761282

Pulled By: mthrok

fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8

18601691

07 Jun, 2023 1 commit

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

06 Jun, 2023 1 commit

Fix style issue (#3410) · 27aa52fb

moto authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410

Differential Revision: D46496786

Pulled By: mthrok

fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059

27aa52fb

04 Jun, 2023 1 commit

Update HuBERT/SSL training recipes to support Lightning 2.x (#3396) · e9083571

Zhaoheng Ni authored Jun 04, 2023

Summary:
There are some BC-Breaking changes from pytorch_lightning to lightning library. The PR adjust those changes to support latest lightning library.

Pull Request resolved: https://github.com/pytorch/audio/pull/3396

Reviewed By: mthrok

Differential Revision: D46345206

Pulled By: nateanl

fbshipit-source-id: 59469c15dc5fe5466a99a5b5380eb4f98c2c633f

e9083571

02 Jun, 2023 2 commits

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

Update data augmentation tutorial (#3375) · 2ba36b47

moto authored Jun 02, 2023

Summary:
Replace sox_effects with `torchaudio.io.AudioEffector`

1. To show case the new and better feature
2. To prepare for the upcoming removal of file-like support object

Pull Request resolved: https://github.com/pytorch/audio/pull/3375

Reviewed By: nateanl

Differential Revision: D46379016

Pulled By: mthrok

fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315

2ba36b47

31 May, 2023 1 commit

Fixes to #3295 Improve RNN-T streaming decoding (#3379) · b8016e44

Jeff Hwang authored May 30, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3379

Fixes `RNNTBeamSearch.infer`'s docstring and removes unused import from tutorial.

Reviewed By: mthrok

Differential Revision: D46227174

fbshipit-source-id: 7c1c3f05a6476cb0437622dea6f3ae6cb3ea9468

b8016e44

26 May, 2023 2 commits

Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9

atalman authored May 26, 2023

Summary:
This reverts commit d38a7854.

This is temporary revert to unblock unit test migration from circleci to github

Pull Request resolved: https://github.com/pytorch/audio/pull/3377

Reviewed By: mthrok

Differential Revision: D46230498

Pulled By: atalman

fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d

37779ef9

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa