Commits · bf07ea6baf7a61ca0979715a5e27b26b6272c7d2 · OpenDAS / Torchaudio

15 Aug, 2023 1 commit

[BC-breaking] Update pre-built ffmpeg4 to 4.4.4 (#3561) · bf07ea6b

moto authored Aug 15, 2023

Summary:
In https://github.com/pytorch/audio/pull/3460, we switched the build process for FFmpeg extension.
Since it is complicated to install FFmpeg in some environments, at build time, pre-built binaries and its headers
are downloaded and used as a scaffolding for torchaudio build.

Now even though we did not change any code or FFmpeg version, it turned out that this causes segmentation
fault on Ubuntu when using system Python and FFmpeg 4.4 installed via aptitude.
While investigating the issue, I swapped the said pre-built FFmpeg scaffolding with FFmpeg 4.4 from aptitude,
and the segmentation fault did not happen. This indicates that it is binary compatibility issue.

Before https://github.com/pytorch/audio/issues/3460, each binary build job was building FFmpeg 4.1.8 using the same compiler used to build torchaudio,
but after https://github.com/pytorch/audio/issues/3460 the environments to build FFmpeg 4.1.8 and torchaudio are different. My hypothesis is that
this difference is causing some ABI incompatibility when linking against FFmpeg 4.4. (Also, I don't remember well,
but I read somewhere that 4.4 has a different ABI)

Through experiments, it turned out upgrading the pre-built FFmpeg scaffolding to 4.4 resolves this.
So this commit upgrade the pre-built FFmpeg 4 to 4.4.
The potential (yet unconfirmed) downside is that torchaudio will no longer work with 4.1, 4.2, and 4.3.
Since FFmpeg 4.4 is what Ubuntu 20.04 and 22.04 support by default, and Google Colab is also on 20.04,
I think it is more important to support 4.4.

Therefore we drop the support for 4.1-4.3 from normal build (and official distributions). Those who wish to
use 4.1-4.3 can build torchaudio from source by linking to specific FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3561

Reviewed By: hwangjeff

Differential Revision: D48340201

Pulled By: mthrok

fbshipit-source-id: 7ece82910f290c7cf83f58311c4cf6a384e8795e

bf07ea6b

10 Aug, 2023 1 commit

Misc tutorial updates (#3546) · bc264256

moto authored Aug 10, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3546

Reviewed By: huangruizhe

Differential Revision: D48219274

Pulled By: mthrok

fbshipit-source-id: 6881f039bf70cf7240fbcfeb48443471ef457bd4

bc264256

08 Aug, 2023 2 commits

Updating CTC FA tutorial (#3542) · eab8aa74

moto authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3542

Reviewed By: huangruizhe

Differential Revision: D48166025

Pulled By: mthrok

fbshipit-source-id: 29fee7dbf08394993972ec2967f94ce9fcb1c853

eab8aa74

Adopt MMS_FA bundle in multilingual FA tutorials (#3534) · 19e9046a

moto authored Aug 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3534

Reviewed By: huangruizhe

Differential Revision: D48155817

Pulled By: mthrok

fbshipit-source-id: a3d45fdfd360f9668063a3ecb3b00364290134c9

19e9046a

04 Aug, 2023 1 commit

Update ctc forced alignment tutorial (#3529) · b645c07b

moto authored Aug 04, 2023

Summary:
- Simplify the step to generate token-level alignment

Pull Request resolved: https://github.com/pytorch/audio/pull/3529

Reviewed By: huangruizhe

Differential Revision: D48066787

Pulled By: mthrok

fbshipit-source-id: 452c243d278e508926a59894928e280fea76dcc6

b645c07b

01 Aug, 2023 1 commit

Add cuctc tutorial, change blank skip threshold into prob (#3297) · 732c94a3

Yuekai Zhang authored Aug 01, 2023

Summary:
Add a separate tutorial for cuctc.
Reslove https://github.com/pytorch/audio/issues/3096

Pull Request resolved: https://github.com/pytorch/audio/pull/3297

Reviewed By: huangruizhe

Differential Revision: D47928400

Pulled By: mthrok

fbshipit-source-id: 8c16492fb4d007b6ea7969ba77c866a51749c0ec

732c94a3

31 Jul, 2023 2 commits

Migrate torch.norm to torch.linalg.vector_norm (#3522) · 8a2e12d3

moto authored Jul 31, 2023

Summary:
torch.norm is now deprecated.
The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm

Resolves https://github.com/pytorch/audio/issues/3484

Pull Request resolved: https://github.com/pytorch/audio/pull/3522

Reviewed By: huangruizhe

Differential Revision: D47926659

Pulled By: mthrok

fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084

8a2e12d3

Set and tweak global matplotlib configuration in tutorials (#3515) · 84b12306

moto authored Jul 31, 2023

Summary:
- Set global matplotlib rc params
- Fix style check
- Fix and updates FA tutorial plots
- Add av-asr index cars

Pull Request resolved: https://github.com/pytorch/audio/pull/3515

Reviewed By: huangruizhe

Differential Revision: D47894156

Pulled By: mthrok

fbshipit-source-id: b40d8d31f12ffc2b337e35e632afc216e9d59a6e

84b12306

29 Jul, 2023 1 commit

Refactor compat (#3518) · 8497ee91

moto authored Jul 29, 2023

Summary:
The I/O functions in _compat module was introduced there so that
everything related to FFmpeg is in torchaudio.io and FFmpeg library
initialization can be carried out in `torchaudio.io.__init__`.

Now that this constraint is removed, (all the initialization happens
at `torchaudio._extension.__init__`) and `_compat` is only used by
FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
for better locality.

Pull Request resolved: https://github.com/pytorch/audio/pull/3518

Reviewed By: huangruizhe

Differential Revision: D47877412

Pulled By: mthrok

fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f

8497ee91

28 Jul, 2023 2 commits

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

Add real-time av-asr tutorial (#3511) · d6aeaa74

Pingchuan Ma authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3511

Reviewed By: mthrok

Differential Revision: D47852108

Pulled By: mpc001

fbshipit-source-id: c0ecb4b5bcc8670013dcbe1164e3929f5793c8aa

d6aeaa74

25 Jul, 2023 1 commit

Update nvdec/nvenc tutorials (#3483) · 56e22664

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483

Differential Revision: D47725664

Pulled By: mthrok

fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b

56e22664

18 Jul, 2023 1 commit

Extract NVDEC tutorial from the current notebook (#3478) · 63244623

moto authored Jul 17, 2023

Summary:
Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders.

Pull Request resolved: https://github.com/pytorch/audio/pull/3478

Differential Revision: D47519672

Pulled By: mthrok

fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2

63244623

15 Jul, 2023 1 commit

Update notes on FFmpeg version (#3480) · 5a809aa0

moto authored Jul 15, 2023

Summary:
The nightly builds support FFmpeg version 4, 5 and 6.

Pull Request resolved: https://github.com/pytorch/audio/pull/3480

Differential Revision: D47482841

Pulled By: mthrok

fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9

5a809aa0

05 Jul, 2023 1 commit

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

28 Jun, 2023 1 commit

Follow up on tutorial update (#3449) · 4a121aa5

moto authored Jun 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449

Differential Revision: D47094402

Pulled By: mthrok

fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622

4a121aa5

26 Jun, 2023 1 commit

Add more explanation about `n_fft` (#3442) · 105b77fe

moto authored Jun 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442

Differential Revision: D46797481

Pulled By: mthrok

fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe

105b77fe

21 Jun, 2023 1 commit

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

15 Jun, 2023 1 commit

Update forced alignment tutorial (#3440) · 18601691

moto authored Jun 15, 2023

Summary:
* Fix backtrack visualization (the cooridnate was off-by-one.)
* Add note about the simplification and the new align API
* Explicitly handle SOS and EOS

Pull Request resolved: https://github.com/pytorch/audio/pull/3440

Reviewed By: xiaohui-zhang

Differential Revision: D46761282

Pulled By: mthrok

fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8

18601691

07 Jun, 2023 1 commit

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

02 Jun, 2023 2 commits

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

Update data augmentation tutorial (#3375) · 2ba36b47

moto authored Jun 02, 2023

Summary:
Replace sox_effects with `torchaudio.io.AudioEffector`

1. To show case the new and better feature
2. To prepare for the upcoming removal of file-like support object

Pull Request resolved: https://github.com/pytorch/audio/pull/3375

Reviewed By: nateanl

Differential Revision: D46379016

Pulled By: mthrok

fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315

2ba36b47

31 May, 2023 1 commit

Fixes to #3295 Improve RNN-T streaming decoding (#3379) · b8016e44

Jeff Hwang authored May 30, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3379

Fixes `RNNTBeamSearch.infer`'s docstring and removes unused import from tutorial.

Reviewed By: mthrok

Differential Revision: D46227174

fbshipit-source-id: 7c1c3f05a6476cb0437622dea6f3ae6cb3ea9468

b8016e44

26 May, 2023 2 commits

Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9

atalman authored May 26, 2023

Summary:
This reverts commit d38a7854.

This is temporary revert to unblock unit test migration from circleci to github

Pull Request resolved: https://github.com/pytorch/audio/pull/3377

Reviewed By: mthrok

Differential Revision: D46230498

Pulled By: atalman

fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d

37779ef9

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa

23 May, 2023 1 commit

[audio] add CTC forced alignment API tutorial to torchaudio (#3356) · f046f7e3

Xiaohui Zhang authored May 22, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3356

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: mthrok

Differential Revision: D46060238

fbshipit-source-id: d90e7db5669a58d1e9ef5c2ec3c6d175b4e394ec

f046f7e3

21 May, 2023 2 commits

Revert D45960556: add CTC forced alignment API tutorial to torchaudio · f9b4f74f

Moto Hira authored May 20, 2023

Differential Revision:
D45960556

Original commit changeset: 93f2271f7130

Original Phabricator Diff: D45960556

fbshipit-source-id: d22883fbcf9c5f2bb5d49274bcc194bdffaca72a

f9b4f74f

add CTC forced alignment API tutorial to torchaudio (#3351) · 93adc3e4

Xiaohui Zhang authored May 20, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3351

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: vineelpratap, nateanl

Differential Revision: D45960556

fbshipit-source-id: 93f2271f71307404e6a7732385cf7d646dc8ceaa

93adc3e4

16 May, 2023 1 commit

Upgrade to FFmpeg5 (#3298) · d38a7854

moto authored May 16, 2023

Summary:
This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4.

FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5.
Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg

Pull Request resolved: https://github.com/pytorch/audio/pull/3298

Reviewed By: hwangjeff

Differential Revision: D45865599

Pulled By: mthrok

fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b

d38a7854

10 May, 2023 2 commits

Add AudioEffector tutorial (#3226) · 2ab49e5b

moto authored May 09, 2023

Summary:
https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/3226

Reviewed By: nateanl

Differential Revision: D45402724

Pulled By: mthrok

fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262

2ab49e5b

Update `torchaudio` doc and tutorial (#3285) · 667c6a9e

moto authored May 09, 2023

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

667c6a9e

05 May, 2023 1 commit

Update squim tutorial (#3313) · 05ef7dc6

Zhaoheng Ni authored May 05, 2023

Summary:
Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.

Pull Request resolved: https://github.com/pytorch/audio/pull/3313

Reviewed By: hwangjeff

Differential Revision: D45620311

Pulled By: nateanl

fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557

05ef7dc6

29 Apr, 2023 1 commit

Add tutorial for TorchAudio-SQUIM pipelines (#3279) · 9b93e7df

Zhaoheng Ni authored Apr 29, 2023

Summary:
The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS).

Pull Request resolved: https://github.com/pytorch/audio/pull/3279

Reviewed By: hwangjeff

Differential Revision: D45415404

Pulled By: nateanl

fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903

9b93e7df

31 Mar, 2023 1 commit

Fix typo in forced alignment tutorial (#3222) · fda41bbf

Nouran Ali authored Mar 31, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3222

Reviewed By: nateanl

Differential Revision: D44539424

Pulled By: mthrok

fbshipit-source-id: 8fbcb5f9918c9930c939bcd448493fa5cf604545

fda41bbf

29 Mar, 2023 1 commit

Remove the note about AAC (#3214) · c07a96ab

moto authored Mar 29, 2023

Summary:
There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it.

Pull Request resolved: https://github.com/pytorch/audio/pull/3214

Reviewed By: nateanl

Differential Revision: D44504030

Pulled By: mthrok

fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef

c07a96ab

28 Mar, 2023 1 commit

Fix typo in audio resampling tutorial (#3212) · 0cd4e391

nateanl authored Mar 28, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3211

Pull Request resolved: https://github.com/pytorch/audio/pull/3212

Reviewed By: mthrok

Differential Revision: D44472523

Pulled By: nateanl

fbshipit-source-id: eb519b0045e7518ad13863a53271745a80d89a21

0cd4e391

16 Mar, 2023 1 commit

Fix initialization of `get_trellis`. (#3172) · a6b34a5d

jiyuntu-eero authored Mar 16, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3172

Reviewed By: mthrok

Differential Revision: D44090889

Pulled By: nateanl

fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264

a6b34a5d

02 Mar, 2023 1 commit

Fix doc build (#3125) · 1ed38095

moto authored Mar 01, 2023

Summary:
Fix build_doc job

https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8

- build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL.
- Fix bash cell syntax in HW tutorial
- Fix C++ doc
- Fix duplicated target name in streamwriter tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/3125

Reviewed By: xiaohui-zhang

Differential Revision: D43724078

Pulled By: mthrok

fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c

1ed38095

15 Feb, 2023 1 commit

Update data augmentation tutorial to use new operators (#3062) · b9ef69d1

hwangjeff authored Feb 15, 2023

Summary:
Updates tutorial "Audio Data Augmentation" to use two of the newly introduced data augmentation operators in beta: `torchaudio.functional.fftconvolve` and `torchaudio.functional.add_noise`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3062

Reviewed By: mthrok

Differential Revision: D43298120

Pulled By: hwangjeff

fbshipit-source-id: 09ca736a5c67242568515d600b7d31eab32c2df1

b9ef69d1

30 Jan, 2023 1 commit

Fix hybrid demucs tutorial for CUDA (#3017) · da9d1627

Yan Li authored Jan 30, 2023

Summary:
Currently there will be a few errors when this tutorial is run with a CUDA device.

The reasons being:
- The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU).
- When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`).

Pull Request resolved: https://github.com/pytorch/audio/pull/3017

Reviewed By: mthrok

Differential Revision: D42828526

Pulled By: nateanl

fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762

da9d1627