Commits · 2e0dfafa4242d05aea0a3fd38dbca896c1cab119 · OpenDAS / Torchaudio

14 Aug, 2023 1 commit

Update I/O and backend docs (#3555) · c0f25f21

moto authored Aug 14, 2023

Summary:
* Merge backend doc into torchaudio toplevel doc
* Update backend, dispatcher, installation doc

Pull Request resolved: https://github.com/pytorch/audio/pull/3555

Reviewed By: huangruizhe

Differential Revision: D48326812

Pulled By: mthrok

fbshipit-source-id: cc0d7326eacfebd341323b5d613ca1777255748b

c0f25f21

11 Aug, 2023 1 commit

Expose AudioMetadata (#3556) · 9467fc44

moto authored Aug 11, 2023

Summary:
`torchaudio.info` returns `AudioMetaData`. It should be exposed as public API, without referring `backend` submodule.

Pull Request resolved: https://github.com/pytorch/audio/pull/3556

Reviewed By: huangruizhe

Differential Revision: D48267349

Pulled By: mthrok

fbshipit-source-id: 6ccc0c32bf62fbdcb71495fc7d8d4cc29891538a

9467fc44

10 Aug, 2023 1 commit

Add Frechet distance function (#3545) · 06301c0a

Jeff Hwang authored Aug 10, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3545

Adds function for computing the Fréchet distance between two multivariate normal distributions.

Reviewed By: mthrok

Differential Revision: D48126102

fbshipit-source-id: e4e122b831e1e752037c03f5baa9451e81ef1697

06301c0a

07 Aug, 2023 2 commits

Add MMS FA Bundle (#3521) · 5e211d66

moto authored Aug 07, 2023

Summary:
Port the MMS FA model from tutorial to the library with post-processing module.

Pull Request resolved: https://github.com/pytorch/audio/pull/3521

Reviewed By: huangruizhe

Differential Revision: D48038285

Pulled By: mthrok

fbshipit-source-id: 571cf0fceaaab4790983be2719f1a85805b814f5

5e211d66

Add merge_tokens / TokenSpan (#3535) · 30668afb

moto authored Aug 07, 2023

Summary:
This commit adds `merge_tokens` function which removes repeated tokens from CTC token sequences returned from `forced_align`.

Resolving repeated tokens is a necessary step and almost universal, thus it makes sense to have such helper function in torchaudio.

Pull Request resolved: https://github.com/pytorch/audio/pull/3535

Reviewed By: huangruizhe

Differential Revision: D48111202

Pulled By: mthrok

fbshipit-source-id: 25354bfa210aa5c03f8c1d3e201f253ca3761b24

30668afb

03 Aug, 2023 1 commit

Refactor wav2vec2 pipeline misc helper functions (#3527) · 09aabcc1

moto authored Aug 02, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3527

Reviewed By: huangruizhe

Differential Revision: D48008822

Pulled By: mthrok

fbshipit-source-id: 4beae2956dfd1f00534832b70a1bf0897cba7812

09aabcc1

01 Aug, 2023 2 commits

Add cuctc tutorial, change blank skip threshold into prob (#3297) · 732c94a3

Yuekai Zhang authored Aug 01, 2023

Summary:
Add a separate tutorial for cuctc.
Reslove https://github.com/pytorch/audio/issues/3096

Pull Request resolved: https://github.com/pytorch/audio/pull/3297

Reviewed By: huangruizhe

Differential Revision: D47928400

Pulled By: mthrok

fbshipit-source-id: 8c16492fb4d007b6ea7969ba77c866a51749c0ec

732c94a3

Add pretrained VGGish inference pipeline (#3491) · cbfde17b

hwangjeff authored Jul 31, 2023

Summary:
Adds pre-trained VGGish inference pipeline ported from https://github.com/harritaylor/torchvggish and https://github.com/tensorflow/models/tree/master/research/audioset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3491

Reviewed By: mthrok

Differential Revision: D47738130

Pulled By: hwangjeff

fbshipit-source-id: 859c1ff1ec1b09dae4e26586169544571657cc67

cbfde17b

31 Jul, 2023 1 commit

Set and tweak global matplotlib configuration in tutorials (#3515) · 84b12306

moto authored Jul 31, 2023

Summary:
- Set global matplotlib rc params
- Fix style check
- Fix and updates FA tutorial plots
- Add av-asr index cars

Pull Request resolved: https://github.com/pytorch/audio/pull/3515

Reviewed By: huangruizhe

Differential Revision: D47894156

Pulled By: mthrok

fbshipit-source-id: b40d8d31f12ffc2b337e35e632afc216e9d59a6e

84b12306

28 Jul, 2023 3 commits

Update documentation about dependencies (#3517) · a051985f

moto authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3517

Reviewed By: huangruizhe

Differential Revision: D47858452

Pulled By: mthrok

fbshipit-source-id: 62ee6c8bb2669dd70f8ca25703a04dc8a9d19aec

a051985f

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

Add real-time av-asr tutorial (#3511) · d6aeaa74

Pingchuan Ma authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3511

Reviewed By: mthrok

Differential Revision: D47852108

Pulled By: mpc001

fbshipit-source-id: c0ecb4b5bcc8670013dcbe1164e3929f5793c8aa

d6aeaa74

27 Jul, 2023 1 commit

Replace libsox with stub library (#3497) · 8588fba1

moto authored Jul 27, 2023

Summary:
This commit updates the way libsox is integrated to torchaudio

1. We stop statically linking libsox, so torchaudio will not ship libsox.
2. We link libsox dynamically. Users are expected to install libsox by themselves.
3. We use stab library to build torchaudio.

Pull Request resolved: https://github.com/pytorch/audio/pull/3497

Differential Revision: D47803706

Pulled By: mthrok

fbshipit-source-id: 31b05495d81069186fa52d67beea360cc7e817a8

8588fba1

25 Jul, 2023 2 commits

Update nvdec/nvenc tutorials (#3483) · 56e22664

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483

Differential Revision: D47725664

Pulled By: mthrok

fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b

56e22664

Update AV-ASR recipe link to index.rst. (#3492) · ae8c131e

Pingchuan Ma authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3492

Reviewed By: mthrok

Differential Revision: D47755638

Pulled By: mpc001

fbshipit-source-id: 729efdb2a69b5656dbc0b70dd623c1509123d3aa

ae8c131e

18 Jul, 2023 1 commit

Extract NVDEC tutorial from the current notebook (#3478) · 63244623

moto authored Jul 17, 2023

Summary:
Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders.

Pull Request resolved: https://github.com/pytorch/audio/pull/3478

Differential Revision: D47519672

Pulled By: mthrok

fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2

63244623

15 Jul, 2023 1 commit

Update notes on FFmpeg version (#3480) · 5a809aa0

moto authored Jul 15, 2023

Summary:
The nightly builds support FFmpeg version 4, 5 and 6.

Pull Request resolved: https://github.com/pytorch/audio/pull/3480

Differential Revision: D47482841

Pulled By: mthrok

fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9

5a809aa0

12 Jul, 2023 1 commit

Support multiple FFmpeg versions (#3464) · 786066b4

moto authored Jul 11, 2023

Summary:
This commit introduces support for multiple FFmpeg versions for OSS binary distributions.

Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.

The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
The order of preference is 6, 5, then 4.

To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
They are LGPL and downloaded from S3 at build time, instead of building every time.

The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
so that it will only support one specific version of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3464

Differential Revision: D47300223

Pulled By: mthrok

fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04

786066b4

11 Jul, 2023 1 commit

Update doc analytics (#3469) · 216146ab

moto authored Jul 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3469

Differential Revision: D47368140

Pulled By: mthrok

fbshipit-source-id: d82ddb91ae1f6612298486fb8401f95c48db5620

216146ab

28 Jun, 2023 1 commit

include a link to index.rst (#3441) · a8ce4a87

Pingchuan Ma authored Jun 28, 2023

Summary:
Include Conformer/Emformer RNN-T ASR/VSR/AV-ASR link to index.rst

Pull Request resolved: https://github.com/pytorch/audio/pull/3441

Differential Revision: D47094158

Pulled By: mthrok

fbshipit-source-id: 9ab42ac2bf52a5ce488003897ffba2f10a6ca941

a8ce4a87

21 Jun, 2023 2 commits

Introduce chroma spectrogram transform (#3427) · 70968293

Jeff Hwang authored Jun 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3427

Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms.

Reviewed By: mthrok

Differential Revision: D46547418

fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960

70968293

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

08 Jun, 2023 1 commit

Introduce chroma filter bank function (#3395) · dfd0c5fd

Jeff Hwang authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3395

Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.

Reviewed By: mthrok

Differential Revision: D46307672

fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5

dfd0c5fd

07 Jun, 2023 1 commit

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

05 Jun, 2023 1 commit

Clean-up ComputeKaldiPitch residue (#3403) · c076d1a8

moto authored Jun 05, 2023

Summary:
Follow up of: https://github.com/pytorch/audio/pull/3368

Remove files and lines no longer used.

Pull Request resolved: https://github.com/pytorch/audio/pull/3403

Differential Revision: D46441462

Pulled By: mthrok

fbshipit-source-id: 11b881ec4b24fa0d625c6aee9f4bd91f637f9923

c076d1a8

26 May, 2023 1 commit

Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9

atalman authored May 26, 2023

Summary:
This reverts commit d38a7854.

This is temporary revert to unblock unit test migration from circleci to github

Pull Request resolved: https://github.com/pytorch/audio/pull/3377

Reviewed By: mthrok

Differential Revision: D46230498

Pulled By: atalman

fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d

37779ef9

24 May, 2023 2 commits

Add StreamReader/Writer custom IO to doc (#3367) · f41ba26d

moto authored May 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3367

Reviewed By: nateanl

Differential Revision: D46148139

Pulled By: mthrok

fbshipit-source-id: 50f297ac69bb95562976eb452e4e382b8c064c3c

f41ba26d

Fix build doc (#3349) · 8b85ca5d

moto authored May 24, 2023

Summary:
Follow-up https://github.com/pytorch/audio/issues/3045
- Revert the removal of HW acceleration doc
- comment out FFmpeg CLI test run

Pull Request resolved: https://github.com/pytorch/audio/pull/3349

Reviewed By: nateanl

Differential Revision: D46121899

Pulled By: mthrok

fbshipit-source-id: dfc030a69f05addec73637cfb6a720c184e37323

8b85ca5d

23 May, 2023 1 commit

[audio] add CTC forced alignment API tutorial to torchaudio (#3356) · f046f7e3

Xiaohui Zhang authored May 22, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3356

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: mthrok

Differential Revision: D46060238

fbshipit-source-id: d90e7db5669a58d1e9ef5c2ec3c6d175b4e394ec

f046f7e3

22 May, 2023 1 commit

Add doc for forced_align (#3355) · 011f7f3d

Zhaoheng Ni authored May 21, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3355

Reviewed By: xiaohui-zhang

Differential Revision: D46060254

Pulled By: nateanl

fbshipit-source-id: c2e44f994739755daf049fe350dd24a987a9cc29

011f7f3d

19 May, 2023 1 commit

Build and use GPU-enabled FFmpeg in doc CI (#3045) · 0db5ab25

moto authored May 19, 2023

Summary:
This commit add the step to build FFmpeg with GPU decoder in build_doc job so that we can use GPU decoder/encoder in documentations.

Pull Request resolved: https://github.com/pytorch/audio/pull/3045

Reviewed By: nateanl

Differential Revision: D45965739

Pulled By: mthrok

fbshipit-source-id: c167eb3ef347860a51efa906068fa2daa556f017

0db5ab25

17 May, 2023 1 commit

Fix for breadcrumbs displaying "Old version (stable)" on Nightly build (#3333) · 3ffd76c8

Carl Parker authored May 16, 2023

Summary:
Previously, `breadcrumbs.html` identified a nightly build version by the prefix "Nightly" which would normally be prepended to the version in `conf.py`. However, the version string is coming through without the "Nightly" prefix, so this change causes `breadcrumbs.html` to key on the substring "dev" instead.

The reason we aren't getting "Nightly" is apparently because the environment variable BUILD_VERSION is available, so `conf.py` is using the value of that env var instead of the version string imported from the `torchaudio` module itself, which actually appears to be incorrect; see below.

If I install torchaudio using

conda install torchaudio -c pytorch-nightly

then `torchaudio.__version__` returns the incorrect version string:

2.0.0.dev20230309

Pull Request resolved: https://github.com/pytorch/audio/pull/3333

Reviewed By: mthrok

Differential Revision: D45926466

Pulled By: carljparker

fbshipit-source-id: d5516f2d9f1716c2400d3e9b285bd5d32b4b3a77

3ffd76c8

16 May, 2023 2 commits

Upgrade to FFmpeg5 (#3298) · d38a7854

moto authored May 16, 2023

Summary:
This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4.

FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5.
Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg

Pull Request resolved: https://github.com/pytorch/audio/pull/3298

Reviewed By: hwangjeff

Differential Revision: D45865599

Pulled By: mthrok

fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b

d38a7854

Remove obsolete third party dependencies of CTC decoder (#3339) · e4c1d70b

moto authored May 16, 2023

Summary:
TorchAudio has migrated CTC decoder to flashlight-text, and code related CTC decoder was removed in https://github.com/pytorch/audio/issues/3236.

This commit cleans up the residual, removes the third party libraries used for CTC decoder, and mention to environment variable for CTC decoder.

Pull Request resolved: https://github.com/pytorch/audio/pull/3339

Reviewed By: nateanl

Differential Revision: D45920878

Pulled By: mthrok

fbshipit-source-id: 8d93e64138697781570e5b0b1c9f86e1a7923a89

e4c1d70b

11 May, 2023 1 commit

Add 2.0.1 to the version compatibility matrix (#3325) · 608775bf

moto authored May 11, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3325

Reviewed By: hwangjeff

Differential Revision: D45759434

Pulled By: mthrok

fbshipit-source-id: f3b1127fcf3b23beeab61fb7ff18f1b89b11ddc6

608775bf

10 May, 2023 2 commits

Add AudioEffector tutorial (#3226) · 2ab49e5b

moto authored May 09, 2023

Summary:
https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/3226

Reviewed By: nateanl

Differential Revision: D45402724

Pulled By: mthrok

fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262

2ab49e5b

Update `torchaudio` doc and tutorial (#3285) · 667c6a9e

moto authored May 09, 2023

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

667c6a9e

29 Apr, 2023 1 commit

Add tutorial for TorchAudio-SQUIM pipelines (#3279) · 9b93e7df

Zhaoheng Ni authored Apr 29, 2023

Summary:
The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS).

Pull Request resolved: https://github.com/pytorch/audio/pull/3279

Reviewed By: hwangjeff

Differential Revision: D45415404

Pulled By: nateanl

fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903

9b93e7df

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

11 Apr, 2023 1 commit

Update windows build doc (#3257) · 623e33d9

moto authored Apr 11, 2023

Summary:
GCC should not be used when building FFmpeg for torchaudio, as torchaudio uses MSVC (cl.exe)

Pull Request resolved: https://github.com/pytorch/audio/pull/3257

Reviewed By: nateanl

Differential Revision: D44835169

Pulled By: mthrok

fbshipit-source-id: 038c70caae58cec47dd2d6d08b8244c193104eda

623e33d9