Commits · 30668afb1a7d015276706ed6ae641fa24f87ac29 · OpenDAS / Torchaudio

07 Aug, 2023 2 commits

Add merge_tokens / TokenSpan (#3535) · 30668afb

moto authored Aug 07, 2023

Summary:
This commit adds `merge_tokens` function which removes repeated tokens from CTC token sequences returned from `forced_align`.

Resolving repeated tokens is a necessary step and almost universal, thus it makes sense to have such helper function in torchaudio.

Pull Request resolved: https://github.com/pytorch/audio/pull/3535

Reviewed By: huangruizhe

Differential Revision: D48111202

Pulled By: mthrok

fbshipit-source-id: 25354bfa210aa5c03f8c1d3e201f253ca3761b24

30668afb

Make target_lengths/input_lengths in forced_align optional (#3533) · cd80976e

moto authored Aug 07, 2023

Summary:
Currently `torchaudio.functional.forced_align` function requires full information on input/target lengths.
When performing non-batched alignment, these can be inferred from the size of Tensor.

Pull Request resolved: https://github.com/pytorch/audio/pull/3533

Reviewed By: nateanl

Differential Revision: D48111041

Pulled By: mthrok

fbshipit-source-id: fbf07124d3959c5cc5533dcd86296851587082fb

cd80976e

04 Aug, 2023 2 commits

Revise VGGish pipeline to accept arbitrary state dict function (#3531) · b976c8f1

Jeff Hwang authored Aug 04, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3531

Revises VGGish pipeline to accept arbitrary state dict function to accommodate loading weights from any source.

Reviewed By: mthrok

Differential Revision: D48056390

fbshipit-source-id: 2767699b58442ad132b518b4a6435f2772a637c3

b976c8f1

Update ctc forced alignment tutorial (#3529) · b645c07b

moto authored Aug 04, 2023

Summary:
- Simplify the step to generate token-level alignment

Pull Request resolved: https://github.com/pytorch/audio/pull/3529

Reviewed By: huangruizhe

Differential Revision: D48066787

Pulled By: mthrok

fbshipit-source-id: 452c243d278e508926a59894928e280fea76dcc6

b645c07b

03 Aug, 2023 2 commits

Refactor wav2vec2 pipeline misc helper functions (#3527) · 09aabcc1

moto authored Aug 02, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3527

Reviewed By: huangruizhe

Differential Revision: D48008822

Pulled By: mthrok

fbshipit-source-id: 4beae2956dfd1f00534832b70a1bf0897cba7812

09aabcc1

Relax Conformer RNN-T numerical parity tests (#3525) · 72b0917d

hwangjeff authored Aug 02, 2023

Summary:
Increases numerical tolerance on Conformer RNN-T TorchScript consistency tests to resolve CI test failures.

Pull Request resolved: https://github.com/pytorch/audio/pull/3525

Reviewed By: mthrok

Differential Revision: D48000613

Pulled By: hwangjeff

fbshipit-source-id: 1d35ba58055a8346dc40e2b67f37ccfd2e015894

72b0917d

02 Aug, 2023 1 commit

Fix save INT16 sox backend (#3524) · 3f9b5171

moto authored Aug 02, 2023

Summary:
When passing int16 type tensor to `save(backend="sox")`, the resulting file should be 16-bit signed PCM, but instead is 32-bit signed PCM.

Resolves https://github.com/pytorch/audio/issues/3304

Pull Request resolved: https://github.com/pytorch/audio/pull/3524

Reviewed By: huangruizhe

Differential Revision: D47941090

Pulled By: mthrok

fbshipit-source-id: 2622b31eb1cbf03969f67ab2b2adec6e2ba677c4

3f9b5171

01 Aug, 2023 3 commits

Add cuctc tutorial, change blank skip threshold into prob (#3297) · 732c94a3

Yuekai Zhang authored Aug 01, 2023

Summary:
Add a separate tutorial for cuctc.
Reslove https://github.com/pytorch/audio/issues/3096

Pull Request resolved: https://github.com/pytorch/audio/pull/3297

Reviewed By: huangruizhe

Differential Revision: D47928400

Pulled By: mthrok

fbshipit-source-id: 8c16492fb4d007b6ea7969ba77c866a51749c0ec

732c94a3

Migrate weight_norm (#3523) · 144cfcfc

moto authored Aug 01, 2023

Summary:
torch.nn.utils.weight_norm is deprecated.
Replacing this with new API

Pull Request resolved: https://github.com/pytorch/audio/pull/3523

Reviewed By: huangruizhe

Differential Revision: D47932384

Pulled By: mthrok

fbshipit-source-id: 344abfa12bd11da779f7fd13b74a1e009a582b52

144cfcfc

Add pretrained VGGish inference pipeline (#3491) · cbfde17b

hwangjeff authored Jul 31, 2023

Summary:
Adds pre-trained VGGish inference pipeline ported from https://github.com/harritaylor/torchvggish and https://github.com/tensorflow/models/tree/master/research/audioset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3491

Reviewed By: mthrok

Differential Revision: D47738130

Pulled By: hwangjeff

fbshipit-source-id: 859c1ff1ec1b09dae4e26586169544571657cc67

cbfde17b

31 Jul, 2023 2 commits

Migrate torch.norm to torch.linalg.vector_norm (#3522) · 8a2e12d3

moto authored Jul 31, 2023

Summary:
torch.norm is now deprecated.
The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm

Resolves https://github.com/pytorch/audio/issues/3484

Pull Request resolved: https://github.com/pytorch/audio/pull/3522

Reviewed By: huangruizhe

Differential Revision: D47926659

Pulled By: mthrok

fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084

8a2e12d3

Set and tweak global matplotlib configuration in tutorials (#3515) · 84b12306

moto authored Jul 31, 2023

Summary:
- Set global matplotlib rc params
- Fix style check
- Fix and updates FA tutorial plots
- Add av-asr index cars

Pull Request resolved: https://github.com/pytorch/audio/pull/3515

Reviewed By: huangruizhe

Differential Revision: D47894156

Pulled By: mthrok

fbshipit-source-id: b40d8d31f12ffc2b337e35e632afc216e9d59a6e

84b12306

29 Jul, 2023 1 commit

Refactor compat (#3518) · 8497ee91

moto authored Jul 29, 2023

Summary:
The I/O functions in _compat module was introduced there so that
everything related to FFmpeg is in torchaudio.io and FFmpeg library
initialization can be carried out in `torchaudio.io.__init__`.

Now that this constraint is removed, (all the initialization happens
at `torchaudio._extension.__init__`) and `_compat` is only used by
FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
for better locality.

Pull Request resolved: https://github.com/pytorch/audio/pull/3518

Reviewed By: huangruizhe

Differential Revision: D47877412

Pulled By: mthrok

fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f

8497ee91

28 Jul, 2023 5 commits

Amend amp_to_db docstring (#3519) · 61cbf791

moto authored Jul 28, 2023

Summary:
Context: https://github.com/pytorch/audio/issues/3448

The documentation of amplitude_to_DB is ambigious on how cut-off values are computed when the input tensor is 3D.

This commit clarifies that.

Closes: https://github.com/pytorch/audio/issues/3448

Pull Request resolved: https://github.com/pytorch/audio/pull/3519

Reviewed By: huangruizhe

Differential Revision: D47875505

Pulled By: mthrok

fbshipit-source-id: e06bb997e7a27e2abe35c8e2ac91ddfbded4e641

61cbf791

Remove ffmpeg fallback from sox_io backend (#3516) · 2c8665de

moto authored Jul 28, 2023

Summary:
In https://github.com/pytorch/audio/issues/2419, we added ffmpeg as fallback for sox_io backend. The was a warkaround for solving the issue with libmad removal.

Now that we introduced `backend` argument to I/O functions, and libsox integration is moved to dynamic binding where users can use libsox with libmad integration, we do not need the workaround.

This commit is based on reverting https://github.com/pytorch/audio/issues/2416 (fd7ace17).

Pull Request resolved: https://github.com/pytorch/audio/pull/3516

Reviewed By: huangruizhe

Differential Revision: D47855272

Pulled By: mthrok

fbshipit-source-id: 5af73af7865f6e545ccb052d478e86588ff2a014

2c8665de

Update documentation about dependencies (#3517) · a051985f

moto authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3517

Reviewed By: huangruizhe

Differential Revision: D47858452

Pulled By: mthrok

fbshipit-source-id: 62ee6c8bb2669dd70f8ca25703a04dc8a9d19aec

a051985f

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

Add real-time av-asr tutorial (#3511) · d6aeaa74

Pingchuan Ma authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3511

Reviewed By: mthrok

Differential Revision: D47852108

Pulled By: mpc001

fbshipit-source-id: c0ecb4b5bcc8670013dcbe1164e3929f5793c8aa

d6aeaa74

27 Jul, 2023 3 commits

Remove unused files (#3514) · 7368e336

moto authored Jul 27, 2023

Summary:
Removes residual from https://github.com/pytorch/audio/issues/3497

Pull Request resolved: https://github.com/pytorch/audio/pull/3514

Differential Revision: D47838049

Pulled By: mthrok

fbshipit-source-id: c4b00aba9f4cc887ec595f04d7a2dd673c63b975

7368e336

Replace libsox with stub library (#3497) · 8588fba1

moto authored Jul 27, 2023

Summary:
This commit updates the way libsox is integrated to torchaudio

1. We stop statically linking libsox, so torchaudio will not ship libsox.
2. We link libsox dynamically. Users are expected to install libsox by themselves.
3. We use stab library to build torchaudio.

Pull Request resolved: https://github.com/pytorch/audio/pull/3497

Differential Revision: D47803706

Pulled By: mthrok

fbshipit-source-id: 31b05495d81069186fa52d67beea360cc7e817a8

8588fba1

Add switch to disable sox integration and ffmpeg integration at runtime (#3500) · 29903c5c

moto authored Jul 26, 2023

Summary:
Since libsox and ffmpeg extensions now depend on external libraries, their initialization processes might cause unrecoverable issue, such as segfault.

This commit adds environment variable to disable them so that importing torchaudio won't attempt to load these libraries.

Pull Request resolved: https://github.com/pytorch/audio/pull/3500

Differential Revision: D47808178

Pulled By: mthrok

fbshipit-source-id: 80c1c6b5f4bc608d4e209473702680db093c95ee

29903c5c

26 Jul, 2023 3 commits

av-asr: move video loading outside detector (#3498) · c977afe0

Pingchuan Ma authored Jul 26, 2023

Summary:
This PR moves video loading outside detector during pre-processing.

Pull Request resolved: https://github.com/pytorch/audio/pull/3498

Reviewed By: mthrok

Differential Revision: D47811044

Pulled By: mpc001

fbshipit-source-id: f17839b695b13d3cf2d9db343d7e9a0202eea7d5

c977afe0

Move env util (#3499) · da212020

moto authored Jul 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3499

Differential Revision: D47803654

Pulled By: mthrok

fbshipit-source-id: 2b916fa66d84c91c01b4dfe6dd5ee3501159f451

da212020

Add nightly doc update (#3496) · f082e6c1

moto authored Jul 26, 2023

Summary:
Add scheduled doc update job so that docs are updated at least once a day.

Pull Request resolved: https://github.com/pytorch/audio/pull/3496

Differential Revision: D47795577

Pulled By: mthrok

fbshipit-source-id: aba5376ec51f07560014d250a16fef8b8a11b43e

f082e6c1

25 Jul, 2023 7 commits

Disable some tests that need libsox (#3494) · 49e9ed94

moto authored Jul 25, 2023

Summary:
In preparation for https://github.com/pytorch/audio/pull/3082

Disable those FFmpeg tests that depend on sox CLI. These tests need to be updated or removed so as not to use sox CLI.

Auto-skip some sox tests if decoder/encoder are not available

Pull Request resolved: https://github.com/pytorch/audio/pull/3494

Differential Revision: D47761948

Pulled By: mthrok

fbshipit-source-id: 3a48d7f280f8376a48d223947dd41a7cdc8cbc30

49e9ed94

Fix and update doc deployment (#3495) · e483a67a

moto authored Jul 25, 2023

Summary:
- Fix condition to add new commit to gh-pages
- Allow to deploy docs from workflow dispatch

Pull Request resolved: https://github.com/pytorch/audio/pull/3495

Differential Revision: D47767443

Pulled By: mthrok

fbshipit-source-id: 9ca858868f3e822e532c21cde9d7499af9891a51

e483a67a

Update avsr recipe (#3493) · d4644793

Pingchuan Ma authored Jul 25, 2023

Summary:
This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/3493

Reviewed By: mthrok

Differential Revision: D47758072

Pulled By: mpc001

fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc

d4644793

Update nvdec/nvenc tutorials (#3483) · 56e22664

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483

Differential Revision: D47725664

Pulled By: mthrok

fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b

56e22664

Run GPU video decoder/encoder tests in CI (#3490) · df655604

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3490

Differential Revision: D47757316

Pulled By: mthrok

fbshipit-source-id: cfb376be29980f9e452f291c4fa25780e9f85a97

df655604

Fix typo in melscale_fbank (#3487) · 135cb7ba

moto authored Jul 25, 2023

Summary:
Resolves https://github.com/pytorch/audio/issues/3486

Pull Request resolved: https://github.com/pytorch/audio/pull/3487

Differential Revision: D47724733

Pulled By: mthrok

fbshipit-source-id: 26f5641a8271a7e50c4a33861d09b0c8274b29e4

135cb7ba

Update AV-ASR recipe link to index.rst. (#3492) · ae8c131e

Pingchuan Ma authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3492

Reviewed By: mthrok

Differential Revision: D47755638

Pulled By: mpc001

fbshipit-source-id: 729efdb2a69b5656dbc0b70dd623c1509123d3aa

ae8c131e

24 Jul, 2023 1 commit

Move examples/asr/avsr_rnnt to examples/avsr folder (#3489) · 66f661df

Pingchuan Ma authored Jul 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489

Reviewed By: mthrok

Differential Revision: D47726448

Pulled By: mpc001

fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841

66f661df

18 Jul, 2023 1 commit

Extract NVDEC tutorial from the current notebook (#3478) · 63244623

moto authored Jul 17, 2023

Summary:
Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders.

Pull Request resolved: https://github.com/pytorch/audio/pull/3478

Differential Revision: D47519672

Pulled By: mthrok

fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2

63244623

17 Jul, 2023 1 commit

Ensure StreamReader returns tensors with requires_grad is False (#3467) · 44b92062

moto authored Jul 17, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3467

Differential Revision: D47482388

Pulled By: mthrok

fbshipit-source-id: abff36491dc28b83270673860d6457a084b1327d

44b92062

15 Jul, 2023 2 commits

Use more recent FFmpeg in unit tests (#3476) · ea7a96dd

moto authored Jul 15, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3476

Differential Revision: D47494211

Pulled By: mthrok

fbshipit-source-id: 230bbf0a271b070d1dea34146d0d466e666cccdc

ea7a96dd

Update notes on FFmpeg version (#3480) · 5a809aa0

moto authored Jul 15, 2023

Summary:
The nightly builds support FFmpeg version 4, 5 and 6.

Pull Request resolved: https://github.com/pytorch/audio/pull/3480

Differential Revision: D47482841

Pulled By: mthrok

fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9

5a809aa0

14 Jul, 2023 1 commit

Update the logic to fetch pixel format from filter graph (#3479) · cf53a486

moto authored Jul 14, 2023

Summary:
When using GPU decoder in some environments, attempting to read the output formats from filter graph caused an issue in which the software pixel format cannot be determined.

We do not know the exact cause but when it happens, the input link of buffer sink does not have HW frames context.

Since currently no filter can convert the pixel format of CUDA frame, we resort to the HW frames context of the output link of buffer source.

Environments this was observed.

Env1
- OS: Fedora 36 (x86_64)
- GCC 12.2.1
- Python 3.10.12
- GPU: GeForce RTX 3070 Ti Laptop GPU
- FFmpeg: 5.1.3
- nv-codec-header: n11.1.5.2
- CUDA: 12.1

Env2
- Ubuntu 20.04.4 LTS (x86_64)
- GCC 9.4.0
- Python 3.11.3
- GPU: Quadro GV100
- FFmpeg: 5.1.3
- nv-codec-header: n11.1.5.2
- CUDA: 11.4

Pull Request resolved: https://github.com/pytorch/audio/pull/3479

Differential Revision: D47482407

Pulled By: mthrok

fbshipit-source-id: 1c53096b27824453b260138ab64e1948afeeefc7

cf53a486

13 Jul, 2023 2 commits

Linux CPU job should respect set Python version (#3477) · 86cb1e09

Omkar Salpekar authored Jul 13, 2023

Summary:
Reintroduce a conda environment within which we will do all deps installation, audio builds, and tests runs. This conda environment will use the python version set by the GHA job - previously this just defaulted to using the system 3.10 python which was default inside the container.

Pull Request resolved: https://github.com/pytorch/audio/pull/3477

Reviewed By: mthrok

Differential Revision: D47414572

Pulled By: osalpekar

fbshipit-source-id: 80760f82c7726205b29812d576e498db2a7a80a0

86cb1e09

Revert D47402174: [audio][PR] Resolve some compilation warnings · 155d1bae

Moto Hira authored Jul 13, 2023

Differential Revision:
D47402174

Original commit changeset: 00c0719ab184

Original Phabricator Diff: D47402174

fbshipit-source-id: b1f6ea4cc3ecef3f72a87bf2f67bf9644c847546

155d1bae

12 Jul, 2023 1 commit

Resolve some compilation warnings (#3471) · a6d1fec0

moto authored Jul 12, 2023

Summary:
- FFmpeg 6 deprecated attributes
- Guard CUDA specific functions not used in CPU builds

Pull Request resolved: https://github.com/pytorch/audio/pull/3471

Differential Revision: D47402174

Pulled By: mthrok

fbshipit-source-id: 00c0719ab1849b50c0b56b03d8fb38bc7aa74538

a6d1fec0