Commits · 09aabcc16e8d5a2c0180a2ac3dc3d507b9dc65b2 · OpenDAS / Torchaudio

03 Aug, 2023 2 commits

Refactor wav2vec2 pipeline misc helper functions (#3527) · 09aabcc1

moto authored Aug 02, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3527

Reviewed By: huangruizhe

Differential Revision: D48008822

Pulled By: mthrok

fbshipit-source-id: 4beae2956dfd1f00534832b70a1bf0897cba7812

09aabcc1

Relax Conformer RNN-T numerical parity tests (#3525) · 72b0917d

hwangjeff authored Aug 02, 2023

Summary:
Increases numerical tolerance on Conformer RNN-T TorchScript consistency tests to resolve CI test failures.

Pull Request resolved: https://github.com/pytorch/audio/pull/3525

Reviewed By: mthrok

Differential Revision: D48000613

Pulled By: hwangjeff

fbshipit-source-id: 1d35ba58055a8346dc40e2b67f37ccfd2e015894

72b0917d

02 Aug, 2023 1 commit

Fix save INT16 sox backend (#3524) · 3f9b5171

moto authored Aug 02, 2023

Summary:
When passing int16 type tensor to `save(backend="sox")`, the resulting file should be 16-bit signed PCM, but instead is 32-bit signed PCM.

Resolves https://github.com/pytorch/audio/issues/3304

Pull Request resolved: https://github.com/pytorch/audio/pull/3524

Reviewed By: huangruizhe

Differential Revision: D47941090

Pulled By: mthrok

fbshipit-source-id: 2622b31eb1cbf03969f67ab2b2adec6e2ba677c4

3f9b5171

01 Aug, 2023 3 commits

Add cuctc tutorial, change blank skip threshold into prob (#3297) · 732c94a3

Yuekai Zhang authored Aug 01, 2023

Summary:
Add a separate tutorial for cuctc.
Reslove https://github.com/pytorch/audio/issues/3096

Pull Request resolved: https://github.com/pytorch/audio/pull/3297

Reviewed By: huangruizhe

Differential Revision: D47928400

Pulled By: mthrok

fbshipit-source-id: 8c16492fb4d007b6ea7969ba77c866a51749c0ec

732c94a3

Migrate weight_norm (#3523) · 144cfcfc

moto authored Aug 01, 2023

Summary:
torch.nn.utils.weight_norm is deprecated.
Replacing this with new API

Pull Request resolved: https://github.com/pytorch/audio/pull/3523

Reviewed By: huangruizhe

Differential Revision: D47932384

Pulled By: mthrok

fbshipit-source-id: 344abfa12bd11da779f7fd13b74a1e009a582b52

144cfcfc

Add pretrained VGGish inference pipeline (#3491) · cbfde17b

hwangjeff authored Jul 31, 2023

Summary:
Adds pre-trained VGGish inference pipeline ported from https://github.com/harritaylor/torchvggish and https://github.com/tensorflow/models/tree/master/research/audioset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3491

Reviewed By: mthrok

Differential Revision: D47738130

Pulled By: hwangjeff

fbshipit-source-id: 859c1ff1ec1b09dae4e26586169544571657cc67

cbfde17b

31 Jul, 2023 2 commits

Migrate torch.norm to torch.linalg.vector_norm (#3522) · 8a2e12d3

moto authored Jul 31, 2023

Summary:
torch.norm is now deprecated.
The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm

Resolves https://github.com/pytorch/audio/issues/3484

Pull Request resolved: https://github.com/pytorch/audio/pull/3522

Reviewed By: huangruizhe

Differential Revision: D47926659

Pulled By: mthrok

fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084

8a2e12d3

Set and tweak global matplotlib configuration in tutorials (#3515) · 84b12306

moto authored Jul 31, 2023

Summary:
- Set global matplotlib rc params
- Fix style check
- Fix and updates FA tutorial plots
- Add av-asr index cars

Pull Request resolved: https://github.com/pytorch/audio/pull/3515

Reviewed By: huangruizhe

Differential Revision: D47894156

Pulled By: mthrok

fbshipit-source-id: b40d8d31f12ffc2b337e35e632afc216e9d59a6e

84b12306

29 Jul, 2023 1 commit

Refactor compat (#3518) · 8497ee91

moto authored Jul 29, 2023

Summary:
The I/O functions in _compat module was introduced there so that
everything related to FFmpeg is in torchaudio.io and FFmpeg library
initialization can be carried out in `torchaudio.io.__init__`.

Now that this constraint is removed, (all the initialization happens
at `torchaudio._extension.__init__`) and `_compat` is only used by
FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
for better locality.

Pull Request resolved: https://github.com/pytorch/audio/pull/3518

Reviewed By: huangruizhe

Differential Revision: D47877412

Pulled By: mthrok

fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f

8497ee91

28 Jul, 2023 5 commits

Amend amp_to_db docstring (#3519) · 61cbf791

moto authored Jul 28, 2023

Summary:
Context: https://github.com/pytorch/audio/issues/3448

The documentation of amplitude_to_DB is ambigious on how cut-off values are computed when the input tensor is 3D.

This commit clarifies that.

Closes: https://github.com/pytorch/audio/issues/3448

Pull Request resolved: https://github.com/pytorch/audio/pull/3519

Reviewed By: huangruizhe

Differential Revision: D47875505

Pulled By: mthrok

fbshipit-source-id: e06bb997e7a27e2abe35c8e2ac91ddfbded4e641

61cbf791

Remove ffmpeg fallback from sox_io backend (#3516) · 2c8665de

moto authored Jul 28, 2023

Summary:
In https://github.com/pytorch/audio/issues/2419, we added ffmpeg as fallback for sox_io backend. The was a warkaround for solving the issue with libmad removal.

Now that we introduced `backend` argument to I/O functions, and libsox integration is moved to dynamic binding where users can use libsox with libmad integration, we do not need the workaround.

This commit is based on reverting https://github.com/pytorch/audio/issues/2416 (fd7ace17).

Pull Request resolved: https://github.com/pytorch/audio/pull/3516

Reviewed By: huangruizhe

Differential Revision: D47855272

Pulled By: mthrok

fbshipit-source-id: 5af73af7865f6e545ccb052d478e86588ff2a014

2c8665de

Update documentation about dependencies (#3517) · a051985f

moto authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3517

Reviewed By: huangruizhe

Differential Revision: D47858452

Pulled By: mthrok

fbshipit-source-id: 62ee6c8bb2669dd70f8ca25703a04dc8a9d19aec

a051985f

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

Add real-time av-asr tutorial (#3511) · d6aeaa74

Pingchuan Ma authored Jul 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3511

Reviewed By: mthrok

Differential Revision: D47852108

Pulled By: mpc001

fbshipit-source-id: c0ecb4b5bcc8670013dcbe1164e3929f5793c8aa

d6aeaa74

27 Jul, 2023 3 commits

Remove unused files (#3514) · 7368e336

moto authored Jul 27, 2023

Summary:
Removes residual from https://github.com/pytorch/audio/issues/3497

Pull Request resolved: https://github.com/pytorch/audio/pull/3514

Differential Revision: D47838049

Pulled By: mthrok

fbshipit-source-id: c4b00aba9f4cc887ec595f04d7a2dd673c63b975

7368e336

Replace libsox with stub library (#3497) · 8588fba1

moto authored Jul 27, 2023

Summary:
This commit updates the way libsox is integrated to torchaudio

1. We stop statically linking libsox, so torchaudio will not ship libsox.
2. We link libsox dynamically. Users are expected to install libsox by themselves.
3. We use stab library to build torchaudio.

Pull Request resolved: https://github.com/pytorch/audio/pull/3497

Differential Revision: D47803706

Pulled By: mthrok

fbshipit-source-id: 31b05495d81069186fa52d67beea360cc7e817a8

8588fba1

Add switch to disable sox integration and ffmpeg integration at runtime (#3500) · 29903c5c

moto authored Jul 26, 2023

Summary:
Since libsox and ffmpeg extensions now depend on external libraries, their initialization processes might cause unrecoverable issue, such as segfault.

This commit adds environment variable to disable them so that importing torchaudio won't attempt to load these libraries.

Pull Request resolved: https://github.com/pytorch/audio/pull/3500

Differential Revision: D47808178

Pulled By: mthrok

fbshipit-source-id: 80c1c6b5f4bc608d4e209473702680db093c95ee

29903c5c

26 Jul, 2023 3 commits

av-asr: move video loading outside detector (#3498) · c977afe0

Pingchuan Ma authored Jul 26, 2023

Summary:
This PR moves video loading outside detector during pre-processing.

Pull Request resolved: https://github.com/pytorch/audio/pull/3498

Reviewed By: mthrok

Differential Revision: D47811044

Pulled By: mpc001

fbshipit-source-id: f17839b695b13d3cf2d9db343d7e9a0202eea7d5

c977afe0

Move env util (#3499) · da212020

moto authored Jul 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3499

Differential Revision: D47803654

Pulled By: mthrok

fbshipit-source-id: 2b916fa66d84c91c01b4dfe6dd5ee3501159f451

da212020

Add nightly doc update (#3496) · f082e6c1

moto authored Jul 26, 2023

Summary:
Add scheduled doc update job so that docs are updated at least once a day.

Pull Request resolved: https://github.com/pytorch/audio/pull/3496

Differential Revision: D47795577

Pulled By: mthrok

fbshipit-source-id: aba5376ec51f07560014d250a16fef8b8a11b43e

f082e6c1

25 Jul, 2023 7 commits

Disable some tests that need libsox (#3494) · 49e9ed94

moto authored Jul 25, 2023

Summary:
In preparation for https://github.com/pytorch/audio/pull/3082

Disable those FFmpeg tests that depend on sox CLI. These tests need to be updated or removed so as not to use sox CLI.

Auto-skip some sox tests if decoder/encoder are not available

Pull Request resolved: https://github.com/pytorch/audio/pull/3494

Differential Revision: D47761948

Pulled By: mthrok

fbshipit-source-id: 3a48d7f280f8376a48d223947dd41a7cdc8cbc30

49e9ed94

Fix and update doc deployment (#3495) · e483a67a

moto authored Jul 25, 2023

Summary:
- Fix condition to add new commit to gh-pages
- Allow to deploy docs from workflow dispatch

Pull Request resolved: https://github.com/pytorch/audio/pull/3495

Differential Revision: D47767443

Pulled By: mthrok

fbshipit-source-id: 9ca858868f3e822e532c21cde9d7499af9891a51

e483a67a

Update avsr recipe (#3493) · d4644793

Pingchuan Ma authored Jul 25, 2023

Summary:
This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe.

Pull Request resolved: https://github.com/pytorch/audio/pull/3493

Reviewed By: mthrok

Differential Revision: D47758072

Pulled By: mpc001

fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc

d4644793

Update nvdec/nvenc tutorials (#3483) · 56e22664

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483

Differential Revision: D47725664

Pulled By: mthrok

fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b

56e22664

Run GPU video decoder/encoder tests in CI (#3490) · df655604

moto authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3490

Differential Revision: D47757316

Pulled By: mthrok

fbshipit-source-id: cfb376be29980f9e452f291c4fa25780e9f85a97

df655604

Fix typo in melscale_fbank (#3487) · 135cb7ba

moto authored Jul 25, 2023

Summary:
Resolves https://github.com/pytorch/audio/issues/3486

Pull Request resolved: https://github.com/pytorch/audio/pull/3487

Differential Revision: D47724733

Pulled By: mthrok

fbshipit-source-id: 26f5641a8271a7e50c4a33861d09b0c8274b29e4

135cb7ba

Update AV-ASR recipe link to index.rst. (#3492) · ae8c131e

Pingchuan Ma authored Jul 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3492

Reviewed By: mthrok

Differential Revision: D47755638

Pulled By: mpc001

fbshipit-source-id: 729efdb2a69b5656dbc0b70dd623c1509123d3aa

ae8c131e

24 Jul, 2023 1 commit

Move examples/asr/avsr_rnnt to examples/avsr folder (#3489) · 66f661df

Pingchuan Ma authored Jul 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489

Reviewed By: mthrok

Differential Revision: D47726448

Pulled By: mpc001

fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841

66f661df

18 Jul, 2023 1 commit

Extract NVDEC tutorial from the current notebook (#3478) · 63244623

moto authored Jul 17, 2023

Summary:
Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders.

Pull Request resolved: https://github.com/pytorch/audio/pull/3478

Differential Revision: D47519672

Pulled By: mthrok

fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2

63244623

17 Jul, 2023 1 commit

Ensure StreamReader returns tensors with requires_grad is False (#3467) · 44b92062

moto authored Jul 17, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3467

Differential Revision: D47482388

Pulled By: mthrok

fbshipit-source-id: abff36491dc28b83270673860d6457a084b1327d

44b92062

15 Jul, 2023 2 commits

Use more recent FFmpeg in unit tests (#3476) · ea7a96dd

moto authored Jul 15, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3476

Differential Revision: D47494211

Pulled By: mthrok

fbshipit-source-id: 230bbf0a271b070d1dea34146d0d466e666cccdc

ea7a96dd

Update notes on FFmpeg version (#3480) · 5a809aa0

moto authored Jul 15, 2023

Summary:
The nightly builds support FFmpeg version 4, 5 and 6.

Pull Request resolved: https://github.com/pytorch/audio/pull/3480

Differential Revision: D47482841

Pulled By: mthrok

fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9

5a809aa0

14 Jul, 2023 1 commit

Update the logic to fetch pixel format from filter graph (#3479) · cf53a486

moto authored Jul 14, 2023

Summary:
When using GPU decoder in some environments, attempting to read the output formats from filter graph caused an issue in which the software pixel format cannot be determined.

We do not know the exact cause but when it happens, the input link of buffer sink does not have HW frames context.

Since currently no filter can convert the pixel format of CUDA frame, we resort to the HW frames context of the output link of buffer source.

Environments this was observed.

Env1
- OS: Fedora 36 (x86_64)
- GCC 12.2.1
- Python 3.10.12
- GPU: GeForce RTX 3070 Ti Laptop GPU
- FFmpeg: 5.1.3
- nv-codec-header: n11.1.5.2
- CUDA: 12.1

Env2
- Ubuntu 20.04.4 LTS (x86_64)
- GCC 9.4.0
- Python 3.11.3
- GPU: Quadro GV100
- FFmpeg: 5.1.3
- nv-codec-header: n11.1.5.2
- CUDA: 11.4

Pull Request resolved: https://github.com/pytorch/audio/pull/3479

Differential Revision: D47482407

Pulled By: mthrok

fbshipit-source-id: 1c53096b27824453b260138ab64e1948afeeefc7

cf53a486

13 Jul, 2023 2 commits

Linux CPU job should respect set Python version (#3477) · 86cb1e09

Omkar Salpekar authored Jul 13, 2023

Summary:
Reintroduce a conda environment within which we will do all deps installation, audio builds, and tests runs. This conda environment will use the python version set by the GHA job - previously this just defaulted to using the system 3.10 python which was default inside the container.

Pull Request resolved: https://github.com/pytorch/audio/pull/3477

Reviewed By: mthrok

Differential Revision: D47414572

Pulled By: osalpekar

fbshipit-source-id: 80760f82c7726205b29812d576e498db2a7a80a0

86cb1e09

Revert D47402174: [audio][PR] Resolve some compilation warnings · 155d1bae

Moto Hira authored Jul 13, 2023

Differential Revision:
D47402174

Original commit changeset: 00c0719ab184

Original Phabricator Diff: D47402174

fbshipit-source-id: b1f6ea4cc3ecef3f72a87bf2f67bf9644c847546

155d1bae

12 Jul, 2023 5 commits

Resolve some compilation warnings (#3471) · a6d1fec0

moto authored Jul 12, 2023

Summary:
- FFmpeg 6 deprecated attributes
- Guard CUDA specific functions not used in CPU builds

Pull Request resolved: https://github.com/pytorch/audio/pull/3471

Differential Revision: D47402174

Pulled By: mthrok

fbshipit-source-id: 00c0719ab1849b50c0b56b03d8fb38bc7aa74538

a6d1fec0

Fix resampling to support dynamic input lengths for onnx exports. (#3473) · a3b6bfb6

Bogdan Teleaga authored Jul 12, 2023

Summary:
This is a port of https://github.com/adefossez/julius/pull/17 for torchaudio.

Not sure if it's possible/desirable to add tests to test the functionality of ONNX exports, but I did a quick test on my machine to ensure this works. The logic is a bit simpler compared to the other PR because the torchaudio version does not support the additional flags available in julius.

Pull Request resolved: https://github.com/pytorch/audio/pull/3473

Differential Revision: D47401988

Pulled By: mthrok

fbshipit-source-id: 62fa1e4388923f6a62cef2c0f902a79ea179cec4

a3b6bfb6

Use FFmpeg6 in build doc (#3475) · 989702b3

moto authored Jul 12, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3475

Differential Revision: D47403772

Pulled By: mthrok

fbshipit-source-id: 5cdde521dbbbbf33856470a9dc79419b4a3a1683

989702b3

Fix FFmpeg initialization logic (#3474) · 49e269ab

Moto Hira authored Jul 12, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3474

Differential Revision: D47398447

fbshipit-source-id: f77b685d54ddfc222b806475707d4a10239872f5

49e269ab

Support multiple FFmpeg versions (#3464) · 786066b4

moto authored Jul 11, 2023

Summary:
This commit introduces support for multiple FFmpeg versions for OSS binary distributions.

Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.

The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
The order of preference is 6, 5, then 4.

To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
They are LGPL and downloaded from S3 at build time, instead of building every time.

The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
so that it will only support one specific version of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3464

Differential Revision: D47300223

Pulled By: mthrok

fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04

786066b4