Commits · 94aafd83de3c4e6f1b0e511d188afe48f9761806 · OpenDAS / Torchaudio

19 Sep, 2023 1 commit
- Add wall implementation for RIR ray tracing (#3612) · 94aafd83
  moto authored Sep 19, 2023
```
Extracted from #3604

Add Wall helper class and C++ unit test
```
  94aafd83
05 Sep, 2023 1 commit

Fix backward compatibility layer in backend module (#3595) · 931598c1

moto authored Sep 05, 2023

Summary:
The PR https://github.com/pytorch/audio/issues/3549 re-organized the backend implementations and deprecated the direct access to torchaudio.backend.

The change was supposed to be BC-compatible while issuing a warning to users, but the implementation of module-level `__getattr__` was not quite right.

See an issue https://github.com/pyannote/pyannote-audio/pull/1456.

This commit fixes it so that the following imports work;

```python
from torchaudio.backend.common import AudioMetaData

from torchaudio.backend import sox_io_backend
from torchaudio.backend.sox_io_backend import save, load, info

from torchaudio.backend import no_backend
from torchaudio.backend.no_backend import save, load, info

from torchaudio.backend import soundfile_backend
from torchaudio.backend.soundfile_backend import save, load, info
```

Pull Request resolved: https://github.com/pytorch/audio/pull/3595

Reviewed By: nateanl

Differential Revision: D48957446

Pulled By: mthrok

fbshipit-source-id: ebb256461dd3032025fd27d0455ce980888f7778

931598c1

04 Sep, 2023 1 commit

[BC-Breaking] Remove legacy global backend switch (#3559) · 454418d2

moto authored Sep 04, 2023

Summary:
This PR removes the legacy backend switch mechanism.
The implementation itself is still available.

Merge after v2.1 release

Pull Request resolved: https://github.com/pytorch/audio/pull/3559

Reviewed By: nateanl

Differential Revision: D48353764

Pulled By: mthrok

fbshipit-source-id: 4d3924dbe6f334ecebe2b12fcd4591c61c4aa656

454418d2

20 Aug, 2023 1 commit

Fix I/O test (#3568) · 0688863c

moto authored Aug 20, 2023

Summary:
Turned out FFmpeg 5 installed via conda reports video frame rate -1. FFmpeg 4 and 6 are fine. This is either a regression in FFmpeg or in the underlying decoding library.

Make the reference value adoptive.

Pull Request resolved: https://github.com/pytorch/audio/pull/3568

Reviewed By: huangruizhe

Differential Revision: D48499621

Pulled By: mthrok

fbshipit-source-id: fb64187bcf0dc57b753cb6c05f04d436238f5c51

0688863c

14 Aug, 2023 1 commit

Add default use_tmp_hub_dir value for integration tests (#3558) · d1d41fd3

Jeff Hwang authored Aug 14, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3558

In the event that `use_tmp_hub_dir` isn't specified as an option, pytest shouldn't fail. To resolve such failures, this PR modifies function `temp_hub_dir` to fall back on a default value of `False` for `use_tmp_hub_dir`.

Reviewed By: mthrok

Differential Revision: D48318947

fbshipit-source-id: 5dd692f9202ef37ec3e2c9ea39896156f928d693

d1d41fd3

11 Aug, 2023 1 commit

Revise VGGish pipeline test again (#3551) · f2b2f05a

Jeff Hwang authored Aug 10, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3551

Restores VGGish pipeline test to be a function rather than class.

Reviewed By: mthrok

Differential Revision: D48236197

fbshipit-source-id: 25ac19d87a7a0964a9c3f7552037cd6c21dc38a9

f2b2f05a

10 Aug, 2023 2 commits

Add Frechet distance function (#3545) · 06301c0a

Jeff Hwang authored Aug 10, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3545

Adds function for computing the Fréchet distance between two multivariate normal distributions.

Reviewed By: mthrok

Differential Revision: D48126102

fbshipit-source-id: e4e122b831e1e752037c03f5baa9451e81ef1697

06301c0a

Move backend initialization to toplevel (#3548) · 6fb21ab1

moto authored Aug 10, 2023

Summary:
The backend dispatcher is implemented in `torchaudio._backend`, while the legacy backend is implemented in `torchaudio.backend`.

The initialization happen in `torchaudio._backend`.
This commit moves it to `torchaudio.__init__`, so that `backend` and `_backend` is more independent.

Pull Request resolved: https://github.com/pytorch/audio/pull/3548

Reviewed By: huangruizhe

Differential Revision: D48219244

Pulled By: mthrok

fbshipit-source-id: e694cb232794f90902a60ee51c7bf11b7f0548a0

6fb21ab1

09 Aug, 2023 1 commit

Revise VGGish inference pipeline test (#3544) · 9f5fa84b

Jeff Hwang authored Aug 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3544

Revises VGGish inference pipeline test to support internal testing.

Reviewed By: mthrok

Differential Revision: D48058409

fbshipit-source-id: 045140a0e9d50128d32ef6510bdb2f642a365c83

9f5fa84b

07 Aug, 2023 1 commit

Add merge_tokens / TokenSpan (#3535) · 30668afb

moto authored Aug 07, 2023

Summary:
This commit adds `merge_tokens` function which removes repeated tokens from CTC token sequences returned from `forced_align`.

Resolving repeated tokens is a necessary step and almost universal, thus it makes sense to have such helper function in torchaudio.

Pull Request resolved: https://github.com/pytorch/audio/pull/3535

Reviewed By: huangruizhe

Differential Revision: D48111202

Pulled By: mthrok

fbshipit-source-id: 25354bfa210aa5c03f8c1d3e201f253ca3761b24

30668afb

03 Aug, 2023 1 commit

Relax Conformer RNN-T numerical parity tests (#3525) · 72b0917d

hwangjeff authored Aug 02, 2023

Summary:
Increases numerical tolerance on Conformer RNN-T TorchScript consistency tests to resolve CI test failures.

Pull Request resolved: https://github.com/pytorch/audio/pull/3525

Reviewed By: mthrok

Differential Revision: D48000613

Pulled By: hwangjeff

fbshipit-source-id: 1d35ba58055a8346dc40e2b67f37ccfd2e015894

72b0917d

01 Aug, 2023 1 commit

Add pretrained VGGish inference pipeline (#3491) · cbfde17b

hwangjeff authored Jul 31, 2023

Summary:
Adds pre-trained VGGish inference pipeline ported from https://github.com/harritaylor/torchvggish and https://github.com/tensorflow/models/tree/master/research/audioset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3491

Reviewed By: mthrok

Differential Revision: D47738130

Pulled By: hwangjeff

fbshipit-source-id: 859c1ff1ec1b09dae4e26586169544571657cc67

cbfde17b

31 Jul, 2023 1 commit

Migrate torch.norm to torch.linalg.vector_norm (#3522) · 8a2e12d3

moto authored Jul 31, 2023

Summary:
torch.norm is now deprecated.
The usages in torchaudio seems to be vector norm, so replacing them with torch.linalg.vector_norm

Resolves https://github.com/pytorch/audio/issues/3484

Pull Request resolved: https://github.com/pytorch/audio/pull/3522

Reviewed By: huangruizhe

Differential Revision: D47926659

Pulled By: mthrok

fbshipit-source-id: f7428cf0168109a3d340b8784adc99bb5f781084

8a2e12d3

29 Jul, 2023 1 commit

Refactor compat (#3518) · 8497ee91

moto authored Jul 29, 2023

Summary:
The I/O functions in _compat module was introduced there so that
everything related to FFmpeg is in torchaudio.io and FFmpeg library
initialization can be carried out in `torchaudio.io.__init__`.

Now that this constraint is removed, (all the initialization happens
at `torchaudio._extension.__init__`) and `_compat` is only used by
FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
for better locality.

Pull Request resolved: https://github.com/pytorch/audio/pull/3518

Reviewed By: huangruizhe

Differential Revision: D47877412

Pulled By: mthrok

fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f

8497ee91

28 Jul, 2023 2 commits

Remove ffmpeg fallback from sox_io backend (#3516) · 2c8665de

moto authored Jul 28, 2023

Summary:
In https://github.com/pytorch/audio/issues/2419, we added ffmpeg as fallback for sox_io backend. The was a warkaround for solving the issue with libmad removal.

Now that we introduced `backend` argument to I/O functions, and libsox integration is moved to dynamic binding where users can use libsox with libmad integration, we do not need the workaround.

This commit is based on reverting https://github.com/pytorch/audio/issues/2416 (fd7ace17).

Pull Request resolved: https://github.com/pytorch/audio/pull/3516

Reviewed By: huangruizhe

Differential Revision: D47855272

Pulled By: mthrok

fbshipit-source-id: 5af73af7865f6e545ccb052d478e86588ff2a014

2c8665de

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

26 Jul, 2023 1 commit

Move env util (#3499) · da212020

moto authored Jul 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3499

Differential Revision: D47803654

Pulled By: mthrok

fbshipit-source-id: 2b916fa66d84c91c01b4dfe6dd5ee3501159f451

da212020

25 Jul, 2023 1 commit

Disable some tests that need libsox (#3494) · 49e9ed94

moto authored Jul 25, 2023

Summary:
In preparation for https://github.com/pytorch/audio/pull/3082

Disable those FFmpeg tests that depend on sox CLI. These tests need to be updated or removed so as not to use sox CLI.

Auto-skip some sox tests if decoder/encoder are not available

Pull Request resolved: https://github.com/pytorch/audio/pull/3494

Differential Revision: D47761948

Pulled By: mthrok

fbshipit-source-id: 3a48d7f280f8376a48d223947dd41a7cdc8cbc30

49e9ed94

17 Jul, 2023 1 commit

Ensure StreamReader returns tensors with requires_grad is False (#3467) · 44b92062

moto authored Jul 17, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3467

Differential Revision: D47482388

Pulled By: mthrok

fbshipit-source-id: abff36491dc28b83270673860d6457a084b1327d

44b92062

12 Jul, 2023 1 commit

Support multiple FFmpeg versions (#3464) · 786066b4

moto authored Jul 11, 2023

Summary:
This commit introduces support for multiple FFmpeg versions for OSS binary distributions.

Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.

The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
The order of preference is 6, 5, then 4.

To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
They are LGPL and downloaded from S3 at build time, instead of building every time.

The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
so that it will only support one specific version of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3464

Differential Revision: D47300223

Pulled By: mthrok

fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04

786066b4

10 Jul, 2023 1 commit

Update package smoke test (#3465) · 589de109

moto authored Jul 10, 2023

Summary:
1. Update smoke test script to change directory so that there is no `torchaudio` directory in CWD when smoke test is being executed.
2. Disable the part of smoke test which requires FFmpeg for wheel. The preparation for https://github.com/pytorch/test-infra/pull/4358

Pull Request resolved: https://github.com/pytorch/audio/pull/3465

Reviewed By: nateanl

Differential Revision: D47345117

Pulled By: mthrok

fbshipit-source-id: 95aad0a22922d44ee9a24a05d9ece85166b8c17e

589de109

05 Jul, 2023 1 commit

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

21 Jun, 2023 1 commit

Introduce chroma spectrogram transform (#3427) · 70968293

Jeff Hwang authored Jun 21, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3427

Adds transform `ChromaSpectrogram` for generating chromagrams from waveforms as well as transform `ChromaScale` for generating chromagrams from linear-frequency spectrograms.

Reviewed By: mthrok

Differential Revision: D46547418

fbshipit-source-id: 250f298b8e11d8cf82f05536c29d51cf8d77a960

70968293

14 Jun, 2023 1 commit

Add resample option to AudioEffector (#3374) · 406e9c8d

moto authored Jun 14, 2023

Summary:
Currently, AudioEffector always resample to the original sample rate. It is more flexible to allow overriding this to any sample rate.

Pull Request resolved: https://github.com/pytorch/audio/pull/3374

Differential Revision: D46235358

Pulled By: mthrok

fbshipit-source-id: 39a5d4e38d9b90380da31d0ce9ee8090668b54e4

406e9c8d

09 Jun, 2023 1 commit

Disable HF integration test (#3431) · f5d7635e

moto authored Jun 09, 2023

Summary:
The new version of transformers changed the format of pre-trained weight. Fixing it is low-priority for the maintanance team so we disable the test.

See https://github.com/pytorch/audio/issues/3430

Pull Request resolved: https://github.com/pytorch/audio/pull/3431

Differential Revision: D46592883

Pulled By: mthrok

fbshipit-source-id: d8f54a281a92cac60c469c48f95345bcf0e959d6

f5d7635e

08 Jun, 2023 2 commits

Introduce chroma filter bank function (#3395) · dfd0c5fd

Jeff Hwang authored Jun 08, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3395

Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.

Reviewed By: mthrok

Differential Revision: D46307672

fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5

dfd0c5fd

Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca

moto authored Jun 08, 2023

Summary:
StreamReader decoding process is composed of the three steps;

1. Decode the incoming AVPacket into AVFrame
2. Pass AVFrame through AVFilter to perform post process
3. Convert the resulgint AVFrame

The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.

For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405

AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.

Fix https://github.com/pytorch/audio/issues/3405

Pull Request resolved: https://github.com/pytorch/audio/pull/3419

Differential Revision: D46557505

Pulled By: mthrok

fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6

7dff24ca

07 Jun, 2023 1 commit

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

06 Jun, 2023 3 commits

Fix style issue (#3410) · 27aa52fb

moto authored Jun 06, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410

Differential Revision: D46496786

Pulled By: mthrok

fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059

27aa52fb

Revert D46126226: Update forced_align method to only support batch Tensors · bbc13b9a

Moto Hira authored Jun 06, 2023

Differential Revision:
D46126226

Original commit changeset: 42cb52b19d91

Original Phabricator Diff: D46126226

fbshipit-source-id: 372b2526d9e196e37e014f1556bf117d29bb1ac6

bbc13b9a

Update forced_align method to only support batch Tensors (#3365) · 5f17d81c

Zhaoheng Ni authored Jun 06, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3365

Reviewed By: vineelpratap

Differential Revision: D46126226

fbshipit-source-id: 42cb52b19d91bbff7dc040ccf60350545d75b3a2

5f17d81c

02 Jun, 2023 1 commit

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

01 Jun, 2023 3 commits

[BC-breaking] Remove file-like object support from sox_io backend (#3035) · bc54ac8a

moto authored Jun 01, 2023

Summary:
This commit removes file-like obejct support so that we can remove custom patch

The motivation and plan is outlined in https://github.com/pytorch/audio/issues/2950.

Pull Request resolved: https://github.com/pytorch/audio/pull/3035

Reviewed By: hwangjeff

Differential Revision: D44695647

Pulled By: mthrok

fbshipit-source-id: 13af0234e288c041bc7b490e1f967f85ce7eb8ec

bc54ac8a

Fix style issue (#3398) · c7ac1aff

moto authored Jun 01, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3398

Reviewed By: nateanl

Differential Revision: D46354862

Pulled By: mthrok

fbshipit-source-id: b86dcdfeff8ed9db87b0b78eca20f6f18117e97e

c7ac1aff

Refactor arg mapping in ffmpeg save function (#3387) · b99e5f46

moto authored May 31, 2023

Summary:
The arguments of TorchAudio's save function ("format", "bits_per_sample" and "encoding")
are not one-to-one mapping to the arguments of FFmpeg encoding.

For example, to use vorbis codec, FFmpeg expects "ogg" container/extension with "vorbis"
encoder. It does not recognize "vorbis" extension like TorchAudio (libsox) does.

This commit refactors the logic to parse/map the arguments.

As a result it now properly works with vorbis and mp3 extension.

Pull Request resolved: https://github.com/pytorch/audio/pull/3387

Reviewed By: hwangjeff

Differential Revision: D46328787

Pulled By: mthrok

fbshipit-source-id: 36f993952a062bfec58a8b51be6aa86297571f90

b99e5f46

30 May, 2023 1 commit

Disable failing GPU unit test (#3384) · caf3ac07

atalman authored May 30, 2023

Summary:
Disable failing GPU unit test.
See associated issue: https://github.com/pytorch/audio/issues/3376

Pull Request resolved: https://github.com/pytorch/audio/pull/3384

Reviewed By: mthrok

Differential Revision: D46279324

Pulled By: atalman

fbshipit-source-id: 3a606bb992e0261451f48d1fb458e054f7fd5583

caf3ac07

27 May, 2023 1 commit

Fix AudioEffector for mulaw (#3372) · af932cc7

moto authored May 26, 2023

Summary:
When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform.

Pull Request resolved: https://github.com/pytorch/audio/pull/3372

Reviewed By: hwangjeff

Differential Revision: D46234772

Pulled By: mthrok

fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552

af932cc7

26 May, 2023 3 commits

Fix encoding g722 format (#3373) · 1b05ca7e

moto authored May 26, 2023

Summary:
g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up.

Pull Request resolved: https://github.com/pytorch/audio/pull/3373

Reviewed By: hwangjeff

Differential Revision: D46233181

Pulled By: mthrok

fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c

1b05ca7e

Temporarily remove test for extract_features (#3378) · 05649ca3

Zhaoheng Ni authored May 26, 2023

Summary:
The tests failed for several bundles. Remove them and will re-add once the root cause is figured out.

Pull Request resolved: https://github.com/pytorch/audio/pull/3378

Reviewed By: atalman

Differential Revision: D46230884

Pulled By: nateanl

fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92

05649ca3

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa