Commits · 096396802de3b1b304c58eb331dc59c8d27fe5de · OpenDAS / Torchaudio

12 May, 2022 2 commits

Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680

Zhaoheng Ni authored May 12, 2022

Summary:
- When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
- If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/2296

Reviewed By: mthrok

Differential Revision: D36323217

Pulled By: nateanl

fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423

09639680

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

11 May, 2022 6 commits

Move FFmpeg integrity test from conda smoke test to custom smoke test (#2381) · 9877f544

moto authored May 11, 2022

Summary:
Conda package build performs simple smoke test, which is different
from smoke_test jobs we define on our CI jobs.

Currently Conda packaging smoke test verifies the imporatability of
`torchaudio.prototype.io`, which requires FFmpeg 4.

1. We list FFmpeg 4 as runtime requirements, but this means that
conda's dependency resolver takes FFmpeg 4 into consideration.
FFmpeg 5 was release this year, and we can expect that user base
will move to FFmpeg gradually. If user environment has some constraint
on FFmpeg, torchaudio will have conflict and it will prevent users
from install torchaudio.

2. In #2377 the way optional dependency is checked/initialized is changed,
so this Conda smoke test will no longer check the integrity with FFmpeg libraries.

To solve the issues above, this commit moves the part that tests integrity with
FFmpeg libraries to the smoke test we define on CircleCI.

Pull Request resolved: https://github.com/pytorch/audio/pull/2381

Reviewed By: carolineechen

Differential Revision: D36323706

Pulled By: mthrok

fbshipit-source-id: 57ca816e0f3ad8e16d21e56062f6ed8a09ab93a3

9877f544

Move multi-channel modules to a separate file (#2382) · 448f53e1

Zhaoheng Ni authored May 11, 2022

Summary:
The modules include:
- PSD
- MVDR
- RTFMVDR
- SoudenMVDR

Pull Request resolved: https://github.com/pytorch/audio/pull/2382

Reviewed By: carolineechen

Differential Revision: D36314096

Pulled By: nateanl

fbshipit-source-id: 9d7d962b1c70cdc435a579191ad88838dd6fc0ba

448f53e1

Remove CodeQL (#2380) · 961a3ae9

moto authored May 11, 2022

Summary:
Since a while ago, CodeQL is always emitting red signal, but the team
does not know what this is / how to fix this. At this point, it is
purely noise while not providing a valuable signal.

Ref https://github.com/pytorch/audio/issues/2314

Pull Request resolved: https://github.com/pytorch/audio/pull/2380

Reviewed By: carolineechen

Differential Revision: D36305599

Pulled By: mthrok

fbshipit-source-id: 27ece58730066543600f3873397b9a239e54beb0

961a3ae9

Ignore TempDir clean up error (#2379) · f35ad461

moto authored May 11, 2022

Summary:
On CircleCI, Windows unittests are failing for Python 3.7 with
`PermissionError` at the end of test when it cleans up temporary
directory.

According to the discussion https://github.com/python/cpython/issues/74168,
this is caused by a known issue with `shutil.rmtree`.

In the above thread it is advised to simply ignore the error as it
is not guaranteed that temp directories are cleaned up.

This commit follows the same path and simply ignore the error
so that our CI gets back to green.

Pull Request resolved: https://github.com/pytorch/audio/pull/2379

Reviewed By: carolineechen

Differential Revision: D36305595

Pulled By: mthrok

fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5

f35ad461

Refactor LibriSpeech Conformer RNN-T recipe (#2366) · 69467ea5

hwangjeff authored May 10, 2022

Summary:
Modifies the example LibriSpeech Conformer RNN-T recipe as follows:
- Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module).
- Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms).
- Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments).
- Modifies training script to allow for specifying path model checkpoint to restart training from.

Pull Request resolved: https://github.com/pytorch/audio/pull/2366

Reviewed By: mthrok

Differential Revision: D36305028

Pulled By: hwangjeff

fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba

69467ea5

Refactor the constructors of pointer wrappers (#2373) · 93c26d63

moto authored May 10, 2022

Summary:
This commit refactor the constructor of wrapper classes so that
wrapper classes are only responsible for deallocation of underlying
FFmpeg custom structures.

The responsibility of custom initialization is moved to helper functions.

Context:

FFmpeg API uses bunch of raw pointers, which require dedicated allocater
and deallcoator. In torchaudio we wrap these pointers with
`std::unique_ptr<>` to adopt RAII semantics.

Currently all of the customization logics required for `Streamer` are
handled by the constructor of wrapper class. Like the following;

```
AVFormatContextPtr(
      const std::string& src,
      const std::string& device,
      const std::map<std::string, std::string>& option);
```

This constructor allocates the raw `AVFormatContext*` pointer,
while initializing it with the given option, then it parses the
input media.

As we consider the write/encode features, which require different way
of initializing the `AVFormatContext*`, making it the responsibility
of constructors of `AVFormatContextPtr` reduce the flexibility.

Thus this commit moves the customization to helper factory function.

- `AVFormatContextPtr(...)` -> `get_input_format_context(...)`
- `AVCodecContextPtr(...)` -> `get_decode_context(...)`

Pull Request resolved: https://github.com/pytorch/audio/pull/2373

Reviewed By: hwangjeff

Differential Revision: D36230148

Pulled By: mthrok

fbshipit-source-id: 202d57d549223904ee958193f3b386ef5a9cda3a

93c26d63

10 May, 2022 8 commits

Add ConvEmformer module (#2358) · 2c79b55a

hwangjeff authored May 10, 2022

Summary:
Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241.

Continuation of https://github.com/pytorch/audio/issues/2324.

Pull Request resolved: https://github.com/pytorch/audio/pull/2358

Reviewed By: nateanl, xiaohui-zhang

Differential Revision: D36137992

Pulled By: hwangjeff

fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063

2c79b55a

Fix return dtype in MVDR module (#2376) · 2f4eb4ac

Zhaoheng Ni authored May 10, 2022

Summary:
Address https://github.com/pytorch/audio/issues/2375
The MVDR module internally transforms the dtype of complex tensors to `torch.complex128` for computation and transforms it back to the original dtype before returning the Tensor. However, it didn't convert back successfully due to `specgram_enhanced.to(dtype)`, which should be `specgram_enhanced = specgram_enhanced.to(dtype)`. Fix it to make the output dtype consistent with original input.

Pull Request resolved: https://github.com/pytorch/audio/pull/2376

Reviewed By: hwangjeff

Differential Revision: D36280851

Pulled By: nateanl

fbshipit-source-id: 553d1b98f899547209a4e3ebc59920c7ef1f3112

2f4eb4ac

[ROCm] Update to rocm5.1.1 (#2362) · eab2f39d

Kyle Chen authored May 10, 2022

Summary:
previous update for rocm: https://github.com/pytorch/audio/pull/2186

Pull Request resolved: https://github.com/pytorch/audio/pull/2362

Reviewed By: seemethere

Differential Revision: D36283672

Pulled By: mthrok

fbshipit-source-id: bfd38940d027c8ccd72ab48991e5ab7f84b0e9c0

eab2f39d

Add RTFMVDR module (#2368) · 4b021ae3

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
The input arguments are:
- multi-channel spectrum.
- RTF vector of the target speech
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.
The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2368

Reviewed By: carolineechen

Differential Revision: D36214940

Pulled By: nateanl

fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937

4b021ae3

Add diagonal_loading optional to rtf_power (#2369) · da1e83cc

Zhaoheng Ni authored May 10, 2022

Summary:
When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.

This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2369

Reviewed By: carolineechen

Differential Revision: D36204130

Pulled By: nateanl

fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420

da1e83cc

Add SoudenMVDR module (#2367) · aed5eb88

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are:
- multi-channel spectrum.
- PSD matrix of target speech.
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2367

Reviewed By: hwangjeff

Differential Revision: D36198015

Pulled By: nateanl

fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527

aed5eb88

Add HW acceleration support on Streamer (#2331) · 54d2d04f

moto authored May 09, 2022

Summary:
This commits add `hw_accel` option to `Streamer::add_video_stream` method.
Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA,
when the following conditions are met.
1. the video format is H264,
2. underlying ffmpeg is compiled with NVENC, and
3. the client code specifies `decoder="h264_cuvid"`.

A simple benchmark yields x7 improvement in the decoding speed.

<details>

```python
import time

from torchaudio.prototype.io import Streamer

srcs = [
    "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
    "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",  # offline version
]

patterns = [
    ("h264_cuvid", None, "cuda:0"),  # NVDEC on CUDA:0 -> CUDA:0
    ("h264_cuvid", None, "cuda:1"),  # NVDEC on CUDA:1 -> CUDA:1
    ("h264_cuvid", None, None),  # NVDEC -> CPU
    (None, None, None),  # CPU
]

for src in srcs:
    print(src, flush=True)
    for (decoder, decoder_options, hw_accel) in patterns:
        s = Streamer(src)
        s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel)

        t0 = time.monotonic()
        num_frames = 0
	for i, (chunk, ) in enumerate(s.stream()):
	    num_frames += chunk.shape[0]
        t1 = time.monotonic()
        print(chunk.dtype, chunk.shape, chunk.device)
        print(time.monotonic() - t0, num_frames, flush=True)
```
</details>

```
https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
10.781158386962488 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
10.771313901990652 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
27.88662809302332 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
83.22728440898936 6175
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
12.945253834011964 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
12.870224556012545 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
28.03406483103754 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
82.6120332319988 6175
```

With HW resizing

<details>

```python
import time

from torchaudio.prototype.io import Streamer

srcs = [
    "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
    "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
]

patterns = [
    # Decode with NVDEC, CUDA HW scaling -> CUDA:0
    ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"),
    # Decoded with NVDEC, CUDA HW scaling -> CPU
    ("h264_cuvid", {"resize": "960x540"}, "", None),
    # CPU decoding, CPU scaling
    (None, None, "scale=width=960:height=540", None),
]

for src in srcs:
    print(src, flush=True)
    for (decoder, decoder_options, filter_desc, hw_accel) in patterns:
        s = Streamer(src)
        s.add_video_stream(
            5,
            decoder=decoder,
            decoder_options=decoder_options,
            filter_desc=filter_desc,
            hw_accel=hw_accel,
        )

        t0 = time.monotonic()
        num_frames = 0
        for i, (chunk, ) in enumerate(s.stream()):
            num_frames += chunk.shape[0]
        t1 = time.monotonic()
        print(chunk.dtype, chunk.shape, chunk.device)
        print(time.monotonic() - t0, num_frames, flush=True)
```

</details>

```
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
12.890056837990414 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
10.697489063022658 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
85.19899423001334 6175

https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
10.712715593050234 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
11.030170071986504 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
84.8515750519582 6175
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2331

Reviewed By: hwangjeff

Differential Revision: D36217169

Pulled By: mthrok

fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b

54d2d04f

Add citations for datasets (#2371) · 638120ca

Caroline Chen authored May 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371

Reviewed By: xiaohui-zhang

Differential Revision: D36246167

Pulled By: carolineechen

fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4

638120ca

09 May, 2022 1 commit

Cleanup cuda115 unused code (#2374) · fe3d5d10

Andrey Talman authored May 09, 2022

Summary:
Cleanup old version of cuda115 and other legacy versions

Pull Request resolved: https://github.com/pytorch/audio/pull/2374

Reviewed By: nateanl, mthrok

Differential Revision: D36250955

Pulled By: atalman

fbshipit-source-id: 6b7f0e2926eeb688991c939901c980428cf8e7ef

fe3d5d10

06 May, 2022 2 commits

Use custom FFmpeg libraries for torchaudio binary distributions (#2355) · b7624c60

moto authored May 06, 2022

Summary:
This commit changes the way torchaudio binary distributions are built.

* For all the binary distributions (conda/pip on Linux/macOS/Windnows), build custom FFmpeg libraries.
* The custom FFmpeg libraries do not use `--use-gpl` nor `--use-nonfree`, so that they stay LGPL.
* The custom FFmpeg libraries employ rpath so that the torchaudio binary distributions look for the corresponding FFmpeg libraries installed in the runtime environment.
* The torchaudio binary build process will use them to bootstrap its build process.
* The custom FFmpeg libraries are NOT shipped.

This commit also add disclaimer about FFmpeg in README.

Pull Request resolved: https://github.com/pytorch/audio/pull/2355

Reviewed By: nateanl

Differential Revision: D36202087

Pulled By: mthrok

fbshipit-source-id: c30e5222ba190106c897e42f567cac9152dbd8ef

b7624c60

Refactor smoke test executions (#2365) · 6a8a28bb

moto authored May 06, 2022

Summary:
The smoke test jobs simply perform `import torchaudio` to check
if the package artifacts are sane.

Originally, the CI was executing it in the root directory.
This was fine unless the source code is checked out.
When source code is checked out, performing `import torchaudio` in
root directory would import source torchaudio directory, instead of the
installed package.

This error is difficult to notice, so this commit introduces common script to
perform the smoke test, while moving out of root directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2365

Reviewed By: carolineechen

Differential Revision: D36202069

Pulled By: mthrok

fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf

6a8a28bb

05 May, 2022 2 commits

Run smoke tests on regular PRs (#2364) · 6beb4875

moto authored May 05, 2022

Summary:
Currently smoke tests are only executed on nightly jobs.
This is inconvenient as PRs that changes build process do not get
the signal naturally.

This commit changes it by always executing smoke tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2364

Reviewed By: atalman

Differential Revision: D36171267

Pulled By: mthrok

fbshipit-source-id: e549965ba139b5992177b7a094d87c9ef4432a7f

6beb4875

Fix windows smoke test (#2361) · 70d7d696

Andrey Talman authored May 05, 2022

Summary:
This PR fixes Windows Smoke tests

Tested via  circleci :
https://app.circleci.com/pipelines/github/pytorch/audio/10572/workflows/970fd791-25cc-4af4-8183-a7835e1891bf/jobs/637607

Pull Request resolved: https://github.com/pytorch/audio/pull/2361

Reviewed By: nateanl, mthrok

Differential Revision: D36167317

Pulled By: atalman

fbshipit-source-id: 1418ebffd74614cc1110dc032d16ee9502a7d571

70d7d696

28 Apr, 2022 2 commits

Add BUILD_MAD option and default to OFF (#2354) · a71e3a40

moto authored Apr 28, 2022

Summary:
libmad integration should be enabled only from source-build

Pull Request resolved: https://github.com/pytorch/audio/pull/2354

Reviewed By: nateanl

Differential Revision: D36012035

Pulled By: mthrok

fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7

a71e3a40

Fix audio win smoke test to use GPU hosts for CUDA builds (#2353) · 3cf7f264

Andrey Talman authored Apr 28, 2022

Summary:
Fix audio win smoke test to use GPU hosts for CUDA builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2353

Reviewed By: mthrok

Differential Revision: D36006928

Pulled By: atalman

fbshipit-source-id: a27c4cc34093810c8cc08e01188e09b474478001

3cf7f264

27 Apr, 2022 1 commit

Fix bug with unsqueezing length tensor in RNNTBeamSearch (#2344) · 90e4959d

Guo Liyong authored Apr 27, 2022

Summary:
This PR amends `RNNTBeamSearch`'s streaming decoding method to correctly unsqueeze `length` when its dimension is 0.

Original comment: Is "input.dim() == 0" unreachable as it could only be 2 or 3 in assertion of Line 329?

Pull Request resolved: https://github.com/pytorch/audio/pull/2344

Reviewed By: carolineechen, nateanl

Differential Revision: D35899740

Pulled By: hwangjeff

fbshipit-source-id: 84c1692b8cc9e5d35798d87f4a1bd052d94af9fb

90e4959d

26 Apr, 2022 5 commits

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

Add extra arguments to hubert pretrain factory functions (#2345) · 7c249d17

Zhaoheng Ni authored Apr 26, 2022

Summary:
In different pre-training and fine-tuning settings, the `mask_prob`, `mask_channel_prob`, and `mask_channel_length` are different. For example, the settings in [pre-training](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L70) and [fine-tuning](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/finetune/base_10h.yaml#L69-L73) are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: [fine-tune on 10 minutes of audio](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/finetuning/vox_10m.yaml#L57-L59)).
This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. `mask_length` is set to `10` by default for all cases, hence it's not included in the factory function.

Pull Request resolved: https://github.com/pytorch/audio/pull/2345

Reviewed By: carolineechen, xiaohui-zhang

Differential Revision: D35845117

Pulled By: nateanl

fbshipit-source-id: 0cbb74d09535d189b8258aa8ee0f88779bdb77e7

7c249d17

Update wavernn.py (#2347) · 0986eebf

Bingcheng Hu authored Apr 26, 2022

Summary:
fix false shape

Pull Request resolved: https://github.com/pytorch/audio/pull/2347

Reviewed By: carolineechen

Differential Revision: D35921047

Pulled By: nateanl

fbshipit-source-id: 5b58820ee777920c68f13a15d80cd2bcc931af87

0986eebf

Fix LibriMix documentation (#2351) · 892d6d34

Zhaoheng Ni authored Apr 26, 2022

Summary:
The `LibriMix` dataset is missing on the [documentation webpage](https://pytorch.org/audio/stable/datasets.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2351

Reviewed By: carolineechen

Differential Revision: D35926695

Pulled By: nateanl

fbshipit-source-id: 168aed3bb15510d1b1ec57d77727932e481aca48

892d6d34

Fix for torchaudio windows tests (#2350) · 867cff5f

Andrey Talman authored Apr 25, 2022

Summary:
Fix for torchaudio windows tests
Following is an example of such test failing:
https://app.circleci.com/pipelines/github/pytorch/audio/9408/workflows/e6e5a05c-7080-4fdc-b478-2182aed5f234/jobs/531612

The following code is failing:
`conda install -v -y $(ls ~/workspace/torchaudio*.tar.bz2)`

This is because the install package is generated in the following directory:
`/workspace/conda-bld/win-64/`

Pull Request resolved: https://github.com/pytorch/audio/pull/2350

Reviewed By: mthrok

Differential Revision: D35912424

Pulled By: atalman

fbshipit-source-id: fc4f66ffca24061cc768a5f1010b448f065b9410

867cff5f

25 Apr, 2022 1 commit

Fix python 3.10 smoke tests (#2348) · d1f747fb

Andrey Talman authored Apr 25, 2022

Summary:
Fix python 3.10 smoke tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2348

Reviewed By: mthrok

Differential Revision: D35906343

Pulled By: atalman

fbshipit-source-id: 6dbb39e69c9751da4b86d5da38a6d11816d527c5

d1f747fb

22 Apr, 2022 3 commits

Cuda 11.5 remove since we introduced cuda 11.6 (#2346) · 48facbd4

Andrey Talman authored Apr 22, 2022

Summary:
Cuda 11.5 remove since we introduced cuda 11.6

Pull Request resolved: https://github.com/pytorch/audio/pull/2346

Reviewed By: mthrok

Differential Revision: D35856758

Pulled By: atalman

fbshipit-source-id: d3c0cf7639fd20f9ccc52c0738f247b8598f1ed7

48facbd4

[CircleCI] Update base images to ubuntu-2004 (#2343) · bf89e570

Andrey Talman authored Apr 22, 2022

Summary:
Same change as done in this vision [PR](https://github.com/pytorch/vision/pull/5802)

As Ubuntu-1604 runners will no longer be available in early May
Update ubuntu-1604-cuda-10.1:201909-23 to ubuntu-2004-cuda-11.4:202110-01
Per [CircleCI Configuration reference](https://circleci.com/docs/2.0/configuration-reference/)

Resolves https://github.com/pytorch/audio/issues/2279

Pull Request resolved: https://github.com/pytorch/audio/pull/2343

Reviewed By: mthrok

Differential Revision: D35844880

Pulled By: atalman

fbshipit-source-id: 318a9fa42455e55664f3da6ab67625cb969f72e6

bf89e570

Introduce DistributedBatchSampler (#2299) · 6411c9ad

Zhaoheng Ni authored Apr 22, 2022

Summary:
When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode.

The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`.

The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value.
Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__`` and the real length is the same.

Pull Request resolved: https://github.com/pytorch/audio/pull/2299

Reviewed By: hwangjeff

Differential Revision: D35781538

Pulled By: nateanl

fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a

6411c9ad

21 Apr, 2022 2 commits

CUDA 11.6 for TorchAudio (#2328) · 2acafdaf

Andrey Talman authored Apr 21, 2022

Summary:
CUDA 11.6 for TorchAudio

Pull Request resolved: https://github.com/pytorch/audio/pull/2328

Reviewed By: mthrok

Differential Revision: D35826414

Pulled By: atalman

fbshipit-source-id: 0a471f0566286d69c0c73191aea7fd5ac0647e5f

2acafdaf

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

19 Apr, 2022 1 commit

Introduce convolution-augmented Emformer layer prototype (#2324) · 9465b6bf

hwangjeff authored Apr 18, 2022

Summary:
Introduces prototype of convolution-augmented Emformer layer. At a high level, it incorporates Conformer's macaron feedforward network structure and convolution module with Emformer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2324

Reviewed By: mthrok

Differential Revision: D35734252

Pulled By: hwangjeff

fbshipit-source-id: c7ea0bdcfe53a948b00881a74f1f1e1928f5ac57

9465b6bf

18 Apr, 2022 1 commit

Add QUESST14 dataset (#2290) · aebcf6af

Caroline Chen authored Apr 18, 2022

Summary:
implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py)

modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline

Pull Request resolved: https://github.com/pytorch/audio/pull/2290

Reviewed By: nateanl

Differential Revision: D35692551

Pulled By: carolineechen

fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c

aebcf6af

15 Apr, 2022 1 commit

Disable clang-tidy modernize-use-trailing-return-type (#2337) · 86100e38

Moto Hira authored Apr 14, 2022

Summary:
Disable clang-tidy's `modernize-use-trailing-return-type` suggestion.

Trailing return type has no impact on performance.
The lint warning shows up everywhere, and it's nothing but noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2337

Reviewed By: hwangjeff

Differential Revision: D35635718

Pulled By: mthrok

fbshipit-source-id: beb2d3ec657f829493e08b2c159f215053b0e784

86100e38

14 Apr, 2022 2 commits

Support specifying decoder and its options (#2327) · be243c59

moto authored Apr 14, 2022

Summary:
This commit adds support to specify decoder to Streamer's add stream method.
This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options.

This allows to override the decoder codec and/or specify the option of
the decoder.

This change allows to specify Nvidia NVDEC codec for supported formats,
which uses dedicated hardware for decoding the video.

 ---

Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to  `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2327

Reviewed By: carolineechen

Differential Revision: D35626904

Pulled By: mthrok

fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede

be243c59

Support NV12 format in video decoding (#2330) · 7972be99

moto authored Apr 13, 2022

Summary:
Support NV12 format in Streamer API.

NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values.
https://chromium.googlesource.com/libyuv/libyuv/+/HEAD/docs/formats.md#nv12-and-nv21

The original UV plane is smaller than Y plane, so in this implmentation,
UV plane is upsampled to match the size of Y plane.

Pull Request resolved: https://github.com/pytorch/audio/pull/2330

Reviewed By: hwangjeff

Differential Revision: D35632351

Pulled By: mthrok

fbshipit-source-id: aab4fbc0ce2bb7a1fb67264c27208b610fb56e27

7972be99