Commits · eab2f39dd688c969434cd61bb074956404ab8120 · OpenDAS / Torchaudio

10 May, 2022 6 commits

[ROCm] Update to rocm5.1.1 (#2362) · eab2f39d

Kyle Chen authored May 10, 2022

Summary:
previous update for rocm: https://github.com/pytorch/audio/pull/2186

Pull Request resolved: https://github.com/pytorch/audio/pull/2362

Reviewed By: seemethere

Differential Revision: D36283672

Pulled By: mthrok

fbshipit-source-id: bfd38940d027c8ccd72ab48991e5ab7f84b0e9c0

eab2f39d

Add RTFMVDR module (#2368) · 4b021ae3

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
The input arguments are:
- multi-channel spectrum.
- RTF vector of the target speech
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.
The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2368

Reviewed By: carolineechen

Differential Revision: D36214940

Pulled By: nateanl

fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937

4b021ae3

Add diagonal_loading optional to rtf_power (#2369) · da1e83cc

Zhaoheng Ni authored May 10, 2022

Summary:
When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.

This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2369

Reviewed By: carolineechen

Differential Revision: D36204130

Pulled By: nateanl

fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420

da1e83cc

Add SoudenMVDR module (#2367) · aed5eb88

Zhaoheng Ni authored May 10, 2022

Summary:
Add a new design of MVDR module.
The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are:
- multi-channel spectrum.
- PSD matrix of target speech.
- PSD matrix of noise.
- reference channel in the microphone array.
- diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
- diag_eps for computing the inverse of the matrix.
- eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

Pull Request resolved: https://github.com/pytorch/audio/pull/2367

Reviewed By: hwangjeff

Differential Revision: D36198015

Pulled By: nateanl

fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527

aed5eb88

Add HW acceleration support on Streamer (#2331) · 54d2d04f

moto authored May 09, 2022

Summary:
This commits add `hw_accel` option to `Streamer::add_video_stream` method.
Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA,
when the following conditions are met.
1. the video format is H264,
2. underlying ffmpeg is compiled with NVENC, and
3. the client code specifies `decoder="h264_cuvid"`.

A simple benchmark yields x7 improvement in the decoding speed.

<details>

```python
import time

from torchaudio.prototype.io import Streamer

srcs = [
    "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
    "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",  # offline version
]

patterns = [
    ("h264_cuvid", None, "cuda:0"),  # NVDEC on CUDA:0 -> CUDA:0
    ("h264_cuvid", None, "cuda:1"),  # NVDEC on CUDA:1 -> CUDA:1
    ("h264_cuvid", None, None),  # NVDEC -> CPU
    (None, None, None),  # CPU
]

for src in srcs:
    print(src, flush=True)
    for (decoder, decoder_options, hw_accel) in patterns:
        s = Streamer(src)
        s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel)

        t0 = time.monotonic()
        num_frames = 0
	for i, (chunk, ) in enumerate(s.stream()):
	    num_frames += chunk.shape[0]
        t1 = time.monotonic()
        print(chunk.dtype, chunk.shape, chunk.device)
        print(time.monotonic() - t0, num_frames, flush=True)
```
</details>

```
https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
10.781158386962488 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
10.771313901990652 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
27.88662809302332 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
83.22728440898936 6175
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
12.945253834011964 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
12.870224556012545 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
28.03406483103754 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
82.6120332319988 6175
```

With HW resizing

<details>

```python
import time

from torchaudio.prototype.io import Streamer

srcs = [
    "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
    "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
]

patterns = [
    # Decode with NVDEC, CUDA HW scaling -> CUDA:0
    ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"),
    # Decoded with NVDEC, CUDA HW scaling -> CPU
    ("h264_cuvid", {"resize": "960x540"}, "", None),
    # CPU decoding, CPU scaling
    (None, None, "scale=width=960:height=540", None),
]

for src in srcs:
    print(src, flush=True)
    for (decoder, decoder_options, filter_desc, hw_accel) in patterns:
        s = Streamer(src)
        s.add_video_stream(
            5,
            decoder=decoder,
            decoder_options=decoder_options,
            filter_desc=filter_desc,
            hw_accel=hw_accel,
        )

        t0 = time.monotonic()
        num_frames = 0
        for i, (chunk, ) in enumerate(s.stream()):
            num_frames += chunk.shape[0]
        t1 = time.monotonic()
        print(chunk.dtype, chunk.shape, chunk.device)
        print(time.monotonic() - t0, num_frames, flush=True)
```

</details>

```
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
12.890056837990414 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
10.697489063022658 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
85.19899423001334 6175

https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
10.712715593050234 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
11.030170071986504 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
84.8515750519582 6175
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2331

Reviewed By: hwangjeff

Differential Revision: D36217169

Pulled By: mthrok

fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b

54d2d04f

Add citations for datasets (#2371) · 638120ca

Caroline Chen authored May 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371

Reviewed By: xiaohui-zhang

Differential Revision: D36246167

Pulled By: carolineechen

fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4

638120ca

09 May, 2022 1 commit

Cleanup cuda115 unused code (#2374) · fe3d5d10

Andrey Talman authored May 09, 2022

Summary:
Cleanup old version of cuda115 and other legacy versions

Pull Request resolved: https://github.com/pytorch/audio/pull/2374

Reviewed By: nateanl, mthrok

Differential Revision: D36250955

Pulled By: atalman

fbshipit-source-id: 6b7f0e2926eeb688991c939901c980428cf8e7ef

fe3d5d10

06 May, 2022 2 commits

Use custom FFmpeg libraries for torchaudio binary distributions (#2355) · b7624c60

moto authored May 06, 2022

Summary:
This commit changes the way torchaudio binary distributions are built.

* For all the binary distributions (conda/pip on Linux/macOS/Windnows), build custom FFmpeg libraries.
* The custom FFmpeg libraries do not use `--use-gpl` nor `--use-nonfree`, so that they stay LGPL.
* The custom FFmpeg libraries employ rpath so that the torchaudio binary distributions look for the corresponding FFmpeg libraries installed in the runtime environment.
* The torchaudio binary build process will use them to bootstrap its build process.
* The custom FFmpeg libraries are NOT shipped.

This commit also add disclaimer about FFmpeg in README.

Pull Request resolved: https://github.com/pytorch/audio/pull/2355

Reviewed By: nateanl

Differential Revision: D36202087

Pulled By: mthrok

fbshipit-source-id: c30e5222ba190106c897e42f567cac9152dbd8ef

b7624c60

Refactor smoke test executions (#2365) · 6a8a28bb

moto authored May 06, 2022

Summary:
The smoke test jobs simply perform `import torchaudio` to check
if the package artifacts are sane.

Originally, the CI was executing it in the root directory.
This was fine unless the source code is checked out.
When source code is checked out, performing `import torchaudio` in
root directory would import source torchaudio directory, instead of the
installed package.

This error is difficult to notice, so this commit introduces common script to
perform the smoke test, while moving out of root directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2365

Reviewed By: carolineechen

Differential Revision: D36202069

Pulled By: mthrok

fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf

6a8a28bb

05 May, 2022 2 commits

Run smoke tests on regular PRs (#2364) · 6beb4875

moto authored May 05, 2022

Summary:
Currently smoke tests are only executed on nightly jobs.
This is inconvenient as PRs that changes build process do not get
the signal naturally.

This commit changes it by always executing smoke tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2364

Reviewed By: atalman

Differential Revision: D36171267

Pulled By: mthrok

fbshipit-source-id: e549965ba139b5992177b7a094d87c9ef4432a7f

6beb4875

Fix windows smoke test (#2361) · 70d7d696

Andrey Talman authored May 05, 2022

Summary:
This PR fixes Windows Smoke tests

Tested via  circleci :
https://app.circleci.com/pipelines/github/pytorch/audio/10572/workflows/970fd791-25cc-4af4-8183-a7835e1891bf/jobs/637607

Pull Request resolved: https://github.com/pytorch/audio/pull/2361

Reviewed By: nateanl, mthrok

Differential Revision: D36167317

Pulled By: atalman

fbshipit-source-id: 1418ebffd74614cc1110dc032d16ee9502a7d571

70d7d696

28 Apr, 2022 2 commits

Add BUILD_MAD option and default to OFF (#2354) · a71e3a40

moto authored Apr 28, 2022

Summary:
libmad integration should be enabled only from source-build

Pull Request resolved: https://github.com/pytorch/audio/pull/2354

Reviewed By: nateanl

Differential Revision: D36012035

Pulled By: mthrok

fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7

a71e3a40

Fix audio win smoke test to use GPU hosts for CUDA builds (#2353) · 3cf7f264

Andrey Talman authored Apr 28, 2022

Summary:
Fix audio win smoke test to use GPU hosts for CUDA builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2353

Reviewed By: mthrok

Differential Revision: D36006928

Pulled By: atalman

fbshipit-source-id: a27c4cc34093810c8cc08e01188e09b474478001

3cf7f264

27 Apr, 2022 1 commit

Fix bug with unsqueezing length tensor in RNNTBeamSearch (#2344) · 90e4959d

Guo Liyong authored Apr 27, 2022

Summary:
This PR amends `RNNTBeamSearch`'s streaming decoding method to correctly unsqueeze `length` when its dimension is 0.

Original comment: Is "input.dim() == 0" unreachable as it could only be 2 or 3 in assertion of Line 329?

Pull Request resolved: https://github.com/pytorch/audio/pull/2344

Reviewed By: carolineechen, nateanl

Differential Revision: D35899740

Pulled By: hwangjeff

fbshipit-source-id: 84c1692b8cc9e5d35798d87f4a1bd052d94af9fb

90e4959d

26 Apr, 2022 5 commits

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

Add extra arguments to hubert pretrain factory functions (#2345) · 7c249d17

Zhaoheng Ni authored Apr 26, 2022

Summary:
In different pre-training and fine-tuning settings, the `mask_prob`, `mask_channel_prob`, and `mask_channel_length` are different. For example, the settings in [pre-training](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L70) and [fine-tuning](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/finetune/base_10h.yaml#L69-L73) are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: [fine-tune on 10 minutes of audio](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/finetuning/vox_10m.yaml#L57-L59)).
This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. `mask_length` is set to `10` by default for all cases, hence it's not included in the factory function.

Pull Request resolved: https://github.com/pytorch/audio/pull/2345

Reviewed By: carolineechen, xiaohui-zhang

Differential Revision: D35845117

Pulled By: nateanl

fbshipit-source-id: 0cbb74d09535d189b8258aa8ee0f88779bdb77e7

7c249d17

Update wavernn.py (#2347) · 0986eebf

Bingcheng Hu authored Apr 26, 2022

Summary:
fix false shape

Pull Request resolved: https://github.com/pytorch/audio/pull/2347

Reviewed By: carolineechen

Differential Revision: D35921047

Pulled By: nateanl

fbshipit-source-id: 5b58820ee777920c68f13a15d80cd2bcc931af87

0986eebf

Fix LibriMix documentation (#2351) · 892d6d34

Zhaoheng Ni authored Apr 26, 2022

Summary:
The `LibriMix` dataset is missing on the [documentation webpage](https://pytorch.org/audio/stable/datasets.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2351

Reviewed By: carolineechen

Differential Revision: D35926695

Pulled By: nateanl

fbshipit-source-id: 168aed3bb15510d1b1ec57d77727932e481aca48

892d6d34

Fix for torchaudio windows tests (#2350) · 867cff5f

Andrey Talman authored Apr 25, 2022

Summary:
Fix for torchaudio windows tests
Following is an example of such test failing:
https://app.circleci.com/pipelines/github/pytorch/audio/9408/workflows/e6e5a05c-7080-4fdc-b478-2182aed5f234/jobs/531612

The following code is failing:
`conda install -v -y $(ls ~/workspace/torchaudio*.tar.bz2)`

This is because the install package is generated in the following directory:
`/workspace/conda-bld/win-64/`

Pull Request resolved: https://github.com/pytorch/audio/pull/2350

Reviewed By: mthrok

Differential Revision: D35912424

Pulled By: atalman

fbshipit-source-id: fc4f66ffca24061cc768a5f1010b448f065b9410

867cff5f

25 Apr, 2022 1 commit

Fix python 3.10 smoke tests (#2348) · d1f747fb

Andrey Talman authored Apr 25, 2022

Summary:
Fix python 3.10 smoke tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2348

Reviewed By: mthrok

Differential Revision: D35906343

Pulled By: atalman

fbshipit-source-id: 6dbb39e69c9751da4b86d5da38a6d11816d527c5

d1f747fb

22 Apr, 2022 3 commits

Cuda 11.5 remove since we introduced cuda 11.6 (#2346) · 48facbd4

Andrey Talman authored Apr 22, 2022

Summary:
Cuda 11.5 remove since we introduced cuda 11.6

Pull Request resolved: https://github.com/pytorch/audio/pull/2346

Reviewed By: mthrok

Differential Revision: D35856758

Pulled By: atalman

fbshipit-source-id: d3c0cf7639fd20f9ccc52c0738f247b8598f1ed7

48facbd4

[CircleCI] Update base images to ubuntu-2004 (#2343) · bf89e570

Andrey Talman authored Apr 22, 2022

Summary:
Same change as done in this vision [PR](https://github.com/pytorch/vision/pull/5802)

As Ubuntu-1604 runners will no longer be available in early May
Update ubuntu-1604-cuda-10.1:201909-23 to ubuntu-2004-cuda-11.4:202110-01
Per [CircleCI Configuration reference](https://circleci.com/docs/2.0/configuration-reference/)

Resolves https://github.com/pytorch/audio/issues/2279

Pull Request resolved: https://github.com/pytorch/audio/pull/2343

Reviewed By: mthrok

Differential Revision: D35844880

Pulled By: atalman

fbshipit-source-id: 318a9fa42455e55664f3da6ab67625cb969f72e6

bf89e570

Introduce DistributedBatchSampler (#2299) · 6411c9ad

Zhaoheng Ni authored Apr 22, 2022

Summary:
When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode.

The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`.

The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value.
Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__`` and the real length is the same.

Pull Request resolved: https://github.com/pytorch/audio/pull/2299

Reviewed By: hwangjeff

Differential Revision: D35781538

Pulled By: nateanl

fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a

6411c9ad

21 Apr, 2022 2 commits

CUDA 11.6 for TorchAudio (#2328) · 2acafdaf

Andrey Talman authored Apr 21, 2022

Summary:
CUDA 11.6 for TorchAudio

Pull Request resolved: https://github.com/pytorch/audio/pull/2328

Reviewed By: mthrok

Differential Revision: D35826414

Pulled By: atalman

fbshipit-source-id: 0a471f0566286d69c0c73191aea7fd5ac0647e5f

2acafdaf

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

19 Apr, 2022 1 commit

Introduce convolution-augmented Emformer layer prototype (#2324) · 9465b6bf

hwangjeff authored Apr 18, 2022

Summary:
Introduces prototype of convolution-augmented Emformer layer. At a high level, it incorporates Conformer's macaron feedforward network structure and convolution module with Emformer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2324

Reviewed By: mthrok

Differential Revision: D35734252

Pulled By: hwangjeff

fbshipit-source-id: c7ea0bdcfe53a948b00881a74f1f1e1928f5ac57

9465b6bf

18 Apr, 2022 1 commit

Add QUESST14 dataset (#2290) · aebcf6af

Caroline Chen authored Apr 18, 2022

Summary:
implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py)

modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline

Pull Request resolved: https://github.com/pytorch/audio/pull/2290

Reviewed By: nateanl

Differential Revision: D35692551

Pulled By: carolineechen

fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c

aebcf6af

15 Apr, 2022 1 commit

Disable clang-tidy modernize-use-trailing-return-type (#2337) · 86100e38

Moto Hira authored Apr 14, 2022

Summary:
Disable clang-tidy's `modernize-use-trailing-return-type` suggestion.

Trailing return type has no impact on performance.
The lint warning shows up everywhere, and it's nothing but noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2337

Reviewed By: hwangjeff

Differential Revision: D35635718

Pulled By: mthrok

fbshipit-source-id: beb2d3ec657f829493e08b2c159f215053b0e784

86100e38

14 Apr, 2022 3 commits

Support specifying decoder and its options (#2327) · be243c59

moto authored Apr 14, 2022

Summary:
This commit adds support to specify decoder to Streamer's add stream method.
This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options.

This allows to override the decoder codec and/or specify the option of
the decoder.

This change allows to specify Nvidia NVDEC codec for supported formats,
which uses dedicated hardware for decoding the video.

 ---

Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to  `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2327

Reviewed By: carolineechen

Differential Revision: D35626904

Pulled By: mthrok

fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede

be243c59

Support NV12 format in video decoding (#2330) · 7972be99

moto authored Apr 13, 2022

Summary:
Support NV12 format in Streamer API.

NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values.
https://chromium.googlesource.com/libyuv/libyuv/+/HEAD/docs/formats.md#nv12-and-nv21

The original UV plane is smaller than Y plane, so in this implmentation,
UV plane is upsampled to match the size of Y plane.

Pull Request resolved: https://github.com/pytorch/audio/pull/2330

Reviewed By: hwangjeff

Differential Revision: D35632351

Pulled By: mthrok

fbshipit-source-id: aab4fbc0ce2bb7a1fb67264c27208b610fb56e27

7972be99

Add YUV420P format support to Streamer API (#2334) · 2f70e2f9

moto authored Apr 13, 2022

Summary:
This commit adds YUV420P format support to Streamer API.
When the native format of a video is YUV420P, the Streamer will
output Tensor of YUV color channel.

Pull Request resolved: https://github.com/pytorch/audio/pull/2334

Reviewed By: hwangjeff

Differential Revision: D35632916

Pulled By: mthrok

fbshipit-source-id: a7a0078788433060266b8bd3e7cad023f41389f5

2f70e2f9

13 Apr, 2022 2 commits

Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b

hwangjeff authored Apr 13, 2022

Summary:
Adds Conformer RNN-T LibriSpeech training recipe to examples directory.

Produces 30M-parameter model that achieves the following WER:

|                     |          WER |
|:-------------------:|-------------:|
| test-clean          |       0.0310 |
| test-other          |       0.0805 |
| dev-clean           |       0.0314 |
| dev-other           |       0.0827 |

Pull Request resolved: https://github.com/pytorch/audio/pull/2329

Reviewed By: xiaohui-zhang

Differential Revision: D35578727

Pulled By: hwangjeff

fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d

c262758b

Add nightly build installation code snippet to prototype feature tutorials (#2325) · fb51cecc

hwangjeff authored Apr 12, 2022

Summary:
Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally.

Pull Request resolved: https://github.com/pytorch/audio/pull/2325

Reviewed By: xiaohui-zhang

Differential Revision: D35597753

Pulled By: hwangjeff

fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c

fb51cecc

12 Apr, 2022 1 commit

Add Conformer RNN-T model prototype (#2322) · b0c8e239

hwangjeff authored Apr 11, 2022

Summary:
Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
- Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
- Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
- Introduces tests for `conformer_rnnt_model`.
- Adds docs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2322

Reviewed By: xiaohui-zhang

Differential Revision: D35565987

Pulled By: hwangjeff

fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789

b0c8e239

11 Apr, 2022 1 commit

Fix ffmpeg integration for ffmpeg 5.0 (#2326) · bd319959

moto authored Apr 11, 2022

Summary:
This commit makes the FFmpeg integration support FFmpeg 5.0

In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed,
so that they deal with constant version of `AVInputFormat`.

> 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h
>  Constified the pointers to AVInputFormats and AVOutputFormats
>  in AVFormatContext, avformat_alloc_output_context2(),
>  av_find_input_format(), av_probe_input_format(),
>  av_probe_input_format2(), av_probe_input_format3(),
>  av_probe_input_buffer2(), av_probe_input_buffer(),
>  avformat_open_input(), av_guess_format() and av_guess_codec().
>  Furthermore, constified the AVProbeData in av_probe_input_format(),
>  av_probe_input_format2() and av_probe_input_format3().

https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260

Pull Request resolved: https://github.com/pytorch/audio/pull/2326

Reviewed By: carolineechen

Differential Revision: D35551380

Pulled By: mthrok

fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666

bd319959

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

06 Apr, 2022 2 commits

Support GroupNorm and re-ordering Convolution/MHA in Conformer (#2320) · eb23a242

Xiaohui Zhang authored Apr 06, 2022

Summary:
Add an option to use GroupNorm rather than BatchNorm1d, and another option to re-order Convolution/MHA modules in Conformer model.

Pull Request resolved: https://github.com/pytorch/audio/pull/2320

Reviewed By: hwangjeff

Differential Revision: D35422112

Pulled By: xiaohui-zhang

fbshipit-source-id: 360a8aaa37b883b0f656da2e4f654e86688ac270

eb23a242

Add an option to use Tanh instead of ReLU in RNNT joiner (#2319) · 16958d5b

Xiaohui Zhang authored Apr 06, 2022

Summary:
Add an option to use Tanh instead of ReLU in RNNT joiner, which enables better training performance sometimes.

 ---

Pull Request resolved: https://github.com/pytorch/audio/pull/2319

Reviewed By: hwangjeff

Differential Revision: D35422122

Pulled By: xiaohui-zhang

fbshipit-source-id: c6a0f8b25936e47081110af046b57d0e8751f9a2

16958d5b

05 Apr, 2022 2 commits

Disable multiprocessing when dumping features in hubert preprocessing (#2311) · f7afe29e

Zhaoheng Ni authored Apr 05, 2022

Summary:
The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2311

Reviewed By: mthrok

Differential Revision: D35393813

Pulled By: nateanl

fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11

f7afe29e

Raise error for resampling int waveform (#2318) · 11328d23

Caroline Chen authored Apr 05, 2022

Summary:
Resolves https://github.com/pytorch/audio/issues/2294

Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type.

Pull Request resolved: https://github.com/pytorch/audio/pull/2318

Reviewed By: mthrok

Differential Revision: D35379276

Pulled By: carolineechen

fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0

11328d23