Commits · 6a8a28bb589135ef3245d08f88a6ee87efd973ab · OpenDAS / Torchaudio

06 May, 2022 1 commit

Refactor smoke test executions (#2365) · 6a8a28bb

moto authored May 06, 2022

Summary:
The smoke test jobs simply perform `import torchaudio` to check
if the package artifacts are sane.

Originally, the CI was executing it in the root directory.
This was fine unless the source code is checked out.
When source code is checked out, performing `import torchaudio` in
root directory would import source torchaudio directory, instead of the
installed package.

This error is difficult to notice, so this commit introduces common script to
perform the smoke test, while moving out of root directory.

Pull Request resolved: https://github.com/pytorch/audio/pull/2365

Reviewed By: carolineechen

Differential Revision: D36202069

Pulled By: mthrok

fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf

6a8a28bb

05 May, 2022 2 commits

Run smoke tests on regular PRs (#2364) · 6beb4875

moto authored May 05, 2022

Summary:
Currently smoke tests are only executed on nightly jobs.
This is inconvenient as PRs that changes build process do not get
the signal naturally.

This commit changes it by always executing smoke tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2364

Reviewed By: atalman

Differential Revision: D36171267

Pulled By: mthrok

fbshipit-source-id: e549965ba139b5992177b7a094d87c9ef4432a7f

6beb4875

Fix windows smoke test (#2361) · 70d7d696

Andrey Talman authored May 05, 2022

Summary:
This PR fixes Windows Smoke tests

Tested via  circleci :
https://app.circleci.com/pipelines/github/pytorch/audio/10572/workflows/970fd791-25cc-4af4-8183-a7835e1891bf/jobs/637607

Pull Request resolved: https://github.com/pytorch/audio/pull/2361

Reviewed By: nateanl, mthrok

Differential Revision: D36167317

Pulled By: atalman

fbshipit-source-id: 1418ebffd74614cc1110dc032d16ee9502a7d571

70d7d696

28 Apr, 2022 2 commits

Add BUILD_MAD option and default to OFF (#2354) · a71e3a40

moto authored Apr 28, 2022

Summary:
libmad integration should be enabled only from source-build

Pull Request resolved: https://github.com/pytorch/audio/pull/2354

Reviewed By: nateanl

Differential Revision: D36012035

Pulled By: mthrok

fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7

a71e3a40

Fix audio win smoke test to use GPU hosts for CUDA builds (#2353) · 3cf7f264

Andrey Talman authored Apr 28, 2022

Summary:
Fix audio win smoke test to use GPU hosts for CUDA builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2353

Reviewed By: mthrok

Differential Revision: D36006928

Pulled By: atalman

fbshipit-source-id: a27c4cc34093810c8cc08e01188e09b474478001

3cf7f264

27 Apr, 2022 1 commit

Fix bug with unsqueezing length tensor in RNNTBeamSearch (#2344) · 90e4959d

Guo Liyong authored Apr 27, 2022

Summary:
This PR amends `RNNTBeamSearch`'s streaming decoding method to correctly unsqueeze `length` when its dimension is 0.

Original comment: Is "input.dim() == 0" unreachable as it could only be 2 or 3 in assertion of Line 329?

Pull Request resolved: https://github.com/pytorch/audio/pull/2344

Reviewed By: carolineechen, nateanl

Differential Revision: D35899740

Pulled By: hwangjeff

fbshipit-source-id: 84c1692b8cc9e5d35798d87f4a1bd052d94af9fb

90e4959d

26 Apr, 2022 5 commits

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

Add extra arguments to hubert pretrain factory functions (#2345) · 7c249d17

Zhaoheng Ni authored Apr 26, 2022

Summary:
In different pre-training and fine-tuning settings, the `mask_prob`, `mask_channel_prob`, and `mask_channel_length` are different. For example, the settings in [pre-training](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L70) and [fine-tuning](https://github.com/pytorch/fairseq/blob/main/examples/hubert/config/finetune/base_10h.yaml#L69-L73) are different. The motivation is to avoid overfitting when fine-tuning on a small dataset (example: [fine-tune on 10 minutes of audio](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/finetuning/vox_10m.yaml#L57-L59)).
This PR adds the required arguments in the factory functions to make them tunable for pre-training and fine-tuning. `mask_length` is set to `10` by default for all cases, hence it's not included in the factory function.

Pull Request resolved: https://github.com/pytorch/audio/pull/2345

Reviewed By: carolineechen, xiaohui-zhang

Differential Revision: D35845117

Pulled By: nateanl

fbshipit-source-id: 0cbb74d09535d189b8258aa8ee0f88779bdb77e7

7c249d17

Update wavernn.py (#2347) · 0986eebf

Bingcheng Hu authored Apr 26, 2022

Summary:
fix false shape

Pull Request resolved: https://github.com/pytorch/audio/pull/2347

Reviewed By: carolineechen

Differential Revision: D35921047

Pulled By: nateanl

fbshipit-source-id: 5b58820ee777920c68f13a15d80cd2bcc931af87

0986eebf

Fix LibriMix documentation (#2351) · 892d6d34

Zhaoheng Ni authored Apr 26, 2022

Summary:
The `LibriMix` dataset is missing on the [documentation webpage](https://pytorch.org/audio/stable/datasets.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2351

Reviewed By: carolineechen

Differential Revision: D35926695

Pulled By: nateanl

fbshipit-source-id: 168aed3bb15510d1b1ec57d77727932e481aca48

892d6d34

Fix for torchaudio windows tests (#2350) · 867cff5f

Andrey Talman authored Apr 25, 2022

Summary:
Fix for torchaudio windows tests
Following is an example of such test failing:
https://app.circleci.com/pipelines/github/pytorch/audio/9408/workflows/e6e5a05c-7080-4fdc-b478-2182aed5f234/jobs/531612

The following code is failing:
`conda install -v -y $(ls ~/workspace/torchaudio*.tar.bz2)`

This is because the install package is generated in the following directory:
`/workspace/conda-bld/win-64/`

Pull Request resolved: https://github.com/pytorch/audio/pull/2350

Reviewed By: mthrok

Differential Revision: D35912424

Pulled By: atalman

fbshipit-source-id: fc4f66ffca24061cc768a5f1010b448f065b9410

867cff5f

25 Apr, 2022 1 commit

Fix python 3.10 smoke tests (#2348) · d1f747fb

Andrey Talman authored Apr 25, 2022

Summary:
Fix python 3.10 smoke tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2348

Reviewed By: mthrok

Differential Revision: D35906343

Pulled By: atalman

fbshipit-source-id: 6dbb39e69c9751da4b86d5da38a6d11816d527c5

d1f747fb

22 Apr, 2022 3 commits

Cuda 11.5 remove since we introduced cuda 11.6 (#2346) · 48facbd4

Andrey Talman authored Apr 22, 2022

Summary:
Cuda 11.5 remove since we introduced cuda 11.6

Pull Request resolved: https://github.com/pytorch/audio/pull/2346

Reviewed By: mthrok

Differential Revision: D35856758

Pulled By: atalman

fbshipit-source-id: d3c0cf7639fd20f9ccc52c0738f247b8598f1ed7

48facbd4

[CircleCI] Update base images to ubuntu-2004 (#2343) · bf89e570

Andrey Talman authored Apr 22, 2022

Summary:
Same change as done in this vision [PR](https://github.com/pytorch/vision/pull/5802)

As Ubuntu-1604 runners will no longer be available in early May
Update ubuntu-1604-cuda-10.1:201909-23 to ubuntu-2004-cuda-11.4:202110-01
Per [CircleCI Configuration reference](https://circleci.com/docs/2.0/configuration-reference/)

Resolves https://github.com/pytorch/audio/issues/2279

Pull Request resolved: https://github.com/pytorch/audio/pull/2343

Reviewed By: mthrok

Differential Revision: D35844880

Pulled By: atalman

fbshipit-source-id: 318a9fa42455e55664f3da6ab67625cb969f72e6

bf89e570

Introduce DistributedBatchSampler (#2299) · 6411c9ad

Zhaoheng Ni authored Apr 22, 2022

Summary:
When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode.

The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`.

The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value.
Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__`` and the real length is the same.

Pull Request resolved: https://github.com/pytorch/audio/pull/2299

Reviewed By: hwangjeff

Differential Revision: D35781538

Pulled By: nateanl

fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a

6411c9ad

21 Apr, 2022 2 commits

CUDA 11.6 for TorchAudio (#2328) · 2acafdaf

Andrey Talman authored Apr 21, 2022

Summary:
CUDA 11.6 for TorchAudio

Pull Request resolved: https://github.com/pytorch/audio/pull/2328

Reviewed By: mthrok

Differential Revision: D35826414

Pulled By: atalman

fbshipit-source-id: 0a471f0566286d69c0c73191aea7fd5ac0647e5f

2acafdaf

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

19 Apr, 2022 1 commit

Introduce convolution-augmented Emformer layer prototype (#2324) · 9465b6bf

hwangjeff authored Apr 18, 2022

Summary:
Introduces prototype of convolution-augmented Emformer layer. At a high level, it incorporates Conformer's macaron feedforward network structure and convolution module with Emformer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2324

Reviewed By: mthrok

Differential Revision: D35734252

Pulled By: hwangjeff

fbshipit-source-id: c7ea0bdcfe53a948b00881a74f1f1e1928f5ac57

9465b6bf

18 Apr, 2022 1 commit

Add QUESST14 dataset (#2290) · aebcf6af

Caroline Chen authored Apr 18, 2022

Summary:
implementation adapted from [s3prl](https://github.com/s3prl/s3prl/blob/master/s3prl/downstream/quesst14_dtw/dataset.py)

modifying the s3prl downstream expert to [this](https://github.com/carolineechen/s3prl/commit/adc91a53d581a604f495f3795a865d84aa17f1a5) using this dataset implementation produces the same results as using the original s3prl pipeline

Pull Request resolved: https://github.com/pytorch/audio/pull/2290

Reviewed By: nateanl

Differential Revision: D35692551

Pulled By: carolineechen

fbshipit-source-id: 035ad161d4cbbd2072411cfdf89984b73a89868c

aebcf6af

15 Apr, 2022 1 commit

Disable clang-tidy modernize-use-trailing-return-type (#2337) · 86100e38

Moto Hira authored Apr 14, 2022

Summary:
Disable clang-tidy's `modernize-use-trailing-return-type` suggestion.

Trailing return type has no impact on performance.
The lint warning shows up everywhere, and it's nothing but noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2337

Reviewed By: hwangjeff

Differential Revision: D35635718

Pulled By: mthrok

fbshipit-source-id: beb2d3ec657f829493e08b2c159f215053b0e784

86100e38

14 Apr, 2022 3 commits

Support specifying decoder and its options (#2327) · be243c59

moto authored Apr 14, 2022

Summary:
This commit adds support to specify decoder to Streamer's add stream method.
This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options.

This allows to override the decoder codec and/or specify the option of
the decoder.

This change allows to specify Nvidia NVDEC codec for supported formats,
which uses dedicated hardware for decoding the video.

 ---

Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to  `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2327

Reviewed By: carolineechen

Differential Revision: D35626904

Pulled By: mthrok

fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede

be243c59

Support NV12 format in video decoding (#2330) · 7972be99

moto authored Apr 13, 2022

Summary:
Support NV12 format in Streamer API.

NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values.
https://chromium.googlesource.com/libyuv/libyuv/+/HEAD/docs/formats.md#nv12-and-nv21

The original UV plane is smaller than Y plane, so in this implmentation,
UV plane is upsampled to match the size of Y plane.

Pull Request resolved: https://github.com/pytorch/audio/pull/2330

Reviewed By: hwangjeff

Differential Revision: D35632351

Pulled By: mthrok

fbshipit-source-id: aab4fbc0ce2bb7a1fb67264c27208b610fb56e27

7972be99

Add YUV420P format support to Streamer API (#2334) · 2f70e2f9

moto authored Apr 13, 2022

Summary:
This commit adds YUV420P format support to Streamer API.
When the native format of a video is YUV420P, the Streamer will
output Tensor of YUV color channel.

Pull Request resolved: https://github.com/pytorch/audio/pull/2334

Reviewed By: hwangjeff

Differential Revision: D35632916

Pulled By: mthrok

fbshipit-source-id: a7a0078788433060266b8bd3e7cad023f41389f5

2f70e2f9

13 Apr, 2022 2 commits

Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b

hwangjeff authored Apr 13, 2022

Summary:
Adds Conformer RNN-T LibriSpeech training recipe to examples directory.

Produces 30M-parameter model that achieves the following WER:

|                     |          WER |
|:-------------------:|-------------:|
| test-clean          |       0.0310 |
| test-other          |       0.0805 |
| dev-clean           |       0.0314 |
| dev-other           |       0.0827 |

Pull Request resolved: https://github.com/pytorch/audio/pull/2329

Reviewed By: xiaohui-zhang

Differential Revision: D35578727

Pulled By: hwangjeff

fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d

c262758b

Add nightly build installation code snippet to prototype feature tutorials (#2325) · fb51cecc

hwangjeff authored Apr 12, 2022

Summary:
Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally.

Pull Request resolved: https://github.com/pytorch/audio/pull/2325

Reviewed By: xiaohui-zhang

Differential Revision: D35597753

Pulled By: hwangjeff

fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c

fb51cecc

12 Apr, 2022 1 commit

Add Conformer RNN-T model prototype (#2322) · b0c8e239

hwangjeff authored Apr 11, 2022

Summary:
Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
- Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
- Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
- Introduces tests for `conformer_rnnt_model`.
- Adds docs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2322

Reviewed By: xiaohui-zhang

Differential Revision: D35565987

Pulled By: hwangjeff

fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789

b0c8e239

11 Apr, 2022 1 commit

Fix ffmpeg integration for ffmpeg 5.0 (#2326) · bd319959

moto authored Apr 11, 2022

Summary:
This commit makes the FFmpeg integration support FFmpeg 5.0

In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed,
so that they deal with constant version of `AVInputFormat`.

> 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h
>  Constified the pointers to AVInputFormats and AVOutputFormats
>  in AVFormatContext, avformat_alloc_output_context2(),
>  av_find_input_format(), av_probe_input_format(),
>  av_probe_input_format2(), av_probe_input_format3(),
>  av_probe_input_buffer2(), av_probe_input_buffer(),
>  avformat_open_input(), av_guess_format() and av_guess_codec().
>  Furthermore, constified the AVProbeData in av_probe_input_format(),
>  av_probe_input_format2() and av_probe_input_format3().

https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260

Pull Request resolved: https://github.com/pytorch/audio/pull/2326

Reviewed By: carolineechen

Differential Revision: D35551380

Pulled By: mthrok

fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666

bd319959

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

06 Apr, 2022 2 commits

Support GroupNorm and re-ordering Convolution/MHA in Conformer (#2320) · eb23a242

Xiaohui Zhang authored Apr 06, 2022

Summary:
Add an option to use GroupNorm rather than BatchNorm1d, and another option to re-order Convolution/MHA modules in Conformer model.

Pull Request resolved: https://github.com/pytorch/audio/pull/2320

Reviewed By: hwangjeff

Differential Revision: D35422112

Pulled By: xiaohui-zhang

fbshipit-source-id: 360a8aaa37b883b0f656da2e4f654e86688ac270

eb23a242

Add an option to use Tanh instead of ReLU in RNNT joiner (#2319) · 16958d5b

Xiaohui Zhang authored Apr 06, 2022

Summary:
Add an option to use Tanh instead of ReLU in RNNT joiner, which enables better training performance sometimes.

 ---

Pull Request resolved: https://github.com/pytorch/audio/pull/2319

Reviewed By: hwangjeff

Differential Revision: D35422122

Pulled By: xiaohui-zhang

fbshipit-source-id: c6a0f8b25936e47081110af046b57d0e8751f9a2

16958d5b

05 Apr, 2022 2 commits

Disable multiprocessing when dumping features in hubert preprocessing (#2311) · f7afe29e

Zhaoheng Ni authored Apr 05, 2022

Summary:
The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2311

Reviewed By: mthrok

Differential Revision: D35393813

Pulled By: nateanl

fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11

f7afe29e

Raise error for resampling int waveform (#2318) · 11328d23

Caroline Chen authored Apr 05, 2022

Summary:
Resolves https://github.com/pytorch/audio/issues/2294

Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type.

Pull Request resolved: https://github.com/pytorch/audio/pull/2318

Reviewed By: mthrok

Differential Revision: D35379276

Pulled By: carolineechen

fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0

11328d23

04 Apr, 2022 2 commits

Use pretrained LM API for decoder example (#2317) · 66185e00

Caroline Chen authored Apr 04, 2022

Summary:
update example ASR pipeline to use the recently added pretrained LM API for decoding

Pull Request resolved: https://github.com/pytorch/audio/pull/2317

Reviewed By: mthrok

Differential Revision: D35361354

Pulled By: carolineechen

fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46

66185e00

Fix arguments in CTC decoding script (#2315) · 4a749e2d

Zhaoheng Ni authored Apr 04, 2022

Summary:
Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser.

Pull Request resolved: https://github.com/pytorch/audio/pull/2315

Reviewed By: carolineechen

Differential Revision: D35357678

Pulled By: nateanl

fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92

4a749e2d

01 Apr, 2022 5 commits

Fix loading checkpoint in hubert preprocessing (#2310) · 87f0d198

Zhaoheng Ni authored Apr 01, 2022

Summary:
When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default.

Pull Request resolved: https://github.com/pytorch/audio/pull/2310

Reviewed By: mthrok

Differential Revision: D35316903

Pulled By: nateanl

fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed

87f0d198

Update GNU config files to support `arm64-apple` system (#2307) · 3ed39e15

moto authored Apr 01, 2022

Summary:
This commit
1. Updates the config.guess and config.sub files and
2. applies them to all the third party libraries that use them.

This resolves the following build failure on M1 mac with newer SDK.

On MacBookPro with M1 chip, with the recent OS update, something
about the development environment has been changed (probably newer
version of XCode) and the build stopeed working with the following
errors from third party dependencies.

```
checking build system type... Invalid configuration ‘arm64-apple-darwin20.0.0': machine ‘arm64-apple' not recognized
```

note: config files are taken from https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2307

Reviewed By: nateanl

Differential Revision: D35318273

Pulled By: mthrok

fbshipit-source-id: 746ac51dd1816767aa78b88445f76a29acfd29e8

3ed39e15

Put CONDA_PREFIX second priority of ffmpeg search path (#2312) · 6a418a89

moto authored Apr 01, 2022

Summary:
Change the cmake logic to search CONDA_PREFIX before falling back
to the other default paths and system paths.

1. FFMPEG_ROOT
2. CONDA_PREFIX
3. Other locations (Package managers and system paths)

For users with regular conda installation, ffmpeg from conda should
be picked automatically.
If anyone wants to specify the ffmpeg, then can set FFMPEG_ROOT
variable to the location of desired installation.

Pull Request resolved: https://github.com/pytorch/audio/pull/2312

Reviewed By: hwangjeff

Differential Revision: D35317383

Pulled By: mthrok

fbshipit-source-id: 52aef8f3f7f0f8f1eaf7a89a2d1ccfb6265e2c50

6a418a89

Refactor the internal of transforms module (#2309) · 72f9a4e3

Moto Hira authored Apr 01, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2309

For upcoming improved Kaldi features which are comprised of multiple classes / functions, put all the transforms implementations in dedicated directory.

Reviewed By: nateanl

Differential Revision: D35303682

fbshipit-source-id: 5bc8c07ef639683008c0f76ffe56e3941f772659

72f9a4e3

Loosen atol for melscale batch test for Windows (#2305) · d65a0f3e

moto authored Mar 31, 2022

Summary:
The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows.

https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272

```
>       self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 28 / 196608 (0.0%)
E       Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed)
E       Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed)
```

The value of atol==1e-08 seems very strict but all the other batch
consistency tests are passing.

The violation is for very small number of samples, which looks
suspicious, but I think it is okay to reduce it to `1e-06` for Windows.

`1e-06` is still more strict than the majority of the comparison tests we have.

Pull Request resolved: https://github.com/pytorch/audio/pull/2305

Reviewed By: hwangjeff

Differential Revision: D35298056

Pulled By: mthrok

fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7

d65a0f3e

31 Mar, 2022 1 commit

Randomize initial phase of sinusoid data in test (#2301) · c6c6b689

moto authored Mar 31, 2022

Summary:
This commit update `get_sinusoid` function in test utility so that
when a multi channel is requested, non-primal channel have randomized
initial phase.

This adds some variety in test data which should not break the tests.
Currently `get_sinusoid` returns identical waveforms for all the channels.
This multi channel support was added just to mock the input data so that
it is easy to test features with multi-channel inputs, so tests should not be
expecting the all channels to be identical.

When working on numerical parity, it is more useful if the raw waveforms
are somewhat different.

Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel
<img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2301

Reviewed By: hwangjeff

Differential Revision: D35291689

Pulled By: mthrok

fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd

c6c6b689