Commits · 3430fd6849ce9a80a2dd5b72fbaf38357a4a7060 · OpenDAS / Torchaudio

06 Sep, 2022 3 commits

Fix random Gaussian generation (#2639) · 3430fd68

Ravi Makhija authored Sep 06, 2022

Summary:
This PR is meant to address the bug raised in issue https://github.com/pytorch/audio/issues/2634.

In particular, previously the Box Muller transform was used to generate Gaussian variates for dithering based on `torch.rand` uniform variates, but it was incorrectly implemented (e.g. the same uniform variate was used as input to the transform, rather than two different uniform variates), which led to a different (non-Gaussian) distribution. This PR instead uses `torch.randn` to generate the Gaussian variates.

Pull Request resolved: https://github.com/pytorch/audio/pull/2639

Reviewed By: mthrok

Differential Revision: D39101144

Pulled By: carolineechen

fbshipit-source-id: 691e49679f6598ef0a1675f6f4ee721ef32215fd

3430fd68

Add metadata function for LibriSpeech (#2653) · 08d3bb17

Caroline Chen authored Sep 06, 2022

Summary:
Adding support for metadata mode, requested in https://github.com/pytorch/audio/issues/2539, by adding a public `get_metadata()` function in the dataset. This function can be used directly by users to fetch metadata for individual dataset indices, or users can subclass the dataset and override `__getitem__` with `get_metadata` to create a dataset class that directly handles metadata mode.

Pull Request resolved: https://github.com/pytorch/audio/pull/2653

Reviewed By: nateanl, mthrok

Differential Revision: D39105114

Pulled By: carolineechen

fbshipit-source-id: 6f26f1402a053dffcfcc5d859f87271ed5923348

08d3bb17

Remove obsolete examples (#2655) · 4a20c412

Peter Albert authored Sep 06, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2655

Removed obsolete example and the corresponding test

Reviewed By: mthrok

Differential Revision: D39260253

fbshipit-source-id: 0bde71ffd75dd0c94a5cc4a9940f4648a5d61bd7

4a20c412

02 Sep, 2022 1 commit

Add CUDA HW encoding support to StreamWriter (#2505) · 95eada24

moto authored Sep 01, 2022

Summary:
This commits add CUDA hardware encoding to StreamWriter.
For certain video formats, it can encode video directly from
CUDA Tensor, without needing to move the data to host CPU.

Pull Request resolved: https://github.com/pytorch/audio/pull/2505

Reviewed By: hwangjeff

Differential Revision: D37446830

Pulled By: mthrok

fbshipit-source-id: eee6424f01a99a3b611dcad45ed58f86cba4672a

95eada24

01 Sep, 2022 1 commit

Add file-like object support to StreamWriter (#2648) · 28da8b84

moto authored Aug 31, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

28da8b84

26 Aug, 2022 3 commits

add CUDA 11.7 builds (#2623) · 76fca37a

pbialecki authored Aug 26, 2022

Summary:
CC atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2623

Reviewed By: hwangjeff, nateanl

Differential Revision: D39036432

Pulled By: atalman

fbshipit-source-id: cd74a1bf8f74e31bd2c32c80d32c617f4b1766e8

76fca37a

[Nova] Use pkg-helpers to modularize GHA Linux Conda Builds (#2650) · 32389c63

Omkar Salpekar authored Aug 25, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2650

Reviewed By: mehtanirav

Differential Revision: D39040559

Pulled By: osalpekar

fbshipit-source-id: df39e23d7c246728793aab969b8dc1070af88d75

32389c63

Replace bg_iterator in examples (#2645) · 8682b644

Caroline Chen authored Aug 25, 2022

Summary:
`bg_iterator` was deprecated in 0.11 because it was known to have issues (deadlock) without speed up. Remove instances of `bg_iterator` used in torchaudio examples.

Resolves https://github.com/pytorch/audio/issues/2642

Pull Request resolved: https://github.com/pytorch/audio/pull/2645

Reviewed By: nateanl

Differential Revision: D38954292

Pulled By: carolineechen

fbshipit-source-id: 2333ab5228c2b8511ff532057543aaf9d02b2789

8682b644

25 Aug, 2022 1 commit

[Nova] Build Linux Conda Binaries using reusable workflow (#2626) · c7e0595b

Omkar Salpekar authored Aug 25, 2022

Summary:
Calling the reusable workflow introduced in https://github.com/pytorch/test-infra/pull/546 to build conda binaries on linux.

Pull Request resolved: https://github.com/pytorch/audio/pull/2626

Reviewed By: mehtanirav

Differential Revision: D39028057

Pulled By: osalpekar

fbshipit-source-id: d74ea3771967d0ee2b0ad28a8f811a95145b2183

c7e0595b

24 Aug, 2022 1 commit

Add StreamWriter (#2628) · 72404de9

moto authored Aug 24, 2022

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

72404de9

23 Aug, 2022 2 commits

Added example for LFCC transform (#2640) · 068fc29c

Ravi Makhija authored Aug 23, 2022

Summary:
Added example for LFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2640

Reviewed By: carolineechen

Differential Revision: D38908975

Pulled By: nateanl

fbshipit-source-id: ffdd994390db7f27556b011a8050a65eef9cd09d

068fc29c

[Nova] Added draft calling GHA workflow for building linux wheels (#2548) · c0815850

Omkar Salpekar authored Aug 23, 2022

Summary:
As part of Project Nova, we are consolidating CI/CD workflows and infra, making them reusable across PyTorch ecosystem libraries. https://github.com/pytorch/test-infra/pull/460 introduces a general-purpose reusable workflow to build linux wheels for python libraries. This PR introduces a caller workflow that triggers the reusable workflow. Details around modular env setup, passing input args across workflows, etc. are still being worked out.

Using reusable workflow defined in https://github.com/pytorch/test-infra/pull/506

Pull Request resolved: https://github.com/pytorch/audio/pull/2548

Reviewed By: osalpekar

Differential Revision: D38947733

Pulled By: mehtanirav

fbshipit-source-id: 03ab88cef973a092f5c5d1ff8c74ec7ae7e46d01

c0815850

22 Aug, 2022 2 commits

Update Sphinx-gallery to 0.11.1 (#2638) · 2a8108eb

moto authored Aug 22, 2022

Summary:
The minor release fixes some gallery issue, which allows to remove
some of the customization we had in https://github.com/pytorch/audio/issues/2629

https://output.circle-artifacts.com/output/job/553a9b98-8260-4cb4-a681-20ef97d2c33e/artifacts/0/docs/pipelines.html#torchaudio.pipelines.Wav2Vec2ASRBundle

Pull Request resolved: https://github.com/pytorch/audio/pull/2638

Reviewed By: carolineechen, nateanl

Differential Revision: D38909097

Pulled By: mthrok

fbshipit-source-id: 78346d93b54fca2a19b28991c224324ef53221c9

2a8108eb

Added example for Loudness transform (#2641) · 0c94d6ef

Ravi Makhija authored Aug 22, 2022

Summary:
Added example for Loudness transform (implemented in PR https://github.com/pytorch/audio/issues/2472) as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2641

Reviewed By: nateanl

Differential Revision: D38907782

Pulled By: carolineechen

fbshipit-source-id: fd2bcc4bac3095a626ea9cf36cb70cb2bf003d63

0c94d6ef

20 Aug, 2022 1 commit

Added example for MFCC transform (#2637) · 533f8748

Ravi Makhija authored Aug 19, 2022

Summary:
Added example for MFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Note: Python formatter package `black` uses double quotes for the string dict keys (e.g. in `melkwargs` for this example). Please let me know if there is a different linter/format/convention that is preferred!

Pull Request resolved: https://github.com/pytorch/audio/pull/2637

Reviewed By: carolineechen

Differential Revision: D38873729

Pulled By: nateanl

fbshipit-source-id: 2e8fe2930671e7c5d02c0c37cf1ca5cc8c5079e3

533f8748

19 Aug, 2022 2 commits

Refactor sox pybind source code (#2636) · 789adf07

Moto Hira authored Aug 19, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2636

At the early stage of torchaudio extension module,
`torchaudio/csrc/pybind` directory was created so that
all the code defining Python interface would be placed
there and there will be only one extension module called
`torchaudio._torchaudio`.

However, the codebase has been evolved in a way separate
extensions are defined for each feature (third party
dependency) for the sake of more moduler file organization.

What is left in `csrc/pybind` is libsox Python bindings.
This commit moves it under `csrc/sox`.

Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.

Reviewed By: carolineechen

Differential Revision: D38829253

fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d

789adf07

Update README.md (#2633) · 0b7f2fba

moto authored Aug 18, 2022

Summary:
Update compatibility matrix

Pull Request resolved: https://github.com/pytorch/audio/pull/2633

Reviewed By: nateanl

Differential Revision: D38827670

Pulled By: mthrok

fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100

0b7f2fba

18 Aug, 2022 6 commits

Update ASR inference tutorial (#2631) · 189edb1b

moto authored Aug 18, 2022

Summary:
* Use download_asset
* Remove notes around nightly
* Print versions first
* Remove duplicated import

Pull Request resolved: https://github.com/pytorch/audio/pull/2631

Reviewed By: carolineechen

Differential Revision: D38830395

Pulled By: mthrok

fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6

189edb1b

Added example for InverseMelScale transform (#2635) · 129a7c1b

Ravi Makhija authored Aug 18, 2022

Summary:
Added example for InverseMelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2635

Reviewed By: carolineechen

Differential Revision: D38830318

Pulled By: nateanl

fbshipit-source-id: fd26a700d495f6755db0767625aa8577cb89bd83

129a7c1b

Update notes around nightly build and third parties (#2632) · 55ce80b1

moto authored Aug 18, 2022

Summary:
Google Colab now has torchaudio 0.12 pre-installed.
This commit removes the note about nightly build.

Pull Request resolved: https://github.com/pytorch/audio/pull/2632

Reviewed By: carolineechen

Differential Revision: D38827632

Pulled By: mthrok

fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb

55ce80b1

Tweak tutorials (#2630) · cab2bb44

moto authored Aug 18, 2022

Summary:
Resolves the following warnings

```
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
/torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2630

Reviewed By: nateanl

Differential Revision: D38816632

Pulled By: mthrok

fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635

cab2bb44

Fix Sphinx-gallery display and pin sphinx-related packages (#2629) · 265c09d8

moto authored Aug 17, 2022

Summary:
This commit fixes the issue with the recent Sphinx-Gallery update.
Also it pins the versions of Sphinx-related packages.

Before:

<img width="256" alt="Screen Shot 2022-08-17 at 10 02 23 PM" src="https://user-images.githubusercontent.com/855818/185140952-28f2d98a-b586-424c-a003-b69089f48eb9.png">

After:

https://user-images.githubusercontent.com/855818/185271889-bd4f86a0-986b-43bb-8121-bd77750d74f0.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2629

Reviewed By: carolineechen

Differential Revision: D38816417

Pulled By: mthrok

fbshipit-source-id: 11ee3f9121d9a302772ee1f461dacae52eb28852

265c09d8

Fix doc warning (#2627) · 39d24d9d

moto authored Aug 17, 2022

Summary:
Resolves the following warning

```
/torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short.

:hidden:`Loudness`
-----------------
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2627

Reviewed By: carolineechen

Differential Revision: D38814802

Pulled By: mthrok

fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749

39d24d9d

16 Aug, 2022 4 commits

Use double quotes for string in functional and transforms (#2618) · 7ac3e2e2

Zhaoheng Ni authored Aug 16, 2022

Summary:
To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2618

Reviewed By: carolineechen

Differential Revision: D38744137

Pulled By: nateanl

fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd

7ac3e2e2

Added example for AmplitudeToDB transform (#2615) · 05545791

Ravi Makhija authored Aug 16, 2022

Summary:
Added example for AmplitudeToDB transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2615

Reviewed By: carolineechen

Differential Revision: D38743117

Pulled By: nateanl

fbshipit-source-id: bf0f760299f4777a4bca65da86359faa00b16207

05545791

Added example for MelScale transform (#2616) · 3742cebb

Ravi Makhija authored Aug 16, 2022

Summary:
Added example for MelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2616

Reviewed By: carolineechen

Differential Revision: D38743145

Pulled By: nateanl

fbshipit-source-id: e24ca92f5317a0ea5a141418bf084b12cfb22486

3742cebb

Move xcode to 14 from 12.5 (#2622) · 9efefff1

Andrey Talman authored Aug 16, 2022

Summary:
Similar to https://github.com/pytorch/vision/pull/6218
Fixing MacOS builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2622

Reviewed By: weiwangmeta

Differential Revision: D38722983

Pulled By: atalman

fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85

9efefff1

15 Aug, 2022 3 commits

Fix anaconda upload (#2621) · 556a8dcd

Andrey Talman authored Aug 15, 2022

Summary:
Same as:
https://github.com/pytorch/vision/pull/6422

Testing:
```
export ANACONDA_PATH=$(conda info --base)/bin
echo $ANACONDA_PATH
/opt/homebrew/Caskroom/miniconda/base/bin
$ANACONDA_PATH/anaconda -V
anaconda Command line client (version 1.10.0)
```
Failure: https://github.com/pytorch/audio/runs/7837085749?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/audio/pull/2621

Reviewed By: weiwangmeta, seemethere

Differential Revision: D38714324

Pulled By: atalman

fbshipit-source-id: 55342cf69006e9250403c955202846bab4516f3e

556a8dcd

Update doc version selector link (#2605) · b475dc3d

moto authored Aug 15, 2022

Summary:
The link to version selector has been absolute link, which had been
a trap when reviewing gh-pages deployment from folk.

This commit changes that to relative link.

Pull Request resolved: https://github.com/pytorch/audio/pull/2605

Test Plan:
- https://mthrok.github.io/audio/main/index.html -> click version selector -> https://mthrok.github.io/audio/versions.html
- https://mthrok.github.io/audio/0.12.1/index.html -> click version selector -> https://pytorch.org/audio/versions.html

Reviewed By: carolineechen, nateanl

Differential Revision: D38695645

Pulled By: mthrok

fbshipit-source-id: 91132ac19b8c61f39d304a162435b9c6599ef2b2

b475dc3d

Remove outdated doc (#2617) · aa591c0d

Zhaoheng Ni authored Aug 15, 2022

Summary:
`ctc_decoder` has become beta, remove it from prototype documents.

Pull Request resolved: https://github.com/pytorch/audio/pull/2617

Reviewed By: hwangjeff

Differential Revision: D38706869

Pulled By: nateanl

fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02

aa591c0d

12 Aug, 2022 1 commit

Introducing pytorch-cuda metapackage (#2612) · 776cf099

Andrey Talman authored Aug 12, 2022

Summary:
Introducing pytorch-cuda metapackage

Same as: https://github.com/pytorch/vision/pull/6371
Following PR: https://github.com/pytorch/builder/pull/1094
Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge.

Pull Request resolved: https://github.com/pytorch/audio/pull/2612

Reviewed By: hwangjeff, seemethere, nateanl

Differential Revision: D38633332

Pulled By: atalman

fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501

776cf099

11 Aug, 2022 1 commit

Add additive noise function (#2608) · f3bb30b8

hwangjeff authored Aug 11, 2022

Summary:
Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2608

Reviewed By: nateanl

Differential Revision: D38557141

Pulled By: hwangjeff

fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0

f3bb30b8

10 Aug, 2022 3 commits

Fix bug in Conformer RNN-T recipe (#2611) · cd4d6607

hwangjeff authored Aug 10, 2022

Summary:
https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script.

Pull Request resolved: https://github.com/pytorch/audio/pull/2611

Reviewed By: carolineechen

Differential Revision: D38578892

Pulled By: hwangjeff

fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac

cd4d6607

Fixed argument validation in TorchAudio filtering (#2609) · fb4eb981

Kunal Upadya authored Aug 10, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2609

Converted argument validations in torchaudio/functional/filtering from assert based validation to the preferred if-then raise validation. Added specific error messages in all cases.

Reviewed By: mthrok

Differential Revision: D38515029

fbshipit-source-id: 6c644a042f86c6feb2bbe8bd02fdb484fe27fae9

fb4eb981

Fix dataset docs parsing issue with extra spaces (#2607) · 733ca909

Sean Kim authored Aug 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2607

Reviewed By: carolineechen, nateanl

Differential Revision: D38522606

Pulled By: skim0514

fbshipit-source-id: 2c38b8dcb343bcf624bfda1bfa2afd91abf2e668

733ca909

09 Aug, 2022 1 commit

Add NNLM support to CTC Decoder (#2528) · 03a0d68e

Caroline Chen authored Aug 09, 2022

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

03a0d68e

08 Aug, 2022 1 commit

Fix stylecheck (#2606) · c15eee23

Caroline Chen authored Aug 08, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2606

Reviewed By: nateanl

Differential Revision: D38502666

Pulled By: carolineechen

fbshipit-source-id: 1e279996fff3621835a07882c63328856fe38f3a

c15eee23

05 Aug, 2022 3 commits

Add convolution operator (#2602) · b396157d

hwangjeff authored Aug 05, 2022

Summary:
Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT.

Pull Request resolved: https://github.com/pytorch/audio/pull/2602

Reviewed By: nateanl, mthrok

Differential Revision: D38450771

Pulled By: hwangjeff

fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b

b396157d

Add note for lexicon free decoder output (#2603) · 33485b8c

Caroline Chen authored Aug 05, 2022

Summary:
``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.

Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc

Pull Request resolved: https://github.com/pytorch/audio/pull/2603

Reviewed By: mthrok

Differential Revision: D38459709

Pulled By: carolineechen

fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934

33485b8c

Added example for SlidingWindowCmn transform (#2600) · 50bba1df

Ravi Makhija authored Aug 05, 2022

Summary:
Added example for `SlidingWindowCmn` transform as mentioned in issue https://github.com/pytorch/audio/issues/1564

Pull Request resolved: https://github.com/pytorch/audio/pull/2600

Reviewed By: mthrok

Differential Revision: D38395579

Pulled By: carolineechen

fbshipit-source-id: 44c5b7181789eedcaaa1d80149d5a1ab8de4c0ba

50bba1df