Commits · da5c80bca55c7d45bae2fd57e7306667a347ef04 · OpenDAS / Torchaudio

31 Dec, 2021 2 commits

Drop support for python3.6 as per task T109096383. Item 2 on issue 2051. (#2119) · da5c80bc

Werner Chao authored Dec 31, 2021

Summary:
As per item 2 on [issue 2051](https://github.com/pytorch/audio/issues/2051), dropping support for python 3.6.
Removed 3.6 from test matrix and ran `.circleci/regenerate.py `.

Pull Request resolved: https://github.com/pytorch/audio/pull/2119

Reviewed By: mthrok

Differential Revision: D33379542

Pulled By: wernerchao

fbshipit-source-id: 6d0fb51b18c2fa7c8cf4eeee4a7f19c4a5210fac

da5c80bc

Update CTC Hypothesis docs (#2117) · 64c7e065

Caroline Chen authored Dec 30, 2021

Summary:
add documentaion for CTC decoder `Hypothesis` and include it in docs

Pull Request resolved: https://github.com/pytorch/audio/pull/2117

Reviewed By: mthrok

Differential Revision: D33370381

Pulled By: carolineechen

fbshipit-source-id: cf6501a499e5303cda0410f733f0fab4e1c39aff

64c7e065

30 Dec, 2021 8 commits

Build ffmpeg-features in Linux/macOS unittests (#2114) · 9f14fa63

moto authored Dec 30, 2021

Summary:
Preparation to land Python front-end of ffmpeg-related features.

- Set BUILD_FFMPEG=1 in Linux/macOS unit test jobs
- Install ffmpeg and pkg-config from conda-forge
- Add note about Windows build process
- Temporarily avoid `av_err2str`

Pull Request resolved: https://github.com/pytorch/audio/pull/2114

Reviewed By: hwangjeff

Differential Revision: D33371346

Pulled By: mthrok

fbshipit-source-id: b0e16a35959a49a2166109068f3e0cbbb836e888

9f14fa63

Add note about ffmpeg code (#2115) · 524d5540

moto authored Dec 30, 2021

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2115

Reviewed By: carolineechen

Differential Revision: D33370700

Pulled By: mthrok

fbshipit-source-id: 591b67870247f69cc542f649cd62444ee3c45934

524d5540

Enforce lint checks and fix/mute lint errors (#2116) · 8ed14782

Joao Gomes authored Dec 30, 2021

Summary:
cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/2116

Reviewed By: mthrok

Differential Revision: D33368453

Pulled By: jdsgomes

fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c

8ed14782

Clean up Emformer module (#2091) · 4c8fd760

hwangjeff authored Dec 30, 2021

Summary:
* Removes redundant declaration `right_context_blocks = []`, as flagged by kobenaxie.
* Adds random seed to tests, as flagged by carolineechen in other PRs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2091

Reviewed By: mthrok

Differential Revision: D33340964

Pulled By: hwangjeff

fbshipit-source-id: a9de43e28d1bae7bd4806b280717b0d822bb42fc

4c8fd760

Add a switch to build ffmpeg binding (#2048) · ece03edc

moto authored Dec 30, 2021

Summary:
This PR adds `BUILD_FFMPEG` switch to torchaudio build process so that features related to ffmpeg are built.
The flag is false by default, so no CI jobs or development flow are affected.

This is because handling the dependencies around ffmpeg is a bit tricky.
Currently, the CMake file uses `pkg-config` to find an ffmpeg installation in the system.
This works fine for both conda-based installation and system-managed installation (like `apt`).

In subsequent PRs, I will find a solution that works for local development and binary distributions.

Pull Request resolved: https://github.com/pytorch/audio/pull/2048

Reviewed By: hwangjeff, nateanl

Differential Revision: D33367260

Pulled By: mthrok

fbshipit-source-id: 94517acecb62bd6d4e96d4b7cbc3ab3c2a25706c

ece03edc

Update and fill the rest of ffmpeg-integration C++ code (#2113) · 9cb75e74

moto authored Dec 30, 2021

Summary:
- Introduce AudioBuffer and VideoBuffer for different way of handling frames
- Update the way option dictionary is passed
- Remove unused AutoFrameUnref
- Add SrcStreamInfo/OutputStreamInfo classes

Pull Request resolved: https://github.com/pytorch/audio/pull/2113

Reviewed By: nateanl

Differential Revision: D33356144

Pulled By: mthrok

fbshipit-source-id: e837e84fae48baa7befd5c70599bcd2cbb61514d

9cb75e74

[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · fd3c9573
CodemodService Bot authored Dec 30, 2021
```
Reviewed By: zertosh

Differential Revision: D33361077

fbshipit-source-id: 007db010bd38c28f597ea66f68f97b13309e878c
```
fd3c9573

Add ffmpeg prototype bindings (#2047) · 3de7892e

moto authored Dec 29, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review.

Add `Streamer` TorchBind.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after https://github.com/pytorch/audio/issues/2046.

Pull Request resolved: https://github.com/pytorch/audio/pull/2047

Reviewed By: hwangjeff

Differential Revision: D33355190

Pulled By: mthrok

fbshipit-source-id: a3ad4c2822ed3a7ddc19b1aaca9dddabd59ce2f8

3de7892e

29 Dec, 2021 9 commits

Add parameter p to TimeMasking (#2090) · 1ec7ff73

hwangjeff authored Dec 29, 2021

Summary:
Adds parameter `p` to `TimeMasking` to allow for enforcing an upper bound on the proportion of time steps that it can mask. This behavior is consistent with the specifications provided in the SpecAugment paper (https://arxiv.org/abs/1904.08779).

Pull Request resolved: https://github.com/pytorch/audio/pull/2090

Reviewed By: carolineechen

Differential Revision: D33344772

Pulled By: hwangjeff

fbshipit-source-id: 6ff65f5304e489fa1c23e15c3d96b9946229fdcf

1ec7ff73

Allow token list as CTC decoder input (#2112) · 896ade04

Caroline Chen authored Dec 29, 2021

Summary:
Additionally accept list of tokens as CTC decoder input. This makes it possible to directly pass in something like `bundles.get_labels()` into the decoder factory function instead of requiring a separate tokens file.

Pull Request resolved: https://github.com/pytorch/audio/pull/2112

Reviewed By: hwangjeff, nateanl, mthrok

Differential Revision: D33352909

Pulled By: carolineechen

fbshipit-source-id: 6d22072e34f6cd7c6f931ce4eaf294ae4cf0c5cc

896ade04

Add Streamer class (#2046) · bb528d7e

moto authored Dec 29, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review.

Add `Streamer` class that bundles `StreamProcessor` and handle input.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after https://github.com/pytorch/audio/issues/2045.

Pull Request resolved: https://github.com/pytorch/audio/pull/2046

Reviewed By: carolineechen

Differential Revision: D33299863

Pulled By: mthrok

fbshipit-source-id: 6470cbe061057c8cb970ce7bb5692be04efb5fe9

bb528d7e

Reorganize RNN-T components in prototype module (#2110) · 67cdf882

hwangjeff authored Dec 29, 2021

Summary:
Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`.

Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2110

Reviewed By: carolineechen, mthrok

Differential Revision: D33354116

Pulled By: hwangjeff

fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb

67cdf882

Add StreamProcessor class (#2045) · 572cd2e2

moto authored Dec 29, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review.

Add StreamProcessor class that bundles `Buffer`, `FilterGraph` and `Decoder`.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after https://github.com/pytorch/audio/issues/2044.

Pull Request resolved: https://github.com/pytorch/audio/pull/2045

Reviewed By: carolineechen

Differential Revision: D33299858

Pulled By: mthrok

fbshipit-source-id: d85bececed475f45622743f137dd59cb1390ceed

572cd2e2

Add Sink class (#2111) · 5cc4765a

moto authored Dec 29, 2021

Summary:
Add Sink class that bundles FilterGraph and Buffer. Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review.

For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.

Pull Request resolved: https://github.com/pytorch/audio/pull/2111

Reviewed By: carolineechen

Differential Revision: D33350388

Pulled By: mthrok

fbshipit-source-id: 8f42c5fe4be39ef2432c51fc0d0ac72ba3f06a26

5cc4765a

[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 697f92f1
CodemodService Bot authored Dec 29, 2021
```
Reviewed By: zertosh

Differential Revision: D33347867

fbshipit-source-id: 7672f65392e363c0359de2d86e745782a09cf9dc
```
697f92f1

Update prototype documentations (#2108) · 10cce198

moto authored Dec 28, 2021

Summary:
### Change list

* Split the documentation of prototypes
* Add a new API reference section dedicated for prototypes.
* Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder
* Hide the signature of RNNT constructor. (cc hwangjeff )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT
* Tweak CTC tutorial
  * Replace hyperlinks to API reference with backlinks
  * Add `progress=False` to download

### Follow-up

RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous.
I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough.

### Before

https://pytorch.org/audio/main/prototype.html

<img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png">

### After

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html

<img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html

<img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html

<img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2108

Reviewed By: hwangjeff, carolineechen, nateanl

Differential Revision: D33340816

Pulled By: mthrok

fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187

10cce198

Add pretrained Emformer RNN-T streaming ASR inference pipeline (#2093) · 72a98a86

hwangjeff authored Dec 28, 2021

Summary:
Adds pretrained Emformer RNN-T inference pipeline that's capable of performing streaming and non-streaming ASR.

Includes demo script that uses pipeline to alternately perform streaming and non-streaming ASR on LibriSpeech test samples (video below).

https://user-images.githubusercontent.com/8345689/147590753-d5126557-d575-4551-8dfe-5977276cb4ad.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2093

Reviewed By: mthrok

Differential Revision: D33340776

Pulled By: hwangjeff

fbshipit-source-id: fbb3b1d471b4e9f1b93fa9dea9c464154537a8ac

72a98a86

28 Dec, 2021 6 commits

Remove the MVDR tutorial in examples (#2109) · 340ec891

Zhaoheng Ni authored Dec 28, 2021

Summary:
Remove it as it's already introduced in the [gallery](https://github.com/pytorch/audio/blob/main/examples/tutorials/mvdr_tutorial.py).

Pull Request resolved: https://github.com/pytorch/audio/pull/2109

Reviewed By: carolineechen

Differential Revision: D33341574

Pulled By: nateanl

fbshipit-source-id: e5c1c8537063b9453947dc3ecafa70e9b6c74146

340ec891

Add ASR CTC inference tutorial (#2106) · 133d0065

Caroline Chen authored Dec 28, 2021

Summary:
demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model

rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html

follow-ups:
- incorporate `nbest`
- demonstrate customizability of different beam search parameters

Pull Request resolved: https://github.com/pytorch/audio/pull/2106

Reviewed By: mthrok

Differential Revision: D33340946

Pulled By: carolineechen

fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7

133d0065

Add HuBERT pretrain model to enable training from scratch (#2064) · 37a2555f

Zhaoheng Ni authored Dec 28, 2021

Summary:
- Add three factory functions:`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`, to enable the HuBERT model to train from scratch.
- Add `num_classes` argument to `hubert_pretrain_base` factory function because the base model has two iterations of training, the first iteration the `num_cluster` is 100, in the second iteration `num_cluster` is 500.
- The model takes `waveforms`, `labels`, and `lengths` as inputs
- The model generates the last layer of transformer embedding, `logit_m`, `logit_u` as the outputs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2064

Reviewed By: hwangjeff, mthrok

Differential Revision: D33338587

Pulled By: nateanl

fbshipit-source-id: 534bc17c576c5f344043d8ba098204b8da6e630a

37a2555f

Disable matplotlib warning in tutorial rendering (#2107) · 7bf04d1e

moto authored Dec 28, 2021

Summary:
*Before:*

https://pytorch.org/audio/main/tutorials/audio_data_augmentation_tutorial.html#effects-applied

<img width="831" alt="Screen Shot 2021-12-28 at 11 25 08 AM" src="https://user-images.githubusercontent.com/855818/147586457-55d566bf-f016-4327-a07e-5de68f80e984.png">

*After:*

https://484994-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html#effects-applied

<img width="830" alt="Screen Shot 2021-12-28 at 11 25 57 AM" src="https://user-images.githubusercontent.com/855818/147586531-90333201-b9e3-450f-a2d7-6fb987b7e9d9.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2107

Reviewed By: carolineechen

Differential Revision: D33337164

Pulled By: mthrok

fbshipit-source-id: 20e3309f0d11d46619f516dc46d967b34f22ec95

7bf04d1e

Add Sphinx gallery automatically (#2101) · eb8e8dc8

moto authored Dec 28, 2021

Summary:
This commit updates the documentation configuration so that if an API (function or class) is used in tutorials, then it automatically add the links to the tutorials.

It also adds `py:func:` so that it's easy to jump from tutorials to API reference.

Note: the use of `py:func:` is not required to be recognized by Shpinx-gallery.

* https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#feature-extractions

<img width="776" alt="Screen Shot 2021-12-24 at 12 41 43 PM" src="https://user-images.githubusercontent.com/855818/147367407-cd86f114-7177-426a-b5ee-a25af17ae476.png">

* https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#mvdr

<img width="769" alt="Screen Shot 2021-12-24 at 12 42 31 PM" src="https://user-images.githubusercontent.com/855818/147367422-01fd245f-2f25-4875-a206-910e17ae0161.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2101

Reviewed By: hwangjeff

Differential Revision: D33311283

Pulled By: mthrok

fbshipit-source-id: e0c124d2a761e0f8d81c3d14c4ffc836ffffe288

eb8e8dc8

Show lint diff with color (#2102) · a4bc8a86

moto authored Dec 28, 2021

Summary:
*Before*
<img width="1094" alt="Screen Shot 2021-12-24 at 12 34 14 PM" src="https://user-images.githubusercontent.com/855818/147367213-b1e539c1-6e06-4e9b-aaf4-0458c502379b.png">

*After*

https://app.circleci.com/pipelines/github/pytorch/audio/8870/workflows/0445f1ac-ad48-412f-8045-2400d0cef4f4/jobs/482060

<img width="1096" alt="Screen Shot 2021-12-24 at 12 33 32 PM" src="https://user-images.githubusercontent.com/855818/147367210-a9b759bb-f992-4dc1-9359-0ec3912b3070.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2102

Reviewed By: carolineechen

Differential Revision: D33311253

Pulled By: mthrok

fbshipit-source-id: 6944921a8be58a2062b66a7dfd2c7ffe8c0866c3

a4bc8a86

24 Dec, 2021 2 commits

[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 9397cd5b
CodemodService Bot authored Dec 24, 2021
```
Reviewed By: zertosh

Differential Revision: D33307283

fbshipit-source-id: 55a95689b8c20b17b7c882070bc3e24706c44444
```
9397cd5b

Add Buffer class (#2044) · c6de2a19

moto authored Dec 23, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review.

Add Buffer class that is responsible for converting `AVFrame` to `Tensor`.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after https://github.com/pytorch/audio/issues/2043.

Pull Request resolved: https://github.com/pytorch/audio/pull/2044

Reviewed By: carolineechen

Differential Revision: D32940553

Pulled By: mthrok

fbshipit-source-id: 8b8b2222ad7b47edc17e9139420e8a71c00d726a

c6de2a19

23 Dec, 2021 6 commits

Add FilterGraph class (#2043) · cd52d008

moto authored Dec 23, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review.

Add FilterGraph class that is responsible for handling AVFilterGraph structure and the application of filters.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after https://github.com/pytorch/audio/issues/2042.

Pull Request resolved: https://github.com/pytorch/audio/pull/2043

Reviewed By: carolineechen

Differential Revision: D32940535

Pulled By: mthrok

fbshipit-source-id: 231e3ad17df2d67b6c7b323e5c89e718a3d48d0d

cd52d008

Add Python CTC decoder API (#2089) · a76b0066

Caroline Chen authored Dec 23, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review

This PR adds Python decoder API and basic README

Pull Request resolved: https://github.com/pytorch/audio/pull/2089

Reviewed By: mthrok

Differential Revision: D33299818

Pulled By: carolineechen

fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc

a76b0066

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

Fix third party archive fetch job (#2095) · 0e5913d5

moto authored Dec 23, 2021

Summary:
Follow-up of https://github.com/pytorch/audio/issues/2086

The CI job to download the third party code and cache daily has not been properly updated.

Pull Request resolved: https://github.com/pytorch/audio/pull/2095

Reviewed By: hwangjeff

Differential Revision: D33291738

Pulled By: mthrok

fbshipit-source-id: 6fc61f76b35c6f032085eda9d6053eefd2a1e0a9

0e5913d5

Accomodate internal dependency change (#2094) · 6d3fe991

Moto Hira authored Dec 22, 2021

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2094

Reviewed By: nateanl

Differential Revision: D33288439

fbshipit-source-id: 385e0e4257755dbaf143287f612e19bede189757

6d3fe991

Introduce Conformer (#2068) · 1b17b011

hwangjeff authored Dec 22, 2021

Summary:
Adds implementation of Conformer module.

Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770.

Pull Request resolved: https://github.com/pytorch/audio/pull/2068

Reviewed By: mthrok

Differential Revision: D33236957

Pulled By: hwangjeff

fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6

1b17b011

22 Dec, 2021 2 commits

Deprecating data utils (#2073) · b18e583e

Joao Gomes authored Dec 22, 2021

Summary:
- Deprecates data utils (with warning that will be removed in v0.12)
- replaces all usages of `torchaudio.datasets.utils.download_url` with `torch.hub.download_url_to_file`
- replaces all MD5 hashes with SHA256 hash

#Addresses https://github.com/pytorch/audio/issues/1883

Pull Request resolved: https://github.com/pytorch/audio/pull/2073

Reviewed By: mthrok

Differential Revision: D33241756

Pulled By: jdsgomes

fbshipit-source-id: 49388ec5965bfc91d9a1d8d0786eeafb2969f6cf

b18e583e

Revert linting exemptions introduced in #2071 (#2087) · 575d221e

Joao Gomes authored Dec 22, 2021

Summary:
After discussing with Moto Hira, we decided to revert linting exemptions
introduced previously in order to keep the entire audio project as formatted
as possible, to reduce the time we spend on formatting discussion.

Pull Request resolved: https://github.com/pytorch/audio/pull/2087

Reviewed By: mthrok

Differential Revision: D33236949

Pulled By: jdsgomes

fbshipit-source-id: e13079f532c4534d8a168059b0ded6fa375fdecf

575d221e

21 Dec, 2021 3 commits

Clean up CTC decoder bynding code (#2092) · 4c2edd21

Moto Hira authored Dec 21, 2021

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2092

Reviewed By: carolineechen

Differential Revision: D33169110

fbshipit-source-id: e422ad93efe50b91f1ac5d572dc82768c1000c05

4c2edd21

Update audio augmentation tutorial (#2082) · 3a03d8c0

moto authored Dec 20, 2021

Summary:
1. Reorder Audio display so that audios are playable from browser in doc
2. Add link to function documentations

https://470342-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2082

Reviewed By: carolineechen

Differential Revision: D33227725

Pulled By: mthrok

fbshipit-source-id: c7ee360b6f9b84c8e0a9b72193b98487d03b57ab

3a03d8c0

Fix load behavior for 24-bit input (#2084) · 4554d242

moto authored Dec 20, 2021

Summary:
## bug description

When a 24 bits-par-sample audio is loaded via file-like object,
the loaded Tensor is wrong. It was fine if the audio is loaded
from local file.

## The cause of the bug

The core of the sox's decoding mechanism is `sox_read` function,
one of which parameter is the maximum number of samples to decode
from the given buffer.

https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]

The `sox_read` function is called in what is called `drain` effect,
callback and this callback receives output buffer and its size in
byte. The previous implementation passed this size value as
the argument of `sox_read` for the maximum number of samples to
read. Since buffer size is larger than the number of samples fit in
the buffer, `sox_read` function always consumed the entire
buffer. (This behavior is not wrong except when the input is
24 bits-per-sample and file-like object.)

When the input is read from file-like object, inside of drain
callback, new data are fetched via Python's `read` method and
loaded on fixed-size memory region. The size of this memory region
can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
but the default value is 8096.

If the input format is 24 bits-per-sample, the end of memory region
does not necessarily correspond to the end of a valid sample.
When `sox_read` consumes all the data in the buffer region, the data
at the end introduces some unexpected values.
This causes the aforementioned bug

## Fix

Pass proper (better estimated) maximum number of samples decodable to
`sox_read`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2084

Reviewed By: carolineechen

Differential Revision: D33236947

Pulled By: mthrok

fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4

4554d242

20 Dec, 2021 2 commits

Standardize the location of third-party source code (#2086) · 2476dd2d

moto authored Dec 20, 2021

Summary:
Previously sox-related third-party source code was archived at
`third_party/sox/archives`.
Recently KenLM-related third-party source code was added and
they are archived at `third_party/archives`.

This PR changes the sox archive location to `third_party/archives`,
so that all the archvies are cached at the same location.

Pull Request resolved: https://github.com/pytorch/audio/pull/2086

Reviewed By: carolineechen

Differential Revision: D33236927

Pulled By: mthrok

fbshipit-source-id: 2f2aa5f4b386fefb46d7c98f7179c04995219f3c

2476dd2d

Update URLs for libritts (#2074) · f3f23e42

Joao Gomes authored Dec 20, 2021

Summary:
The urls for this dataset seem to have changed so I am updating to the new location

Pull Request resolved: https://github.com/pytorch/audio/pull/2074

Reviewed By: mthrok

Differential Revision: D33234996

Pulled By: jdsgomes

fbshipit-source-id: e09c35a122e8227fcce7fa97aeeeea312cb89173

f3f23e42