Commits · 9b93e7df8c53631ef964cfff118772f4f9fa17bd · OpenDAS / Torchaudio

29 Apr, 2023 1 commit

Add tutorial for TorchAudio-SQUIM pipelines (#3279) · 9b93e7df

Zhaoheng Ni authored Apr 29, 2023

Summary:
The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS).

Pull Request resolved: https://github.com/pytorch/audio/pull/3279

Reviewed By: hwangjeff

Differential Revision: D45415404

Pulled By: nateanl

fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903

9b93e7df

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

18 Apr, 2023 1 commit

Add multi-channel DNN beamforming training recipe (#3036) · 94f5027e

nateanl authored Apr 18, 2023

Summary:
The PR adds the training recipe of DNN beamforming for multi-channel speech enhancement.

Pull Request resolved: https://github.com/pytorch/audio/pull/3036

Reviewed By: hwangjeff

Differential Revision: D45061841

Pulled By: nateanl

fbshipit-source-id: 48ede5dd579efe200669dbc83e9cb4dea809e4b4

94f5027e

31 Mar, 2023 1 commit

Fix typo in forced alignment tutorial (#3222) · fda41bbf

Nouran Ali authored Mar 31, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3222

Reviewed By: nateanl

Differential Revision: D44539424

Pulled By: mthrok

fbshipit-source-id: 8fbcb5f9918c9930c939bcd448493fa5cf604545

fda41bbf

29 Mar, 2023 1 commit

Remove the note about AAC (#3214) · c07a96ab

moto authored Mar 29, 2023

Summary:
There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it.

Pull Request resolved: https://github.com/pytorch/audio/pull/3214

Reviewed By: nateanl

Differential Revision: D44504030

Pulled By: mthrok

fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef

c07a96ab

28 Mar, 2023 1 commit

Fix typo in audio resampling tutorial (#3212) · 0cd4e391

nateanl authored Mar 28, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3211

Pull Request resolved: https://github.com/pytorch/audio/pull/3212

Reviewed By: mthrok

Differential Revision: D44472523

Pulled By: nateanl

fbshipit-source-id: eb519b0045e7518ad13863a53271745a80d89a21

0cd4e391

16 Mar, 2023 1 commit

Fix initialization of `get_trellis`. (#3172) · a6b34a5d

jiyuntu-eero authored Mar 16, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3172

Reviewed By: mthrok

Differential Revision: D44090889

Pulled By: nateanl

fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264

a6b34a5d

07 Mar, 2023 1 commit

Fix Adam and AdamW initializers in wav2letter example (#3145) · cea12eaf

Maciej Torhan authored Mar 06, 2023

Summary:
In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer.

Pull Request resolved: https://github.com/pytorch/audio/pull/3145

Reviewed By: mthrok

Differential Revision: D43847713

Pulled By: nateanl

fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6

cea12eaf

02 Mar, 2023 1 commit

Fix doc build (#3125) · 1ed38095

moto authored Mar 01, 2023

Summary:
Fix build_doc job

https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8

- build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL.
- Fix bash cell syntax in HW tutorial
- Fix C++ doc
- Fix duplicated target name in streamwriter tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/3125

Reviewed By: xiaohui-zhang

Differential Revision: D43724078

Pulled By: mthrok

fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c

1ed38095

24 Feb, 2023 2 commits

Add Wav2Vec2DataModule in self_supervised_learning training recipe (#3081) · fd778091

Vladislav Agafonov authored Feb 24, 2023

Summary:
Add `Wav2Vec2DataModule` in self_supervised_learning training recipe to support Wav2Vec2 pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/3081

Reviewed By: mthrok

Differential Revision: D43579239

Pulled By: nateanl

fbshipit-source-id: 3e935eb9a18ef0259a58940ae466cbdc3baf8494

fd778091

Add wav2vec2 loss function in self_supervised_learning training recipe (#3090) · c532f35c

Vladislav Agafonov authored Feb 24, 2023

Summary:
Add wav2vec2 loss function in the self_supervised_learning training recipe to support Wav2Vec2 pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/3090

Reviewed By: mthrok

Differential Revision: D43579220

Pulled By: nateanl

fbshipit-source-id: 4b52792b518ddc5b01c9660c90ceb3c4ad1f0237

c532f35c

23 Feb, 2023 1 commit

Add TCPGen context-biasing Conformer RNN-T (#2890) · 1ed330b5

G. Sun authored Feb 23, 2023

Summary:
This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing.

An example for Librispeech can be found in audio/examples/asr/librispeech_biasing.

Maintainer's note (mthrok):
It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple
could cause some issue without running the code, so the code is not changed, though the annotation uses tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2890

Reviewed By: nateanl

Differential Revision: D43171447

Pulled By: mthrok

fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e

1ed330b5

16 Feb, 2023 2 commits

Fix DDP training in HuBERT recipes (#3068) · 2c9b3e59

Zhaoheng Ni authored Feb 16, 2023

Summary:
The `BucketizeBatchSampler` may return different iter_list in different node if `shuffle` is `True`, which will cause DPP training hang forever.
`shuffle` in `DistributedSampler` only happens in initialization, which means it will assign the same subset to replicas in all training epochs. The PR fixes the two above issues.

cc arlofaria

Pull Request resolved: https://github.com/pytorch/audio/pull/3068

Reviewed By: mthrok

Differential Revision: D43372110

Pulled By: nateanl

fbshipit-source-id: a162728406ae995e05d2a07cfc2444fb76cf345e

2c9b3e59

Update WER results for CTC n-gram decoding (#3070) · 11bdafc3

Zhaoheng Ni authored Feb 16, 2023

Summary:
In https://github.com/pytorch/audio/issues/2873, layer normalization is applied to waveforms for SSL models trained on large scale datasets. The word error rate is significantly reduced after the change. The PR updates the results for the affected models.

Without the change in https://github.com/pytorch/audio/issues/2873, here is the WER result table:
|                                                                                            Model | dev-clean | dev-other | test-clean | test-other |
|:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|
| [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) |        10.59|        15.62|        9.58|        16.33|
| [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) |        2.80|        6.01|        2.82|        6.34|
| [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) |        2.36|        4.43|        2.41|        4.96|
| [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) |        1.85|        3.46|        2.09|        3.89|
| [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) |         2.21|        3.40|        2.26|        4.05|

After applying layer normalization, here is the updated result:
|                                                                                            Model | dev-clean | dev-other | test-clean | test-other |
|:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|
| [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) |        6.77|        10.03|        6.87|        10.51|
| [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) |        2.19|        4.55|        2.32|        4.64|
| [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) |        1.78|        3.51|        2.03|        3.68|
| [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) |        1.77|        3.32|        2.03|        3.68|
| [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) |         1.73|        2.72|        1.90|        3.16|

Pull Request resolved: https://github.com/pytorch/audio/pull/3070

Reviewed By: mthrok

Differential Revision: D43365313

Pulled By: nateanl

fbshipit-source-id: 34a60ad2e5eb1299da64ef88ff0208ec8ec76e91

11bdafc3

15 Feb, 2023 1 commit

Update data augmentation tutorial to use new operators (#3062) · b9ef69d1

hwangjeff authored Feb 15, 2023

Summary:
Updates tutorial "Audio Data Augmentation" to use two of the newly introduced data augmentation operators in beta: `torchaudio.functional.fftconvolve` and `torchaudio.functional.add_noise`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3062

Reviewed By: mthrok

Differential Revision: D43298120

Pulled By: hwangjeff

fbshipit-source-id: 09ca736a5c67242568515d600b7d31eab32c2df1

b9ef69d1

14 Feb, 2023 1 commit

Update ssl example (#3060) · ff01be0f

Zhaoheng Ni authored Feb 14, 2023

Summary:
- Rename the current `ssl` example to `self_supervised_learning`
- Add README to demonstrate how to run the recipe with hubert task

Pull Request resolved: https://github.com/pytorch/audio/pull/3060

Reviewed By: mthrok

Differential Revision: D43287868

Pulled By: nateanl

fbshipit-source-id: 10352682485ef147ca32f4c4c9f9cde995444aa0

ff01be0f

30 Jan, 2023 1 commit

Fix hybrid demucs tutorial for CUDA (#3017) · da9d1627

Yan Li authored Jan 30, 2023

Summary:
Currently there will be a few errors when this tutorial is run with a CUDA device.

The reasons being:
- The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU).
- When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`).

Pull Request resolved: https://github.com/pytorch/audio/pull/3017

Reviewed By: mthrok

Differential Revision: D42828526

Pulled By: nateanl

fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762

da9d1627

19 Jan, 2023 2 commits

Add modularized SSL training recipe (#2876) · 2eaefe27

Zhaoheng Ni authored Jan 19, 2023

Summary:
TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model.

Pull Request resolved: https://github.com/pytorch/audio/pull/2876

Reviewed By: hwangjeff

Differential Revision: D42617414

Pulled By: nateanl

fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792

2eaefe27

Simplify train step in Conformer RNN-T LibriSpeech recipe (#2981) · c6a52355

hwangjeff authored Jan 19, 2023

Summary:
In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead.

Pull Request resolved: https://github.com/pytorch/audio/pull/2981

Reviewed By: mthrok

Differential Revision: D42507228

Pulled By: hwangjeff

fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309

c6a52355

17 Jan, 2023 1 commit

Fix mel spectrogram visualization in TTS tutorial (#2989) · b983c665

Zhaoheng Ni authored Jan 16, 2023

Summary:
The mel spectrograms in the TTS tutorial are upside down. The PR fixes it by using `origin="lower"` in imshow.

Pull Request resolved: https://github.com/pytorch/audio/pull/2989

Reviewed By: mthrok

Differential Revision: D42538349

Pulled By: nateanl

fbshipit-source-id: 4388103a49bdfabf1705c1f979d44ecedd5c910a

b983c665

16 Jan, 2023 1 commit

Fixes examples/source_separation for WSJ0_2mix dataset (#2987) · f9d38796

Robin Scheibler authored Jan 16, 2023

Summary:
The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following.

1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset
2. Corrects `args.data_dir` to `args.root_dir` in eval.py
3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`)

Pull Request resolved: https://github.com/pytorch/audio/pull/2987

Reviewed By: xiaohui-zhang

Differential Revision: D42536992

Pulled By: nateanl

fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76

f9d38796

13 Jan, 2023 1 commit

Add mel spectrogram visualization to Streaming ASR tutorial (#2974) · 55575a53

moto authored Jan 12, 2023

Summary:
Per the suggestion by nateanl, adding the visualization of feature fed to ASR.

<img width="688" alt="Screen Shot 2023-01-12 at 8 19 59 PM" src="https://user-images.githubusercontent.com/855818/212215190-23be7553-4c04-40d9-944e-3ee2ff69c49b.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2974

Reviewed By: nateanl

Differential Revision: D42484088

Pulled By: mthrok

fbshipit-source-id: 2c839492869416554eac04aa06cd12078db21bd7

55575a53

30 Dec, 2022 1 commit

Add subtractive synthesis tutorial (#2934) · 9f57951a

moto authored Dec 29, 2022

Summary:
Artifact: [subtractive_synthesis_tutorial](https://output.circle-artifacts.com/output/job/4c1ce33f-834d-48e0-ba89-2e91acdcb572/artifacts/0/docs/tutorials/subtractive_synthesis_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2934

Reviewed By: carolineechen

Differential Revision: D42284945

Pulled By: mthrok

fbshipit-source-id: d255b8e8e2a601a19bc879f9e1c38edbeebaf9b3

9f57951a

17 Dec, 2022 1 commit

Add filter design tutorial (#2894) · 9c4f71a6

moto authored Dec 16, 2022

Summary:
Adds filter design tutorial, which demonstrates `sinc_impulse_response` and `frequency_impulse_response`.

Example:
 - [filter_design_tutorial](https://output.circle-artifacts.com/output/job/bd22c615-9215-4b17-a52c-b171a47f646c/artifacts/0/docs/tutorials/filter_design_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2894

Reviewed By: xiaohui-zhang

Differential Revision: D42117658

Pulled By: mthrok

fbshipit-source-id: f7dd04980e8557bb6f0e0ec26ac2c7f53314ea16

9c4f71a6

16 Dec, 2022 1 commit

Rename resampling_method options (#2922) · e6bebe6a

Caroline Chen authored Dec 16, 2022

Summary:
resolves https://github.com/pytorch/audio/issues/2891

Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.

Pull Request resolved: https://github.com/pytorch/audio/pull/2922

Reviewed By: mthrok

Differential Revision: D42083619

Pulled By: carolineechen

fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70

e6bebe6a

29 Nov, 2022 1 commit

Add additive synthesis tutorial (#2877) · 1a003c3f

moto authored Nov 29, 2022

Summary:
This commit adds the tutorial for additive synthesis, using torchaudio's prototype DSP ops.

[Review here](https://output.circle-artifacts.com/output/job/3dc83322-832a-4272-9c13-df752c97b660/artifacts/0/docs/tutorials/additive_synthesis_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2877

Reviewed By: carolineechen

Differential Revision: D41585425

Pulled By: mthrok

fbshipit-source-id: b81283b90e4779c8054fd030a1d8c3d39d676bbd

1a003c3f

28 Nov, 2022 1 commit

Add oscillator tutorial (#2862) · 52e89756

moto authored Nov 28, 2022

Summary:
This commits add tutorial for oscillator_bank and adsr_envelope, which will be a basis for DDSP.

 - [Review here](https://output.circle-artifacts.com/output/job/cf1d3001-88e5-418b-8cf8-ae22b4445dba/artifacts/0/docs/tutorials/oscillator_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2862

Reviewed By: carolineechen

Differential Revision: D41559503

Pulled By: mthrok

fbshipit-source-id: 3f1689186db7d246de14f228fc2f91bf37db98cd

52e89756

17 Nov, 2022 1 commit

fix import bug in global_stats.py (#2858) · d912dcd7

vasiliy authored Nov 17, 2022

Summary:
This code was added by
https://github.com/pytorch/audio/commit/4d0095a528412cfec2a549204fc01d9ebb15df7a

Seems that the original code had a typo?

Pull Request resolved: https://github.com/pytorch/audio/pull/2858

Test Plan:
```
// the import of `mustc` now succeeds, previously crashed
python examples/asr/emformer_rnnt/global_stats.py --model-type librispeech --dataset-path /home/vasiliy/local/librispeech/
```

Reviewed By: carolineechen

Differential Revision: D41355663

Pulled By: nateanl

fbshipit-source-id: 92507e529d41b984b9dd400ad24a55d130372b7d

d912dcd7

16 Nov, 2022 2 commits

Enable mixed precision training for hubert_pretrain_model (#2854) · e062110b

Zhaoheng Ni authored Nov 16, 2022

Summary:
address https://github.com/pytorch/audio/issues/2847

In mixed precision training, the dtype of `mask_embedding` is **not** converted to fp16 automatically. This PR addresses the issue by changing the dtype of `mask_embedding` to `x` to enable mixed precision training.

Pull Request resolved: https://github.com/pytorch/audio/pull/2854

Reviewed By: carolineechen

Differential Revision: D41343486

Pulled By: nateanl

fbshipit-source-id: 4a5cbb429ff8ba5d3c439a3d5acb5094f66bf705

e062110b

Fix hubert fine-tuning recipe (#2851) · 40ff642e

Zhaoheng Ni authored Nov 16, 2022

Summary:
- `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths.
- model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`.
- The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module.

cc simpleoier

Pull Request resolved: https://github.com/pytorch/audio/pull/2851

Reviewed By: carolineechen

Differential Revision: D41327998

Pulled By: nateanl

fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d

40ff642e

17 Oct, 2022 1 commit

Update resampling tutorial (#2773) · 8f187354

moto authored Oct 17, 2022

Summary:
* Refactor benchmark script
* Rename `time` variable to avoid (potential) conflicting with time module
* Fix `beta` parameter in benchmark (it was not used previously)
* Use `timeit` module for benchmark
* Add plot
* Move the comment on result at the end
* Add link to an explanation of aliasing

https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2773

Reviewed By: carolineechen

Differential Revision: D40421337

Pulled By: mthrok

fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a

8f187354

14 Oct, 2022 2 commits

Fix leaking matplotlib figure (#2771) · 5239583e

moto authored Oct 14, 2022

Summary:
In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command.

It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html

<img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png">

This commit fixes it by closing the figure.

Pull Request resolved: https://github.com/pytorch/audio/pull/2771

Reviewed By: nateanl

Differential Revision: D40382076

Pulled By: mthrok

fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a

5239583e

Fix fading in hybrid demucs tutorial (#2769) · 000d7526

nateanl authored Oct 13, 2022

Summary:
The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:

![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png)

In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded.

Pull Request resolved: https://github.com/pytorch/audio/pull/2769

Reviewed By: carolineechen

Differential Revision: D40358382

Pulled By: nateanl

fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e

000d7526

13 Oct, 2022 2 commits

Add custom lm example to decoder tutorial (#2762) · 3a5a83d9

Caroline Chen authored Oct 13, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762

Reviewed By: mthrok

Differential Revision: D40332603

Pulled By: carolineechen

fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251

3a5a83d9

Update tutorial author information (#2764) · fb82ac0b

moto authored Oct 13, 2022

Summary:
Adding and updating author information.

Pull Request resolved: https://github.com/pytorch/audio/pull/2764

Reviewed By: carolineechen

Differential Revision: D40332427

Pulled By: mthrok

fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a

fb82ac0b

12 Oct, 2022 2 commits

Fix typos in tacotron2 tutorial (#2761) · 7aabcbd4

Nikita Shulga authored Oct 12, 2022

Summary:
`publishe`->`published`

Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`

Pull Request resolved: https://github.com/pytorch/audio/pull/2761

Reviewed By: carolineechen

Differential Revision: D40313042

Pulled By: malfet

fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b

7aabcbd4

Improve hubert recipe for pre-training and fine-tuning (#2744) · 27433050

Zhaoheng Ni authored Oct 12, 2022

Summary:
following pr https://github.com/pytorch/audio/issues/2716
- For preprocessing
  - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.

- For pre-training
  - Normalize the loss based on the total number of masked frames across all GPUs.
  - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
  - Log accuracies of masked/unmasked frames during training.
  - Clip the gradients with norm `10.0`.

- For ASR fine-tuning
  - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
  - Use mixed precision training.
  - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.

- Update the WER results on LibriSpeech dev and test sets.

|                   | WER% (Viterbi)|  WER% (KenLM) |
|:-----------------:|--------------:|--------------:|
| dev-clean         |       10.9    |       4.2     |
| dev-other         |       17.5    |       9.4     |
| test-clean        |       10.9    |       4.4     |
| test-other        |       17.8    |       9.5     |

Pull Request resolved: https://github.com/pytorch/audio/pull/2744

Reviewed By: carolineechen

Differential Revision: D40282322

Pulled By: nateanl

fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90

27433050

07 Oct, 2022 1 commit

Fix sphinx gallery list in io doc (#2736) · 1a18c41d

moto authored Oct 07, 2022

Summary:
Specifying multiple object in `:minigallery:` directive shows duplicated tutorials.

This commit fixes it by listing tutorials based on module used.

https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html

Before:
<img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png">

After:

<img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2736

Reviewed By: carolineechen

Differential Revision: D40160247

Pulled By: carolineechen

fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477

1a18c41d

06 Oct, 2022 1 commit

Add StreamWriter tutorial (#2698) · 0c5a8bf7

moto authored Oct 06, 2022

Summary:
Add a tutorial for basic usage of torchaudio.io.StreamWriter.

https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2698

Reviewed By: carolineechen

Differential Revision: D40133007

Pulled By: carolineechen

fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623

0c5a8bf7

05 Oct, 2022 1 commit

Tweak tutorials (#2733) · b076abd1

moto authored Oct 04, 2022

Summary:
* Port downstream change https://github.com/pytorch/tutorials/pull/2060
* Fix inter-tutorial links and references

Pull Request resolved: https://github.com/pytorch/audio/pull/2733

Reviewed By: hwangjeff

Differential Revision: D40086902

Pulled By: hwangjeff

fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8

b076abd1