Commits · df6556049da332d978799a7f67d60c4f21484133 · OpenDAS / Torchaudio

18 Jul, 2023 1 commit

Extract NVDEC tutorial from the current notebook (#3478) · 63244623

moto authored Jul 17, 2023

Summary:
Now that GPU video decoders are available in doc CI, we run the tutorials with GPU decoders.

Pull Request resolved: https://github.com/pytorch/audio/pull/3478

Differential Revision: D47519672

Pulled By: mthrok

fbshipit-source-id: 2f95243100e9c75e17c2b4d306da164f0e31f8f2

63244623

15 Jul, 2023 1 commit

Update notes on FFmpeg version (#3480) · 5a809aa0

moto authored Jul 15, 2023

Summary:
The nightly builds support FFmpeg version 4, 5 and 6.

Pull Request resolved: https://github.com/pytorch/audio/pull/3480

Differential Revision: D47482841

Pulled By: mthrok

fbshipit-source-id: 88267f5e83ddc7b1e866b35e57a87b985e2c78c9

5a809aa0

05 Jul, 2023 1 commit

Update forced_align method to only support batch Tensors (#3433) · cc164478

Zhaoheng Ni authored Jul 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3433

Current design of forced_align accept 2D Tensor for `log_probs` and 1D Tensor for `targets`. To make the API simple, the PR make changes to only support batch Tensors (3D Tensor for `log_probs` and 2D Tensor for `targets`).

Reviewed By: mthrok

Differential Revision: D46657526

fbshipit-source-id: af17ec3f92f1a2c46dba91c6db2488a11de36f89

cc164478

28 Jun, 2023 1 commit

Follow up on tutorial update (#3449) · 4a121aa5

moto authored Jun 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3449

Differential Revision: D47094402

Pulled By: mthrok

fbshipit-source-id: 43e6994604f0e6c06a5f19c5e8599e2ce12ae622

4a121aa5

26 Jun, 2023 1 commit

Add more explanation about `n_fft` (#3442) · 105b77fe

moto authored Jun 26, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3442

Differential Revision: D46797481

Pulled By: mthrok

fbshipit-source-id: 3513037cbb8f2edb70fdab0fec5c7c554a697abe

105b77fe

21 Jun, 2023 1 commit

Split the CTC forced aligment API tutorial into two tutorials (#3443) · 627c37a9

Xiaohui Zhang authored Jun 20, 2023

Summary:
Splitting the multilingual example part into another tutorial.

Pull Request resolved: https://github.com/pytorch/audio/pull/3443

Reviewed By: mthrok

Differential Revision: D46802844

Pulled By: xiaohui-zhang

fbshipit-source-id: a7093053cac8b79d650d4f665db7fde2d8254998

627c37a9

15 Jun, 2023 1 commit

Update forced alignment tutorial (#3440) · 18601691

moto authored Jun 15, 2023

Summary:
* Fix backtrack visualization (the cooridnate was off-by-one.)
* Add note about the simplification and the new align API
* Explicitly handle SOS and EOS

Pull Request resolved: https://github.com/pytorch/audio/pull/3440

Reviewed By: xiaohui-zhang

Differential Revision: D46761282

Pulled By: mthrok

fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8

18601691

07 Jun, 2023 1 commit

Fix style to prep #3414 (#3415) · 47716772

moto authored Jun 07, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3415

Differential Revision: D46526437

Pulled By: mthrok

fbshipit-source-id: f78d19c19d7e68f67712412de35d9ed50f47263b

47716772

02 Jun, 2023 2 commits

[BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5

moto authored Jun 02, 2023

Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion https://github.com/pytorch/audio/issues/1269

Pull Request resolved: https://github.com/pytorch/audio/pull/3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e

5bbbb1d5

Update data augmentation tutorial (#3375) · 2ba36b47

moto authored Jun 02, 2023

Summary:
Replace sox_effects with `torchaudio.io.AudioEffector`

1. To show case the new and better feature
2. To prepare for the upcoming removal of file-like support object

Pull Request resolved: https://github.com/pytorch/audio/pull/3375

Reviewed By: nateanl

Differential Revision: D46379016

Pulled By: mthrok

fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315

2ba36b47

31 May, 2023 1 commit

Fixes to #3295 Improve RNN-T streaming decoding (#3379) · b8016e44

Jeff Hwang authored May 30, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3379

Fixes `RNNTBeamSearch.infer`'s docstring and removes unused import from tutorial.

Reviewed By: mthrok

Differential Revision: D46227174

fbshipit-source-id: 7c1c3f05a6476cb0437622dea6f3ae6cb3ea9468

b8016e44

26 May, 2023 2 commits

Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9

atalman authored May 26, 2023

Summary:
This reverts commit d38a7854.

This is temporary revert to unblock unit test migration from circleci to github

Pull Request resolved: https://github.com/pytorch/audio/pull/3377

Reviewed By: mthrok

Differential Revision: D46230498

Pulled By: atalman

fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d

37779ef9

Improve RNN-T streaming decoding (#3295) · 9fc0dcaa

Lakshmi Krishnan authored May 26, 2023

Summary:
This commit fixes the following issues affecting streaming decoding quality
1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
3. Some minor errors regarding shape checking for length.

This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.

Pull Request resolved: https://github.com/pytorch/audio/pull/3295

Reviewed By: nateanl

Differential Revision: D46216113

Pulled By: hwangjeff

fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0

9fc0dcaa

23 May, 2023 1 commit

[audio] add CTC forced alignment API tutorial to torchaudio (#3356) · f046f7e3

Xiaohui Zhang authored May 22, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3356

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: mthrok

Differential Revision: D46060238

fbshipit-source-id: d90e7db5669a58d1e9ef5c2ec3c6d175b4e394ec

f046f7e3

21 May, 2023 2 commits

Revert D45960556: add CTC forced alignment API tutorial to torchaudio · f9b4f74f

Moto Hira authored May 20, 2023

Differential Revision:
D45960556

Original commit changeset: 93f2271f7130

Original Phabricator Diff: D45960556

fbshipit-source-id: d22883fbcf9c5f2bb5d49274bcc194bdffaca72a

f9b4f74f

add CTC forced alignment API tutorial to torchaudio (#3351) · 93adc3e4

Xiaohui Zhang authored May 20, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3351

move the forced aligner tutorial to torchaudio, with some formatting changes

Reviewed By: vineelpratap, nateanl

Differential Revision: D45960556

fbshipit-source-id: 93f2271f71307404e6a7732385cf7d646dc8ceaa

93adc3e4

16 May, 2023 1 commit

Upgrade to FFmpeg5 (#3298) · d38a7854

moto authored May 16, 2023

Summary:
This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4.

FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5.
Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg

Pull Request resolved: https://github.com/pytorch/audio/pull/3298

Reviewed By: hwangjeff

Differential Revision: D45865599

Pulled By: mthrok

fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b

d38a7854

10 May, 2023 2 commits

Add AudioEffector tutorial (#3226) · 2ab49e5b

moto authored May 09, 2023

Summary:
https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/3226

Reviewed By: nateanl

Differential Revision: D45402724

Pulled By: mthrok

fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262

2ab49e5b

Update `torchaudio` doc and tutorial (#3285) · 667c6a9e

moto authored May 09, 2023

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

667c6a9e

05 May, 2023 1 commit

Update squim tutorial (#3313) · 05ef7dc6

Zhaoheng Ni authored May 05, 2023

Summary:
Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.

Pull Request resolved: https://github.com/pytorch/audio/pull/3313

Reviewed By: hwangjeff

Differential Revision: D45620311

Pulled By: nateanl

fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557

05ef7dc6

29 Apr, 2023 1 commit

Add tutorial for TorchAudio-SQUIM pipelines (#3279) · 9b93e7df

Zhaoheng Ni authored Apr 29, 2023

Summary:
The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS).

Pull Request resolved: https://github.com/pytorch/audio/pull/3279

Reviewed By: hwangjeff

Differential Revision: D45415404

Pulled By: nateanl

fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903

9b93e7df

31 Mar, 2023 1 commit

Fix typo in forced alignment tutorial (#3222) · fda41bbf

Nouran Ali authored Mar 31, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3222

Reviewed By: nateanl

Differential Revision: D44539424

Pulled By: mthrok

fbshipit-source-id: 8fbcb5f9918c9930c939bcd448493fa5cf604545

fda41bbf

29 Mar, 2023 1 commit

Remove the note about AAC (#3214) · c07a96ab

moto authored Mar 29, 2023

Summary:
There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it.

Pull Request resolved: https://github.com/pytorch/audio/pull/3214

Reviewed By: nateanl

Differential Revision: D44504030

Pulled By: mthrok

fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef

c07a96ab

28 Mar, 2023 1 commit

Fix typo in audio resampling tutorial (#3212) · 0cd4e391

nateanl authored Mar 28, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3211

Pull Request resolved: https://github.com/pytorch/audio/pull/3212

Reviewed By: mthrok

Differential Revision: D44472523

Pulled By: nateanl

fbshipit-source-id: eb519b0045e7518ad13863a53271745a80d89a21

0cd4e391

16 Mar, 2023 1 commit

Fix initialization of `get_trellis`. (#3172) · a6b34a5d

jiyuntu-eero authored Mar 16, 2023

Summary:
Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3172

Reviewed By: mthrok

Differential Revision: D44090889

Pulled By: nateanl

fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264

a6b34a5d

02 Mar, 2023 1 commit

Fix doc build (#3125) · 1ed38095

moto authored Mar 01, 2023

Summary:
Fix build_doc job

https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8

- build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL.
- Fix bash cell syntax in HW tutorial
- Fix C++ doc
- Fix duplicated target name in streamwriter tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/3125

Reviewed By: xiaohui-zhang

Differential Revision: D43724078

Pulled By: mthrok

fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c

1ed38095

15 Feb, 2023 1 commit

Update data augmentation tutorial to use new operators (#3062) · b9ef69d1

hwangjeff authored Feb 15, 2023

Summary:
Updates tutorial "Audio Data Augmentation" to use two of the newly introduced data augmentation operators in beta: `torchaudio.functional.fftconvolve` and `torchaudio.functional.add_noise`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3062

Reviewed By: mthrok

Differential Revision: D43298120

Pulled By: hwangjeff

fbshipit-source-id: 09ca736a5c67242568515d600b7d31eab32c2df1

b9ef69d1

30 Jan, 2023 1 commit

Fix hybrid demucs tutorial for CUDA (#3017) · da9d1627

Yan Li authored Jan 30, 2023

Summary:
Currently there will be a few errors when this tutorial is run with a CUDA device.

The reasons being:
- The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU).
- When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`).

Pull Request resolved: https://github.com/pytorch/audio/pull/3017

Reviewed By: mthrok

Differential Revision: D42828526

Pulled By: nateanl

fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762

da9d1627

17 Jan, 2023 1 commit

Fix mel spectrogram visualization in TTS tutorial (#2989) · b983c665

Zhaoheng Ni authored Jan 16, 2023

Summary:
The mel spectrograms in the TTS tutorial are upside down. The PR fixes it by using `origin="lower"` in imshow.

Pull Request resolved: https://github.com/pytorch/audio/pull/2989

Reviewed By: mthrok

Differential Revision: D42538349

Pulled By: nateanl

fbshipit-source-id: 4388103a49bdfabf1705c1f979d44ecedd5c910a

b983c665

13 Jan, 2023 1 commit

Add mel spectrogram visualization to Streaming ASR tutorial (#2974) · 55575a53

moto authored Jan 12, 2023

Summary:
Per the suggestion by nateanl, adding the visualization of feature fed to ASR.

<img width="688" alt="Screen Shot 2023-01-12 at 8 19 59 PM" src="https://user-images.githubusercontent.com/855818/212215190-23be7553-4c04-40d9-944e-3ee2ff69c49b.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2974

Reviewed By: nateanl

Differential Revision: D42484088

Pulled By: mthrok

fbshipit-source-id: 2c839492869416554eac04aa06cd12078db21bd7

55575a53

30 Dec, 2022 1 commit

Add subtractive synthesis tutorial (#2934) · 9f57951a

moto authored Dec 29, 2022

Summary:
Artifact: [subtractive_synthesis_tutorial](https://output.circle-artifacts.com/output/job/4c1ce33f-834d-48e0-ba89-2e91acdcb572/artifacts/0/docs/tutorials/subtractive_synthesis_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2934

Reviewed By: carolineechen

Differential Revision: D42284945

Pulled By: mthrok

fbshipit-source-id: d255b8e8e2a601a19bc879f9e1c38edbeebaf9b3

9f57951a

17 Dec, 2022 1 commit

Add filter design tutorial (#2894) · 9c4f71a6

moto authored Dec 16, 2022

Summary:
Adds filter design tutorial, which demonstrates `sinc_impulse_response` and `frequency_impulse_response`.

Example:
 - [filter_design_tutorial](https://output.circle-artifacts.com/output/job/bd22c615-9215-4b17-a52c-b171a47f646c/artifacts/0/docs/tutorials/filter_design_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2894

Reviewed By: xiaohui-zhang

Differential Revision: D42117658

Pulled By: mthrok

fbshipit-source-id: f7dd04980e8557bb6f0e0ec26ac2c7f53314ea16

9c4f71a6

16 Dec, 2022 1 commit

Rename resampling_method options (#2922) · e6bebe6a

Caroline Chen authored Dec 16, 2022

Summary:
resolves https://github.com/pytorch/audio/issues/2891

Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.

Pull Request resolved: https://github.com/pytorch/audio/pull/2922

Reviewed By: mthrok

Differential Revision: D42083619

Pulled By: carolineechen

fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70

e6bebe6a

29 Nov, 2022 1 commit

Add additive synthesis tutorial (#2877) · 1a003c3f

moto authored Nov 29, 2022

Summary:
This commit adds the tutorial for additive synthesis, using torchaudio's prototype DSP ops.

[Review here](https://output.circle-artifacts.com/output/job/3dc83322-832a-4272-9c13-df752c97b660/artifacts/0/docs/tutorials/additive_synthesis_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2877

Reviewed By: carolineechen

Differential Revision: D41585425

Pulled By: mthrok

fbshipit-source-id: b81283b90e4779c8054fd030a1d8c3d39d676bbd

1a003c3f

28 Nov, 2022 1 commit

Add oscillator tutorial (#2862) · 52e89756

moto authored Nov 28, 2022

Summary:
This commits add tutorial for oscillator_bank and adsr_envelope, which will be a basis for DDSP.

 - [Review here](https://output.circle-artifacts.com/output/job/cf1d3001-88e5-418b-8cf8-ae22b4445dba/artifacts/0/docs/tutorials/oscillator_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2862

Reviewed By: carolineechen

Differential Revision: D41559503

Pulled By: mthrok

fbshipit-source-id: 3f1689186db7d246de14f228fc2f91bf37db98cd

52e89756

17 Oct, 2022 1 commit

Update resampling tutorial (#2773) · 8f187354

moto authored Oct 17, 2022

Summary:
* Refactor benchmark script
* Rename `time` variable to avoid (potential) conflicting with time module
* Fix `beta` parameter in benchmark (it was not used previously)
* Use `timeit` module for benchmark
* Add plot
* Move the comment on result at the end
* Add link to an explanation of aliasing

https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2773

Reviewed By: carolineechen

Differential Revision: D40421337

Pulled By: mthrok

fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a

8f187354

14 Oct, 2022 2 commits

Fix leaking matplotlib figure (#2771) · 5239583e

moto authored Oct 14, 2022

Summary:
In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command.

It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html

<img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png">

This commit fixes it by closing the figure.

Pull Request resolved: https://github.com/pytorch/audio/pull/2771

Reviewed By: nateanl

Differential Revision: D40382076

Pulled By: mthrok

fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a

5239583e

Fix fading in hybrid demucs tutorial (#2769) · 000d7526

nateanl authored Oct 13, 2022

Summary:
The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:

![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png)

In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded.

Pull Request resolved: https://github.com/pytorch/audio/pull/2769

Reviewed By: carolineechen

Differential Revision: D40358382

Pulled By: nateanl

fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e

000d7526

13 Oct, 2022 2 commits

Add custom lm example to decoder tutorial (#2762) · 3a5a83d9

Caroline Chen authored Oct 13, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762

Reviewed By: mthrok

Differential Revision: D40332603

Pulled By: carolineechen

fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251

3a5a83d9

Update tutorial author information (#2764) · fb82ac0b

moto authored Oct 13, 2022

Summary:
Adding and updating author information.

Pull Request resolved: https://github.com/pytorch/audio/pull/2764

Reviewed By: carolineechen

Differential Revision: D40332427

Pulled By: mthrok

fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a

fb82ac0b