Commits · b0c8e239f7a00fda6370d54d1fa085f96c477a8f · OpenDAS / Torchaudio

25 Mar, 2022 1 commit

Add Pretrained LM Support for Decoder (#2275) · 34c0d115

Caroline Chen authored Mar 24, 2022

Summary:
add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2275

Reviewed By: mthrok

Differential Revision: D35115418

Pulled By: carolineechen

fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0

34c0d115

24 Mar, 2022 2 commits

Update CTC decoder docs and add citation (#2278) · 05592dff

Caroline Chen authored Mar 24, 2022

Summary:
rendered:
- [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html)
- [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2278

Reviewed By: mthrok

Differential Revision: D35097734

Pulled By: carolineechen

fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c

05592dff

Add notes about prototype features in tutorials (#2288) · 8844fbb7

moto authored Mar 23, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2288

Reviewed By: hwangjeff

Differential Revision: D35099492

Pulled By: mthrok

fbshipit-source-id: 955c5e617469009ae2600d2764d601d794ee916f

8844fbb7

22 Mar, 2022 1 commit

Fix calculation of SNR value in tutorial (#2285) · 8395fe65

Hagen Wierstorf authored Mar 22, 2022

Summary:
The calculation of the SNR in tha data augmentation examples seems to be wrong to me:

![image](https://user-images.githubusercontent.com/173624/159487032-c60470c6-ef8e-48a0-ad5e-a117fcb8d606.png)

If we start from the definition of the signal-to-noise ratio using the root mean square value we get:

```
SNR = 20 log10 ( rms(scale * speech) / rms(noise) )
```
this can be transformed to
```
scale = 10^(SNR/20) rms(noise) / rms(speech)
```
In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have
```
rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2)
```
this would lead us to:
```
10^(SNR/20) = e^(SNR / 10)
```
which is not true.

Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`.

For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41.

Pull Request resolved: https://github.com/pytorch/audio/pull/2285

Reviewed By: nateanl

Differential Revision: D35047737

Pulled By: mthrok

fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3

8395fe65

17 Mar, 2022 1 commit

[Doc] fix typo and backlink (#2281) · 1c3403ea

moto authored Mar 17, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2281

Reviewed By: carolineechen

Differential Revision: D34939494

Pulled By: mthrok

fbshipit-source-id: e97100b95a8e3d3e28805d8fab43b66120c2254d

1c3403ea

10 Mar, 2022 1 commit

Fix typos and remove comments (#2270) · 4b47412e

moto authored Mar 10, 2022

Summary:
Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202

Pull Request resolved: https://github.com/pytorch/audio/pull/2270

Reviewed By: hwangjeff

Differential Revision: D34793460

Pulled By: mthrok

fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723

4b47412e

26 Feb, 2022 1 commit

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

17 Feb, 2022 1 commit

Update online ASR tutorial (#2226) · c5c4bbfd

moto authored Feb 16, 2022

Summary:
https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

1. Add figure to explain the caching
2. Fix the initialization of stream iterator

Pull Request resolved: https://github.com/pytorch/audio/pull/2226

Reviewed By: carolineechen

Differential Revision: D34265971

Pulled By: mthrok

fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4

c5c4bbfd

15 Feb, 2022 1 commit

Update context building to not delay the inference (#2213) · 8e3c6144

moto authored Feb 14, 2022

Summary:
Updating the context cacher so that fetched audio chunk is used for inference immediately.

https://github.com/pytorch/audio/pull/2202#discussion_r802838174

Pull Request resolved: https://github.com/pytorch/audio/pull/2213

Reviewed By: hwangjeff

Differential Revision: D34235230

Pulled By: mthrok

fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e

8e3c6144

09 Feb, 2022 1 commit

Fix librosa calls (#2208) · e5d567c9

hwangjeff authored Feb 08, 2022

Summary:
Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2208

Reviewed By: mthrok

Differential Revision: D34099793

Pulled By: hwangjeff

fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc

e5d567c9

03 Feb, 2022 1 commit

Add tutorials with streaming API (#2193) · c00f65da

moto authored Feb 03, 2022

Summary:
* tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html
* tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2193

Reviewed By: hwangjeff

Differential Revision: D33971312

Pulled By: mthrok

fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f

c00f65da

02 Feb, 2022 1 commit

Add timesteps visualization to CTC decoder tutorial (#2188) · 94f4ef0f

Caroline Chen authored Feb 02, 2022

Summary:
resulting tutorial: https://538358-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html
- add visualization for timestep alignments
- modify section organization for decoder construction

Pull Request resolved: https://github.com/pytorch/audio/pull/2188

Reviewed By: mthrok

Differential Revision: D33954937

Pulled By: carolineechen

fbshipit-source-id: 8f397229d74c994b8793a30623e1de4c19ebd401

94f4ef0f

31 Jan, 2022 1 commit

Use download.pytorch.org for asset URL (#2182) · f654b2c9

moto authored Jan 31, 2022

Summary:
Changing the URL of tutorial assets to `download.pytorch.org` which is more appropriate for user facing materials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2182

Reviewed By: nateanl

Differential Revision: D33887839

Pulled By: mthrok

fbshipit-source-id: 30569672e8caf30aae5476036dfdadc8ebd436bf

f654b2c9

27 Jan, 2022 1 commit

Add no lm support for CTC decoder (#2174) · 4c3fa875

Caroline Chen authored Jan 27, 2022

Summary:
Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)

Pull Request resolved: https://github.com/pytorch/audio/pull/2174

Reviewed By: hwangjeff, nateanl

Differential Revision: D33798674

Pulled By: carolineechen

fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde

4c3fa875

26 Jan, 2022 1 commit

Add beam search description to tutorial (#2173) · bcf04839

Caroline Chen authored Jan 26, 2022

Summary:
following up on https://github.com/pytorch/audio/pull/2141#discussion_r779055465, adding brief beam search description and linking to resources

Pull Request resolved: https://github.com/pytorch/audio/pull/2173

Reviewed By: nateanl

Differential Revision: D33791731

Pulled By: carolineechen

fbshipit-source-id: 603fdd177c9a3c8276a4692fb7bb385bd01b9bfb

bcf04839

20 Jan, 2022 1 commit

Remove multiprocessing from audio dataset tutorial (#2163) · dc4f76fd

yonMaor authored Jan 20, 2022

Summary:
Closes https://github.com/pytorch/audio/issues/2162

Pull Request resolved: https://github.com/pytorch/audio/pull/2163

Reviewed By: nateanl

Differential Revision: D33666354

Pulled By: mthrok

fbshipit-source-id: 3e7a963b9ac85046317df8d5dab91af363e5668b

dc4f76fd

07 Jan, 2022 1 commit

Add parameter usage to CTC inference tutorial (#2141) · ffbfe74a

Caroline Chen authored Jan 06, 2022

Summary:
Add explanation and demonstration of different beam search decoder parameters.
Additionally use a better sample audio file and load in with token list instead of tokens file.

Pull Request resolved: https://github.com/pytorch/audio/pull/2141

Reviewed By: mthrok

Differential Revision: D33463230

Pulled By: carolineechen

fbshipit-source-id: d3dd6452b03d4fc2e095d778189c66f7161e4c68

ffbfe74a

29 Dec, 2021 1 commit

Update prototype documentations (#2108) · 10cce198

moto authored Dec 28, 2021

Summary:
### Change list

* Split the documentation of prototypes
* Add a new API reference section dedicated for prototypes.
* Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder
* Hide the signature of RNNT constructor. (cc hwangjeff )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT
* Tweak CTC tutorial
  * Replace hyperlinks to API reference with backlinks
  * Add `progress=False` to download

### Follow-up

RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous.
I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough.

### Before

https://pytorch.org/audio/main/prototype.html

<img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png">

### After

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html

<img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html

<img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html

<img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2108

Reviewed By: hwangjeff, carolineechen, nateanl

Differential Revision: D33340816

Pulled By: mthrok

fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187

10cce198

28 Dec, 2021 2 commits

Add ASR CTC inference tutorial (#2106) · 133d0065

Caroline Chen authored Dec 28, 2021

Summary:
demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model

rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html

follow-ups:
- incorporate `nbest`
- demonstrate customizability of different beam search parameters

Pull Request resolved: https://github.com/pytorch/audio/pull/2106

Reviewed By: mthrok

Differential Revision: D33340946

Pulled By: carolineechen

fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7

133d0065

Add Sphinx gallery automatically (#2101) · eb8e8dc8

moto authored Dec 28, 2021

Summary:
This commit updates the documentation configuration so that if an API (function or class) is used in tutorials, then it automatically add the links to the tutorials.

It also adds `py:func:` so that it's easy to jump from tutorials to API reference.

Note: the use of `py:func:` is not required to be recognized by Shpinx-gallery.

* https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#feature-extractions

<img width="776" alt="Screen Shot 2021-12-24 at 12 41 43 PM" src="https://user-images.githubusercontent.com/855818/147367407-cd86f114-7177-426a-b5ee-a25af17ae476.png">

* https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#mvdr

<img width="769" alt="Screen Shot 2021-12-24 at 12 42 31 PM" src="https://user-images.githubusercontent.com/855818/147367422-01fd245f-2f25-4875-a206-910e17ae0161.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2101

Reviewed By: hwangjeff

Differential Revision: D33311283

Pulled By: mthrok

fbshipit-source-id: e0c124d2a761e0f8d81c3d14c4ffc836ffffe288

eb8e8dc8

23 Dec, 2021 1 commit

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

21 Dec, 2021 1 commit

Update audio augmentation tutorial (#2082) · 3a03d8c0

moto authored Dec 20, 2021

Summary:
1. Reorder Audio display so that audios are playable from browser in doc
2. Add link to function documentations

https://470342-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2082

Reviewed By: carolineechen

Differential Revision: D33227725

Pulled By: mthrok

fbshipit-source-id: c7ee360b6f9b84c8e0a9b72193b98487d03b57ab

3a03d8c0

11 Nov, 2021 1 commit
- Fix style checks in examples/tutorials (#2006) · 0ac196c6
  nateanl authored Nov 11, 2021
  
  0ac196c6
10 Nov, 2021 1 commit
- [BC-Breaking] Remove deprecated create_fb_matrix (#1998) · 22379d14
  Krishna Kalyan authored Nov 10, 2021
  
  22379d14
05 Nov, 2021 4 commits
- Fix sections of MVDR tutorial (#1989) · 209145a4
  moto authored Nov 05, 2021
  
  209145a4
- Port MVDR tutorial (#1983) · b9247022
  moto authored Nov 05, 2021
  
  b9247022
- Port audio manipulation tutorial (#1970) · 8f061987
  moto authored Nov 05, 2021
  
  8f061987
- Refactor tutorial organization (#1987) · 6cf84866
  moto authored Nov 05, 2021
```
* Refactor tutorial organization

* Merge tutorial subdirectoris under to examples/gallery/tutorials
* Do not use index.rst generated by Sphinx-gallery
* Instead use flat structure so that all the tutorials are listed in left menu
* Use `_assets` dir for artifacts of tutorials
```
  6cf84866