- 27 Jan, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future) Pull Request resolved: https://github.com/pytorch/audio/pull/2174 Reviewed By: hwangjeff, nateanl Differential Revision: D33798674 Pulled By: carolineechen fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178 Reviewed By: mthrok Differential Revision: D33797649 Pulled By: nateanl fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603
-
- 26 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: following up on https://github.com/pytorch/audio/pull/2141#discussion_r779055465, adding brief beam search description and linking to resources Pull Request resolved: https://github.com/pytorch/audio/pull/2173 Reviewed By: nateanl Differential Revision: D33791731 Pulled By: carolineechen fbshipit-source-id: 603fdd177c9a3c8276a4692fb7bb385bd01b9bfb
-
- 22 Jan, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Rename `BucketizeSampler` to `BucketizeBatchSampler` - Fix bugs in `BucketizeBatchSampler` - Adjust HuBERTDataset based on the latest `BucketizeBatchSampler`. Pull Request resolved: https://github.com/pytorch/audio/pull/2150 Reviewed By: mthrok Differential Revision: D33689963 Pulled By: nateanl fbshipit-source-id: 203764e9af5b7577ba08ebaa30ba5da3b67fb7e7
-
- 20 Jan, 2022 1 commit
-
-
yonMaor authored
Summary: Closes https://github.com/pytorch/audio/issues/2162 Pull Request resolved: https://github.com/pytorch/audio/pull/2163 Reviewed By: nateanl Differential Revision: D33666354 Pulled By: mthrok fbshipit-source-id: 3e7a963b9ac85046317df8d5dab91af363e5668b
-
- 18 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: additionally add decoding results for wav2vec2 large and also on the test-clean dataset Pull Request resolved: https://github.com/pytorch/audio/pull/2161 Reviewed By: mthrok Differential Revision: D33644670 Pulled By: carolineechen fbshipit-source-id: a219a15af46f82a6bd90169bb3001dbad8f0a96e
-
- 08 Jan, 2022 1 commit
-
-
Binh Tang authored
[PyTorchLightning/pytorch-lightning] Add deprecation path for renamed training type plugins (#11227) Summary: ### New commit log messages 4eede7c30 Add deprecation path for renamed training type plugins (#11227) Reviewed By: edward-io, daniellepintz Differential Revision: D33409991 fbshipit-source-id: 373e48767e992d67db3c85e436648481ad16c9d0
-
- 07 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add explanation and demonstration of different beam search decoder parameters. Additionally use a better sample audio file and load in with token list instead of tokens file. Pull Request resolved: https://github.com/pytorch/audio/pull/2141 Reviewed By: mthrok Differential Revision: D33463230 Pulled By: carolineechen fbshipit-source-id: d3dd6452b03d4fc2e095d778189c66f7161e4c68
-
- 06 Jan, 2022 2 commits
-
-
Elijah Rippeth authored
Summary: This PR: - Replaces the `data_source` with `lengths` - Adds a `shuffle` argument to decide whether to shuffle the samples in the buckets. - Add `max_len` and `min_len` to filter out samples that are > max_len or < min_len. cc nateanl Pull Request resolved: https://github.com/pytorch/audio/pull/2147 Reviewed By: carolineechen Differential Revision: D33454369 Pulled By: nateanl fbshipit-source-id: 3835169ec7f808f8dd9650e7f183f79091efe886
-
Binh Tang authored
Summary: ### New commit log messages b64dea9dc Rename `DDPPlugin` to `DDPStrategy` (#11142) Reviewed By: jjenniferdai Differential Revision: D33259306 fbshipit-source-id: b4608c6b96b4a7977eaa4ed3f03c4b824882aef0
-
- 05 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: add script for running CTC beam search decoder on librispeech dataset with torchaudio pretrained wav2vec2 models Pull Request resolved: https://github.com/pytorch/audio/pull/2130 Reviewed By: mthrok Differential Revision: D33419436 Pulled By: carolineechen fbshipit-source-id: 0a0d00f4c17ecdbb497c9eda78673aa939d73c57
-
- 30 Dec, 2021 1 commit
-
-
Joao Gomes authored
Summary: cc mthrok Pull Request resolved: https://github.com/pytorch/audio/pull/2116 Reviewed By: mthrok Differential Revision: D33368453 Pulled By: jdsgomes fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c
-
- 29 Dec, 2021 4 commits
-
-
hwangjeff authored
Summary: Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`. Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html Pull Request resolved: https://github.com/pytorch/audio/pull/2110 Reviewed By: carolineechen, mthrok Differential Revision: D33354116 Pulled By: hwangjeff fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb
-
CodemodService Bot authored
Reviewed By: zertosh Differential Revision: D33347867 fbshipit-source-id: 7672f65392e363c0359de2d86e745782a09cf9dc
-
moto authored
Summary: ### Change list * Split the documentation of prototypes * Add a new API reference section dedicated for prototypes. * Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder * Hide the signature of RNNT constructor. (cc hwangjeff ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT * Tweak CTC tutorial * Replace hyperlinks to API reference with backlinks * Add `progress=False` to download ### Follow-up RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous. I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough. ### Before https://pytorch.org/audio/main/prototype.html <img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png"> ### After https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html <img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html <img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html <img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2108 Reviewed By: hwangjeff, carolineechen, nateanl Differential Revision: D33340816 Pulled By: mthrok fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187
-
hwangjeff authored
Summary: Adds pretrained Emformer RNN-T inference pipeline that's capable of performing streaming and non-streaming ASR. Includes demo script that uses pipeline to alternately perform streaming and non-streaming ASR on LibriSpeech test samples (video below). https://user-images.githubusercontent.com/8345689/147590753-d5126557-d575-4551-8dfe-5977276cb4ad.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2093 Reviewed By: mthrok Differential Revision: D33340776 Pulled By: hwangjeff fbshipit-source-id: fbb3b1d471b4e9f1b93fa9dea9c464154537a8ac
-
- 28 Dec, 2021 3 commits
-
-
Zhaoheng Ni authored
Summary: Remove it as it's already introduced in the [gallery](https://github.com/pytorch/audio/blob/main/examples/tutorials/mvdr_tutorial.py). Pull Request resolved: https://github.com/pytorch/audio/pull/2109 Reviewed By: carolineechen Differential Revision: D33341574 Pulled By: nateanl fbshipit-source-id: e5c1c8537063b9453947dc3ecafa70e9b6c74146
-
Caroline Chen authored
Summary: demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html follow-ups: - incorporate `nbest` - demonstrate customizability of different beam search parameters Pull Request resolved: https://github.com/pytorch/audio/pull/2106 Reviewed By: mthrok Differential Revision: D33340946 Pulled By: carolineechen fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7
-
moto authored
Summary: This commit updates the documentation configuration so that if an API (function or class) is used in tutorials, then it automatically add the links to the tutorials. It also adds `py:func:` so that it's easy to jump from tutorials to API reference. Note: the use of `py:func:` is not required to be recognized by Shpinx-gallery. * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#feature-extractions <img width="776" alt="Screen Shot 2021-12-24 at 12 41 43 PM" src="https://user-images.githubusercontent.com/855818/147367407-cd86f114-7177-426a-b5ee-a25af17ae476.png"> * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#mvdr <img width="769" alt="Screen Shot 2021-12-24 at 12 42 31 PM" src="https://user-images.githubusercontent.com/855818/147367422-01fd245f-2f25-4875-a206-910e17ae0161.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2101 Reviewed By: hwangjeff Differential Revision: D33311283 Pulled By: mthrok fbshipit-source-id: e0c124d2a761e0f8d81c3d14c4ffc836ffffe288
-
- 23 Dec, 2021 1 commit
-
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
- 21 Dec, 2021 1 commit
-
-
moto authored
Summary: 1. Reorder Audio display so that audios are playable from browser in doc 2. Add link to function documentations https://470342-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2082 Reviewed By: carolineechen Differential Revision: D33227725 Pulled By: mthrok fbshipit-source-id: c7ee360b6f9b84c8e0a9b72193b98487d03b57ab
-
- 10 Dec, 2021 1 commit
-
-
nateanl authored
Summary: The PR adds PyTorch Lightning based training script for HuBERT Base model. There are two iterations of pre-training and 1 iteration of ASR fine-tuning on LibriSpeech dataset. Pull Request resolved: https://github.com/pytorch/audio/pull/2000 Reviewed By: carolineechen Differential Revision: D33021467 Pulled By: nateanl fbshipit-source-id: 77fe5a751943b56b63d5f1fb4e6ef35946e081db
-
- 03 Dec, 2021 1 commit
-
-
hwangjeff authored
Summary: Add training recipe for RNN-T Emformer ASR model to examples directory. Pull Request resolved: https://github.com/pytorch/audio/pull/2052 Reviewed By: nateanl Differential Revision: D32814096 Pulled By: hwangjeff fbshipit-source-id: a5153044efc16cb39f0e6413369a6791637af76a
-
- 11 Nov, 2021 1 commit
-
-
nateanl authored
-
- 10 Nov, 2021 1 commit
-
-
Krishna Kalyan authored
-
- 09 Nov, 2021 1 commit
-
-
nateanl authored
-
- 05 Nov, 2021 5 commits
-
-
moto authored
-
moto authored
-
moto authored
-
moto authored
* Refactor tutorial organization * Merge tutorial subdirectoris under to examples/gallery/tutorials * Do not use index.rst generated by Sphinx-gallery * Instead use flat structure so that all the tutorials are listed in left menu * Use `_assets` dir for artifacts of tutorials
-
moto authored
It turned out that generated tutorials can embed the audio if the following conditions are met. This commit changes how audio samples are shown in tutorials so that they become playable in doc. 1. There is only one `IPython.display.Audio` call in a cell 2. An object of `IPython.display.Audio` is the last object interpreter receives in the cell 3. Audio format is `wav` (`flac` can be embedded as well, but browsers (Chrome/Safari) won't play it) Ref: https://stackoverflow.com/a/33109647
-
- 04 Nov, 2021 2 commits
- 03 Nov, 2021 1 commit
-
-
nateanl authored
-
- 01 Nov, 2021 1 commit
-
-
nateanl authored
-
- 30 Oct, 2021 1 commit
-
-
nateanl authored
-
- 13 Oct, 2021 1 commit
-
-
moto authored
-
- 11 Oct, 2021 1 commit
-
-
moto authored
-
- 10 Oct, 2021 1 commit
-
-
moto authored
Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.
-
- 09 Oct, 2021 1 commit
-
-
moto authored
-