- 25 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2228 Reviewed By: mthrok Differential Revision: D34474018 Pulled By: nateanl fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb
-
Zhaoheng Ni authored
Summary: This PR adds ``psd`` method to ``torchaudio.functional``. It computes the power spectral density (PSD) matrix of the complex-valued spectrum. The method also supports normalization of Time-Frequency mask. Pull Request resolved: https://github.com/pytorch/audio/pull/2227 Reviewed By: mthrok Differential Revision: D34473908 Pulled By: nateanl fbshipit-source-id: c1cfc584085d77881b35d41d76d39b26fca1dda9
-
- 16 Feb, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: This PR provides a RNNTBundle that is pre-trained on the MuST-C release v2.0 dataset. The model preserves the casing and punctuations of the transcripts when training the SentencePiece model. Here is the model performance on the dev and test sets of MuST-C 2.0: | | WER | |:-----------------:|-------------:| | dev | 0.190 | | tst-COMMON | 0.213 | | tst-HE | 0.186 | Pull Request resolved: https://github.com/pytorch/audio/pull/2241 Reviewed By: mthrok Differential Revision: D34267792 Pulled By: nateanl fbshipit-source-id: 67bca9f277e66d41a4530d01615f249b3cec7167
-
- 04 Feb, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177 Reviewed By: hwangjeff Differential Revision: D33893052 Pulled By: nateanl fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7
-
- 03 Feb, 2022 1 commit
-
-
moto authored
Summary: * tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html * tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2193 Reviewed By: hwangjeff Differential Revision: D33971312 Pulled By: mthrok fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f
-
- 02 Feb, 2022 1 commit
-
-
moto authored
Summary: This PR adds the prototype streaming API. The implementation is based on ffmpeg libraries. For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2164 Reviewed By: hwangjeff Differential Revision: D33934457 Pulled By: mthrok fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe
-
- 01 Feb, 2022 3 commits
-
-
hwangjeff authored
Summary: Missed a couple of spots in https://github.com/pytorch/audio/issues/2187. Pull Request resolved: https://github.com/pytorch/audio/pull/2189 Reviewed By: carolineechen, nateanl, mthrok Differential Revision: D33926342 Pulled By: hwangjeff fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06
-
hwangjeff authored
Summary: Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2187 Reviewed By: nateanl, mthrok Differential Revision: D33918092 Pulled By: hwangjeff fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2185 Reviewed By: hwangjeff, mthrok Differential Revision: D33905767 Pulled By: carolineechen fbshipit-source-id: 964576ab3f4a12b91fa3960b2aa2337239356513
-
- 27 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future) Pull Request resolved: https://github.com/pytorch/audio/pull/2174 Reviewed By: hwangjeff, nateanl Differential Revision: D33798674 Pulled By: carolineechen fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
-
- 14 Jan, 2022 1 commit
-
-
moto authored
Summary: - Change the version of nightly build to `Nightly Build (VERSION)`. - Use `BUILD_VERSION` env var for release. - Automatically change copyright year. - Update the link to nightly in README so that the main branch directs to the corresponding document. Because of the way CI job is setup, the resulting documentation says 0.8.0. This is fixed by https://github.com/pytorch/audio/issues/2151. Pull Request resolved: https://github.com/pytorch/audio/pull/2152 Reviewed By: carolineechen, nateanl Differential Revision: D33585053 Pulled By: mthrok fbshipit-source-id: 3c2bf9fc3214c89f989f5ac65b74bc1e276a7161
-
- 06 Jan, 2022 1 commit
-
-
moto authored
Summary: - Unindent RNNTBundle components so that they show up on the right side bar - Overwrite the sigunature of RNNTBundle methods so that back links are available --- ## Before <img width="1440" alt="Screen Shot 2022-01-06 at 1 36 16 PM" src="https://user-images.githubusercontent.com/855818/148433552-9ba3051d-38b1-4825-9a8f-9173b23650ea.png"> ## After <img width="1436" alt="Screen Shot 2022-01-06 at 1 35 39 PM" src="https://user-images.githubusercontent.com/855818/148433525-733d138d-9a8b-43d6-bdf5-444b52d6a7a9.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2148 Reviewed By: hwangjeff Differential Revision: D33458574 Pulled By: mthrok fbshipit-source-id: ac34ffc4070261563a1f4ea9337997f0fe7b2212
-
- 04 Jan, 2022 1 commit
-
-
moto authored
Summary: * Before https://pytorch.org/audio/main/models.html <img width="852" alt="Screen Shot 2022-01-04 at 11 00 12 AM" src="https://user-images.githubusercontent.com/855818/148087255-3b94e63b-9870-4c7e-95c6-17acc1e65fef.png"> *After https://503135-90321822-gh.circle-artifacts.com/0/docs/models.html <img width="842" alt="Screen Shot 2022-01-04 at 10 59 40 AM" src="https://user-images.githubusercontent.com/855818/148087148-b951c7b0-d9cf-4014-8a50-b88c749f12ba.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2123 Reviewed By: carolineechen Differential Revision: D33409661 Pulled By: mthrok fbshipit-source-id: bb2dffea25ccc4356d257b2ab4a6e88f7f4e2bb3
-
- 31 Dec, 2021 1 commit
-
-
Caroline Chen authored
Summary: add documentaion for CTC decoder `Hypothesis` and include it in docs Pull Request resolved: https://github.com/pytorch/audio/pull/2117 Reviewed By: mthrok Differential Revision: D33370381 Pulled By: carolineechen fbshipit-source-id: cf6501a499e5303cda0410f733f0fab4e1c39aff
-
- 30 Dec, 2021 1 commit
-
-
Joao Gomes authored
Summary: cc mthrok Pull Request resolved: https://github.com/pytorch/audio/pull/2116 Reviewed By: mthrok Differential Revision: D33368453 Pulled By: jdsgomes fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c
-
- 29 Dec, 2021 3 commits
-
-
hwangjeff authored
Summary: Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`. Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html Pull Request resolved: https://github.com/pytorch/audio/pull/2110 Reviewed By: carolineechen, mthrok Differential Revision: D33354116 Pulled By: hwangjeff fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb
-
moto authored
Summary: ### Change list * Split the documentation of prototypes * Add a new API reference section dedicated for prototypes. * Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder * Hide the signature of RNNT constructor. (cc hwangjeff ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT * Tweak CTC tutorial * Replace hyperlinks to API reference with backlinks * Add `progress=False` to download ### Follow-up RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous. I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough. ### Before https://pytorch.org/audio/main/prototype.html <img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png"> ### After https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html <img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html <img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html <img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2108 Reviewed By: hwangjeff, carolineechen, nateanl Differential Revision: D33340816 Pulled By: mthrok fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187
-
hwangjeff authored
Summary: Adds pretrained Emformer RNN-T inference pipeline that's capable of performing streaming and non-streaming ASR. Includes demo script that uses pipeline to alternately perform streaming and non-streaming ASR on LibriSpeech test samples (video below). https://user-images.githubusercontent.com/8345689/147590753-d5126557-d575-4551-8dfe-5977276cb4ad.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2093 Reviewed By: mthrok Differential Revision: D33340776 Pulled By: hwangjeff fbshipit-source-id: fbb3b1d471b4e9f1b93fa9dea9c464154537a8ac
-
- 28 Dec, 2021 4 commits
-
-
Caroline Chen authored
Summary: demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html follow-ups: - incorporate `nbest` - demonstrate customizability of different beam search parameters Pull Request resolved: https://github.com/pytorch/audio/pull/2106 Reviewed By: mthrok Differential Revision: D33340946 Pulled By: carolineechen fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7
-
Zhaoheng Ni authored
Summary: - Add three factory functions:`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`, to enable the HuBERT model to train from scratch. - Add `num_classes` argument to `hubert_pretrain_base` factory function because the base model has two iterations of training, the first iteration the `num_cluster` is 100, in the second iteration `num_cluster` is 500. - The model takes `waveforms`, `labels`, and `lengths` as inputs - The model generates the last layer of transformer embedding, `logit_m`, `logit_u` as the outputs. Pull Request resolved: https://github.com/pytorch/audio/pull/2064 Reviewed By: hwangjeff, mthrok Differential Revision: D33338587 Pulled By: nateanl fbshipit-source-id: 534bc17c576c5f344043d8ba098204b8da6e630a
-
moto authored
Summary: *Before:* https://pytorch.org/audio/main/tutorials/audio_data_augmentation_tutorial.html#effects-applied <img width="831" alt="Screen Shot 2021-12-28 at 11 25 08 AM" src="https://user-images.githubusercontent.com/855818/147586457-55d566bf-f016-4327-a07e-5de68f80e984.png"> *After:* https://484994-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html#effects-applied <img width="830" alt="Screen Shot 2021-12-28 at 11 25 57 AM" src="https://user-images.githubusercontent.com/855818/147586531-90333201-b9e3-450f-a2d7-6fb987b7e9d9.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2107 Reviewed By: carolineechen Differential Revision: D33337164 Pulled By: mthrok fbshipit-source-id: 20e3309f0d11d46619f516dc46d967b34f22ec95
-
moto authored
Summary: This commit updates the documentation configuration so that if an API (function or class) is used in tutorials, then it automatically add the links to the tutorials. It also adds `py:func:` so that it's easy to jump from tutorials to API reference. Note: the use of `py:func:` is not required to be recognized by Shpinx-gallery. * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#feature-extractions <img width="776" alt="Screen Shot 2021-12-24 at 12 41 43 PM" src="https://user-images.githubusercontent.com/855818/147367407-cd86f114-7177-426a-b5ee-a25af17ae476.png"> * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#mvdr <img width="769" alt="Screen Shot 2021-12-24 at 12 42 31 PM" src="https://user-images.githubusercontent.com/855818/147367422-01fd245f-2f25-4875-a206-910e17ae0161.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2101 Reviewed By: hwangjeff Differential Revision: D33311283 Pulled By: mthrok fbshipit-source-id: e0c124d2a761e0f8d81c3d14c4ffc836ffffe288
-
- 23 Dec, 2021 3 commits
-
-
Caroline Chen authored
Summary: Part of https://github.com/pytorch/audio/issues/2072 -- splitting up PR for easier review This PR adds Python decoder API and basic README Pull Request resolved: https://github.com/pytorch/audio/pull/2089 Reviewed By: mthrok Differential Revision: D33299818 Pulled By: carolineechen fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
hwangjeff authored
Summary: Adds implementation of Conformer module. Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770. Pull Request resolved: https://github.com/pytorch/audio/pull/2068 Reviewed By: mthrok Differential Revision: D33236957 Pulled By: hwangjeff fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6
-
- 24 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference. Pull Request resolved: https://github.com/pytorch/audio/pull/2028 Reviewed By: mthrok Differential Revision: D32627919 Pulled By: hwangjeff fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
-
- 23 Nov, 2021 1 commit
-
-
moto authored
Summary: - Remove unnecessary content list - Remove legacy description Pull Request resolved: https://github.com/pytorch/audio/pull/2029 Reviewed By: carolineechen Differential Revision: D32629917 Pulled By: mthrok fbshipit-source-id: bc9a9366c681bcf8b74907c2a6459c73fb6a7424
-
- 19 Nov, 2021 1 commit
-
-
moto authored
Summary: With the introduction of tutorials, the turn around time for doc build has become longer. By default, the tutorial is not built but SPHINXOPT=-W treats it as error. This commit disable the option for the local build while keeping it for the CI. Pull Request resolved: https://github.com/pytorch/audio/pull/2013 Reviewed By: carolineechen Differential Revision: D32538952 Pulled By: mthrok fbshipit-source-id: eae4ffd87100dff466f91abfe26a82aa702d605a
-
- 18 Nov, 2021 1 commit
-
-
hwangjeff authored
Summary: Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model. Pull Request resolved: https://github.com/pytorch/audio/pull/2003 Reviewed By: nateanl Differential Revision: D32440879 Pulled By: hwangjeff fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360
-
- 10 Nov, 2021 1 commit
-
-
Krishna Kalyan authored
-
- 05 Nov, 2021 4 commits
-
-
moto authored
- Add link to index page on left - Package Reference -> API Reference - Update description.
-
moto authored
-
moto authored
-
moto authored
* Refactor tutorial organization * Merge tutorial subdirectoris under to examples/gallery/tutorials * Do not use index.rst generated by Sphinx-gallery * Instead use flat structure so that all the tutorials are listed in left menu * Use `_assets` dir for artifacts of tutorials
-
- 04 Nov, 2021 6 commits
-
-
moto authored
-
moto authored
The hack introduced in #1969 could break if upstream theme changes its HTML or Javascript. To prevent our documentation from randomly break, this commit fixes the commit of the them.
-
moto authored
-
moto authored
This commit adds colab/download/source link to tutorials, like in `pytorch/tutorials` repo. Since the upstream `pytorch-sphinx-theme` does not provide the interface for this, a hack to overwrite the URL is added. This hack might stop working if there is some update in `pytorch-sphinx-theme`.
-
moto authored
With the introduction of TTS tutorial (#1973), it takes more than couple of minutes to build documentation. This commit makes the doc build process defaults to not build tutorials. To build tutorials one can use environment variable `BUILD_GALLERY=1`, and set `GALLERY_PATTERN=...` to filter the tutorials to build. This `GALLERY_PATTERN` is same approach as in `tutorials` repo. https://github.com/pytorch/tutorials/blob/cbf2238df0e78d84c15bd94288966d2f4b2e83ae/conf.py#L75-L83 Also this commit dynamically parse the subdirectory of `examples/gallery` so that when a new category of examples are added, it will automatically parsed.
-
moto authored
-