- 27 Jun, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: This PR adds two dataset classes of VoxCeleb1 corpus. - `VoxCeleb1Identification` Each data sample contains the waveform, sample rate, speaker id, and the file id. - `VoxCeleb1Verification` Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids. Pull Request resolved: https://github.com/pytorch/audio/pull/2349 Reviewed By: carolineechen Differential Revision: D35927921 Pulled By: nateanl fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449
-
- 21 Jun, 2022 1 commit
-
-
Sean Kim authored
Summary: Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks. Pull Request resolved: https://github.com/pytorch/audio/pull/2484 Reviewed By: carolineechen, nateanl Differential Revision: D37250556 Pulled By: skim0514 fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51
-
- 20 Jun, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2480 Reviewed By: nateanl Differential Revision: D37249571 Pulled By: carolineechen fbshipit-source-id: caefeec4253c91f2579655a0c1735edaeed51be9
-
- 10 May, 2022 2 commits
-
-
hwangjeff authored
Summary: Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241. Continuation of https://github.com/pytorch/audio/issues/2324. Pull Request resolved: https://github.com/pytorch/audio/pull/2358 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D36137992 Pulled By: hwangjeff fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371 Reviewed By: xiaohui-zhang Differential Revision: D36246167 Pulled By: carolineechen fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4
-
- 08 Apr, 2022 1 commit
-
-
moto authored
Summary: Add badges of supported properties and devices to functionals and transforms. This commit adds `.. devices::` and `.. properties::` directives to sphinx. APIs with these directives will have badges (based off of shields.io) which link to the page with description of these features. Continuation of https://github.com/pytorch/audio/issues/2316 Excluded dtypes for further improvement, and actually added badges to most of functional/transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2321 Reviewed By: hwangjeff Differential Revision: D35489063 Pulled By: mthrok fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
-
- 24 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: rendered: - [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2278 Reviewed By: mthrok Differential Revision: D35097734 Pulled By: carolineechen fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c
-
- 25 Feb, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``. It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf). The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively. Pull Request resolved: https://github.com/pytorch/audio/pull/2228 Reviewed By: mthrok Differential Revision: D34474018 Pulled By: nateanl fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb
-
- 23 Dec, 2021 1 commit
-
-
hwangjeff authored
Summary: Adds implementation of Conformer module. Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770. Pull Request resolved: https://github.com/pytorch/audio/pull/2068 Reviewed By: mthrok Differential Revision: D33236957 Pulled By: hwangjeff fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6
-
- 25 Oct, 2021 1 commit
-
-
moto authored
-
- 16 Oct, 2021 1 commit
-
-
moto authored
-
- 15 Oct, 2021 1 commit
-
-
moto authored
Future work items: - length computation of GriffinLim - better way to make InverseMelScale work in inference_mode
-
- 06 Oct, 2021 1 commit
-
-
hwangjeff authored
Adds an implementation of Emformer, a memory-efficient transformer architecture introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency streaming speech recognition applications.
-
- 05 Oct, 2021 1 commit
-
-
moto authored
-
- 28 Sep, 2021 1 commit
-
-
moto authored
This commit adds the following HuBERT model architectures - `base` (pre-training) - `large` (pre-training / fine-tuning) - `xlarge` (pre-training / fine-tuning) Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules.. With these models, it is possible to - import the pre-trained model published by `fairseq` and TorchScript it. - fine-tune the existing model for downstream task.
-
- 20 Sep, 2021 1 commit
-
-
nateanl authored
-
- 12 Aug, 2021 1 commit
-
-
yangarbiter authored
-
- 20 Jul, 2021 1 commit
-
-
yangarbiter authored
Porting Tacotron2 from https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/tacotron2/model.py
-
- 03 Jun, 2021 1 commit
-
-
moto authored
* Use `bibtex` for paper citations. * add `override.css` for fixing back reference. * wav2vec2 * wav2letter * convtasnet * deepspeech * rnnt-loss * griffinlim * Fix broken references in `filtering`. * Fix note in soundfile backends. * Tweak wav2vec2 example. * Removes unused `pytorch_theme.css`
-