Commits · 21b2d1392c4ff998fed71d14d2cb5892afc445b8 · OpenDAS / Torchaudio

27 Jun, 2022 1 commit

Add VoxCeleb1 dataset (#2349) · 21b2d139

Zhaoheng Ni authored Jun 27, 2022

Summary:
This PR adds two dataset classes of VoxCeleb1 corpus.
- `VoxCeleb1Identification`
Each data sample contains the waveform, sample rate, speaker id, and the file id.
- `VoxCeleb1Verification`
Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids.

Pull Request resolved: https://github.com/pytorch/audio/pull/2349

Reviewed By: carolineechen

Differential Revision: D35927921

Pulled By: nateanl

fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449

21b2d139

21 Jun, 2022 1 commit

Create musdb handler and tests (#2484) · b92a8a09

Sean Kim authored Jun 21, 2022

Summary:
Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks.

Pull Request resolved: https://github.com/pytorch/audio/pull/2484

Reviewed By: carolineechen, nateanl

Differential Revision: D37250556

Pulled By: skim0514

fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51

b92a8a09

20 Jun, 2022 1 commit

Add fluent speech commands (#2480) · 66a67d2e

Caroline Chen authored Jun 20, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2480

Reviewed By: nateanl

Differential Revision: D37249571

Pulled By: carolineechen

fbshipit-source-id: caefeec4253c91f2579655a0c1735edaeed51be9

66a67d2e

10 May, 2022 2 commits

Add ConvEmformer module (#2358) · 2c79b55a

hwangjeff authored May 10, 2022

Summary:
Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241.

Continuation of https://github.com/pytorch/audio/issues/2324.

Pull Request resolved: https://github.com/pytorch/audio/pull/2358

Reviewed By: nateanl, xiaohui-zhang

Differential Revision: D36137992

Pulled By: hwangjeff

fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063

2c79b55a

Add citations for datasets (#2371) · 638120ca

Caroline Chen authored May 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371

Reviewed By: xiaohui-zhang

Differential Revision: D36246167

Pulled By: carolineechen

fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4

638120ca

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

24 Mar, 2022 1 commit

Update CTC decoder docs and add citation (#2278) · 05592dff

Caroline Chen authored Mar 24, 2022

Summary:
rendered:
- [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html)
- [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2278

Reviewed By: mthrok

Differential Revision: D35097734

Pulled By: carolineechen

fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c

05592dff

25 Feb, 2022 1 commit

Add mvdr_weights_souden to torchaudio.functional (#2228) · 5d06a369

Zhaoheng Ni authored Feb 25, 2022

Summary:
This PR adds ``mvdr_weights_souden`` method to ``torchaudio.functional``.
It computes the MVDR weight matrix based on the solution proposed by [``Souden et, al.``](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
The input arguments are the complex-valued power spectral density (PSD) matrix of the target speech, PSD matrix of noise, int or one-hot Tensor to indicate the reference channel, respectively.

Pull Request resolved: https://github.com/pytorch/audio/pull/2228

Reviewed By: mthrok

Differential Revision: D34474018

Pulled By: nateanl

fbshipit-source-id: 725df812f8f6e6cc81cc37e8c3cb0da2ab3b74fb

5d06a369

23 Dec, 2021 1 commit

Introduce Conformer (#2068) · 1b17b011

hwangjeff authored Dec 22, 2021

Summary:
Adds implementation of Conformer module.

Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770.

Pull Request resolved: https://github.com/pytorch/audio/pull/2068

Reviewed By: mthrok

Differential Revision: D33236957

Pulled By: hwangjeff

fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6

1b17b011

25 Oct, 2021 1 commit
- Add pretrained French ASR from voxpopuli (#1919) · cbf267c3
  moto authored Oct 25, 2021
  
  cbf267c3
16 Oct, 2021 1 commit
- Add SpecAugment figure/citation (#1887) · 9e3778d2
  moto authored Oct 16, 2021
  
  9e3778d2
15 Oct, 2021 1 commit

Add TTS bundle/pipelines (#1872) · e885204e

moto authored Oct 15, 2021

Future work items:
- length computation of GriffinLim
- better way to make InverseMelScale work in inference_mode

e885204e

06 Oct, 2021 1 commit

Introduce Emformer (#1801) · 48cfbf2b

hwangjeff authored Oct 06, 2021

Adds an implementation of Emformer, a memory-efficient transformer architecture
introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency
streaming speech recognition applications.

48cfbf2b

05 Oct, 2021 1 commit
- Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 358e9e93
  moto authored Oct 05, 2021
  
  358e9e93
28 Sep, 2021 1 commit

Add HuBERT model architectures (#1769) · a7854f33

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

a7854f33

20 Sep, 2021 1 commit
- Move MVDR and PSD modules to transforms (#1771) · ac97ad82
  nateanl authored Sep 20, 2021
  
  ac97ad82
12 Aug, 2021 1 commit
- Add prototype.tacotron2 page to docs (#1695) · 9c641849
  yangarbiter authored Aug 12, 2021
  
  9c641849
20 Jul, 2021 1 commit

Add Tacotron2 model (#1621) · 394d617e

yangarbiter authored Jul 20, 2021

Porting Tacotron2 from https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/tacotron2/model.py

394d617e

03 Jun, 2021 1 commit

Update docs (#1550) · 0166a851

moto authored Jun 03, 2021

* Use `bibtex` for paper citations.
  * add `override.css` for fixing back reference.
  * wav2vec2
  * wav2letter
  * convtasnet
  * deepspeech
  * rnnt-loss
  * griffinlim
* Fix broken references in `filtering`.
* Fix note in soundfile backends.
* Tweak wav2vec2 example.
* Removes unused `pytorch_theme.css`

0166a851