Commits · 9a34e7c0e2368e30d261f47fcdb99bfc6163adcc · OpenDAS / Torchaudio

06 Oct, 2021 4 commits

Add DR-VCTK dataset (#1819) · 9a34e7c0
kingyiusuen authored Oct 06, 2021

9a34e7c0

Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · e40c9c3c

moto authored Oct 06, 2021

Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53

e40c9c3c

Introduce Emformer (#1801) · 48cfbf2b

hwangjeff authored Oct 06, 2021

Adds an implementation of Emformer, a memory-efficient transformer architecture
introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency
streaming speech recognition applications.

48cfbf2b

Add the rest of HuBERT pretrained models (#1824) · c9e4c75d
moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
c9e4c75d

05 Oct, 2021 2 commits
- [BC-Breaking] Remove deprecated VCTK (#1825) · fc4f481b
  moto authored Oct 05, 2021
  
  fc4f481b
- Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 358e9e93
  moto authored Oct 05, 2021
  
  358e9e93
29 Sep, 2021 1 commit

Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · 5c01c25f

moto authored Sep 29, 2021

* Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`

In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
and ones for fine-tuning models (pretraining model + extra Linear module).

I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
but did not feel right, because the architecture code is more generic.
Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
it does not have to be ASR.
This became more evident as we add pre-trained parameters support, such as #1799.
It matters more for the weight files that for which task and on which dataset it was
trained on. For factory function, ASR task is not relevant.

Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.

Note: Since the new functions are not release yet, this PR itself is not BC-breaking.

5c01c25f

28 Sep, 2021 1 commit

Add HuBERT model architectures (#1769) · a7854f33

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

a7854f33

24 Sep, 2021 1 commit

[BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4

moto authored Sep 24, 2021

* [BC-Breaking] Split pretraining and finetuning factory functions

Previously, factory functions of wav2vec2 only generated the architecture
for the fine-tuning architecture used in wav2ve2 paper for ASR task.
That is, pre-training architecture + Linear module, and it did not
provide a straightforward way to generate architectures for pre-training.

The goal of the original implementation was to allow the inference of
wav2vec2 in non-Python environment via TorchScript. Now we would like to
expand it to pre-training/fine-tuning and HuBERT model as well.

Therefore, we need to have factory functions for both pre-training and
fine-tuning. This commit introduces new factory functions and separate
functions for pre-training and fine-tuning.

1. New functions for ASR fine-tuning.

We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
used for the fine-tuning task in wav2vec2 paper. *1

2. Re-purpse the old functions

The existing functions, `wav2vec2_XXX`, now generates the architecture with
pre-trainig module only. (no Linear module)

Note
*1 This architecture is just one way to define architecture for fine-tuning
and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
designed to provide these specific fine-tuning configuration and they are not
meant to support generic architecture for downstream task.

b2e9f1e4

20 Sep, 2021 1 commit
- Move MVDR and PSD modules to transforms (#1771) · ac97ad82
  nateanl authored Sep 20, 2021
  
  ac97ad82
17 Sep, 2021 1 commit
- [DOC] Fix model subsections (#1775) · 88ca1e05
  moto authored Sep 17, 2021
  
  88ca1e05
01 Sep, 2021 1 commit
- Add edit_distance to documentation with a new category Metric (#1743) · d579d4b2
  yangarbiter authored Sep 01, 2021
  
  d579d4b2
26 Aug, 2021 1 commit
- [Docs] Update sphinx to 3.5.4 (#1685) · 38528cf6
  nateanl authored Aug 26, 2021
  
  38528cf6
23 Aug, 2021 1 commit
- Refactor WaveRNN infer and move it to the codebase (#1704) · 3bb5feb5
  yangarbiter authored Aug 23, 2021
  
  3bb5feb5
20 Aug, 2021 2 commits
- Add sections to transforms docs (#1720) · ecfaac11
  Caroline Chen authored Aug 20, 2021
  
  ecfaac11
- Add basic filtfilt implementation (#1681) · 496b381a
  hwangjeff authored Aug 20, 2021
```
* Add basic filtfilt implementation

* Add filtfilt to functional package; add tests
Co-authored-by: V G <vladislav.goncharenko@phystech.edu>
```
  496b381a
19 Aug, 2021 1 commit
- Move RNNT Loss out of prototype (#1711) · 2c115821
  Caroline Chen authored Aug 19, 2021
  
  2c115821
18 Aug, 2021 1 commit
- Move Tacotron2 out of prototype (#1714) · 352d63c5
  yangarbiter authored Aug 17, 2021
  
  352d63c5
14 Aug, 2021 1 commit
- Add doc for InverseSpectrogram (#1706) · ee74056f
  nateanl authored Aug 14, 2021
  
  ee74056f
12 Aug, 2021 1 commit
- Add prototype.tacotron2 page to docs (#1695) · 9c641849
  yangarbiter authored Aug 12, 2021
  
  9c641849
02 Aug, 2021 2 commits
- Add CMUDict dataset (#1627) · 077a5f4a
  yangarbiter authored Aug 02, 2021
  
  077a5f4a
- Add melscale_fbanks and deprecate create_fb_matrix (#1653) · 83dc5ec7
  Joel Frank authored Aug 02, 2021
```
- Renamed torchaudio.functional.create_fb_matrix to torchaudio.functional.melscale_fbanks.
- Added interface with a warning for create_fb_matrix
```
  83dc5ec7
31 Jul, 2021 1 commit
- Prep to rename default branch to `main` (#1659) · 8a347b62
  Nikita Shulga authored Jul 31, 2021
  
  8a347b62
29 Jul, 2021 1 commit
- Add LFCC feature to transforms (#1611) · 86370639
  Joel Frank authored Jul 29, 2021
```
Summary:
- Add linear_fbank method
- Add LFCC in transforms
```
  86370639
20 Jul, 2021 2 commits
- Add pretrained weights for wavernn (#1612) · 8ec6b873
  yangarbiter authored Jul 20, 2021
  
  8ec6b873
- Add Tacotron2 model (#1621) · 394d617e
  yangarbiter authored Jul 20, 2021
```
Porting Tacotron2 from https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/tacotron2/model.py
```
  394d617e
16 Jul, 2021 1 commit
- Add PitchShift to functional and transform (#1629) · f5dbb002
  nateanl authored Jul 16, 2021
  
  f5dbb002
04 Jun, 2021 2 commits
- Remove unnecessary override CSS (#1554) · afb6626c
  moto authored Jun 04, 2021
  
  afb6626c
- [BC-Breaking] Remove kaldi.resample_waveform (#1555) · 30de797c
  moto authored Jun 04, 2021
```
`torchaudio.compliance.kaldi.resample_waveform` has been replaced with `torchaudio.funcitonal.resample`.
```
  30de797c
03 Jun, 2021 1 commit

Update docs (#1550) · 0166a851

moto authored Jun 03, 2021

* Use `bibtex` for paper citations.
  * add `override.css` for fixing back reference.
  * wav2vec2
  * wav2letter
  * convtasnet
  * deepspeech
  * rnnt-loss
  * griffinlim
* Fix broken references in `filtering`.
* Fix note in soundfile backends.
* Tweak wav2vec2 example.
* Removes unused `pytorch_theme.css`

0166a851

02 Jun, 2021 1 commit
- Reformat resample docs (#1548) · a87b33db
  Caroline Chen authored Jun 02, 2021
  
  a87b33db
01 Jun, 2021 1 commit
- Add wav2vec2 fairseq importer (#1531) · f1a0b605
  moto authored Jun 01, 2021
  
  f1a0b605
27 May, 2021 2 commits

Add wav2vec2 HuggingFace importer (#1530) · c8239c64
moto authored May 27, 2021

c8239c64

Add wav2vec2.0 model (#1529) · e6886a4d

moto authored May 27, 2021

- TorchScript-able `Wav2Vec2Model` class
- Factory functions for three configurations presented in the paper 
  - `wav2vec2_base`
  - `wav2vec2_large`
  - `wav2vec2_large_lv60k`

e6886a4d

11 May, 2021 1 commit
- Add vanilla DeepSpeech model (#1399) · 1f136671
  discort authored May 12, 2021
```
Co-authored-by: Vincent Quenneville-Belair <vincentqb@gmail.com>
```
  1f136671
30 Apr, 2021 1 commit

Replace existing prototype RNNT Loss (#1479) · 0c263a93

Caroline Chen authored Apr 30, 2021

Replace the prototype RNNT implementation (using warp-transducer) with one without external library dependencies

0c263a93

21 Apr, 2021 1 commit
- Add Google Analytics support (#1466) · 9d50acf3
  Nicolas Hug authored Apr 21, 2021
  
  9d50acf3
22 Mar, 2021 1 commit
- Move resample to functional and add librosa comparison (#1402) · 14dd917e
  Caroline Chen authored Mar 22, 2021
```
This PR additionally adds batching to kaldi compliance resample interface.
```
  14dd917e
12 Mar, 2021 1 commit
- Fix Lint doc config (#1385) · 6d81ab8b
  Rahul Amaram authored Mar 12, 2021
  
  6d81ab8b
05 Mar, 2021 1 commit
- Remove deprecated load_wav functions (#1362) · 436470c2
  Isaac Seessel authored Mar 05, 2021
  
  436470c2