Commits · 0582e73ce782f5003f520d69ba286c31ab5aab90 · OpenDAS / Torchaudio

08 Oct, 2021 1 commit

Make the core wav2vec2 factory function public (#1829) · 0582e73c

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

0582e73c

06 Oct, 2021 3 commits
- Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · 5b1cd9a6
  moto authored Oct 06, 2021
```
Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53
```
  5b1cd9a6
- Add the rest of HuBERT pretrained models (#1824) · 384e4471
  moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
  384e4471
- Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 38c5b10f
  moto authored Oct 05, 2021
  
  38c5b10f
05 Oct, 2021 2 commits

Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · dacd3fd4

moto authored Sep 29, 2021

* Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`

In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
and ones for fine-tuning models (pretraining model + extra Linear module).

I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
but did not feel right, because the architecture code is more generic.
Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
it does not have to be ASR.
This became more evident as we add pre-trained parameters support, such as #1799.
It matters more for the weight files that for which task and on which dataset it was
trained on. For factory function, ASR task is not relevant.

Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.

Note: Since the new functions are not release yet, this PR itself is not BC-breaking.

dacd3fd4

Add HuBERT model architectures (#1769) · 7438f325

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

7438f325

24 Sep, 2021 1 commit

[BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4

moto authored Sep 24, 2021

* [BC-Breaking] Split pretraining and finetuning factory functions

Previously, factory functions of wav2vec2 only generated the architecture
for the fine-tuning architecture used in wav2ve2 paper for ASR task.
That is, pre-training architecture + Linear module, and it did not
provide a straightforward way to generate architectures for pre-training.

The goal of the original implementation was to allow the inference of
wav2vec2 in non-Python environment via TorchScript. Now we would like to
expand it to pre-training/fine-tuning and HuBERT model as well.

Therefore, we need to have factory functions for both pre-training and
fine-tuning. This commit introduces new factory functions and separate
functions for pre-training and fine-tuning.

1. New functions for ASR fine-tuning.

We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
used for the fine-tuning task in wav2vec2 paper. *1

2. Re-purpse the old functions

The existing functions, `wav2vec2_XXX`, now generates the architecture with
pre-trainig module only. (no Linear module)

Note
*1 This architecture is just one way to define architecture for fine-tuning
and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
designed to provide these specific fine-tuning configuration and they are not
meant to support generic architecture for downstream task.

b2e9f1e4

17 Sep, 2021 1 commit
- [DOC] Fix model subsections (#1775) · 88ca1e05
  moto authored Sep 17, 2021
  
  88ca1e05
23 Aug, 2021 1 commit
- Refactor WaveRNN infer and move it to the codebase (#1704) · 3bb5feb5
  yangarbiter authored Aug 23, 2021
  
  3bb5feb5
18 Aug, 2021 1 commit
- Move Tacotron2 out of prototype (#1714) · 352d63c5
  yangarbiter authored Aug 17, 2021
  
  352d63c5
20 Jul, 2021 1 commit
- Add pretrained weights for wavernn (#1612) · 8ec6b873
  yangarbiter authored Jul 20, 2021
  
  8ec6b873
03 Jun, 2021 1 commit

Update docs (#1550) · 0166a851

moto authored Jun 03, 2021

* Use `bibtex` for paper citations.
  * add `override.css` for fixing back reference.
  * wav2vec2
  * wav2letter
  * convtasnet
  * deepspeech
  * rnnt-loss
  * griffinlim
* Fix broken references in `filtering`.
* Fix note in soundfile backends.
* Tweak wav2vec2 example.
* Removes unused `pytorch_theme.css`

0166a851

01 Jun, 2021 1 commit
- Add wav2vec2 fairseq importer (#1531) · f1a0b605
  moto authored Jun 01, 2021
  
  f1a0b605
27 May, 2021 2 commits

Add wav2vec2 HuggingFace importer (#1530) · c8239c64
moto authored May 27, 2021

c8239c64

Add wav2vec2.0 model (#1529) · e6886a4d

moto authored May 27, 2021

- TorchScript-able `Wav2Vec2Model` class
- Factory functions for three configurations presented in the paper 
  - `wav2vec2_base`
  - `wav2vec2_large`
  - `wav2vec2_large_lv60k`

e6886a4d

11 May, 2021 1 commit
- Add vanilla DeepSpeech model (#1399) · 1f136671
  discort authored May 12, 2021
```
Co-authored-by: Vincent Quenneville-Belair <vincentqb@gmail.com>
```
  1f136671
01 Oct, 2020 1 commit
- Update model documentation (#933) · 1df9e201
  moto authored Oct 01, 2020
  
  1df9e201
29 Jul, 2020 1 commit
- Add model name in docs (#836) · de1cb83d
  jimchen90 authored Jul 29, 2020
```
Co-authored-by: Ji Chen <jimchen90@devfair0160.h2.fair>
```
  de1cb83d
28 Apr, 2020 1 commit

Add model Wav2Letter (#462) · d678357f

Tomás Osório authored Apr 28, 2020

* add wav2letter model

* add unit_test to model

* add docstrings

* add documentation

* fix minor error, change logic on forward

* update padding same with ceil

* add inline typing and minor fixes to docstrings

* remove python2

* add formula do docstrings, change param name

* add test with mfcc, add pytest

* fix bug, update docstrings

* change parameter name

d678357f