Commits · a7854f33d51e7a19dc3b96c8d404014d2aa2d1d8 · hehl2 / Torchaudio

28 Sep, 2021 1 commit

Add HuBERT model architectures (#1769) · a7854f33

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

a7854f33

25 Sep, 2021 1 commit
- [doc] Fix return type of wav2vec2 model (#1790) · 78d41d57
  moto authored Sep 25, 2021
  
  78d41d57
24 Sep, 2021 1 commit

[BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4

moto authored Sep 24, 2021

* [BC-Breaking] Split pretraining and finetuning factory functions

Previously, factory functions of wav2vec2 only generated the architecture
for the fine-tuning architecture used in wav2ve2 paper for ASR task.
That is, pre-training architecture + Linear module, and it did not
provide a straightforward way to generate architectures for pre-training.

The goal of the original implementation was to allow the inference of
wav2vec2 in non-Python environment via TorchScript. Now we would like to
expand it to pre-training/fine-tuning and HuBERT model as well.

Therefore, we need to have factory functions for both pre-training and
fine-tuning. This commit introduces new factory functions and separate
functions for pre-training and fine-tuning.

1. New functions for ASR fine-tuning.

We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
used for the fine-tuning task in wav2vec2 paper. *1

2. Re-purpse the old functions

The existing functions, `wav2vec2_XXX`, now generates the architecture with
pre-trainig module only. (no Linear module)

Note
*1 This architecture is just one way to define architecture for fine-tuning
and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
designed to provide these specific fine-tuning configuration and they are not
meant to support generic architecture for downstream task.

b2e9f1e4

22 Sep, 2021 1 commit

[BC-Breaking] Move fine-tune specific module out of wav2vec2 encoder (#1782) · 40f2a085

moto authored Sep 22, 2021

Previously, the Linear module (called `readout`, which is used only for an ASR fine-tuning
task) was placed in encoder module. Conceptually, the encoder has nothing to
do with a module specific to fine-tuning / downstream task.

The problems here are that;
1. encoder can be also used in pre-training phase, in which such a module should
not present
2. The choice of Linear module is arbitral, and it is inconvenient for users
to have hard-coded module structure in encoder.

Therefore, this commit moves the Linear module out the encoder, and places it
as `aux` attribute of `Wav2Vec2Model`. (as a result `Wav2Vec2Model` has
`feature_extractor`, `encoder` and `aux` attributes.)

An alternative approach is to define another module and place `Wav2Vec2Model`
and aux module along each other. But that will introduce a new class we need
to maintain.
The expected use of `aux` is only  for 1. loading the pre-trained parameters 
published by `fairseq` (and it's variations from HF) and 2. creating the same model 
architectures for comparison experiment.
The newly introduced class will not be general enough for downstream adaptations, 
where there will be a bunch of different more complicated models. (i.e. s3prl)

Therefore, based on the minimalistic approach, we put them inside of `Wav2Vec2Model`.

40f2a085

20 Sep, 2021 1 commit

[BC-Breaking] Update `extract_features` of Wav2Vec2Model (#1776) · 78b08c26

moto authored Sep 20, 2021

* [BC-Breaking] Update `extract_features` of Wav2Vec2Model

Originally, `extract_features` method was returning the result from
the convolutional feature extractor module.

The features commonly used in downstream tasks are outputs from intermediate
layers of transformer block in encoder.

This commit update the behavior of `extract_features` to allow selectively
retrieve such features.

78b08c26

02 Sep, 2021 1 commit
- Standardize optional types in docstrings (#1746) · 768432c3
  Caroline Chen authored Sep 02, 2021
  
  768432c3
14 Jun, 2021 1 commit
- add name of paper before reference. (#1575) · e39ece66
  Vincent QB authored Jun 14, 2021
  
  e39ece66
03 Jun, 2021 1 commit

Update docs (#1550) · 0166a851

moto authored Jun 03, 2021

* Use `bibtex` for paper citations.
  * add `override.css` for fixing back reference.
  * wav2vec2
  * wav2letter
  * convtasnet
  * deepspeech
  * rnnt-loss
  * griffinlim
* Fix broken references in `filtering`.
* Fix note in soundfile backends.
* Tweak wav2vec2 example.
* Removes unused `pytorch_theme.css`

0166a851

27 May, 2021 1 commit

Add wav2vec2.0 model (#1529) · e6886a4d

moto authored May 27, 2021

- TorchScript-able `Wav2Vec2Model` class
- Factory functions for three configurations presented in the paper 
  - `wav2vec2_base`
  - `wav2vec2_large`
  - `wav2vec2_large_lv60k`

e6886a4d