Commits · 49c48f931a1e349d354158bd58e223fde776beee · OpenDAS / Torchaudio

11 Oct, 2021 5 commits
- Fix the main loop of tacotron2 decoder inference (#1849) · 49c48f93
  moto authored Oct 11, 2021
```
To handle batched input properly.
```
  49c48f93
- Clean up constructor of CMUDict (#1852) · ab97afa0
  moto authored Oct 11, 2021
  
  ab97afa0
- Avoid concatenation in loop (#1850) · f18d01a0
  moto authored Oct 11, 2021
  
  f18d01a0
- Replace custom padding with torch's native impl (#1846) · 6321adcf
  moto authored Oct 10, 2021
  
  6321adcf
- Store n_bits in WaveRNN (#1847) · 498722b5
  moto authored Oct 10, 2021
```
Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.
```
  498722b5
09 Oct, 2021 1 commit
- Refactor WaveRNNInferenceWrapper (#1845) · 202bc4f2
  moto authored Oct 08, 2021
  
  202bc4f2
08 Oct, 2021 10 commits

Replace `text` with `token` in Tacotron2 API (#1844) · 9f9b6537
moto authored Oct 08, 2021

9f9b6537
Default pretrained weights to eval mode (#1843) · cb77a86c
moto authored Oct 08, 2021

cb77a86c
Update Tacotron2 docs (#1840) · 486022e9
hwangjeff authored Oct 08, 2021

486022e9
Rename utterance to transcript in datasets (#1841) · 9bbd4600
hwangjeff authored Oct 08, 2021

9bbd4600
Make `text_length` optional in `Tacotron2.infer` (#1839) · 94027791
moto authored Oct 08, 2021

94027791
Add customization support to wav2vec2 labels (#1834) · b1838cfc
moto authored Oct 07, 2021

b1838cfc
Add license to pre-trained model doc (#1836) · 01764dee
moto authored Oct 07, 2021

01764dee
[doc] List all the pre-trained models on right bar (#1828) · a43cee71
moto authored Oct 07, 2021

a43cee71

Merge factory functions of pre-training model and fine-tuned model (#1830) · 3e5cbc0a

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

3e5cbc0a

Make the core wav2vec2 factory function public (#1829) · 0582e73c

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

0582e73c

07 Oct, 2021 4 commits
- Standardize tensor shapes format in docs (#1838) · 8f270d09
  Caroline Chen authored Oct 07, 2021
  
  8f270d09
- [Cherry-picked 0.10] Move LibriMix dataset to datasets directory (#1833) · dc0990c7
  nateanl authored Oct 07, 2021
  
  dc0990c7
- Update RNNT Loss docs and add example (#1835) · e6fccfda
  Caroline Chen authored Oct 07, 2021
  
  e6fccfda
- Training recipe for ConvTasNet on Libri2Mix dataset. (#1757) · 5656d5d4
  nateanl authored Oct 05, 2021
  
  5656d5d4
06 Oct, 2021 3 commits
- Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · 5b1cd9a6
  moto authored Oct 06, 2021
```
Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53
```
  5b1cd9a6
- Add the rest of HuBERT pretrained models (#1824) · 384e4471
  moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
  384e4471
- Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 38c5b10f
  moto authored Oct 05, 2021
  
  38c5b10f
05 Oct, 2021 7 commits

Deprecate data utils (#1809) · 407df37d

moto authored Sep 30, 2021

* Deprecate data utils

- The design criteria of diskcache_iterator and bg_iterator are not well-specified
- The implementation does not improve the performance due to GIL and threading

407df37d

Deprecate VCTK (#1810) · 93e7f02f
moto authored Sep 30, 2021

93e7f02f

Fix HuBERT xlarge configuration and test (#1811) · 5b07c33e

moto authored Oct 01, 2021

1. Fix the HuBERT xlarge model config
2. In the 48 transformer layers of HuBERT xlarge model, very few elements deviate from the equivalent model of fairseq, and exceeds the default atol 1e-5. This commit relax it to 3e-5 for the specific test.

5b07c33e

Skip hubert_xlarge TS test on Windows (#1807) · 3b292ce3
moto authored Sep 30, 2021
```
Writing scripted HuBERT XLarge models fail on Windows CI.
```
3b292ce3

Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · dacd3fd4

moto authored Sep 29, 2021

* Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`

In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
and ones for fine-tuning models (pretraining model + extra Linear module).

I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
but did not feel right, because the architecture code is more generic.
Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
it does not have to be ASR.
This became more evident as we add pre-trained parameters support, such as #1799.
It matters more for the weight files that for which task and on which dataset it was
trained on. For factory function, ASR task is not relevant.

Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.

Note: Since the new functions are not release yet, this PR itself is not BC-breaking.

dacd3fd4

Skip hubert_asr_xlarge TS test on Windows (#1800) · a4974c4c
moto authored Sep 29, 2021

a4974c4c

Add HuBERT model architectures (#1769) · 7438f325

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

7438f325

04 Oct, 2021 1 commit
- Set release and base PyTorch version (#1816) · 86aeec10
  Caroline Chen authored Oct 04, 2021
  
  86aeec10
27 Sep, 2021 1 commit

Enable audio windows cuda tests (#1777) · d98c8847

Yi Zhang authored Sep 28, 2021

* enable windows cudatests

* add this dir

* minor change

* vs integration

* Update cuda_install.bat

* add logs

* minor change

* minor change

* cp vision conda activate

* mv vc_env_helper.bat

* minor change

* exit if cuda not avaiable

* install numpy

* improt CMakeLists

* check cuda

* minor change

* change windows GPU image from previous to stable

* set libtorch audio suffix as pyd on Windows

* reduce changes

* check env settings

d98c8847

26 Sep, 2021 1 commit
- Add equations to MVDR docstring (#1789) · b6a0434a
  nateanl authored Sep 26, 2021
  
  b6a0434a
25 Sep, 2021 1 commit
- [doc] Fix return type of wav2vec2 model (#1790) · 78d41d57
  moto authored Sep 25, 2021
  
  78d41d57
24 Sep, 2021 4 commits

[BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4

moto authored Sep 24, 2021

* [BC-Breaking] Split pretraining and finetuning factory functions

Previously, factory functions of wav2vec2 only generated the architecture
for the fine-tuning architecture used in wav2ve2 paper for ASR task.
That is, pre-training architecture + Linear module, and it did not
provide a straightforward way to generate architectures for pre-training.

The goal of the original implementation was to allow the inference of
wav2vec2 in non-Python environment via TorchScript. Now we would like to
expand it to pre-training/fine-tuning and HuBERT model as well.

Therefore, we need to have factory functions for both pre-training and
fine-tuning. This commit introduces new factory functions and separate
functions for pre-training and fine-tuning.

1. New functions for ASR fine-tuning.

We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
used for the fine-tuning task in wav2vec2 paper. *1

2. Re-purpse the old functions

The existing functions, `wav2vec2_XXX`, now generates the architecture with
pre-trainig module only. (no Linear module)

Note
*1 This architecture is just one way to define architecture for fine-tuning
and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
designed to provide these specific fine-tuning configuration and they are not
meant to support generic architecture for downstream task.

b2e9f1e4

Fix build on Windows with CUDA (#1787) · cf0adb28
Yi Zhang authored Sep 24, 2021
```
This commit fixes the local build on Windows with CUDA.
```
cf0adb28
Add MVDR beamforming tutorial to example directory (#1768) · 8d83a2f4
nateanl authored Sep 24, 2021

8d83a2f4
set libtorch audio suffix as pyd on Windows (#1788) · 56a010b0
Yi Zhang authored Sep 24, 2021

56a010b0

23 Sep, 2021 1 commit
- update win gpu image from previous to stable (#1786) · c69955c6
  Yi Zhang authored Sep 23, 2021
  
  c69955c6
22 Sep, 2021 1 commit

[BC-Breaking] Move fine-tune specific module out of wav2vec2 encoder (#1782) · 40f2a085

moto authored Sep 22, 2021

Previously, the Linear module (called `readout`, which is used only for an ASR fine-tuning
task) was placed in encoder module. Conceptually, the encoder has nothing to
do with a module specific to fine-tuning / downstream task.

The problems here are that;
1. encoder can be also used in pre-training phase, in which such a module should
not present
2. The choice of Linear module is arbitral, and it is inconvenient for users
to have hard-coded module structure in encoder.

Therefore, this commit moves the Linear module out the encoder, and places it
as `aux` attribute of `Wav2Vec2Model`. (as a result `Wav2Vec2Model` has
`feature_extractor`, `encoder` and `aux` attributes.)

An alternative approach is to define another module and place `Wav2Vec2Model`
and aux module along each other. But that will introduce a new class we need
to maintain.
The expected use of `aux` is only  for 1. loading the pre-trained parameters 
published by `fairseq` (and it's variations from HF) and 2. creating the same model 
architectures for comparison experiment.
The newly introduced class will not be general enough for downstream adaptations, 
where there will be a bunch of different more complicated models. (i.e. s3prl)

Therefore, based on the minimalistic approach, we put them inside of `Wav2Vec2Model`.

40f2a085