Commits · 217fb684e23b4ec0aae581ca8feccdea4e52a039 · OpenDAS / Torchaudio

15 Oct, 2021 7 commits
- Remove factory functions of tacotron2 and wavernn (#1874) · 217fb684
  moto authored Oct 15, 2021
  
  217fb684
- Add sample rate to Wav2Vec2 bundle (#1878) · 7260ad2e
  moto authored Oct 15, 2021
  
  7260ad2e
- Put pretrained weights to subsection (#1879) · 61594507
  moto authored Oct 15, 2021
  
  61594507
- Move wav2vec2 pretrained models to pipelines module (#1876) · ec23f635
  moto authored Oct 15, 2021
```
- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
- Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
- Update base URL
```
  ec23f635
- Add `lengths` param to WaveRNN.infer (#1851) · 137600d0
  moto authored Oct 13, 2021
  
  137600d0
- Log prototype exclusion (#1882) · ddc49548
  moto authored Oct 15, 2021
  
  ddc49548
- Exclude prototype if it is in release (#1870) · 6bae1e9e
  moto authored Oct 15, 2021
  
  6bae1e9e
13 Oct, 2021 3 commits
- Refactor transforms.Fade on GPU computation (#1871) · 515ad59c
  nateanl authored Oct 13, 2021
  
  515ad59c
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 6858db77
  Caroline Chen authored Oct 13, 2021
  
  6858db77
- Fix PitchShift docstring (#1866) · 38b90e76
  nateanl authored Oct 13, 2021
  
  38b90e76
12 Oct, 2021 2 commits
- Use integer rates in pitch shift resample (#1861) · 4b5db340
  Caroline Chen authored Oct 12, 2021
  
  4b5db340
- [BC-Breaking] Replace waveform with specgram in SlidingWindowCmn (#1859) · 04e0e2ff
  nateanl authored Oct 12, 2021
  
  04e0e2ff
11 Oct, 2021 5 commits
- Fix the main loop of tacotron2 decoder inference (#1849) · 49c48f93
  moto authored Oct 11, 2021
```
To handle batched input properly.
```
  49c48f93
- Clean up constructor of CMUDict (#1852) · ab97afa0
  moto authored Oct 11, 2021
  
  ab97afa0
- Avoid concatenation in loop (#1850) · f18d01a0
  moto authored Oct 11, 2021
  
  f18d01a0
- Replace custom padding with torch's native impl (#1846) · 6321adcf
  moto authored Oct 10, 2021
  
  6321adcf
- Store n_bits in WaveRNN (#1847) · 498722b5
  moto authored Oct 10, 2021
```
Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.
```
  498722b5
09 Oct, 2021 1 commit
- Refactor WaveRNNInferenceWrapper (#1845) · 202bc4f2
  moto authored Oct 08, 2021
  
  202bc4f2
08 Oct, 2021 10 commits

Replace `text` with `token` in Tacotron2 API (#1844) · 9f9b6537
moto authored Oct 08, 2021

9f9b6537
Default pretrained weights to eval mode (#1843) · cb77a86c
moto authored Oct 08, 2021

cb77a86c
Update Tacotron2 docs (#1840) · 486022e9
hwangjeff authored Oct 08, 2021

486022e9
Rename utterance to transcript in datasets (#1841) · 9bbd4600
hwangjeff authored Oct 08, 2021

9bbd4600
Make `text_length` optional in `Tacotron2.infer` (#1839) · 94027791
moto authored Oct 08, 2021

94027791
Add customization support to wav2vec2 labels (#1834) · b1838cfc
moto authored Oct 07, 2021

b1838cfc
Add license to pre-trained model doc (#1836) · 01764dee
moto authored Oct 07, 2021

01764dee
[doc] List all the pre-trained models on right bar (#1828) · a43cee71
moto authored Oct 07, 2021

a43cee71

Merge factory functions of pre-training model and fine-tuned model (#1830) · 3e5cbc0a

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

3e5cbc0a

Make the core wav2vec2 factory function public (#1829) · 0582e73c

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

0582e73c

07 Oct, 2021 4 commits
- Standardize tensor shapes format in docs (#1838) · 8f270d09
  Caroline Chen authored Oct 07, 2021
  
  8f270d09
- [Cherry-picked 0.10] Move LibriMix dataset to datasets directory (#1833) · dc0990c7
  nateanl authored Oct 07, 2021
  
  dc0990c7
- Update RNNT Loss docs and add example (#1835) · e6fccfda
  Caroline Chen authored Oct 07, 2021
  
  e6fccfda
- Training recipe for ConvTasNet on Libri2Mix dataset. (#1757) · 5656d5d4
  nateanl authored Oct 05, 2021
  
  5656d5d4
06 Oct, 2021 3 commits
- Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · 5b1cd9a6
  moto authored Oct 06, 2021
```
Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53
```
  5b1cd9a6
- Add the rest of HuBERT pretrained models (#1824) · 384e4471
  moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
  384e4471
- Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 38c5b10f
  moto authored Oct 05, 2021
  
  38c5b10f
05 Oct, 2021 5 commits

Deprecate data utils (#1809) · 407df37d

moto authored Sep 30, 2021

* Deprecate data utils

- The design criteria of diskcache_iterator and bg_iterator are not well-specified
- The implementation does not improve the performance due to GIL and threading

407df37d

Deprecate VCTK (#1810) · 93e7f02f
moto authored Sep 30, 2021

93e7f02f

Fix HuBERT xlarge configuration and test (#1811) · 5b07c33e

moto authored Oct 01, 2021

1. Fix the HuBERT xlarge model config
2. In the 48 transformer layers of HuBERT xlarge model, very few elements deviate from the equivalent model of fairseq, and exceeds the default atol 1e-5. This commit relax it to 3e-5 for the specific test.

5b07c33e

Skip hubert_xlarge TS test on Windows (#1807) · 3b292ce3
moto authored Sep 30, 2021
```
Writing scripted HuBERT XLarge models fail on Windows CI.
```
3b292ce3

Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · dacd3fd4

moto authored Sep 29, 2021

* Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`

In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
and ones for fine-tuning models (pretraining model + extra Linear module).

I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
but did not feel right, because the architecture code is more generic.
Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
it does not have to be ASR.
This became more evident as we add pre-trained parameters support, such as #1799.
It matters more for the weight files that for which task and on which dataset it was
trained on. For factory function, ASR task is not relevant.

Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.

Note: Since the new functions are not release yet, this PR itself is not BC-breaking.

dacd3fd4