Commits · 57c8b97e0d7d2d98a65c3960e8df794fd19834f0 · OpenDAS / Torchaudio

18 Oct, 2021 3 commits
- Fix vocoder interface (#1895) · 57c8b97e
  moto authored Oct 18, 2021
  
  57c8b97e
- Update models/pipelines doc (#1894) · c4fc8f90
  moto authored Oct 18, 2021
```
1. Override the return type so that Sphinx shows the exported symbols.
   (output model types and input torch.nn.Module)
2. Tweak docs for Tacotron2TTSBundle interfaces
3. Fix for HUBERT_ASR_XLARGE
```
  c4fc8f90
- [DOC] Standardization and minor fixes (#1892) · 70987b01
  Caroline Chen authored Oct 18, 2021
  
  70987b01
16 Oct, 2021 4 commits
- Update desciptions of `lengths` parameters (#1890) · 481d1ecf
  moto authored Oct 16, 2021
  
  481d1ecf
- Add filter bank figures (#1891) · 0a63458e
  moto authored Oct 16, 2021
  
  0a63458e
- Update docs version for release (#1888) · 149213e1
  Caroline Chen authored Oct 16, 2021
  
  149213e1
- Add SpecAugment figure/citation (#1887) · 036c4ae3
  moto authored Oct 16, 2021
  
  036c4ae3
15 Oct, 2021 9 commits
- Update prototype exclusion (#1885) · fa390e1e
  moto authored Oct 15, 2021
  
  fa390e1e
- Add TTS bundle/pipelines (#1872) · d5e44f87
  moto authored Oct 15, 2021
```
Future work items:
- length computation of GriffinLim
- better way to make InverseMelScale work in inference_mode
```
  d5e44f87
- Remove factory functions of tacotron2 and wavernn (#1874) · 217fb684
  moto authored Oct 15, 2021
  
  217fb684
- Add sample rate to Wav2Vec2 bundle (#1878) · 7260ad2e
  moto authored Oct 15, 2021
  
  7260ad2e
- Put pretrained weights to subsection (#1879) · 61594507
  moto authored Oct 15, 2021
  
  61594507
- Move wav2vec2 pretrained models to pipelines module (#1876) · ec23f635
  moto authored Oct 15, 2021
```
- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
- Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
- Update base URL
```
  ec23f635
- Add `lengths` param to WaveRNN.infer (#1851) · 137600d0
  moto authored Oct 13, 2021
  
  137600d0
- Log prototype exclusion (#1882) · ddc49548
  moto authored Oct 15, 2021
  
  ddc49548
- Exclude prototype if it is in release (#1870) · 6bae1e9e
  moto authored Oct 15, 2021
  
  6bae1e9e
13 Oct, 2021 3 commits
- Refactor transforms.Fade on GPU computation (#1871) · 515ad59c
  nateanl authored Oct 13, 2021
  
  515ad59c
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 6858db77
  Caroline Chen authored Oct 13, 2021
  
  6858db77
- Fix PitchShift docstring (#1866) · 38b90e76
  nateanl authored Oct 13, 2021
  
  38b90e76
12 Oct, 2021 2 commits
- Use integer rates in pitch shift resample (#1861) · 4b5db340
  Caroline Chen authored Oct 12, 2021
  
  4b5db340
- [BC-Breaking] Replace waveform with specgram in SlidingWindowCmn (#1859) · 04e0e2ff
  nateanl authored Oct 12, 2021
  
  04e0e2ff
11 Oct, 2021 5 commits
- Fix the main loop of tacotron2 decoder inference (#1849) · 49c48f93
  moto authored Oct 11, 2021
```
To handle batched input properly.
```
  49c48f93
- Clean up constructor of CMUDict (#1852) · ab97afa0
  moto authored Oct 11, 2021
  
  ab97afa0
- Avoid concatenation in loop (#1850) · f18d01a0
  moto authored Oct 11, 2021
  
  f18d01a0
- Replace custom padding with torch's native impl (#1846) · 6321adcf
  moto authored Oct 10, 2021
  
  6321adcf
- Store n_bits in WaveRNN (#1847) · 498722b5
  moto authored Oct 10, 2021
```
Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.
```
  498722b5
09 Oct, 2021 1 commit
- Refactor WaveRNNInferenceWrapper (#1845) · 202bc4f2
  moto authored Oct 08, 2021
  
  202bc4f2
08 Oct, 2021 10 commits

Replace `text` with `token` in Tacotron2 API (#1844) · 9f9b6537
moto authored Oct 08, 2021

9f9b6537
Default pretrained weights to eval mode (#1843) · cb77a86c
moto authored Oct 08, 2021

cb77a86c
Update Tacotron2 docs (#1840) · 486022e9
hwangjeff authored Oct 08, 2021

486022e9
Rename utterance to transcript in datasets (#1841) · 9bbd4600
hwangjeff authored Oct 08, 2021

9bbd4600
Make `text_length` optional in `Tacotron2.infer` (#1839) · 94027791
moto authored Oct 08, 2021

94027791
Add customization support to wav2vec2 labels (#1834) · b1838cfc
moto authored Oct 07, 2021

b1838cfc
Add license to pre-trained model doc (#1836) · 01764dee
moto authored Oct 07, 2021

01764dee
[doc] List all the pre-trained models on right bar (#1828) · a43cee71
moto authored Oct 07, 2021

a43cee71

Merge factory functions of pre-training model and fine-tuned model (#1830) · 3e5cbc0a

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

3e5cbc0a

Make the core wav2vec2 factory function public (#1829) · 0582e73c

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

0582e73c

07 Oct, 2021 3 commits
- Standardize tensor shapes format in docs (#1838) · 8f270d09
  Caroline Chen authored Oct 07, 2021
  
  8f270d09
- [Cherry-picked 0.10] Move LibriMix dataset to datasets directory (#1833) · dc0990c7
  nateanl authored Oct 07, 2021
  
  dc0990c7
- Update RNNT Loss docs and add example (#1835) · e6fccfda
  Caroline Chen authored Oct 07, 2021
  
  e6fccfda