Commits · e885204ea0d343b8bddd7e829b9e7b29e7c5a9bb · OpenDAS / Torchaudio

15 Oct, 2021 5 commits
- Add TTS bundle/pipelines (#1872) · e885204e
  moto authored Oct 15, 2021
```
Future work items:
- length computation of GriffinLim
- better way to make InverseMelScale work in inference_mode
```
  e885204e
- Remove factory functions of tacotron2 and wavernn (#1874) · 6b8f378b
  moto authored Oct 15, 2021
  
  6b8f378b
- Add sample rate to Wav2Vec2 bundle (#1878) · 5600bd25
  moto authored Oct 15, 2021
  
  5600bd25
- Put pretrained weights to subsection (#1879) · 6c074666
  moto authored Oct 15, 2021
  
  6c074666
- Move wav2vec2 pretrained models to pipelines module (#1876) · fad855cd
  moto authored Oct 15, 2021
```
- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
- Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
- Update base URL
```
  fad855cd
14 Oct, 2021 1 commit
- check torch installation before building package. (#1867) · c22962d1
  Yi Zhang authored Oct 14, 2021
```
* check cuda installation

* check in build.sh

* use USE_CUDA

* Update pkg_helpers.bash

* Fix typo
```
  c22962d1
13 Oct, 2021 4 commits
- Refactor transforms.Fade on GPU computation (#1871) · bf580c75
  nateanl authored Oct 13, 2021
  
  bf580c75
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 25a8adf6
  Caroline Chen authored Oct 13, 2021
  
  25a8adf6
- Add `lengths` param to WaveRNN.infer (#1851) · 483d8fae
  moto authored Oct 13, 2021
  
  483d8fae
- Fix PitchShift docstring (#1866) · a6f9cf8b
  nateanl authored Oct 13, 2021
  
  a6f9cf8b
12 Oct, 2021 3 commits
- Use integer rates in pitch shift resample (#1861) · e8ed8f46
  Caroline Chen authored Oct 12, 2021
  
  e8ed8f46
- USE_CUDA in windows and reduce one vcvarsall (#1854) · e3443b1c
  Yi Zhang authored Oct 12, 2021
  
  e3443b1c
- [BC-Breaking] Replace waveform with specgram in SlidingWindowCmn (#1859) · 0cc28748
  nateanl authored Oct 12, 2021
  
  0cc28748
11 Oct, 2021 6 commits
- Fix the main loop of tacotron2 decoder inference (#1849) · 6b1c712f
  moto authored Oct 11, 2021
```
To handle batched input properly.
```
  6b1c712f
- Use cu113 for unittest_windows_gpu (#1853) · ccc183da
  Yi Zhang authored Oct 12, 2021
```
* set cu113 for unittest_windows_gpu

* fix old logic

* Update .circleci/regenerate.py
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
```
  ccc183da
- Limit Windows GPU testing to CUDA-11.3 only (#1842) · a85f1adc
  Nikita Shulga authored Oct 11, 2021
```
* Limit Windows GPU testing to CUDA-11.3 only

Which is the only CUDA version that planned to be supported on Windows
for the upcoming release

* Move unittests to 11.3 as well
```
  a85f1adc
- Clean up constructor of CMUDict (#1852) · 19f7f971
  moto authored Oct 11, 2021
  
  19f7f971
- Avoid concatenation in loop (#1850) · 3aa0d573
  moto authored Oct 11, 2021
  
  3aa0d573
- Replace custom padding with torch's native impl (#1846) · d93322e8
  moto authored Oct 10, 2021
  
  d93322e8
10 Oct, 2021 1 commit

Store n_bits in WaveRNN (#1847) · 9637c6bf

moto authored Oct 10, 2021

Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.

9637c6bf

09 Oct, 2021 1 commit
- Refactor WaveRNNInferenceWrapper (#1845) · 19f53cf2
  moto authored Oct 08, 2021
  
  19f53cf2
08 Oct, 2021 6 commits
- Replace `text` with `token` in Tacotron2 API (#1844) · 635a4a0a
  moto authored Oct 08, 2021
  
  635a4a0a
- Default pretrained weights to eval mode (#1843) · cd8f87bd
  moto authored Oct 08, 2021
  
  cd8f87bd
- Update Tacotron2 docs (#1840) · 78c382ee
  hwangjeff authored Oct 08, 2021
  
  78c382ee
- Rename utterance to transcript in datasets (#1841) · c38ecd2e
  hwangjeff authored Oct 08, 2021
  
  c38ecd2e
- Make `text_length` optional in `Tacotron2.infer` (#1839) · 976f56e8
  moto authored Oct 08, 2021
  
  976f56e8
- Add customization support to wav2vec2 labels (#1834) · fd7fcf93
  moto authored Oct 07, 2021
  
  fd7fcf93
07 Oct, 2021 7 commits

Standardize tensor shapes format in docs (#1838) · 21a0d29e
Caroline Chen authored Oct 07, 2021

21a0d29e
[Cherry-picked 0.10] Move LibriMix dataset to datasets directory (#1833) · d857348f
nateanl authored Oct 07, 2021

d857348f
Add license to pre-trained model doc (#1836) · f9663a7b
moto authored Oct 07, 2021

f9663a7b
Update RNNT Loss docs and add example (#1835) · 33a655fd
Caroline Chen authored Oct 07, 2021

33a655fd

Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

274ada80

[doc] List all the pre-trained models on right bar (#1828) · 60aeb78a
moto authored Oct 07, 2021

60aeb78a

Make the core wav2vec2 factory function public (#1829) · 31a69c36

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

31a69c36

06 Oct, 2021 6 commits
- Add DR-VCTK dataset (#1819) · 9a34e7c0
  kingyiusuen authored Oct 06, 2021
  
  9a34e7c0
- Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · e40c9c3c
  moto authored Oct 06, 2021
```
Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53
```
  e40c9c3c
- Introduce Emformer (#1801) · 48cfbf2b
  hwangjeff authored Oct 06, 2021
```
Adds an implementation of Emformer, a memory-efficient transformer architecture 
introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency 
streaming speech recognition applications.
```
  48cfbf2b
- Add OpenMP support (#1761) · e3734fef
  moto authored Oct 06, 2021
  
  e3734fef
- Add the rest of HuBERT pretrained models (#1824) · c9e4c75d
  moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
  c9e4c75d
- Rename build_tools to tools (#1812) · 181f0c80
  moto authored Oct 05, 2021
  
  181f0c80