Commits · 420e84ee0d689b788b25e6e95168da50c70ec90a · OpenDAS / Torchaudio

18 Oct, 2021 3 commits

Update models/pipelines doc (#1894) · 420e84ee

moto authored Oct 18, 2021

1. Override the return type so that Sphinx shows the exported symbols.
   (output model types and input torch.nn.Module)
2. Tweak docs for Tacotron2TTSBundle interfaces
3. Fix for HUBERT_ASR_XLARGE

420e84ee

[DOC] Standardization and minor fixes (#1892) · cb40dd72
Caroline Chen authored Oct 18, 2021

cb40dd72

Update intersphinx inventory (#1893) · 955cdbdc

moto authored Oct 18, 2021

Resolve the following warnings when `make clean html`.

```
parsing bibtex file /torchaudio/docs/source/refs.bib... parsed 26 entries
loading intersphinx inventory from https://docs.python.org/objects.inv...
loading intersphinx inventory from https://docs.scipy.org/doc/numpy/objects.inv...
loading intersphinx inventory from https://pytorch.org/docs/stable/objects.inv...
intersphinx inventory has moved: https://docs.python.org/objects.inv -> https://docs.python.org/3/objects.inv
intersphinx inventory has moved: https://docs.scipy.org/doc/numpy/objects.inv -> https://numpy.org/doc/stable/objects.inv
```

955cdbdc

16 Oct, 2021 3 commits
- Update desciptions of `lengths` parameters (#1890) · 211270db
  moto authored Oct 16, 2021
  
  211270db
- Add filter bank figures (#1891) · 89aeb686
  moto authored Oct 16, 2021
  
  89aeb686
- Add SpecAugment figure/citation (#1887) · 9e3778d2
  moto authored Oct 16, 2021
  
  9e3778d2
15 Oct, 2021 5 commits
- Add TTS bundle/pipelines (#1872) · e885204e
  moto authored Oct 15, 2021
```
Future work items:
- length computation of GriffinLim
- better way to make InverseMelScale work in inference_mode
```
  e885204e
- Remove factory functions of tacotron2 and wavernn (#1874) · 6b8f378b
  moto authored Oct 15, 2021
  
  6b8f378b
- Add sample rate to Wav2Vec2 bundle (#1878) · 5600bd25
  moto authored Oct 15, 2021
  
  5600bd25
- Put pretrained weights to subsection (#1879) · 6c074666
  moto authored Oct 15, 2021
  
  6c074666
- Move wav2vec2 pretrained models to pipelines module (#1876) · fad855cd
  moto authored Oct 15, 2021
```
- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
- Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
- Update base URL
```
  fad855cd
14 Oct, 2021 1 commit
- check torch installation before building package. (#1867) · c22962d1
  Yi Zhang authored Oct 14, 2021
```
* check cuda installation

* check in build.sh

* use USE_CUDA

* Update pkg_helpers.bash

* Fix typo
```
  c22962d1
13 Oct, 2021 4 commits
- Refactor transforms.Fade on GPU computation (#1871) · bf580c75
  nateanl authored Oct 13, 2021
  
  bf580c75
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 25a8adf6
  Caroline Chen authored Oct 13, 2021
  
  25a8adf6
- Add `lengths` param to WaveRNN.infer (#1851) · 483d8fae
  moto authored Oct 13, 2021
  
  483d8fae
- Fix PitchShift docstring (#1866) · a6f9cf8b
  nateanl authored Oct 13, 2021
  
  a6f9cf8b
12 Oct, 2021 3 commits
- Use integer rates in pitch shift resample (#1861) · e8ed8f46
  Caroline Chen authored Oct 12, 2021
  
  e8ed8f46
- USE_CUDA in windows and reduce one vcvarsall (#1854) · e3443b1c
  Yi Zhang authored Oct 12, 2021
  
  e3443b1c
- [BC-Breaking] Replace waveform with specgram in SlidingWindowCmn (#1859) · 0cc28748
  nateanl authored Oct 12, 2021
  
  0cc28748
11 Oct, 2021 6 commits
- Fix the main loop of tacotron2 decoder inference (#1849) · 6b1c712f
  moto authored Oct 11, 2021
```
To handle batched input properly.
```
  6b1c712f
- Use cu113 for unittest_windows_gpu (#1853) · ccc183da
  Yi Zhang authored Oct 12, 2021
```
* set cu113 for unittest_windows_gpu

* fix old logic

* Update .circleci/regenerate.py
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
```
  ccc183da
- Limit Windows GPU testing to CUDA-11.3 only (#1842) · a85f1adc
  Nikita Shulga authored Oct 11, 2021
```
* Limit Windows GPU testing to CUDA-11.3 only

Which is the only CUDA version that planned to be supported on Windows
for the upcoming release

* Move unittests to 11.3 as well
```
  a85f1adc
- Clean up constructor of CMUDict (#1852) · 19f7f971
  moto authored Oct 11, 2021
  
  19f7f971
- Avoid concatenation in loop (#1850) · 3aa0d573
  moto authored Oct 11, 2021
  
  3aa0d573
- Replace custom padding with torch's native impl (#1846) · d93322e8
  moto authored Oct 10, 2021
  
  d93322e8
10 Oct, 2021 1 commit

Store n_bits in WaveRNN (#1847) · 9637c6bf

moto authored Oct 10, 2021

Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.

9637c6bf

09 Oct, 2021 1 commit
- Refactor WaveRNNInferenceWrapper (#1845) · 19f53cf2
  moto authored Oct 08, 2021
  
  19f53cf2
08 Oct, 2021 6 commits
- Replace `text` with `token` in Tacotron2 API (#1844) · 635a4a0a
  moto authored Oct 08, 2021
  
  635a4a0a
- Default pretrained weights to eval mode (#1843) · cd8f87bd
  moto authored Oct 08, 2021
  
  cd8f87bd
- Update Tacotron2 docs (#1840) · 78c382ee
  hwangjeff authored Oct 08, 2021
  
  78c382ee
- Rename utterance to transcript in datasets (#1841) · c38ecd2e
  hwangjeff authored Oct 08, 2021
  
  c38ecd2e
- Make `text_length` optional in `Tacotron2.infer` (#1839) · 976f56e8
  moto authored Oct 08, 2021
  
  976f56e8
- Add customization support to wav2vec2 labels (#1834) · fd7fcf93
  moto authored Oct 07, 2021
  
  fd7fcf93
07 Oct, 2021 7 commits

Standardize tensor shapes format in docs (#1838) · 21a0d29e
Caroline Chen authored Oct 07, 2021

21a0d29e
[Cherry-picked 0.10] Move LibriMix dataset to datasets directory (#1833) · d857348f
nateanl authored Oct 07, 2021

d857348f
Add license to pre-trained model doc (#1836) · f9663a7b
moto authored Oct 07, 2021

f9663a7b
Update RNNT Loss docs and add example (#1835) · 33a655fd
Caroline Chen authored Oct 07, 2021

33a655fd

Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

274ada80

[doc] List all the pre-trained models on right bar (#1828) · 60aeb78a
moto authored Oct 07, 2021

60aeb78a

Make the core wav2vec2 factory function public (#1829) · 31a69c36

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

31a69c36