Commits · fd7fcf939f214223ea4abe92b7c482cd5a17c579 · OpenDAS / Torchaudio

08 Oct, 2021 1 commit
- Add customization support to wav2vec2 labels (#1834) · fd7fcf93
  moto authored Oct 07, 2021
  
  fd7fcf93
07 Oct, 2021 7 commits

Standardize tensor shapes format in docs (#1838) · 21a0d29e
Caroline Chen authored Oct 07, 2021

21a0d29e
[Cherry-picked 0.10] Move LibriMix dataset to datasets directory (#1833) · d857348f
nateanl authored Oct 07, 2021

d857348f
Add license to pre-trained model doc (#1836) · f9663a7b
moto authored Oct 07, 2021

f9663a7b
Update RNNT Loss docs and add example (#1835) · 33a655fd
Caroline Chen authored Oct 07, 2021

33a655fd

Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

274ada80

[doc] List all the pre-trained models on right bar (#1828) · 60aeb78a
moto authored Oct 07, 2021

60aeb78a

Make the core wav2vec2 factory function public (#1829) · 31a69c36

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

31a69c36

06 Oct, 2021 7 commits
- Add DR-VCTK dataset (#1819) · 9a34e7c0
  kingyiusuen authored Oct 06, 2021
  
  9a34e7c0
- Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · e40c9c3c
  moto authored Oct 06, 2021
```
Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53
```
  e40c9c3c
- Introduce Emformer (#1801) · 48cfbf2b
  hwangjeff authored Oct 06, 2021
```
Adds an implementation of Emformer, a memory-efficient transformer architecture 
introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency 
streaming speech recognition applications.
```
  48cfbf2b
- Add OpenMP support (#1761) · e3734fef
  moto authored Oct 06, 2021
  
  e3734fef
- Add the rest of HuBERT pretrained models (#1824) · c9e4c75d
  moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
  c9e4c75d
- Rename build_tools to tools (#1812) · 181f0c80
  moto authored Oct 05, 2021
  
  181f0c80
- Remove deprecated dataset utils (#1826) · 1efba850
  moto authored Oct 05, 2021
  
  1efba850
05 Oct, 2021 5 commits
- [BC-Breaking] Remove deprecated VCTK (#1825) · fc4f481b
  moto authored Oct 05, 2021
  
  fc4f481b
- Replace dropout with Dropout (#1815) · 3f562547
  moto authored Oct 05, 2021
  
  3f562547
- [fbsync] torchaudio: torch.quantization -> torch.ao.quantization (#1823) · 02def7c4
  moto authored Oct 05, 2021
```
Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/1817



This changes the imports in the `torchaudio` to include the new import locations.

```
  codemod -d pytorch/audio --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Reviewed By: mthrok

Differential Revision: D31302450

fbshipit-source-id: f31a0d4f453f840ea690edb688555a9d585787b5
Co-authored-by: Zafar Takhirov <zaf@fb.com>
```
  02def7c4
- Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 358e9e93
  moto authored Oct 05, 2021
  
  358e9e93
- Training recipe for ConvTasNet on Libri2Mix dataset. (#1757) · 8c262c14
  nateanl authored Oct 05, 2021
  
  8c262c14
01 Oct, 2021 1 commit

Fix HuBERT xlarge configuration and test (#1811) · 13b2349a

moto authored Oct 01, 2021

1. Fix the HuBERT xlarge model config
2. In the 48 transformer layers of HuBERT xlarge model, very few elements deviate from the equivalent model of fairseq, and exceeds the default atol 1e-5. This commit relax it to 3e-5 for the specific test.

13b2349a

30 Sep, 2021 5 commits

Deprecate data utils (#1809) · d64648b6

moto authored Sep 30, 2021

* Deprecate data utils

- The design criteria of diskcache_iterator and bg_iterator are not well-specified
- The implementation does not improve the performance due to GIL and threading

d64648b6

Skip hubert_xlarge TS test on Windows (#1807) · 8686af1f
moto authored Sep 30, 2021
```
Writing scripted HuBERT XLarge models fail on Windows CI.
```
8686af1f
Add unmaintained warnings (#1813) · 95e82ea9
moto authored Sep 30, 2021

95e82ea9
Deprecate VCTK (#1810) · bbb52631
moto authored Sep 30, 2021

bbb52631

Replace issue templates with new issue forms (#1802) · 13192db1

Nicolas Hug authored Sep 30, 2021



* Add enw issue forms

* forgot one torchvision left

* audio-specific bug

* Apply suggestions from code review
Co-authored-by: moto <855818+mthrok@users.noreply.github.com>
Co-authored-by: moto <855818+mthrok@users.noreply.github.com>

13192db1

29 Sep, 2021 4 commits

Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · 5c01c25f

moto authored Sep 29, 2021

* Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`

In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
and ones for fine-tuning models (pretraining model + extra Linear module).

I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
but did not feel right, because the architecture code is more generic.
Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
it does not have to be ASR.
This became more evident as we add pre-trained parameters support, such as #1799.
It matters more for the weight files that for which task and on which dataset it was
trained on. For factory function, ASR task is not relevant.

Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.

Note: Since the new functions are not release yet, this PR itself is not BC-breaking.

5c01c25f

update windows cuda installer 11.1.0 to 11.1.1 (#1795) · 4a735b8e
Yi Zhang authored Sep 29, 2021
```
* 11.1.0 to 11.1.1

* Fix typo
```
4a735b8e
[fbsync] Remove trailing whitespace (#1803) · b75e3bb9
Caroline Chen authored Sep 29, 2021

b75e3bb9
Skip hubert_asr_xlarge TS test on Windows (#1800) · a7bdedae
moto authored Sep 29, 2021

a7bdedae

28 Sep, 2021 1 commit

Add HuBERT model architectures (#1769) · a7854f33

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

a7854f33

27 Sep, 2021 2 commits

Update the main version to 0.11.0 (#1793) · ecd068f5
Nikita Shulga authored Sep 27, 2021
```
Also update README.md to mention 1.9.1
```
ecd068f5

Enable audio windows cuda tests (#1777) · d98c8847

Yi Zhang authored Sep 28, 2021

* enable windows cudatests

* add this dir

* minor change

* vs integration

* Update cuda_install.bat

* add logs

* minor change

* minor change

* cp vision conda activate

* mv vc_env_helper.bat

* minor change

* exit if cuda not avaiable

* install numpy

* improt CMakeLists

* check cuda

* minor change

* change windows GPU image from previous to stable

* set libtorch audio suffix as pyd on Windows

* reduce changes

* check env settings

d98c8847

26 Sep, 2021 1 commit
- Add equations to MVDR docstring (#1789) · b6a0434a
  nateanl authored Sep 26, 2021
  
  b6a0434a
25 Sep, 2021 1 commit
- [doc] Fix return type of wav2vec2 model (#1790) · 78d41d57
  moto authored Sep 25, 2021
  
  78d41d57
24 Sep, 2021 4 commits

[BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4

moto authored Sep 24, 2021

* [BC-Breaking] Split pretraining and finetuning factory functions

Previously, factory functions of wav2vec2 only generated the architecture
for the fine-tuning architecture used in wav2ve2 paper for ASR task.
That is, pre-training architecture + Linear module, and it did not
provide a straightforward way to generate architectures for pre-training.

The goal of the original implementation was to allow the inference of
wav2vec2 in non-Python environment via TorchScript. Now we would like to
expand it to pre-training/fine-tuning and HuBERT model as well.

Therefore, we need to have factory functions for both pre-training and
fine-tuning. This commit introduces new factory functions and separate
functions for pre-training and fine-tuning.

1. New functions for ASR fine-tuning.

We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
used for the fine-tuning task in wav2vec2 paper. *1

2. Re-purpse the old functions

The existing functions, `wav2vec2_XXX`, now generates the architecture with
pre-trainig module only. (no Linear module)

Note
*1 This architecture is just one way to define architecture for fine-tuning
and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
designed to provide these specific fine-tuning configuration and they are not
meant to support generic architecture for downstream task.

b2e9f1e4

Fix build on Windows with CUDA (#1787) · cf0adb28
Yi Zhang authored Sep 24, 2021
```
This commit fixes the local build on Windows with CUDA.
```
cf0adb28
Add MVDR beamforming tutorial to example directory (#1768) · 8d83a2f4
nateanl authored Sep 24, 2021

8d83a2f4
set libtorch audio suffix as pyd on Windows (#1788) · 56a010b0
Yi Zhang authored Sep 24, 2021

56a010b0

23 Sep, 2021 1 commit
- update win gpu image from previous to stable (#1786) · c69955c6
  Yi Zhang authored Sep 23, 2021
  
  c69955c6