Commits · 5859923adf577410f672f72b208f1e4367cef1ca · OpenDAS / Torchaudio

23 Dec, 2021 2 commits

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

Introduce Conformer (#2068) · 1b17b011

hwangjeff authored Dec 22, 2021

Summary:
Adds implementation of Conformer module.

Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770.

Pull Request resolved: https://github.com/pytorch/audio/pull/2068

Reviewed By: mthrok

Differential Revision: D33236957

Pulled By: hwangjeff

fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6

1b17b011

21 Dec, 2021 1 commit

Fix load behavior for 24-bit input (#2084) · 4554d242

moto authored Dec 20, 2021

Summary:
## bug description

When a 24 bits-par-sample audio is loaded via file-like object,
the loaded Tensor is wrong. It was fine if the audio is loaded
from local file.

## The cause of the bug

The core of the sox's decoding mechanism is `sox_read` function,
one of which parameter is the maximum number of samples to decode
from the given buffer.

https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]

The `sox_read` function is called in what is called `drain` effect,
callback and this callback receives output buffer and its size in
byte. The previous implementation passed this size value as
the argument of `sox_read` for the maximum number of samples to
read. Since buffer size is larger than the number of samples fit in
the buffer, `sox_read` function always consumed the entire
buffer. (This behavior is not wrong except when the input is
24 bits-per-sample and file-like object.)

When the input is read from file-like object, inside of drain
callback, new data are fetched via Python's `read` method and
loaded on fixed-size memory region. The size of this memory region
can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
but the default value is 8096.

If the input format is 24 bits-per-sample, the end of memory region
does not necessarily correspond to the end of a valid sample.
When `sox_read` consumes all the data in the buffer region, the data
at the end introduces some unexpected values.
This causes the aforementioned bug

## Fix

Pass proper (better estimated) maximum number of samples decodable to
`sox_read`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2084

Reviewed By: carolineechen

Differential Revision: D33236947

Pulled By: mthrok

fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4

4554d242

30 Nov, 2021 1 commit

Revise Griffin-Lim transform test to reduce execution time (#2037) · 96b1fa72

hwangjeff authored Nov 30, 2021

Summary:
Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time.

For one of the four tests:
Before:
```
test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]

======================== 1 passed in 517.35s (0:08:37) =========================
```

After:
```
test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]

======================== 1 passed in 104.59s (0:01:44) =========================
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2037

Reviewed By: mthrok

Differential Revision: D32726213

Pulled By: hwangjeff

fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e

96b1fa72

24 Nov, 2021 1 commit

Add RNN-T beam search decoder (#2028) · 60a85b50

hwangjeff authored Nov 23, 2021

Summary:
Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference.

Pull Request resolved: https://github.com/pytorch/audio/pull/2028

Reviewed By: mthrok

Differential Revision: D32627919

Pulled By: hwangjeff

fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd

60a85b50

23 Nov, 2021 1 commit

Temporarily skip threadpool test (#2025) · 05ae795a

moto authored Nov 23, 2021

Summary:
The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2025

Reviewed By: nateanl

Differential Revision: D32615933

Pulled By: mthrok

fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f

05ae795a

22 Nov, 2021 2 commits

Relax dtype for MVDR (#2024) · 392a03c8

Zhaoheng Ni authored Nov 22, 2021

Summary:
Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram.

Pull Request resolved: https://github.com/pytorch/audio/pull/2024

Reviewed By: carolineechen

Differential Revision: D32594051

Pulled By: nateanl

fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46

392a03c8

Improve MVDR stability (#2004) · fb2f9538

Zhaoheng Ni authored Nov 22, 2021

Summary:
Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check.

Pull Request resolved: https://github.com/pytorch/audio/pull/2004

Reviewed By: mthrok

Differential Revision: D32539827

Pulled By: nateanl

fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22

fb2f9538

18 Nov, 2021 2 commits

Add Emformer RNN-T model (#2003) · 78ce7010

hwangjeff authored Nov 18, 2021

Summary:
Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model.

Pull Request resolved: https://github.com/pytorch/audio/pull/2003

Reviewed By: nateanl

Differential Revision: D32440879

Pulled By: hwangjeff

fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360

78ce7010

Re-sync with internal repository (#2017) · b4184dc6
Facebook Community Bot authored Nov 18, 2021
```
Co-authored-by: Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
```
b4184dc6

17 Nov, 2021 1 commit

Remove facebook folder in wav2vec unittests (#2015) · 2a5fe5ff

Zhaoheng Ni authored Nov 17, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2015

as titled

Reviewed By: hwangjeff, mthrok

Differential Revision: D32495691

fbshipit-source-id: 60d8a2337585e3147f24ca9f0b6518e30cd9134a

2a5fe5ff

04 Nov, 2021 1 commit
- Doc fixes (#1982) · c670898c
  Caroline Chen authored Nov 04, 2021
  
  c670898c
03 Nov, 2021 2 commits

[BC-Breaking] Drop pseudo complex support from phase_vocoder / TimeStretch (#1957) · d3e146fd
moto authored Nov 03, 2021
```
Following the plan #1337, this commit drops the support for pseudo complex type from `F.phase_vocoder` and `T.TimeStretch`.
```
d3e146fd

[BC-Breaking] Drop pseudo complex support from spectrogram (#1958) · 5ec6ada6

moto authored Nov 03, 2021

Following the plan #1337, this commit drops the support for pseudo complex type from 
`F.spectrogram` and `T.Spectrogram`.

It also deprecates the use of `return_complex` argument.

5ec6ada6

28 Oct, 2021 1 commit
- Remove F.complex_norm and T.ComplexNorm (#1942) · ab50909d
  S Harish authored Oct 28, 2021
  
  ab50909d
22 Oct, 2021 1 commit
- Refactor integration test (#1922) · 19d8f1c2
  moto authored Oct 22, 2021
```
- Make the test support other languages
- Fetch tetst asset on-the-fly
```
  19d8f1c2
13 Oct, 2021 2 commits
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 25a8adf6
  Caroline Chen authored Oct 13, 2021
  
  25a8adf6
- Add `lengths` param to WaveRNN.infer (#1851) · 483d8fae
  moto authored Oct 13, 2021
  
  483d8fae
10 Oct, 2021 1 commit

Store n_bits in WaveRNN (#1847) · 9637c6bf

moto authored Oct 10, 2021

Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.

9637c6bf

08 Oct, 2021 1 commit
- Rename utterance to transcript in datasets (#1841) · c38ecd2e
  hwangjeff authored Oct 08, 2021
  
  c38ecd2e
07 Oct, 2021 2 commits

Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

274ada80

Make the core wav2vec2 factory function public (#1829) · 31a69c36

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

31a69c36

06 Oct, 2021 3 commits
- Add DR-VCTK dataset (#1819) · 9a34e7c0
  kingyiusuen authored Oct 06, 2021
  
  9a34e7c0
- Introduce Emformer (#1801) · 48cfbf2b
  hwangjeff authored Oct 06, 2021
```
Adds an implementation of Emformer, a memory-efficient transformer architecture 
introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency 
streaming speech recognition applications.
```
  48cfbf2b
- Remove deprecated dataset utils (#1826) · 1efba850
  moto authored Oct 05, 2021
  
  1efba850
05 Oct, 2021 3 commits

[BC-Breaking] Remove deprecated VCTK (#1825) · fc4f481b
moto authored Oct 05, 2021

fc4f481b

[fbsync] torchaudio: torch.quantization -> torch.ao.quantization (#1823) · 02def7c4

moto authored Oct 05, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/1817



This changes the imports in the `torchaudio` to include the new import locations.

```
codemod -d pytorch/audio --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Reviewed By: mthrok

Differential Revision: D31302450

fbshipit-source-id: f31a0d4f453f840ea690edb688555a9d585787b5
Co-authored-by: Zafar Takhirov <zaf@fb.com>

02def7c4

Add HUBERT_BASE and HUBERT_ASR_LARGE pretrained models (#1821) · 358e9e93
moto authored Oct 05, 2021

358e9e93

01 Oct, 2021 1 commit

Fix HuBERT xlarge configuration and test (#1811) · 13b2349a

moto authored Oct 01, 2021

1. Fix the HuBERT xlarge model config
2. In the 48 transformer layers of HuBERT xlarge model, very few elements deviate from the equivalent model of fairseq, and exceeds the default atol 1e-5. This commit relax it to 3e-5 for the specific test.

13b2349a

30 Sep, 2021 1 commit
- Skip hubert_xlarge TS test on Windows (#1807) · 8686af1f
  moto authored Sep 30, 2021
```
Writing scripted HuBERT XLarge models fail on Windows CI.
```
  8686af1f
29 Sep, 2021 2 commits

Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · 5c01c25f

moto authored Sep 29, 2021

* Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`

In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
and ones for fine-tuning models (pretraining model + extra Linear module).

I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
but did not feel right, because the architecture code is more generic.
Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
it does not have to be ASR.
This became more evident as we add pre-trained parameters support, such as #1799.
It matters more for the weight files that for which task and on which dataset it was
trained on. For factory function, ASR task is not relevant.

Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.

Note: Since the new functions are not release yet, this PR itself is not BC-breaking.

5c01c25f

Skip hubert_asr_xlarge TS test on Windows (#1800) · a7bdedae
moto authored Sep 29, 2021

a7bdedae

28 Sep, 2021 1 commit

Add HuBERT model architectures (#1769) · a7854f33

moto authored Sep 28, 2021

This commit adds the following HuBERT model architectures

 - `base` (pre-training)
 - `large` (pre-training / fine-tuning)
 - `xlarge` (pre-training / fine-tuning)

Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
With these models, it is possible to 
- import the pre-trained model published by `fairseq` and TorchScript it.
- fine-tune the existing model for downstream task.

a7854f33

24 Sep, 2021 1 commit

[BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4

moto authored Sep 24, 2021

* [BC-Breaking] Split pretraining and finetuning factory functions

Previously, factory functions of wav2vec2 only generated the architecture
for the fine-tuning architecture used in wav2ve2 paper for ASR task.
That is, pre-training architecture + Linear module, and it did not
provide a straightforward way to generate architectures for pre-training.

The goal of the original implementation was to allow the inference of
wav2vec2 in non-Python environment via TorchScript. Now we would like to
expand it to pre-training/fine-tuning and HuBERT model as well.

Therefore, we need to have factory functions for both pre-training and
fine-tuning. This commit introduces new factory functions and separate
functions for pre-training and fine-tuning.

1. New functions for ASR fine-tuning.

We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
used for the fine-tuning task in wav2vec2 paper. *1

2. Re-purpse the old functions

The existing functions, `wav2vec2_XXX`, now generates the architecture with
pre-trainig module only. (no Linear module)

Note
*1 This architecture is just one way to define architecture for fine-tuning
and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
designed to provide these specific fine-tuning configuration and they are not
meant to support generic architecture for downstream task.

b2e9f1e4

22 Sep, 2021 3 commits

[BC-Breaking] Move fine-tune specific module out of wav2vec2 encoder (#1782) · 40f2a085

moto authored Sep 22, 2021

Previously, the Linear module (called `readout`, which is used only for an ASR fine-tuning
task) was placed in encoder module. Conceptually, the encoder has nothing to
do with a module specific to fine-tuning / downstream task.

The problems here are that;
1. encoder can be also used in pre-training phase, in which such a module should
not present
2. The choice of Linear module is arbitral, and it is inconvenient for users
to have hard-coded module structure in encoder.

Therefore, this commit moves the Linear module out the encoder, and places it
as `aux` attribute of `Wav2Vec2Model`. (as a result `Wav2Vec2Model` has
`feature_extractor`, `encoder` and `aux` attributes.)

An alternative approach is to define another module and place `Wav2Vec2Model`
and aux module along each other. But that will introduce a new class we need
to maintain.
The expected use of `aux` is only  for 1. loading the pre-trained parameters 
published by `fairseq` (and it's variations from HF) and 2. creating the same model 
architectures for comparison experiment.
The newly introduced class will not be general enough for downstream adaptations, 
where there will be a bunch of different more complicated models. (i.e. s3prl)

Therefore, based on the minimalistic approach, we put them inside of `Wav2Vec2Model`.

40f2a085

Fix HF model integration (#1781) · e9cab8f8

moto authored Sep 22, 2021

* Fix HF model integration

Previously, when testing wav2vec models from HF transformers, all the model were
instantiated as `Wav2Vec2ForCTC` class, while some of them were supposed to be
`Wav2Vec2Model`.

Fixing this revealed that model importer cannot correctly handle `Wav2Vec2Model` import.

This PR fixes these issues.

e9cab8f8

Update reference from master to main elsewhere (#1784) · 1b4b82e0

moto authored Sep 22, 2021



Summary: Update fairseq reference from master to main elsewhere

Reviewed By: alexeib

Differential Revision: D30938472

fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8
Co-authored-by: Diana Liskovich <dianaml@fb.com>

1b4b82e0

21 Sep, 2021 1 commit

Tweak test name by appending factory function name (#1780) · 5aedcab3

moto authored Sep 21, 2021

Apply tweak around the test names so that it's easier to see which tests are failing.

Before: `test_import_finetuned_model_2`
After: `test_import_finetuned_model_2_wav2vec2_large_lv60k`

5aedcab3

20 Sep, 2021 2 commits

[BC-Breaking] Update `extract_features` of Wav2Vec2Model (#1776) · 78b08c26

moto authored Sep 20, 2021

* [BC-Breaking] Update `extract_features` of Wav2Vec2Model

Originally, `extract_features` method was returning the result from
the convolutional feature extractor module.

The features commonly used in downstream tasks are outputs from intermediate
layers of transformer block in encoder.

This commit update the behavior of `extract_features` to allow selectively
retrieve such features.

78b08c26

Move MVDR and PSD modules to transforms (#1771) · ac97ad82
nateanl authored Sep 20, 2021

ac97ad82