Commits · 5859923adf577410f672f72b208f1e4367cef1ca · OpenDAS / Torchaudio

23 Dec, 2021 2 commits

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

Introduce Conformer (#2068) · 1b17b011

hwangjeff authored Dec 22, 2021

Summary:
Adds implementation of Conformer module.

Adapted from sravyapopuri388's implementation for fairseq at https://github.com/fairinternal/fairseq-py/pull/2770.

Pull Request resolved: https://github.com/pytorch/audio/pull/2068

Reviewed By: mthrok

Differential Revision: D33236957

Pulled By: hwangjeff

fbshipit-source-id: 382d99394996ff5249522b5899e1a4b4a95de9e6

1b17b011

21 Dec, 2021 1 commit

Fix load behavior for 24-bit input (#2084) · 4554d242

moto authored Dec 20, 2021

Summary:
## bug description

When a 24 bits-par-sample audio is loaded via file-like object,
the loaded Tensor is wrong. It was fine if the audio is loaded
from local file.

## The cause of the bug

The core of the sox's decoding mechanism is `sox_read` function,
one of which parameter is the maximum number of samples to decode
from the given buffer.

https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]

The `sox_read` function is called in what is called `drain` effect,
callback and this callback receives output buffer and its size in
byte. The previous implementation passed this size value as
the argument of `sox_read` for the maximum number of samples to
read. Since buffer size is larger than the number of samples fit in
the buffer, `sox_read` function always consumed the entire
buffer. (This behavior is not wrong except when the input is
24 bits-per-sample and file-like object.)

When the input is read from file-like object, inside of drain
callback, new data are fetched via Python's `read` method and
loaded on fixed-size memory region. The size of this memory region
can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
but the default value is 8096.

If the input format is 24 bits-per-sample, the end of memory region
does not necessarily correspond to the end of a valid sample.
When `sox_read` consumes all the data in the buffer region, the data
at the end introduces some unexpected values.
This causes the aforementioned bug

## Fix

Pass proper (better estimated) maximum number of samples decodable to
`sox_read`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2084

Reviewed By: carolineechen

Differential Revision: D33236947

Pulled By: mthrok

fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4

4554d242

30 Nov, 2021 1 commit

Revise Griffin-Lim transform test to reduce execution time (#2037) · 96b1fa72

hwangjeff authored Nov 30, 2021

Summary:
Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time.

For one of the four tests:
Before:
```
test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]

======================== 1 passed in 517.35s (0:08:37) =========================
```

After:
```
test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]

======================== 1 passed in 104.59s (0:01:44) =========================
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2037

Reviewed By: mthrok

Differential Revision: D32726213

Pulled By: hwangjeff

fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e

96b1fa72

24 Nov, 2021 1 commit

Add RNN-T beam search decoder (#2028) · 60a85b50

hwangjeff authored Nov 23, 2021

Summary:
Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference.

Pull Request resolved: https://github.com/pytorch/audio/pull/2028

Reviewed By: mthrok

Differential Revision: D32627919

Pulled By: hwangjeff

fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd

60a85b50

23 Nov, 2021 1 commit

Temporarily skip threadpool test (#2025) · 05ae795a

moto authored Nov 23, 2021

Summary:
The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test.

Pull Request resolved: https://github.com/pytorch/audio/pull/2025

Reviewed By: nateanl

Differential Revision: D32615933

Pulled By: mthrok

fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f

05ae795a

22 Nov, 2021 2 commits

Relax dtype for MVDR (#2024) · 392a03c8

Zhaoheng Ni authored Nov 22, 2021

Summary:
Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram.

Pull Request resolved: https://github.com/pytorch/audio/pull/2024

Reviewed By: carolineechen

Differential Revision: D32594051

Pulled By: nateanl

fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46

392a03c8

Improve MVDR stability (#2004) · fb2f9538

Zhaoheng Ni authored Nov 22, 2021

Summary:
Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check.

Pull Request resolved: https://github.com/pytorch/audio/pull/2004

Reviewed By: mthrok

Differential Revision: D32539827

Pulled By: nateanl

fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22

fb2f9538

18 Nov, 2021 2 commits

Add Emformer RNN-T model (#2003) · 78ce7010

hwangjeff authored Nov 18, 2021

Summary:
Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model.

Pull Request resolved: https://github.com/pytorch/audio/pull/2003

Reviewed By: nateanl

Differential Revision: D32440879

Pulled By: hwangjeff

fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360

78ce7010

Re-sync with internal repository (#2017) · b4184dc6
Facebook Community Bot authored Nov 18, 2021
```
Co-authored-by: Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
```
b4184dc6

17 Nov, 2021 1 commit

Remove facebook folder in wav2vec unittests (#2015) · 2a5fe5ff

Zhaoheng Ni authored Nov 17, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2015

as titled

Reviewed By: hwangjeff, mthrok

Differential Revision: D32495691

fbshipit-source-id: 60d8a2337585e3147f24ca9f0b6518e30cd9134a

2a5fe5ff

04 Nov, 2021 2 commits

Doc fixes (#1982) · c670898c
Caroline Chen authored Nov 04, 2021

c670898c

Consolidate network utils (#1974) · 536e8ac0

moto authored Nov 04, 2021

This commit changes all the `torch.hub` network utility functions to
be imported from `torchaudio._internal`, so that later we can replace
the function within fbcode.

536e8ac0

03 Nov, 2021 3 commits
- [BC-Breaking] Drop pseudo complex support from phase_vocoder / TimeStretch (#1957) · d3e146fd
  moto authored Nov 03, 2021
```
Following the plan #1337, this commit drops the support for pseudo complex type from `F.phase_vocoder` and `T.TimeStretch`.
```
  d3e146fd
- [BC-Breaking] Drop pseudo complex support from spectrogram (#1958) · 5ec6ada6
  moto authored Nov 03, 2021
```
Following the plan #1337, this commit drops the support for pseudo complex type from 
`F.spectrogram` and `T.Spectrogram`.

It also deprecates the use of `return_complex` argument.
```
  5ec6ada6
- Add wav2vec2 ASR English pretrained model from voxpopuli (#1956) · f2eec77b
  moto authored Nov 03, 2021
  
  f2eec77b
02 Nov, 2021 3 commits
- Run integration tests on CI (#1939) · 5594eae6
  moto authored Nov 02, 2021
  
  5594eae6
- Add wav2vec2 ASR Italian pretrained model from voxpopuli (#1954) · 5c8541b7
  moto authored Nov 02, 2021
  
  5c8541b7
- Add wav2vec2 ASR German pretrained model from voxpopuli (#1953) · e15431b7
  moto authored Nov 01, 2021
```
* Add wav2vec2 ASR German pretrained model from voxpopuli
```
  e15431b7
28 Oct, 2021 1 commit
- Remove F.complex_norm and T.ComplexNorm (#1942) · ab50909d
  S Harish authored Oct 28, 2021
  
  ab50909d
27 Oct, 2021 1 commit
- Add wav2vec2 ASR Spanish pretrained model from voxpopuli (#1924) · 3a599315
  moto authored Oct 26, 2021
  
  3a599315
25 Oct, 2021 1 commit
- Add pretrained French ASR from voxpopuli (#1919) · cbf267c3
  moto authored Oct 25, 2021
  
  cbf267c3
22 Oct, 2021 1 commit
- Refactor integration test (#1922) · 19d8f1c2
  moto authored Oct 22, 2021
```
- Make the test support other languages
- Fetch tetst asset on-the-fly
```
  19d8f1c2
21 Oct, 2021 1 commit

[BC-breaking] Remove unused dimension from pretrained Wav2Vec2 ASR (#1914) · ec4837dc

moto authored Oct 21, 2021

* [BC-breaking] Remove unused dimension from pretrained Wav2Vec2 ASR

The Wav2Vec2 ASR pretrained weights originated from fairseq have
extra dimension that have nothing to do with the ASR task.

https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/data/dictionary.py#L18-L37

which is masked during the loss computation as

https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/criterions/ctc.py#L126-L128

This change removes it.

* Use '-' for blank token representation.

ec4837dc

15 Oct, 2021 2 commits

Add TTS bundle/pipelines (#1872) · e885204e

moto authored Oct 15, 2021

Future work items:
- length computation of GriffinLim
- better way to make InverseMelScale work in inference_mode

e885204e

Move wav2vec2 pretrained models to pipelines module (#1876) · fad855cd

moto authored Oct 15, 2021

- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
- Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
- Update base URL

fad855cd

13 Oct, 2021 2 commits
- [BC-Breaking] Ensure integer input frequencies for resample (#1857) · 25a8adf6
  Caroline Chen authored Oct 13, 2021
  
  25a8adf6
- Add `lengths` param to WaveRNN.infer (#1851) · 483d8fae
  moto authored Oct 13, 2021
  
  483d8fae
10 Oct, 2021 1 commit

Store n_bits in WaveRNN (#1847) · 9637c6bf

moto authored Oct 10, 2021

Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.

9637c6bf

08 Oct, 2021 2 commits
- Rename utterance to transcript in datasets (#1841) · c38ecd2e
  hwangjeff authored Oct 08, 2021
  
  c38ecd2e
- Add customization support to wav2vec2 labels (#1834) · fd7fcf93
  moto authored Oct 07, 2021
  
  fd7fcf93
07 Oct, 2021 2 commits

Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80

moto authored Oct 07, 2021

This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.

The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
1. `Wav2Vec2Model.extract_features`
In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
    - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
    - Added dropout parameters.

274ada80

Make the core wav2vec2 factory function public (#1829) · 31a69c36

moto authored Oct 06, 2021

This commit makes the following changes
1. Make the factory function with full customizability public.
    i.e. `_get_model(...) -> wav2vec2_model(...)`.
2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
    i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`

### Why?

While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
  1. Model implementation is responsible for computation logic,
  2. factory functions are responsible for customizability and model construction,
  3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).

(note: for simple models, combining 1 and 2 is also okay.)

This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.

This PR rectifies it by making the mother factory function public.
This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.

31a69c36

06 Oct, 2021 5 commits

Add DR-VCTK dataset (#1819) · 9a34e7c0
kingyiusuen authored Oct 06, 2021

9a34e7c0

Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · e40c9c3c

moto authored Oct 06, 2021

Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53

e40c9c3c

Introduce Emformer (#1801) · 48cfbf2b

hwangjeff authored Oct 06, 2021

Adds an implementation of Emformer, a memory-efficient transformer architecture
introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency
streaming speech recognition applications.

48cfbf2b

Add the rest of HuBERT pretrained models (#1824) · c9e4c75d
moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
c9e4c75d
Remove deprecated dataset utils (#1826) · 1efba850
moto authored Oct 05, 2021

1efba850

05 Oct, 2021 2 commits

[BC-Breaking] Remove deprecated VCTK (#1825) · fc4f481b
moto authored Oct 05, 2021

fc4f481b

[fbsync] torchaudio: torch.quantization -> torch.ao.quantization (#1823) · 02def7c4

moto authored Oct 05, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/1817



This changes the imports in the `torchaudio` to include the new import locations.

```
codemod -d pytorch/audio --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Reviewed By: mthrok

Differential Revision: D31302450

fbshipit-source-id: f31a0d4f453f840ea690edb688555a9d585787b5
Co-authored-by: Zafar Takhirov <zaf@fb.com>

02def7c4