Commits · b7d2d92895c20fb8f24e09ef6dbf0709a83d99fc · OpenDAS / Torchaudio

28 Jul, 2023 1 commit

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

23 Mar, 2023 1 commit

Add SquimSubjective pre-trained pipeline (#3197) · 68fa1d3f

Zhaoheng Ni authored Mar 23, 2023

Summary:
The PR adds the pre-trained pipeline for `SquimSubjective` model which predicts MOS score for speech enhancement task.

Pull Request resolved: https://github.com/pytorch/audio/pull/3197

Reviewed By: mthrok

Differential Revision: D44313244

Pulled By: nateanl

fbshipit-source-id: 905095ff77006e9f441faa826fc25d9d8681e8aa

68fa1d3f

27 Feb, 2023 1 commit

Add SquimObjectiveBundle to prototype (#3103) · 46fae2fe

Zhaoheng Ni authored Feb 27, 2023

Summary:
Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3103

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D43611794

Pulled By: nateanl

fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d

46fae2fe

15 Jan, 2023 1 commit

Add pre-trained pipelines for XLS-R models (#2978) · 9b7b64e4

Zhaoheng Ni authored Jan 15, 2023

Summary:
The PR adds three `Wav2Vec2Bundle ` pipeline objects for XLS-R models:
- WAV2VEC2_XLSR_300M
- WAV2VEC2_XLSR_1B
- WAV2VEC2_XLSR_2B

All three models use layer normalization in the feature extraction layers, hence `_normalize_waveform` is set to `True`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2978

Reviewed By: hwangjeff

Differential Revision: D42501491

Pulled By: nateanl

fbshipit-source-id: 2429ec880cc14798034843381e458e1b4664dac3

9b7b64e4

05 Jan, 2023 1 commit

Add HiFiGAN bundle (#2921) · 54e5c859

Grigory Sizov authored Jan 05, 2023

Summary:
Closes [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314)
## Description
- Add  bundle `HIFIGAN_GENERATOR_V3_LJSPEECH` to prototypes. The bundle contains pre-trained HiFiGAN generator weights from the [original HiFiGAN publication](https://github.com/jik876/hifi-gan#pretrained-model), converted slightly to fit our model
- Add tests
  - unit tests checking that vocoder and mel-transform implementations in the bundle give the same results as the original ones. Part of the original HiFiGAN code is ported to this repo to enable these tests
  - integration test checking that waveform reconstructed from mel spectrogram by the bundle is close enough to the original
- Add docs

Pull Request resolved: https://github.com/pytorch/audio/pull/2921

Reviewed By: nateanl, mthrok

Differential Revision: D42034761

Pulled By: sgrigory

fbshipit-source-id: 8b0dadeed510b3c9371d6aa2c46ec7d8378f6048

54e5c859

09 Dec, 2022 1 commit

Fix integration test for WAV2VEC2_ASR_LARGE_LV60K_10M (#2910) · 90162812

Zhaoheng Ni authored Dec 09, 2022

Summary:
After https://github.com/pytorch/audio/issues/2873, the pre-trained Wav2Vec2 models with larger datasets can get better performances. The PR fixes the integration test of bundle `WAV2VEC2_ASR_LARGE_LV60K_10M` which predicts the word `CURIOUSITY` to `CURIOUSSITY` before but now to `CURIOUSITY` correctly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2910

Reviewed By: mthrok

Differential Revision: D41881919

Pulled By: nateanl

fbshipit-source-id: 236fd00b983a5205c731f3efa31033a6b8257cab

90162812

15 Nov, 2022 1 commit

Add WavLM bundles (#2833) · 26f62dc5

Grigory Sizov authored Nov 15, 2022

Summary:
Closes T136364380, follow-up to https://github.com/pytorch/audio/issues/2822

- Added "base", "base+", and "large" bundles for WavLM
- Expanded `wav2vec2_pipeline_test.py` to include the new bundles
- Added the new bundles to docs in `pipelines.rst`

Pull Request resolved: https://github.com/pytorch/audio/pull/2833

Reviewed By: nateanl

Differential Revision: D41194796

Pulled By: sgrigory

fbshipit-source-id: bf8e96c05b6a81ac5c5a014c46adeeac12685328

26f62dc5

14 Sep, 2022 1 commit

Move Hybrid Demucs pipeline to beta (#2673) · 60868748

Caroline Chen authored Sep 14, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

60868748

13 Sep, 2022 1 commit

Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669) · 4d535e88

Zhaoheng Ni authored Sep 13, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669

Reviewed By: carolineechen, mthrok

Differential Revision: D39433560

Pulled By: nateanl

fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb

4d535e88

03 Aug, 2022 2 commits

Add HDEMUCS_HIGH_MUSDB (#2601) · 6ecc11c2

Sean Kim authored Aug 03, 2022

Summary:
Add new model pretrained weights and tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2601

Reviewed By: carolineechen, nateanl

Differential Revision: D38396673

Pulled By: skim0514

fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1

6ecc11c2

An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a

bshall authored Aug 03, 2022

Summary:
I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
- I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
- I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
- I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
- I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?

I hope this is helpful! looking forward to hearing from you.

Pull Request resolved: https://github.com/pytorch/audio/pull/2472

Reviewed By: hwangjeff

Differential Revision: D38389155

Pulled By: carolineechen

fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904

946b180a

26 Jul, 2022 1 commit

New Pipeline edits for HDemucs (#2565) · 4c4da32c

Sean Kim authored Jul 25, 2022

Summary:
Created new branch and brought in commits due to rebasing issues, resolved conflicts on new branch, close old branch.

Pull Request resolved: https://github.com/pytorch/audio/pull/2565

Reviewed By: nateanl, mthrok

Differential Revision: D38131189

Pulled By: skim0514

fbshipit-source-id: 96531480cf50562944abb28d70879f21b4609f15

4c4da32c

25 Jul, 2022 1 commit

Integration test fix deleting temporary directory (#2569) · 8dcf06ac

Sean Kim authored Jul 25, 2022

Summary:
Previous Issue: --use-tmp-hub-dir expected the temp directories used to store large file to be deleted after each test case, but pytest erases directories after 3 full test sessions. This commit fixes by manually deleting a new subdirectory created in each test case. https://github.com/pytorch/audio/pull/2565#discussion_r929007101

Pull Request resolved: https://github.com/pytorch/audio/pull/2569

Reviewed By: nateanl

Differential Revision: D38117848

Pulled By: skim0514

fbshipit-source-id: 3767cb8df1238fd6218f6aaa58d5d583cea72699

8dcf06ac

22 Jul, 2022 1 commit

Add documents for SourceSeparationBundle (#2559) · 6cee56ab

Zhaoheng Ni authored Jul 22, 2022

Summary:
- Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
- Add citation of Libri2Mix dataset in the bundle documentation.
- url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.

Pull Request resolved: https://github.com/pytorch/audio/pull/2559

Reviewed By: carolineechen

Differential Revision: D38036116

Pulled By: nateanl

fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836

6cee56ab

21 Jul, 2022 1 commit

Add SourceSeparationBundle to prototype (#2440) · 83362580

Zhaoheng Ni authored Jul 20, 2022

Summary:
- Add SourceSeparationBundle class for source separation pipeline
- Add `CONVTASNET_BASE_LIBRI2MIX` that is trained on Libri2Mix dataset.
- Add integration test with example mixture audio and expected scale-invariant signal-to-distortion ratio (Si-SDR) score. The test computes the Si-SDR score with permutation-invariant training (PIT) criterion for all permutations of sources and use the highest value as the final output. The test verifies if the score is equal to or larger than the expected value.

Pull Request resolved: https://github.com/pytorch/audio/pull/2440

Reviewed By: mthrok

Differential Revision: D37997646

Pulled By: nateanl

fbshipit-source-id: c951bcbbe8b7ed9553cb8793d6dc1ef90d5a29fe

83362580

27 Jun, 2022 1 commit

Fix download links of RNNT pipelines in prototype (#2444) · 9b4ee17c

Zhaoheng Ni authored Jun 27, 2022

Summary:
In https://github.com/pytorch/audio/issues/2283, torchaudio's downloading function is updated to reduce code duplication. The links in `EMFORMER_RNNT_BASE_LIBRISPEECH` are updated, but the ones in prototype pipelines are not. This PR addresses it by updating the download links of `EMFORMER_RNNT_BASE_MUSTC` and `EMFORMER_RNNT_BASE_TEDLIUM3` in prototype. Corresponding integration tests are added as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/2444

Reviewed By: mthrok

Differential Revision: D37389178

Pulled By: nateanl

fbshipit-source-id: 46598dd71c95be47d1e1b54cef89ea51d280e17a

9b4ee17c

01 Jun, 2022 1 commit

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

15 May, 2022 1 commit

[codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402214

fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c

d62875cc

26 Apr, 2022 1 commit

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

21 Apr, 2022 1 commit

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

25 Mar, 2022 1 commit

Add Pretrained LM Support for Decoder (#2275) · 34c0d115

Caroline Chen authored Mar 24, 2022

Summary:
add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2275

Reviewed By: mthrok

Differential Revision: D35115418

Pulled By: carolineechen

fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0

34c0d115

22 Mar, 2022 1 commit

Add download utility specialized for torchaudio (#2283) · 64b98521

moto authored Mar 22, 2022

Summary:
In recent updates, torchaudio added features that download assets/models from
download.pytorch.org/torchaudio.

To reduce the code duplication, the implementations uses utilities from
``torch.hub``, but still, there are patterns repeated in implementing
the fetch mechanism, notably cache and local file path handling.

This commit introduces the utility function that handles
download/cache/local path management that can be used for
fetching pre-trained model data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2283

Reviewed By: carolineechen

Differential Revision: D35050469

Pulled By: mthrok

fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b

64b98521

01 Feb, 2022 1 commit

Move ASR features out of prototype (#2187) · aca5591c

hwangjeff authored Feb 01, 2022

Summary:
Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2187

Reviewed By: nateanl, mthrok

Differential Revision: D33918092

Pulled By: hwangjeff

fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c

aca5591c

26 Jan, 2022 1 commit

Add integration test for Emformer RNN-T LibriSpeech pipeline (#2172) · 0d6d0669

hwangjeff authored Jan 26, 2022

Summary:
Adds integration test for pretrained ASR pipeline `EMFORMER_RNNT_BASE_LIBRISPEECH`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2172

Reviewed By: carolineechen, nateanl

Differential Revision: D33793324

Pulled By: hwangjeff

fbshipit-source-id: d0613e2ab98fe5afa7b16ca39b67f0a0304d13fc

0d6d0669

30 Dec, 2021 1 commit

Enforce lint checks and fix/mute lint errors (#2116) · 8ed14782

Joao Gomes authored Dec 30, 2021

Summary:
cc mthrok

Pull Request resolved: https://github.com/pytorch/audio/pull/2116

Reviewed By: mthrok

Differential Revision: D33368453

Pulled By: jdsgomes

fbshipit-source-id: 09cf3fe5ed6f771c2f16505633c0e59b0c27453c

8ed14782

23 Dec, 2021 1 commit

Apply arc lint to pytorch audio (#2096) · 5859923a

Joao Gomes authored Dec 23, 2021

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2096

run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'`

Reviewed By: mthrok

Differential Revision: D33297351

fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8

5859923a

04 Nov, 2021 1 commit

Consolidate network utils (#1974) · 536e8ac0

moto authored Nov 04, 2021

This commit changes all the `torch.hub` network utility functions to
be imported from `torchaudio._internal`, so that later we can replace
the function within fbcode.

536e8ac0

03 Nov, 2021 1 commit
- Add wav2vec2 ASR English pretrained model from voxpopuli (#1956) · f2eec77b
  moto authored Nov 03, 2021
  
  f2eec77b
02 Nov, 2021 3 commits
- Run integration tests on CI (#1939) · 5594eae6
  moto authored Nov 02, 2021
  
  5594eae6
- Add wav2vec2 ASR Italian pretrained model from voxpopuli (#1954) · 5c8541b7
  moto authored Nov 02, 2021
  
  5c8541b7
- Add wav2vec2 ASR German pretrained model from voxpopuli (#1953) · e15431b7
  moto authored Nov 01, 2021
```
* Add wav2vec2 ASR German pretrained model from voxpopuli
```
  e15431b7
27 Oct, 2021 1 commit
- Add wav2vec2 ASR Spanish pretrained model from voxpopuli (#1924) · 3a599315
  moto authored Oct 26, 2021
  
  3a599315
25 Oct, 2021 1 commit
- Add pretrained French ASR from voxpopuli (#1919) · cbf267c3
  moto authored Oct 25, 2021
  
  cbf267c3
22 Oct, 2021 1 commit
- Refactor integration test (#1922) · 19d8f1c2
  moto authored Oct 22, 2021
```
- Make the test support other languages
- Fetch tetst asset on-the-fly
```
  19d8f1c2
21 Oct, 2021 1 commit

[BC-breaking] Remove unused dimension from pretrained Wav2Vec2 ASR (#1914) · ec4837dc

moto authored Oct 21, 2021

* [BC-breaking] Remove unused dimension from pretrained Wav2Vec2 ASR

The Wav2Vec2 ASR pretrained weights originated from fairseq have
extra dimension that have nothing to do with the ASR task.

https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/data/dictionary.py#L18-L37

which is masked during the loss computation as

https://github.com/pytorch/fairseq/blob/c5ff181125c7e6126b49a85e5ebdd5f5b6a07914/fairseq/criterions/ctc.py#L126-L128

This change removes it.

* Use '-' for blank token representation.

ec4837dc

15 Oct, 2021 2 commits

Add TTS bundle/pipelines (#1872) · e885204e

moto authored Oct 15, 2021

Future work items:
- length computation of GriffinLim
- better way to make InverseMelScale work in inference_mode

e885204e

Move wav2vec2 pretrained models to pipelines module (#1876) · fad855cd

moto authored Oct 15, 2021

- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
- Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
- Update base URL

fad855cd

08 Oct, 2021 1 commit
- Add customization support to wav2vec2 labels (#1834) · fd7fcf93
  moto authored Oct 07, 2021
  
  fd7fcf93
06 Oct, 2021 2 commits

Add pretrained weights from wav2vec2.0 and XLSR papers (#1827) · e40c9c3c

moto authored Oct 06, 2021

Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models
- Wav2Vec 2.0 Base / Large / Large (LV-60)
- XLSR-53

e40c9c3c

Add the rest of HuBERT pretrained models (#1824) · c9e4c75d
moto authored Oct 05, 2021
```
This commit adds
- HUBERT_LARGE
- HUBERT_XLARGE
- HUBERT_ASR_XLARGE
```
c9e4c75d