Commits · 4463fbdfbbc29fbc78d5dcd4f61cd9d0a806432c · OpenDAS / Torchaudio

10 May, 2023 2 commits

Add AudioEffector tutorial (#3226) · 2ab49e5b

moto authored May 09, 2023

Summary:
https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/3226

Reviewed By: nateanl

Differential Revision: D45402724

Pulled By: mthrok

fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262

2ab49e5b

Update `torchaudio` doc and tutorial (#3285) · 667c6a9e

moto authored May 09, 2023

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

667c6a9e

29 Apr, 2023 1 commit

Add tutorial for TorchAudio-SQUIM pipelines (#3279) · 9b93e7df

Zhaoheng Ni authored Apr 29, 2023

Summary:
The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS).

Pull Request resolved: https://github.com/pytorch/audio/pull/3279

Reviewed By: hwangjeff

Differential Revision: D45415404

Pulled By: nateanl

fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903

9b93e7df

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

11 Apr, 2023 1 commit

Update windows build doc (#3257) · 623e33d9

moto authored Apr 11, 2023

Summary:
GCC should not be used when building FFmpeg for torchaudio, as torchaudio uses MSVC (cl.exe)

Pull Request resolved: https://github.com/pytorch/audio/pull/3257

Reviewed By: nateanl

Differential Revision: D44835169

Pulled By: mthrok

fbshipit-source-id: 038c70caae58cec47dd2d6d08b8244c193104eda

623e33d9

10 Apr, 2023 1 commit

Update description of Squim pipelines (#3254) · 5a5b0fc3

Zhaoheng Ni authored Apr 10, 2023

Summary:
- Add citations of [`TorchAudio-Squim`](https://arxiv.org/abs/2304.01448) publication.
- Update descriptions in the `SQUIM_OBJECTIVE` and `SQUIM_SUBJECTIVE` pipelines.

Pull Request resolved: https://github.com/pytorch/audio/pull/3254

Reviewed By: hwangjeff

Differential Revision: D44802015

Pulled By: nateanl

fbshipit-source-id: ca08298ec1eafefdd671ff2e010ef18f7372f9f8

5a5b0fc3

01 Apr, 2023 1 commit

Add AudioEffector (#3163) · a4036248

moto authored Mar 31, 2023

Summary:
This commit adds a new feature AudioEffector, which can be used to
apply various effects and codecs to waveforms in Tensor.

Under the hood it uses StreamWriter and StreamReader to apply
filters and encode/decode.

This is going to replace the deprecated `apply_codec` and
`apply_sox_effect_tensor` functions.

It can also perform online, chunk-by-chunk filtering.

Tutorial to follow.

closes https://github.com/pytorch/audio/issues/3161

Pull Request resolved: https://github.com/pytorch/audio/pull/3163

Reviewed By: hwangjeff

Differential Revision: D44576660

Pulled By: mthrok

fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb

a4036248

27 Mar, 2023 1 commit

Revise encoder config arg and docstrings (#3203) · b1de9f1a

hwangjeff authored Mar 27, 2023

Summary:
For `StreamWriter`,
* Renames arg `config` to codec_config`.
* Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`.
* Adds docstrings for arg codec_config`.
* Updates `chunk` to `frames` in `write_*_chunk` methods.

Pull Request resolved: https://github.com/pytorch/audio/pull/3203

Reviewed By: mthrok

Differential Revision: D44350153

Pulled By: hwangjeff

fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343

b1de9f1a

23 Mar, 2023 2 commits

Add SquimSubjective pre-trained pipeline (#3197) · 68fa1d3f

Zhaoheng Ni authored Mar 23, 2023

Summary:
The PR adds the pre-trained pipeline for `SquimSubjective` model which predicts MOS score for speech enhancement task.

Pull Request resolved: https://github.com/pytorch/audio/pull/3197

Reviewed By: mthrok

Differential Revision: D44313244

Pulled By: nateanl

fbshipit-source-id: 905095ff77006e9f441faa826fc25d9d8681e8aa

68fa1d3f

Fix prototype.models documentation (#3202) · e584fc46

Zhaoheng Ni authored Mar 23, 2023

Summary:
In the nightly documentation, "Prototype Factory Functions of Beta Models" is listed as an individual section, which is not correct.
<img width="310" alt="image" src="https://user-images.githubusercontent.com/8653221/227262349-604b99e8-1b20-4b19-9711-81e7b6cfa62e.png">

After the PR, the section outlook is fixed
<img width="285" alt="image" src="https://user-images.githubusercontent.com/8653221/227262893-b938d81e-6c4b-432a-833c-95981bca5e65.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/3202

Reviewed By: mthrok

Differential Revision: D44338663

Pulled By: nateanl

fbshipit-source-id: 09f591b9e4af66ebf34fb423bd5c30d4630f0b88

e584fc46

21 Mar, 2023 2 commits

Add SquimSubjective Model (#3189) · a8a16238

Zhaoheng Ni authored Mar 21, 2023

Summary:
Add model architecture and factory functions for `SquimSubjective` which predicts subjective evaluation metric scores (e.g. MOS) for speech enhancement task.

Pull Request resolved: https://github.com/pytorch/audio/pull/3189

Reviewed By: mthrok

Differential Revision: D44267255

Pulled By: nateanl

fbshipit-source-id: f8060398b14c625b38ea1bb2417f61aeaec3f1db

a8a16238

Update prototype Conformer doc & docstring (#3191) · f8d8ffb5

moto authored Mar 21, 2023

Summary:
To suppress local warning of flake8 <120

Pull Request resolved: https://github.com/pytorch/audio/pull/3191

Reviewed By: nateanl

Differential Revision: D44263027

Pulled By: mthrok

fbshipit-source-id: b3e48dba21fc5c9813f07e624a93f38a68956c6e

f8d8ffb5

17 Mar, 2023 2 commits

Update compatibility matrix (#3182) · b2e07b58

moto authored Mar 17, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3182

Reviewed By: nateanl

Differential Revision: D44167810

Pulled By: mthrok

fbshipit-source-id: 6ecbae54224ef7ba32835e4006aa5f2dc16b9acb

b2e07b58

Add EncodingConfig (#3179) · 9bb35070

moto authored Mar 16, 2023

Summary:
Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level.

Pull Request resolved: https://github.com/pytorch/audio/pull/3179

Pull Request resolved: https://github.com/pytorch/audio/pull/3164

Reviewed By: mthrok

Differential Revision: D43861413

Pulled By: hwangjeff

fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161

9bb35070

15 Mar, 2023 1 commit

Enhance UX on TorchAudio pages to improve awareness of doc versioning (#3167) · 92f2ea89

Carl Parker authored Mar 15, 2023

Summary:
- Boldface the version-selection UX and increase size by three percent.
- Add text to breadcrumbs to indicate version and stability.
- New `breadcrumbs.html` in `_templates` overrides Sphinx version.

I create a new variable in `conf.py`, **version_stable**, which has the version number for the most-recent stable release. I define this variable in the **html_context** dictionary so that it is visible to the templates.

I use this approach because I was not able to find any other way of discerning the current stable release during the build. Note that the `versions.html` file--which identifies the current stable release--appears to be available only in the **gh-pages** branch and so it is not available at build time.

However, this means that someone will need to update `conf.py` whenever the current stable release changes.

Pull Request resolved: https://github.com/pytorch/audio/pull/3167

Reviewed By: mthrok

Differential Revision: D44112224

Pulled By: carljparker

fbshipit-source-id: e76f5cb6734a784d161342964459577aa9b64cac

92f2ea89

14 Mar, 2023 2 commits

Add documentation introducing I/O backend revision (#3147) · 6a8ed4a2

hwangjeff authored Mar 14, 2023

Summary:
Adds documentation that introduces forthcoming I/O backend revision and provides enablement directions for the current release.

Doc pages:
https://output.circle-artifacts.com/output/job/9c0e5a49-eaf4-404c-b910-ca1b18bb289b/artifacts/0/docs/torchaudio.html

Pull Request resolved: https://github.com/pytorch/audio/pull/3147

Reviewed By: mthrok

Differential Revision: D43824019

Pulled By: hwangjeff

fbshipit-source-id: ad21d60c7e8f69f64859c56a8ca75735ddc22e40

6a8ed4a2

Update compatibility matrix (#3168) · 10aec5bd

Zhaoheng Ni authored Mar 14, 2023

Summary:
Add `2.0.0` release to the compatibility matrix

Pull Request resolved: https://github.com/pytorch/audio/pull/3168

Reviewed By: mthrok

Differential Revision: D44059197

Pulled By: nateanl

fbshipit-source-id: a2830d059be90eddeab72b30e85cdfc393369bf8

10aec5bd

08 Mar, 2023 1 commit

Include format information after filter (#3155) · 146195d8

moto authored Mar 08, 2023

Summary:
This commit adds fields to OutputStream, which shows the result
of fitlers, such as width and height after filtering.

Before

```
OutputStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
```

After

```
OutputVideoStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
    media_type='video',
    format='gray',
    width=320,
    height=320,
    frame_rate=3.0)
```

Pull Request resolved: https://github.com/pytorch/audio/pull/3155

Reviewed By: nateanl

Differential Revision: D43882399

Pulled By: mthrok

fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d

146195d8

02 Mar, 2023 1 commit

Fix doc build (#3125) · 1ed38095

moto authored Mar 01, 2023

Summary:
Fix build_doc job

https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8

- build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL.
- Fix bash cell syntax in HW tutorial
- Fix C++ doc
- Fix duplicated target name in streamwriter tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/3125

Reviewed By: xiaohui-zhang

Differential Revision: D43724078

Pulled By: mthrok

fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c

1ed38095

27 Feb, 2023 1 commit

Add SquimObjectiveBundle to prototype (#3103) · 46fae2fe

Zhaoheng Ni authored Feb 27, 2023

Summary:
Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3103

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D43611794

Pulled By: nateanl

fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d

46fae2fe

24 Feb, 2023 2 commits

Bind StreamReader/Writer with PyBind11 (#3091) · b012b452

moto authored Feb 24, 2023

Summary:
This commit is kind of clean up and preparation for future
development.

We plan to pass around more complicated objects among
StreamReader and StreamWriter, and TorchBind is not expressive enough
for defining intermediate object, so we use PyBind11 for binding
StreamWriter.

Pull Request resolved: https://github.com/pytorch/audio/pull/3091

Reviewed By: xiaohui-zhang

Differential Revision: D43515714

Pulled By: mthrok

fbshipit-source-id: 9097bb104bbf8c1536a5fab6f87447c08b10a7f2

b012b452

Use autosummary for torchaudio.prototyoe.models documentation (#3084) · f6d1bc96

Zhaoheng Ni authored Feb 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3084

Reviewed By: mthrok

Differential Revision: D43550150

Pulled By: nateanl

fbshipit-source-id: 5c5e3d9461e375be202493e3399ff38ce5cd7690

f6d1bc96

22 Feb, 2023 1 commit

Add objective metric estimation model for speech enhancement (#3042) · 3267c7ed

Zhaoheng Ni authored Feb 21, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3042

Reviewed By: mthrok

Differential Revision: D43405932

Pulled By: nateanl

fbshipit-source-id: 88f6dabae35565b699230e9909b8f68f4a57f5c7

3267c7ed

15 Feb, 2023 1 commit

Tweak docs around IO (#3064) · 12e8cb97

moto authored Feb 15, 2023

Summary:
* Mention context manager in StreamWriter
* Add FFmpeg as optional dependency

Pull Request resolved: https://github.com/pytorch/audio/pull/3064

Reviewed By: hwangjeff

Differential Revision: D43307818

Pulled By: mthrok

fbshipit-source-id: 86339d973aba85e090f520e08af65b5d736e3d18

12e8cb97

14 Feb, 2023 2 commits

Redirect build instruction to official doc (#3053) · 73b29fc9

moto authored Feb 14, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3053

Reviewed By: nateanl

Differential Revision: D43238766

Pulled By: mthrok

fbshipit-source-id: 4f82878b1c97b0e6a35af75855849b86200e6061

73b29fc9

Add simulate_rir_ism method for room impulse response simulation (#2880) · 8c5c9a9b

Zhaoheng Ni authored Feb 14, 2023

Summary:
replicate of https://github.com/pytorch/audio/issues/2644

Pull Request resolved: https://github.com/pytorch/audio/pull/2880

Reviewed By: mthrok

Differential Revision: D41633911

Pulled By: nateanl

fbshipit-source-id: 73cf145d75c389e996aafe96571ab86dc21f86e5

8c5c9a9b

11 Feb, 2023 1 commit

Update hardware accelerated video processing tutorial (#3050) · 3f02b898

moto authored Feb 10, 2023

Summary:
Par https://github.com/pytorch/audio/issues/3040 and https://github.com/pytorch/audio/issues/3041, it turned out Google Colab now has FFmpeg with GPU decoder/encoder preinstalled, and installing FFmpeg manually corrups the environment.

This commit updates the tutorial by extracting and moving the how-to-install part to installation/build section.

closes https://github.com/pytorch/audio/issues/3041
closes https://github.com/pytorch/audio/issues/3040

Pull Request resolved: https://github.com/pytorch/audio/pull/3050

Reviewed By: nateanl

Differential Revision: D43166054

Pulled By: mthrok

fbshipit-source-id: 32667f292a796344d5fcde86e8231e15ad904e58

3f02b898

09 Feb, 2023 1 commit

Follow-up on audio playback function (#3051) · 91b05e2e

moto authored Feb 09, 2023

Summary:
- Add documentation
- Tweak docsrting
- Fix import

Pull Request resolved: https://github.com/pytorch/audio/pull/3051

Reviewed By: weiwangmeta, atalman, nateanl

Differential Revision: D43166081

Pulled By: mthrok

fbshipit-source-id: 7d77aa34a6318a64824626cff8372f8b9aebf6f9

91b05e2e

07 Feb, 2023 1 commit

Add installation / build instruction to doc (#3038) · 3c121a59

moto authored Feb 07, 2023

Summary:
Add a section about installation/build

https://output.circle-artifacts.com/output/job/f121cd38-68f3-47a3-ac29-c7b0cfe94c77/artifacts/0/docs/installation.html
<img width="1102" alt="Screenshot 2023-02-06 at 6 13 50 PM" src="https://user-images.githubusercontent.com/855818/217108551-622b117b-209e-4776-b5d6-d6934c8126a4.png">

https://output.circle-artifacts.com/output/job/f121cd38-68f3-47a3-ac29-c7b0cfe94c77/artifacts/0/docs/build.html
<img width="1072" alt="Screenshot 2023-02-06 at 6 13 57 PM" src="https://user-images.githubusercontent.com/855818/217108568-c125cdc2-9d6a-4c1d-a155-2cee40c9dac6.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/3038

Reviewed By: hwangjeff, nateanl

Differential Revision: D43083469

Pulled By: mthrok

fbshipit-source-id: e0b5b76dbf706552dd60ae26ea40ebc98627e3b0

3c121a59

01 Feb, 2023 1 commit

Add C++ documentation (#2994) · f663cb28

moto authored Jan 31, 2023

Summary:
Adding C++ documentation. (C++ APIs are categorized as prototype, though it's used by Python beta APIs.)

https://output.circle-artifacts.com/output/job/69654229-a99e-4b15-9ce0-7bc6bcf01101/artifacts/0/docs/libtorchaudio.html

<img width="1202" alt="Screenshot 2023-01-31 at 11 48 47 AM" src="https://user-images.githubusercontent.com/855818/215828167-d23032f8-9e40-4413-b5b1-5cbd12d705e9.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2994

Reviewed By: hwangjeff

Differential Revision: D42876621

Pulled By: mthrok

fbshipit-source-id: d8b8d610b87ec766501baa88b7506368a9905a6a

f663cb28

27 Jan, 2023 1 commit

Move data augmentation transforms out of prototype (#3009) · b4cc0f33

hwangjeff authored Jan 26, 2023

Summary:
Moves `AddNoise`, `Convolve`, `FFTConvolve`, `Speed`, `SpeedPerturbation`, `Deemphasis`, and `Preemphasis` out of `torchaudio.prototype.transforms` and into `torchaudio.transforms`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3009

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D42730322

Pulled By: hwangjeff

fbshipit-source-id: 43739ac31437150d3127e51eddc0f0bba5facb15

b4cc0f33

26 Jan, 2023 1 commit

Deprecate sox initialization/shutdown public API functions (#3010) · aa760caf

moto authored Jan 25, 2023

Summary:
These functions are called part of sox initialization, thus it is no longer needed.

Pull Request resolved: https://github.com/pytorch/audio/pull/3010

Reviewed By: hwangjeff

Differential Revision: D42744478

Pulled By: mthrok

fbshipit-source-id: 17d715b328392397ec47d81a533a307aac22862d

aa760caf

24 Jan, 2023 1 commit

Move data augmentation functions out of prototype (#3001) · 41b88314

hwangjeff authored Jan 23, 2023

Summary:
Moves `add_noise`, `fftconvolve`, `convolve`, `speed`, `preemphasis`, and `deemphasis` out of `torchaudio.prototype.functional` and into `torchaudio.functional`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3001

Reviewed By: mthrok

Differential Revision: D42688971

Pulled By: hwangjeff

fbshipit-source-id: 43280bd3ffeccddae57f1092ac45afb64dd426cc

41b88314

23 Jan, 2023 1 commit

Update highlighting in doc (#3000) · 1f9b9104

moto authored Jan 23, 2023

Summary:
This change fixes the issue where syntax highlighting is broken up par word.

## Plain
Before
<img width="243" alt="Screenshot 2023-01-20 at 1 28 48 PM" src="https://user-images.githubusercontent.com/855818/213778202-27ec8030-3f2f-4ef9-8210-bce7cfc3cb38.png">
After
<img width="244" alt="Screenshot 2023-01-20 at 1 29 01 PM" src="https://user-images.githubusercontent.com/855818/213778231-61c52825-d63a-4913-b10d-a65f3b2cfbbb.png">

## In articles
Before
<img width="786" alt="Screenshot 2023-01-20 at 1 34 12 PM" src="https://user-images.githubusercontent.com/855818/213779050-c21ba5e2-84b3-4935-bbab-6edcb7bc89ce.png">
After
<img width="783" alt="Screenshot 2023-01-20 at 1 34 17 PM" src="https://user-images.githubusercontent.com/855818/213779069-f1406422-27a4-41cf-8ccd-5058f80860bd.png">

## In tables
Before
<img width="813" alt="Screenshot 2023-01-20 at 1 27 35 PM" src="https://user-images.githubusercontent.com/855818/213778039-fede6f18-5a35-47f2-9e0b-a9be5716dc73.png">
After
<img width="813" alt="Screenshot 2023-01-20 at 1 27 51 PM" src="https://user-images.githubusercontent.com/855818/213778073-e26275a9-d380-4601-aa92-84af7aeab00f.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/3000

Reviewed By: xiaohui-zhang

Differential Revision: D42642522

Pulled By: mthrok

fbshipit-source-id: 6831bb90da005aff8d7f178ef768e967bc6d2640

1f9b9104

22 Jan, 2023 1 commit

Make StreamReader return PTS (#2975) · 0dd59e0d

moto authored Jan 22, 2023

Summary:
This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.

Example

```python
from torchaudio.io import StreamReader

s = StreamReader(...)
s.add_video_stream(...)
for (video_chunk, ) in s.stream():
    # video_chunk is Torch tensor type but has extra attribute of PTS
    print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
```

For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
of Tensor and metadata, but works like a normal tensor in PyTorch operations.

The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).

It was also suggested to attach metadata directly to Tensor object,
but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
PyTorch cannot be ignored, so we use Tensor subclass implementation.

If any unexpected issue arise from metadata attribute name collision, client code can
fetch the bare Tensor and continue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2975

Reviewed By: hwangjeff

Differential Revision: D42526945

Pulled By: mthrok

fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35

0dd59e0d

15 Jan, 2023 1 commit

Add pre-trained pipelines for XLS-R models (#2978) · 9b7b64e4

Zhaoheng Ni authored Jan 15, 2023

Summary:
The PR adds three `Wav2Vec2Bundle ` pipeline objects for XLS-R models:
- WAV2VEC2_XLSR_300M
- WAV2VEC2_XLSR_1B
- WAV2VEC2_XLSR_2B

All three models use layer normalization in the feature extraction layers, hence `_normalize_waveform` is set to `True`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2978

Reviewed By: hwangjeff

Differential Revision: D42501491

Pulled By: nateanl

fbshipit-source-id: 2429ec880cc14798034843381e458e1b4664dac3

9b7b64e4

13 Jan, 2023 1 commit

Add XLS-R models (#2959) · a5664ca9

Zhaoheng Ni authored Jan 12, 2023

Summary:
XLSR (cross-lingual speech representation) are a set of cross-lingual self-supervised learning models for generating cross-lingual speech representation. It was first proposed in https://arxiv.org/pdf/2006.13979.pdf which is trained on 53 languages (so-called XLSR-53). This PR supports more XLS-R models from https://arxiv.org/pdf/2111.09296.pdf that have more parameters (300M, 1B, 2B) and are trained on 128 languages.

Pull Request resolved: https://github.com/pytorch/audio/pull/2959

Reviewed By: mthrok

Differential Revision: D42397643

Pulled By: nateanl

fbshipit-source-id: 23e8e51a7cde0a226db4f4028db7df8f02b986ce

a5664ca9

06 Jan, 2023 1 commit

Fix document for MelScale and InverseMelScale (#2967) · 4a037b03

Zhaoheng Ni authored Jan 06, 2023

Summary:
`InverseMelScale` is missing from the nightly documentation webpage. `MelScale` is better in Feature Extractions section. This PR moves both documents into Feature Extractions section.

Pull Request resolved: https://github.com/pytorch/audio/pull/2967

Reviewed By: mthrok

Differential Revision: D42387886

Pulled By: nateanl

fbshipit-source-id: cdac020887817ea2530bfb26e8ed414ae4761420

4a037b03

05 Jan, 2023 2 commits

Rename generator to vocoder in HiFiGAN model and factory functions (#2955) · 5e75c8e8

Zhaoheng Ni authored Jan 05, 2023

Summary:
The generator part of HiFiGAN model is a vocoder which converts mel spectrogram to waveform. It makes more sense to name it as vocoder for better understanding.

Pull Request resolved: https://github.com/pytorch/audio/pull/2955

Reviewed By: carolineechen

Differential Revision: D42348864

Pulled By: nateanl

fbshipit-source-id: c45a2f8d8d205ee381178ae5d37e9790a257e1aa

5e75c8e8

Add HiFiGAN bundle (#2921) · 54e5c859

Grigory Sizov authored Jan 05, 2023

Summary:
Closes [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314)
## Description
- Add  bundle `HIFIGAN_GENERATOR_V3_LJSPEECH` to prototypes. The bundle contains pre-trained HiFiGAN generator weights from the [original HiFiGAN publication](https://github.com/jik876/hifi-gan#pretrained-model), converted slightly to fit our model
- Add tests
  - unit tests checking that vocoder and mel-transform implementations in the bundle give the same results as the original ones. Part of the original HiFiGAN code is ported to this repo to enable these tests
  - integration test checking that waveform reconstructed from mel spectrogram by the bundle is close enough to the original
- Add docs

Pull Request resolved: https://github.com/pytorch/audio/pull/2921

Reviewed By: nateanl, mthrok

Differential Revision: D42034761

Pulled By: sgrigory

fbshipit-source-id: 8b0dadeed510b3c9371d6aa2c46ec7d8378f6048

54e5c859