Commits · c5b965589e187b75d7fc51fa544c27111b201244 · OpenDAS / Torchaudio

20 Mar, 2023 1 commit

Support CUDA frame in FilterGraph (#3183) · c5b96558

moto authored Mar 20, 2023

Summary:
This commit adds CUDA frame support to FilterGraph

It initializes and attaches CUDA frames context to FilterGraph,
so that CUDA frames can be processed in FilterGraph.

As a result, it enables
1. CUDA filter support such as `scale_cuda`
2. Properly retrieve the pixel format coming out of FilterGraph when
   CUDA HW acceleration is enabled. (currently it is reported as "cuda")

Resolves https://github.com/pytorch/audio/issues/3159

Pull Request resolved: https://github.com/pytorch/audio/pull/3183

Reviewed By: hwangjeff

Differential Revision: D44183722

Pulled By: mthrok

fbshipit-source-id: 522d21039c361ddfaa87fa89cf49c19d210ac62f

c5b96558

17 Mar, 2023 1 commit

Add EncodingConfig (#3179) · 9bb35070

moto authored Mar 16, 2023

Summary:
Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level.

Pull Request resolved: https://github.com/pytorch/audio/pull/3179

Pull Request resolved: https://github.com/pytorch/audio/pull/3164

Reviewed By: mthrok

Differential Revision: D43861413

Pulled By: hwangjeff

fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161

9bb35070

16 Mar, 2023 1 commit

Refactor Tensor conversion in StreamReader (#3170) · 014d7140

moto authored Mar 15, 2023

Summary:
Currently, when the Buffer converts AVFrame* to torch::Tensor,
it checks the format at each time a frame is passed, and
perform the conversion.

This commit changes it so that the conversion operation is
pre-instantiated at the time outside stream is configured.

It introduces Converter implementations for various formats,
and use template to embed them in Buffer class.
This way, branching like if/switch are eliminated from
decoding path.

Pull Request resolved: https://github.com/pytorch/audio/pull/3170

Reviewed By: xiaohui-zhang

Differential Revision: D44048293

Pulled By: mthrok

fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f

014d7140

15 Mar, 2023 1 commit

Fix MFCC autograd test (#3169) · ee0b97f2

Zhaoheng Ni authored Mar 14, 2023

Summary:
Autograd test randomly fails for MFCC transform. Fix it by increasing `nondet_tol` to `1e-10`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3169

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D44069673

Pulled By: nateanl

fbshipit-source-id: addafefe381104e778b09bfbaafb322df1d9054c

ee0b97f2

08 Mar, 2023 2 commits

Include format information after filter (#3155) · 146195d8

moto authored Mar 08, 2023

Summary:
This commit adds fields to OutputStream, which shows the result
of fitlers, such as width and height after filtering.

Before

```
OutputStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
```

After

```
OutputVideoStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
    media_type='video',
    format='gray',
    width=320,
    height=320,
    frame_rate=3.0)
```

Pull Request resolved: https://github.com/pytorch/audio/pull/3155

Reviewed By: nateanl

Differential Revision: D43882399

Pulled By: mthrok

fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d

146195d8

Support overwriting PTS in StreamWriter (#3135) · 8d2f6f8d

moto authored Mar 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3135

Reviewed By: xiaohui-zhang

Differential Revision: D43724273

Pulled By: mthrok

fbshipit-source-id: 9b52823618948945a26e57d5b3deccbf5f9268c1

8d2f6f8d

07 Mar, 2023 3 commits

Use deterministic algorithms for filtfilt autograd tests (#3150) · 1923be04

Zhaoheng Ni authored Mar 07, 2023

Summary:
`filtfilt` function uses `lfilter`, which calls `conv_1d` operation internally. `conv_1d` is expected to have autograd test failures (see https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html). The PR uses deterministic algorithms in the autograd tests to make `filtfilt` related tests pass.

Pull Request resolved: https://github.com/pytorch/audio/pull/3150

Reviewed By: mthrok

Differential Revision: D43872977

Pulled By: nateanl

fbshipit-source-id: c3d6ec281f34db8a7092526ccb245797bf2338da

1923be04

Fix LFCC autograd test (#3154) · 67a49f3c

Zhaoheng Ni authored Mar 07, 2023

Summary:
Autograd test randomly failed on gpu linux machine. Increase `nondet_tol` to make it pass.

Pull Request resolved: https://github.com/pytorch/audio/pull/3154

Reviewed By: mthrok

Differential Revision: D43873028

Pulled By: nateanl

fbshipit-source-id: a6668c47967a085e5eafb00e2dd4e61b2b46412e

67a49f3c

Raise an error is StreamWriter is not opened (#3152) · 502d5811

Moto Hira authored Mar 07, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3152

In StreamWriter, if the destination is not opened when attempting to write data, it causes segmentation fault.
This commit adds guard so that instead of segfault, it will error-out.

Reviewed By: nateanl

Differential Revision: D43852649

fbshipit-source-id: aef5db7c1508f8a7db5834c2ab6de3cad09f9d60

502d5811

02 Mar, 2023 1 commit

Fix PTS regression (#3131) · fbf05f28

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3131

In https://github.com/pytorch/audio/pull/3122, the intermediate `num_frames` variable
is removed.

PTS can be incremented the same way, but the timing was wrong in #3122.
This commit fixes it.

Reviewed By: xiaohui-zhang

Differential Revision: D43712046

fbshipit-source-id: 2fe0082969296f4f3964e62e55b5325fcd45f4f9

fbf05f28

01 Mar, 2023 1 commit

Fix windows tests (#3119) · 6a4a8200

Zhaoheng Ni authored Mar 01, 2023

Summary:
`sox` is not available on Windows machines. Add skip decorators to the sox related tests to skip running tests on Windows.

Pull Request resolved: https://github.com/pytorch/audio/pull/3119

Reviewed By: mthrok

Differential Revision: D43682754

Pulled By: nateanl

fbshipit-source-id: f69987dac8232a3569be83f096b32389bd8bda81

6a4a8200

27 Feb, 2023 1 commit

Add SquimObjectiveBundle to prototype (#3103) · 46fae2fe

Zhaoheng Ni authored Feb 27, 2023

Summary:
Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3103

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D43611794

Pulled By: nateanl

fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d

46fae2fe

25 Feb, 2023 1 commit

Fix unit tests for griffinlim and Spectrogram (#3099) · 75fc9a46

Zhaoheng Ni authored Feb 25, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3099

Reviewed By: mthrok

Differential Revision: D43596866

Pulled By: nateanl

fbshipit-source-id: 43a139bf8ebdf3261414e2855aefc3b53df298ac

75fc9a46

23 Feb, 2023 1 commit

Remove Tensor binding from StreamReader (#3093) · d3c9295c

mthrok authored Feb 23, 2023

Summary:
Remove the Tensor input support from StreamReader

Follow up of https://github.com/pytorch/audio/pull/3086

Pull Request resolved: https://github.com/pytorch/audio/pull/3093

Reviewed By: xiaohui-zhang

Differential Revision: D43526066

Pulled By: mthrok

fbshipit-source-id: 57ba4866c413649173e1c2c3b23ba7de3231b7bc

d3c9295c

22 Feb, 2023 1 commit

Add objective metric estimation model for speech enhancement (#3042) · 3267c7ed

Zhaoheng Ni authored Feb 21, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3042

Reviewed By: mthrok

Differential Revision: D43405932

Pulled By: nateanl

fbshipit-source-id: 88f6dabae35565b699230e9909b8f68f4a57f5c7

3267c7ed

17 Feb, 2023 1 commit

Make lengths optional for speed functions and modules (#3072) · 5af309d3

hwangjeff authored Feb 16, 2023

Summary:
Makes lengths input optional for `torchaudio.functional.speed`, `torchaudio.transforms.Speed`, and `torchaudio.transforms.SpeedPerturbation`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3072

Reviewed By: nateanl, mthrok

Differential Revision: D43371406

Pulled By: hwangjeff

fbshipit-source-id: ecb38bcc2bfff5c5a396a37eff238b22238e795a

5af309d3

16 Feb, 2023 1 commit

Introduce I/O backend dispatcher (#3015) · b799fcd6

hwangjeff authored Feb 16, 2023

Summary:
Adds I/O backend dispatcher that routes I/O requests to FFmpeg, SoX, or Soundfile backend, per library availability. It allows users to specify a backend mapped to a media library, i.e. one of `["ffmpeg", "sox", "soundfile"]`, to use via keyword argument, with FFmpeg being the default. Environment variable `TORCHAUDIO_USE_BACKEND_DISPATCHER` gates enablement of the dispatcher; specifically, if `TORCHAUDIO_USE_BACKEND_DISPATCHER` is explicitly set to `1`, importing TorchAudio makes it accessible via `torchaudio.info`, `torchaudio.load`, and `torchaudio.save`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3015

Reviewed By: mthrok

Differential Revision: D43258649

Pulled By: hwangjeff

fbshipit-source-id: 8f12e4e56b9fa3f0814dd3fed3e1783ab23a53a1

b799fcd6

15 Feb, 2023 2 commits

Implement exp sigmoid (#3056) · 9db4bdf1

Cole Li authored Feb 15, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3056

Task #2 from https://github.com/pytorch/audio/issues/2835

Reviewed By: mthrok

Differential Revision: D42854156

fbshipit-source-id: e1b3bd992c91fedc55f30a814e16efd7c51e0c80

9db4bdf1

Enable broadcasting for inputs to convolve (#3061) · a49edea5

hwangjeff authored Feb 15, 2023

Summary:
Relaxes input dimension matching constraint on `convolve` to enable broadcasting for inputs.

Pull Request resolved: https://github.com/pytorch/audio/pull/3061

Reviewed By: mthrok

Differential Revision: D43298078

Pulled By: hwangjeff

fbshipit-source-id: a6cc36674754523b88390fac0a05f06562921319

a49edea5

14 Feb, 2023 1 commit

Add simulate_rir_ism method for room impulse response simulation (#2880) · 8c5c9a9b

Zhaoheng Ni authored Feb 14, 2023

Summary:
replicate of https://github.com/pytorch/audio/issues/2644

Pull Request resolved: https://github.com/pytorch/audio/pull/2880

Reviewed By: mthrok

Differential Revision: D41633911

Pulled By: nateanl

fbshipit-source-id: 73cf145d75c389e996aafe96571ab86dc21f86e5

8c5c9a9b

07 Feb, 2023 1 commit

Add playback function (#3026) · 2ead941e

juan.azcarreta.ortiz authored Feb 07, 2023

Summary:
Allows user to play audio through the
device speaker.

Pull Request resolved: https://github.com/pytorch/audio/pull/3026

Test Plan:
Created a new test that mocks a call to the write audio chunk method from StreamWriter. To run the test:

`pytest test/torchaudio_unittest/io/_playback_test.py`

Reviewed By: mthrok

Differential Revision: D43082062

Pulled By: jazcarretao

fbshipit-source-id: 01a85b32ce925687a633d1208d15d54556e89dd8

2ead941e

04 Feb, 2023 1 commit

Add rgb48le and CUDA p010 support (HDR/10bit) to StreamReader (#3023) · b7e173fa

Tristan Rice authored Feb 04, 2023

Summary:
This adds 2 10 bit pix formats one for CPU and one for CUDA. This allows for training on HDR/10bit video datasets.

Pull Request resolved: https://github.com/pytorch/audio/pull/3023

Test Plan:
```py
r = StreamReader(
    reader, format='hevc',
)
stream = r.add_video_stream(
    frames_per_chunk=-1,
    decoder="hevc_cuvid",
    hw_accel="cuda",
)
frame = next(r.stream())
```

```py
r = StreamReader(
    reader, format='hevc',
)
stream = r.add_video_stream(
    frames_per_chunk=-1,
    filter_desc="format=rgb48le",
)
frame = next(r.stream())
```

![audio-example](https://user-images.githubusercontent.com/909104/215696543-ed3dc5a3-3013-4a57-8b98-05aa4a5a9a7c.png)

Reviewed By: xiaohui-zhang

Differential Revision: D43019191

Pulled By: mthrok

fbshipit-source-id: fe4359e525b24c8b856dfdf3d2f8596871566350

b7e173fa

03 Feb, 2023 1 commit

Add Linux GPU unit tests on GHA (#3029) · 6bdd3830

moto authored Feb 02, 2023

Summary:
Add GitHub Action-based GPU test jobs.
- It seems that there is 2 hour upper cap so only running CUDA/GPU tests.
- Since Kaldi related features are not available, they are disabled.

Pull Request resolved: https://github.com/pytorch/audio/pull/3029

Reviewed By: hwangjeff

Differential Revision: D42983800

Pulled By: mthrok

fbshipit-source-id: 47fefe39c635d1c73ad6799ddacefd2666fe5403

6bdd3830

01 Feb, 2023 2 commits

Update prototype functional tests. (#3027) · 01ba0ac8

Moto Hira authored Feb 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3027

To support older NumPy, removing `numpy.typing`.

Reviewed By: nateanl

Differential Revision: D42924428

fbshipit-source-id: af1a370b5baf00c63a088f172dbc2190d414bdf1

01ba0ac8

Drop python 3.7 support (#3020) · 60af60a8

Wei Wang authored Jan 31, 2023

Summary:
https://github.com/pytorch/pytorch/pull/93155 Core has dropped python3.7

Pull Request resolved: https://github.com/pytorch/audio/pull/3020

Reviewed By: mthrok

Differential Revision: D42902346

Pulled By: weiwangmeta

fbshipit-source-id: 07ab1aff0e128c5960d87e5fa29e341310dea388

60af60a8

27 Jan, 2023 1 commit

Move data augmentation transforms out of prototype (#3009) · b4cc0f33

hwangjeff authored Jan 26, 2023

Summary:
Moves `AddNoise`, `Convolve`, `FFTConvolve`, `Speed`, `SpeedPerturbation`, `Deemphasis`, and `Preemphasis` out of `torchaudio.prototype.transforms` and into `torchaudio.transforms`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3009

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D42730322

Pulled By: hwangjeff

fbshipit-source-id: 43739ac31437150d3127e51eddc0f0bba5facb15

b4cc0f33

26 Jan, 2023 1 commit

Remove function input parameters from data aug functional tests (#3011) · 2f5fcf4f

hwangjeff authored Jan 25, 2023

Summary:
Passing functions as test parameters causes issues on some platforms. This PR updates the functional tests to pass functions by name instead.

Pull Request resolved: https://github.com/pytorch/audio/pull/3011

Reviewed By: mthrok

Differential Revision: D42748106

Pulled By: hwangjeff

fbshipit-source-id: 4d81dabe4aff2293bc344a457a034a2d9af024e2

2f5fcf4f

24 Jan, 2023 1 commit

Move data augmentation functions out of prototype (#3001) · 41b88314

hwangjeff authored Jan 23, 2023

Summary:
Moves `add_noise`, `fftconvolve`, `convolve`, `speed`, `preemphasis`, and `deemphasis` out of `torchaudio.prototype.functional` and into `torchaudio.functional`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3001

Reviewed By: mthrok

Differential Revision: D42688971

Pulled By: hwangjeff

fbshipit-source-id: 43280bd3ffeccddae57f1092ac45afb64dd426cc

41b88314

22 Jan, 2023 1 commit

Make StreamReader return PTS (#2975) · 0dd59e0d

moto authored Jan 22, 2023

Summary:
This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.

Example

```python
from torchaudio.io import StreamReader

s = StreamReader(...)
s.add_video_stream(...)
for (video_chunk, ) in s.stream():
    # video_chunk is Torch tensor type but has extra attribute of PTS
    print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
```

For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
of Tensor and metadata, but works like a normal tensor in PyTorch operations.

The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).

It was also suggested to attach metadata directly to Tensor object,
but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
PyTorch cannot be ignored, so we use Tensor subclass implementation.

If any unexpected issue arise from metadata attribute name collision, client code can
fetch the bare Tensor and continue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2975

Reviewed By: hwangjeff

Differential Revision: D42526945

Pulled By: mthrok

fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35

0dd59e0d

19 Jan, 2023 1 commit

Make lengths optional for additive noise operators (#2977) · bb077284

hwangjeff authored Jan 19, 2023

Summary:
For greater flexibility, this PR makes argument `lengths` optional for `add_noise` and `AddNoise`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2977

Reviewed By: nateanl

Differential Revision: D42484211

Pulled By: hwangjeff

fbshipit-source-id: 54757dcc73df194bb98c1d9d42a2f43f3027b190

bb077284

16 Jan, 2023 1 commit

Refactor chunked buffer implementation (#2984) · 52b6bc3b

moto authored Jan 16, 2023

Summary:
So that the number of Tensor frames stored in buffers is always a multiple of frames_per_chunk.

This makes it easy to store PTS values in aligned manner.

Pull Request resolved: https://github.com/pytorch/audio/pull/2984

Reviewed By: nateanl

Differential Revision: D42526670

Pulled By: mthrok

fbshipit-source-id: d83ee914b7e50de3b51758069b0e0b6b3ebe2e54

52b6bc3b

14 Jan, 2023 1 commit

Fix CI tests on gpu machines (#2982) · 82ded7e7

Zhaoheng Ni authored Jan 14, 2023

Summary:
XLS-R tests are supposed to be skipped on gpu machines, but they are forced to run in [_skipIf](https://github.com/pytorch/audio/blob/main/test/torchaudio_unittest/common_utils/case_utils.py#L143-L145) decorator. This PR skips the XLS-R tests if the machine is CI and CUDA is available.

Pull Request resolved: https://github.com/pytorch/audio/pull/2982

Reviewed By: xiaohui-zhang

Differential Revision: D42520292

Pulled By: nateanl

fbshipit-source-id: c6ee4d4a801245226c26d9cd13e039e8d910add2

82ded7e7

13 Jan, 2023 1 commit

Add XLS-R models (#2959) · a5664ca9

Zhaoheng Ni authored Jan 12, 2023

Summary:
XLSR (cross-lingual speech representation) are a set of cross-lingual self-supervised learning models for generating cross-lingual speech representation. It was first proposed in https://arxiv.org/pdf/2006.13979.pdf which is trained on 53 languages (so-called XLSR-53). This PR supports more XLS-R models from https://arxiv.org/pdf/2111.09296.pdf that have more parameters (300M, 1B, 2B) and are trained on 128 languages.

Pull Request resolved: https://github.com/pytorch/audio/pull/2959

Reviewed By: mthrok

Differential Revision: D42397643

Pulled By: nateanl

fbshipit-source-id: 23e8e51a7cde0a226db4f4028db7df8f02b986ce

a5664ca9

12 Jan, 2023 2 commits

Refactor extension modules initialization (#2968) · 5dfe0b22

mthrok authored Jan 12, 2023

Summary:
* Refactor _extension module so that
  * the implementation of initialization logic and its execution are separated.
    * logic goes to `_extension.utils`
    * the execution is at `_extension.__init__`
    * global variables are defined and modified in `__init__`.
* Replace `is_sox_available()` with `_extension._SOX_INITIALIZED`
* Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE`
* Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`.
* Merge the sox-related initialization logic in `_extension.utils` module.

Pull Request resolved: https://github.com/pytorch/audio/pull/2968

Reviewed By: hwangjeff

Differential Revision: D42387251

Pulled By: mthrok

fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f

5dfe0b22

Add `buffer_chunk_size=-1` option (#2969) · 22788a8f

moto authored Jan 11, 2023

Summary:
This commit adds `buffer_chunk_size=-1`, which does not drop buffered frames.

Pull Request resolved: https://github.com/pytorch/audio/pull/2969

Reviewed By: xiaohui-zhang

Differential Revision: D42403467

Pulled By: mthrok

fbshipit-source-id: a0847e6878874ce7e4b0ec3f56e5fbb8ebdb5992

22788a8f

10 Jan, 2023 1 commit

Update the handling of videos without PTS values (#2970) · 1717edaa

moto authored Jan 10, 2023

Summary:
filter graph does not fallback to `best_effort_timestamp`, thus applying filters (like changing fps) on videos without PTS values failed.

This commit changes the behavior by overwriting the PTS values with best_effort_timestamp.

Pull Request resolved: https://github.com/pytorch/audio/pull/2970

Reviewed By: YosuaMichael

Differential Revision: D42425771

Pulled By: mthrok

fbshipit-source-id: 7b7a033ea2ad89bb49d6e1663d35d377dab2aae9

1717edaa

06 Jan, 2023 2 commits

Add utility functions to fetch available formats/devices/codecs/protocols. (#2958) · b6d147ad

moto authored Jan 06, 2023

Summary:
This commit adds utility functions that fetch the available/supported formats/devices/codecs.

These functions are mostly same with commands like `ffmpeg -decoders`. But the use of `ffmpeg` CLI can report different resutls if there are multiple installation of FFmpegs. Or, the CLI might not be available.

Pull Request resolved: https://github.com/pytorch/audio/pull/2958

Reviewed By: hwangjeff

Differential Revision: D42371640

Pulled By: mthrok

fbshipit-source-id: 96a96183815a126cb1adc97ab7754aef216fff6f

b6d147ad

Reduce the sample rate of some tests (#2963) · d6dbe03f

Moto Hira authored Jan 05, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2963

Phaser batch consistency test takes longer than the rest.
Change the sample rate from 44100 to 8000.

Reviewed By: hwangjeff

Differential Revision: D42379064

fbshipit-source-id: 2005b833c696bb3c2bb1d21c38c39e6163d81d53

d6dbe03f

05 Jan, 2023 2 commits

Rename generator to vocoder in HiFiGAN model and factory functions (#2955) · 5e75c8e8

Zhaoheng Ni authored Jan 05, 2023

Summary:
The generator part of HiFiGAN model is a vocoder which converts mel spectrogram to waveform. It makes more sense to name it as vocoder for better understanding.

Pull Request resolved: https://github.com/pytorch/audio/pull/2955

Reviewed By: carolineechen

Differential Revision: D42348864

Pulled By: nateanl

fbshipit-source-id: c45a2f8d8d205ee381178ae5d37e9790a257e1aa

5e75c8e8

Add HiFiGAN bundle (#2921) · 54e5c859

Grigory Sizov authored Jan 05, 2023

Summary:
Closes [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314)
## Description
- Add  bundle `HIFIGAN_GENERATOR_V3_LJSPEECH` to prototypes. The bundle contains pre-trained HiFiGAN generator weights from the [original HiFiGAN publication](https://github.com/jik876/hifi-gan#pretrained-model), converted slightly to fit our model
- Add tests
  - unit tests checking that vocoder and mel-transform implementations in the bundle give the same results as the original ones. Part of the original HiFiGAN code is ported to this repo to enable these tests
  - integration test checking that waveform reconstructed from mel spectrogram by the bundle is close enough to the original
- Add docs

Pull Request resolved: https://github.com/pytorch/audio/pull/2921

Reviewed By: nateanl, mthrok

Differential Revision: D42034761

Pulled By: sgrigory

fbshipit-source-id: 8b0dadeed510b3c9371d6aa2c46ec7d8378f6048

54e5c859