Commits · 901628126abf6af4f61f6a08d0db7ed22c86fbb1 · OpenDAS / Torchaudio

09 Dec, 2022 2 commits

Fix integration test for WAV2VEC2_ASR_LARGE_LV60K_10M (#2910) · 90162812

Zhaoheng Ni authored Dec 09, 2022

Summary:
After https://github.com/pytorch/audio/issues/2873, the pre-trained Wav2Vec2 models with larger datasets can get better performances. The PR fixes the integration test of bundle `WAV2VEC2_ASR_LARGE_LV60K_10M` which predicts the word `CURIOUSITY` to `CURIOUSSITY` before but now to `CURIOUSITY` correctly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2910

Reviewed By: mthrok

Differential Revision: D41881919

Pulled By: nateanl

fbshipit-source-id: 236fd00b983a5205c731f3efa31033a6b8257cab

90162812

Toggle on/off ffmpeg test if needed (#2901) · ccda545c

atalman authored Dec 09, 2022

Summary:
Toggle on/off ffmpeg test if needed
By default it ON, hence should not affect any current tests.
To toggle ON no change required.
To toggle OFF use:
```
smoke_test.py --no-ffmpeg
```

To be used when calling from builder currently. Since we do not install ffmpeg currently.

Pull Request resolved: https://github.com/pytorch/audio/pull/2901

Reviewed By: carolineechen, mthrok

Differential Revision: D41874976

Pulled By: atalman

fbshipit-source-id: c57b19f37c63a1f476f93a5211550e980e67d9c7

ccda545c

08 Dec, 2022 1 commit

Add HiFi GAN Generator to prototypes (#2860) · b5e4663a

Grigory Sizov authored Dec 08, 2022

Summary:
Part 1 of [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314)

This PR ports the generator part of [HiFi GAN](https://arxiv.org/abs/2010.05646v2) from [the original implementation](https://github.com/jik876/hifi-gan/blob/4769534d45265d52a904b850da5a622601885777/models.py#L75)

Adds tests:
- Smoke tests for architectures V1, V2, V3
- Check that output shapes are correct
- Check that the model is torchscriptable and scripting doesn't change the output
- Check that our code's output matches the original implementation. Here I clone the original repo inside `/tmp` and import necessary objects from inside the test function.  On test teardown I restore `PATH`, but don't remove the cloned code, so that it can be reused on subsequent runs - let me know if removing it would be a better practice

There are no quantization tests, because the model consists mainly of `Conv1d` and `ConvTransposed1d`, and they are [not supported by dynamic quantization](https://pytorch.org/docs/stable/quantization.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2860

Reviewed By: nateanl

Differential Revision: D41433416

Pulled By: sgrigory

fbshipit-source-id: f135c560df20f5138f01e3efdd182621edabb4f5

b5e4663a

07 Dec, 2022 2 commits

Add additive noise transform (#2889) · 29ecf7e8

hwangjeff authored Dec 07, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2889

Reviewed By: xiaohui-zhang

Differential Revision: D41760084

Pulled By: hwangjeff

fbshipit-source-id: d2f5253e1fae7e7aafa9fa6043c6a7045c5b33a0

29ecf7e8

Introduce MUSAN dataset (#2888) · 45c7d05a

hwangjeff authored Dec 06, 2022

Summary:
Introduces the MUSAN dataset (https://www.openslr.org/17/), which contains music, speech, and noise recordings.

Pull Request resolved: https://github.com/pytorch/audio/pull/2888

Reviewed By: xiaohui-zhang

Differential Revision: D41762164

Pulled By: hwangjeff

fbshipit-source-id: 14d5baaa4d40f065dd5d99bf7f2e0a73aa6c31a9

45c7d05a

06 Dec, 2022 1 commit

Add frequency_impulse_response (#2879) · d234498c

moto authored Dec 06, 2022

Summary:
This commit adds `frequency_impulse_response` function, which generates filter from desired frequency response.

[Example](https://output.circle-artifacts.com/output/job/5233fda9-dadb-4710-9389-7e8ac20a062f/artifacts/0/docs/tutorials/filter_design_tutorial.html#frequency-sampling)

Pull Request resolved: https://github.com/pytorch/audio/pull/2879

Reviewed By: hwangjeff

Differential Revision: D41767787

Pulled By: mthrok

fbshipit-source-id: 6d5e44c6390e8cf3028994a1b1de590ff3aaf6c2

d234498c

04 Dec, 2022 1 commit

Fix _init_hubert_pretrain_model (#2886) · d8a5a11d

Zhaoheng Ni authored Dec 03, 2022

Summary:
address https://github.com/pytorch/audio/issues/2885

In `_init_hubert_pretrain_model ` method which initialize the hubert pretrain models, `kaiming_normal_` should be applied on `ConvLayerBlock` instead of `LayerNorm` layer. This PR fixes it and adds more unit tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2886

Reviewed By: hwangjeff

Differential Revision: D41713801

Pulled By: nateanl

fbshipit-source-id: ed199baf7504d06bbf2d31c522ae708a75426a2d

d8a5a11d

02 Dec, 2022 1 commit

Add pre-emphasis and de-emphasis functions (#2871) · 55e9978a

hwangjeff authored Dec 01, 2022

Summary:
Adds pre-emphasis and de-emphasis functions.

Pull Request resolved: https://github.com/pytorch/audio/pull/2871

Reviewed By: carolineechen

Differential Revision: D41651097

Pulled By: hwangjeff

fbshipit-source-id: 7a3cf6ce68b6ce1b9ae315ddd8bd8ed71acccdf1

55e9978a

30 Nov, 2022 1 commit

Add speed and speed perturbation functions and transforms (#2829) · c28073cc

hwangjeff authored Nov 30, 2022

Summary:
Adds functions and transforms for speed and speed perturbation (https://www.isca-speech.org/archive/interspeech_2015/ko15_interspeech.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2829

Reviewed By: xiaohui-zhang

Differential Revision: D41285114

Pulled By: hwangjeff

fbshipit-source-id: 114740507698e01f35d4beb2c568a2479e847506

c28073cc

29 Nov, 2022 3 commits

Add sinc_impulse_response op (#2875) · fc0720b4

moto authored Nov 29, 2022

Summary:
This commit adds `sinc_impulse_response`, which generates windowed-sinc low-pass filters for given cutoff frequencies.

Example usage:
 - [Filter Design Tutorial](https://output.circle-artifacts.com/output/job/c0085baa-5345-4aeb-bd44-448034caa9e1/artifacts/0/docs/tutorials/filter_design_tutorial.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2875

Reviewed By: carolineechen

Differential Revision: D41586631

Pulled By: mthrok

fbshipit-source-id: a9991dbe5b137b0b4679228ec37072a1da7e50bb

fc0720b4

Extend fftconvolve to support broadcast-able shapes (#2874) · 7a05622e

moto authored Nov 29, 2022

Summary:
Currently, fftconvolve only accepts the tensors for the exact same leading dimensions.
This commit loosens the restriction to allow shapes that are broadcast-able.

This makes the fftconvolve operation more efficient for cases like signal filtering where one operand (waveform) is larger than the other (filter kernel) and the same filter kernels are applied across channels and batches.

Pull Request resolved: https://github.com/pytorch/audio/pull/2874

Reviewed By: carolineechen

Differential Revision: D41581588

Pulled By: mthrok

fbshipit-source-id: c0117e11b979fb53236cc307a970a461b0e50134

7a05622e

Add conformer wav2vec2 pretrain model (#2827) · 8bde6a54

Caroline Chen authored Nov 29, 2022

Summary:
modeled after [paper](https://arxiv.org/pdf/2110.07313.pdf) and internal flow f288347302

internal comparison tests: D40080919

Pull Request resolved: https://github.com/pytorch/audio/pull/2827

Reviewed By: nateanl

Differential Revision: D41569046

Pulled By: carolineechen

fbshipit-source-id: 43c5313074af05972d93da55b2029c746b75c380

8bde6a54

28 Nov, 2022 2 commits

Add aux_num_out to emformer_hubert_model (#2868) · b0795ebe

Zhaoheng Ni authored Nov 28, 2022

Summary:
- layer_norm in `EmformerEncoder` is set as default in emformer_hubert_model, change the type to be non-optional.
- add `aux_num_out` to emformer_hubert_model to support fine-tuning model.
- update unit tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2868

Reviewed By: carolineechen

Differential Revision: D41451311

Pulled By: nateanl

fbshipit-source-id: 5fa0f19255e4f01e001d62f8689e36f134030083

b0795ebe

Add extend_pitch (#2863) · 3882c395

moto authored Nov 27, 2022

Summary:
Add `extend_pitch` function that can be used for augmenting fundamental frequencies with its harmonic overtones or inharmonic partials. it can be use for amplitude as well.

For example usages, see https://output.circle-artifacts.com/output/job/4ad0c29a-d75a-4244-baad-f5499f11d94b/artifacts/0/docs/tutorials/synthesis_tutorial.html

Part of https://github.com/pytorch/audio/issues/2835
Extracted from https://github.com/pytorch/audio/issues/2808

Pull Request resolved: https://github.com/pytorch/audio/pull/2863

Reviewed By: carolineechen

Differential Revision: D41543880

Pulled By: mthrok

fbshipit-source-id: 4f20e55770b0b3bee825ec07c73f9ec7cb181109

3882c395

19 Nov, 2022 1 commit

Add torchscript test to oscillator_bank (#2864) · 8ba323bb

moto authored Nov 18, 2022

Summary:
Missing from https://github.com/pytorch/audio/issues/2848

Pull Request resolved: https://github.com/pytorch/audio/pull/2864

Reviewed By: carolineechen

Differential Revision: D41413381

Pulled By: mthrok

fbshipit-source-id: 4377ed4a59504c6ade9ee6f42938a2bc3f04fb73

8ba323bb

18 Nov, 2022 1 commit

Add emformer hubert model architecture (#2836) · 92b6847e

Zhaoheng Ni authored Nov 18, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2836

Reviewed By: carolineechen

Differential Revision: D41208630

Pulled By: nateanl

fbshipit-source-id: 625e1651f0b8a6e20876409739cf7084cb7c748b

92b6847e

17 Nov, 2022 2 commits

Add adsr_envelope (#2859) · 793ff00b

moto authored Nov 17, 2022

Summary:
Add adsr_envelope op, which generates ADSR envelope

* Supports generation of the envelope on GPU
* Supports optional Hold
* Supports polynomial decay

<image src='https://download.pytorch.org/torchaudio/doc-assets/adsr_examples.png'>

Pull Request resolved: https://github.com/pytorch/audio/pull/2859

Reviewed By: nateanl

Differential Revision: D41379601

Pulled By: mthrok

fbshipit-source-id: 3717a6e0360d2a24913c2a836c57c5edec1d7b31

793ff00b

Add oscillator_bank (#2848) · e502b10c

moto authored Nov 16, 2022

Summary:
This commit adds `oscillator_bank` op, which is the core of (differential) digital signal processing ops.
The implementation itself is pretty simple, sum instantaneous frequencies, take sin and multiply with amplitudes.

Following the magenta implementation, amplitudes for frequency range outside of [-Nyquist, Nyquist] \
are suppressed.

The differentiability is tested within frequency range of [- Nyquist, Nyquist], and amplitude range of [-5, 5], which should be enough.

For example usages:
 - https://output.circle-artifacts.com/output/job/129f3e21-41ce-406b-bc6b-833efb3c3141/artifacts/0/docs/tutorials/oscillator_tutorial.html
 - https://output.circle-artifacts.com/output/job/129f3e21-41ce-406b-bc6b-833efb3c3141/artifacts/0/docs/tutorials/synthesis_tutorial.html

Part of https://github.com/pytorch/audio/issues/2835
Extracted from https://github.com/pytorch/audio/issues/2808

Pull Request resolved: https://github.com/pytorch/audio/pull/2848

Reviewed By: carolineechen

Differential Revision: D41353075

Pulled By: mthrok

fbshipit-source-id: 80e60772fb555760f2396f7df40458803c280225

e502b10c

15 Nov, 2022 1 commit

Add WavLM bundles (#2833) · 26f62dc5

Grigory Sizov authored Nov 15, 2022

Summary:
Closes T136364380, follow-up to https://github.com/pytorch/audio/issues/2822

- Added "base", "base+", and "large" bundles for WavLM
- Expanded `wav2vec2_pipeline_test.py` to include the new bundles
- Added the new bundles to docs in `pipelines.rst`

Pull Request resolved: https://github.com/pytorch/audio/pull/2833

Reviewed By: nateanl

Differential Revision: D41194796

Pulled By: sgrigory

fbshipit-source-id: bf8e96c05b6a81ac5c5a014c46adeeac12685328

26f62dc5

14 Nov, 2022 1 commit

Move bark spectrogram to prototype (#2843) · 7819f3f6

Caroline Chen authored Nov 14, 2022

Summary:
follow up to https://github.com/pytorch/audio/issues/2823
- move bark spectrogram to prototype
- decrease autograd test tolerance (passing on circle ci)
- add diagram for bark fbanks

cc jdariasl

Pull Request resolved: https://github.com/pytorch/audio/pull/2843

Reviewed By: nateanl

Differential Revision: D41199522

Pulled By: carolineechen

fbshipit-source-id: 8e6c2e20fb7b14f39477683b3c6ed8356359a213

7819f3f6

10 Nov, 2022 2 commits

BarkSpectrogram (#2823) · b326bc49

Julián D. Arias-Londoño authored Nov 10, 2022

Summary:
I have added BarkScale transform, which can transform a regular Spectrogram into a BarkSpectrograms similar to MelScale. ahmed-fau opened this requirement in December 2021 with the number (https://github.com/pytorch/audio/issues/2103). The new functionality includes three different well-known approximations of the Bark scale.

Pull Request resolved: https://github.com/pytorch/audio/pull/2823

Reviewed By: nateanl

Differential Revision: D41162100

Pulled By: carolineechen

fbshipit-source-id: b2670c4972e49c9ef424da5d5982576f7a4df831

b326bc49

Add conformer w2v2 model architecture (#2826) · 74f9a894

Caroline Chen authored Nov 09, 2022

Summary:
internal comparison tests: D40080919

follow up PR for pretrained models https://github.com/pytorch/audio/issues/2827

Pull Request resolved: https://github.com/pytorch/audio/pull/2826

Reviewed By: nateanl

Differential Revision: D41160061

Pulled By: carolineechen

fbshipit-source-id: f3c478b28c235af53d1d8e21b573c53684a63ac4

74f9a894

09 Nov, 2022 1 commit

Add WavLM model (#2822) · bd76d3d7

Grigory Sizov authored Nov 09, 2022

Summary:
Closes T136364380

Added [WavLM Model](https://github.com/microsoft/UniSpeech/tree/main/WavLM):
- Added `WavLMSelfAttention` class (from [original implementation](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py)) and adjusted existing Encoder and Transformer classes to be compatible with it
- Added factory functions `wavlm_model`, `wavlm_base`, `wavlm_large` to `models/wav2vec2/model.py`
- Added bundles for base and large models to pipelines. **TODO**: pre-trained model weights are not yet uploaded to `download.pytorch.org`, permissions not granted yet.

## Tests
- Expanded HuggingFace integration tests to cover WavLM. For there tests, added JSON configs for base and large models from HF ([base](https://huggingface.co/microsoft/wavlm-base/blob/main/config.json), [large](https://huggingface.co/microsoft/wavlm-large/blob/main/config.json)) into test assets
- Expanded TorchScript and quantization tests to cover WavLM

## Comments
There are a few workarounds I had to introduce:
- Quantization tests for WavLM were breaking down at [`torch.cat`](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR466) ~~until I excluded the arguments of `torch.cat` from quantization [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR368-R369). I haven't found a better way to fix it, let me know if there is one~~ The reason for this seems to be that quantization replaces `.bias` and `.weight` attributes of a `Linear` module with methods. Since we are using weights and biases directly, the code was break. The final solution suggested by nateanl was to define attention weights and biases directly in `WavLMSelfAttention`, skipping the `Linear` layers
- ~~WavLM uses position embedding in the first layer of encoder, but not in the subsequent ones.  So [UniSpeech](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py#L342) and [HF](https://github.com/huggingface/transformers/blob/b047472650cba259621549ac27b18fd2066ce18e/src/transformers/models/wavlm/modeling_wavlm.py#L441-L442) implementations only create this embedding module in the layers where it's used. However, we can't do this here because it breaks TorchScript. So as a solution I add a dummy `Identity` module to `WavLMSelfAttention` when the actual embedding is not needed: [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR361-R368).~~ Thanks nateanl for resolving this!
- I had to add dummy `position_bias` and `key_padding_mask` arguments to `SelfAttention.forward` to make TorchScript tests pass. Since both `SelfAttention` and `WavLMSelfAttention` are called from `EncoderLayer`, they need to have compatible signatures. Having a variable number of arguments with `**kwargs` or checking object class doesn't seem to work with TorchScript, so I instead made both types of attention accept `position_bias` and `key_padding_mask` arguments.

Nit: do we still need to specify `__all__` if there are no wildcard imports in `__init__.py`, e.g. in `torchaudio/models/__init__.py`?

Pull Request resolved: https://github.com/pytorch/audio/pull/2822

Reviewed By: nateanl

Differential Revision: D41121855

Pulled By: sgrigory

fbshipit-source-id: 9f4f787e5810010de4e74cb704063a26c66767d7

bd76d3d7

08 Nov, 2022 2 commits

Enable log probs input for rnnt loss (#2798) · ca478823

Caroline Chen authored Nov 08, 2022

Summary:
Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss.

If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function.

The following should produce the same output:
```
rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True)
```

```
log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False)
```

testing -- unit tests + get same results on the conformer rnnt recipe

Pull Request resolved: https://github.com/pytorch/audio/pull/2798

Reviewed By: xiaohui-zhang

Differential Revision: D41083523

Pulled By: carolineechen

fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61

ca478823

Add convolution transforms (#2811) · 2d99fee2

hwangjeff authored Nov 07, 2022

Summary:
Adds `torch.nn.Module`-based implementations for convolution and FFT convolution.

Pull Request resolved: https://github.com/pytorch/audio/pull/2811

Reviewed By: carolineechen

Differential Revision: D40881937

Pulled By: hwangjeff

fbshipit-source-id: bfe8969e6178ad4f58981efd4b2720ac006be8de

2d99fee2

04 Nov, 2022 1 commit

Fix decimal FPS handling StreamWriter (#2831) · 6bd38512

moto authored Nov 04, 2022

Summary:
StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption.

This commit fixes it by properly computing time_base from frame rate.

Address https://github.com/pytorch/audio/issues/2830

Pull Request resolved: https://github.com/pytorch/audio/pull/2831

Reviewed By: carolineechen

Differential Revision: D41036084

Pulled By: mthrok

fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609

6bd38512

31 Oct, 2022 1 commit

Add precise seek (#2737) · 60f29ca0

Joao Gomes authored Oct 31, 2022

Summary:
cc mthrok

Implements precise seek and seek to any frame in torchaudio

Pull Request resolved: https://github.com/pytorch/audio/pull/2737

Reviewed By: mthrok

Differential Revision: D40546716

Pulled By: jdsgomes

fbshipit-source-id: d37da7f55977337eb16a3c4df44ce8c3c102698e

60f29ca0

28 Oct, 2022 1 commit

Introduce argument 'mode' for convolution functions (#2801) · 86d596d3

hwangjeff authored Oct 28, 2022

Summary:
Introduces argument 'mode' for convolution functions, following SciPy's convention.

Pull Request resolved: https://github.com/pytorch/audio/pull/2801

Reviewed By: nateanl

Differential Revision: D40805405

Pulled By: hwangjeff

fbshipit-source-id: 8f0006ffe9e3945b4b17f44c4cfa1adb265c20ef

86d596d3

26 Oct, 2022 1 commit

Deprecate 'onesided' init param for MelSpectrogram (#2797) · 546e699a

hwangjeff authored Oct 26, 2022

Summary:
Initializer parameter `onesided` isn't relevant to `MelSpectrogram` — it should always be `True`. In fact, the module already assumes `onesided == True` in the filterbank it generates and fails in its forward pass when `onesided == False`. Accordingly, this PR makes param `onesided` optional and adds a deprecation warning that's fired when the param is provided.

Pull Request resolved: https://github.com/pytorch/audio/pull/2797

Reviewed By: carolineechen, xiaohui-zhang

Differential Revision: D40731238

Pulled By: hwangjeff

fbshipit-source-id: 6eea8eb9d4a85a805162e03ad91682a1946f92cd

546e699a

25 Oct, 2022 1 commit

Fix issue with the missing video frame in StreamWriter (#2789) · 17a2b93b

moto authored Oct 24, 2022

Summary:
Addresses https://github.com/pytorch/audio/issues/2790.

Previously AVPacket objects had duration==0.

`av_interleaved_write_frame` function was inferring the duration of packets by
comparing them against the next ones but It could not infer the duration of
the last packet, as there is no subsequent frame, thus was omitting it from the final data.

This commit fixes it by explicitly setting packet duration = 1 (one frame)
only for video. (audio AVPacket contains multiple samples, so it's different.
To ensure the correctness for audio, the tests were added.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2789

Reviewed By: xiaohui-zhang

Differential Revision: D40627439

Pulled By: mthrok

fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320

17a2b93b

19 Oct, 2022 2 commits

Add iemocap variants (#2778) · 34255386

Caroline Chen authored Oct 19, 2022

Summary:
add ability to load only improvised or only scripted utterances.

Pull Request resolved: https://github.com/pytorch/audio/pull/2778

Reviewed By: nateanl

Differential Revision: D40511865

Pulled By: carolineechen

fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539

34255386

Add file_name to the returned item in Snips dataset (#2775) · e8ae0ad2

Zhaoheng Ni authored Oct 18, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775

Reviewed By: carolineechen

Differential Revision: D40481144

Pulled By: nateanl

fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e

e8ae0ad2

12 Oct, 2022 1 commit

Skip hubert xlarge torchscript test (#2758) · c2ea6898

Caroline Chen authored Oct 11, 2022

Summary:
a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci

cc atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2758

Reviewed By: mthrok

Differential Revision: D40290535

Pulled By: carolineechen

fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57

c2ea6898

11 Oct, 2022 1 commit

Add Snips Dataset (#2738) · 84187909

Zhaoheng Ni authored Oct 10, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738

Reviewed By: carolineechen

Differential Revision: D40238099

Pulled By: nateanl

fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e

84187909

10 Oct, 2022 1 commit

Add unit test for LibriMix dataset (#2659) · c5b8e585

Zhaoheng Ni authored Oct 10, 2022

Summary:
Besides the unit test, the PR also addresses these issues:
- The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
- If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.

Pull Request resolved: https://github.com/pytorch/audio/pull/2659

Reviewed By: carolineechen

Differential Revision: D40229227

Pulled By: nateanl

fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235

c5b8e585

09 Oct, 2022 1 commit

Add IEMOCAP dataset (#2732) · 0b4b1fd4

Caroline Chen authored Oct 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732

Reviewed By: nateanl

Differential Revision: D40186996

Pulled By: nateanl

fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4

0b4b1fd4

07 Oct, 2022 1 commit

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740) · 7729723b

hwangjeff authored Oct 07, 2022

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

7729723b

21 Sep, 2022 1 commit

Support in-memory decoding via Tensor wrapper in StreamReader (#2694) · c5a43372

Moto Hira authored Sep 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

c5a43372

14 Sep, 2022 1 commit

Move Hybrid Demucs pipeline to beta (#2673) · 60868748

Caroline Chen authored Sep 14, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

60868748

13 Sep, 2022 1 commit

Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669) · 4d535e88

Zhaoheng Ni authored Sep 13, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669

Reviewed By: carolineechen, mthrok

Differential Revision: D39433560

Pulled By: nateanl

fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb

4d535e88