Commits · ca478823c1d40c2d9b5ebf1908ed2f87ddf8a894 · OpenDAS / Torchaudio

08 Nov, 2022 2 commits

Enable log probs input for rnnt loss (#2798) · ca478823

Caroline Chen authored Nov 08, 2022

Summary:
Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss.

If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function.

The following should produce the same output:
```
rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True)
```

```
log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False)
```

testing -- unit tests + get same results on the conformer rnnt recipe

Pull Request resolved: https://github.com/pytorch/audio/pull/2798

Reviewed By: xiaohui-zhang

Differential Revision: D41083523

Pulled By: carolineechen

fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61

ca478823

Add convolution transforms (#2811) · 2d99fee2

hwangjeff authored Nov 07, 2022

Summary:
Adds `torch.nn.Module`-based implementations for convolution and FFT convolution.

Pull Request resolved: https://github.com/pytorch/audio/pull/2811

Reviewed By: carolineechen

Differential Revision: D40881937

Pulled By: hwangjeff

fbshipit-source-id: bfe8969e6178ad4f58981efd4b2720ac006be8de

2d99fee2

04 Nov, 2022 1 commit

Fix decimal FPS handling StreamWriter (#2831) · 6bd38512

moto authored Nov 04, 2022

Summary:
StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption.

This commit fixes it by properly computing time_base from frame rate.

Address https://github.com/pytorch/audio/issues/2830

Pull Request resolved: https://github.com/pytorch/audio/pull/2831

Reviewed By: carolineechen

Differential Revision: D41036084

Pulled By: mthrok

fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609

6bd38512

31 Oct, 2022 1 commit

Add precise seek (#2737) · 60f29ca0

Joao Gomes authored Oct 31, 2022

Summary:
cc mthrok

Implements precise seek and seek to any frame in torchaudio

Pull Request resolved: https://github.com/pytorch/audio/pull/2737

Reviewed By: mthrok

Differential Revision: D40546716

Pulled By: jdsgomes

fbshipit-source-id: d37da7f55977337eb16a3c4df44ce8c3c102698e

60f29ca0

28 Oct, 2022 1 commit

Introduce argument 'mode' for convolution functions (#2801) · 86d596d3

hwangjeff authored Oct 28, 2022

Summary:
Introduces argument 'mode' for convolution functions, following SciPy's convention.

Pull Request resolved: https://github.com/pytorch/audio/pull/2801

Reviewed By: nateanl

Differential Revision: D40805405

Pulled By: hwangjeff

fbshipit-source-id: 8f0006ffe9e3945b4b17f44c4cfa1adb265c20ef

86d596d3

26 Oct, 2022 1 commit

Deprecate 'onesided' init param for MelSpectrogram (#2797) · 546e699a

hwangjeff authored Oct 26, 2022

Summary:
Initializer parameter `onesided` isn't relevant to `MelSpectrogram` — it should always be `True`. In fact, the module already assumes `onesided == True` in the filterbank it generates and fails in its forward pass when `onesided == False`. Accordingly, this PR makes param `onesided` optional and adds a deprecation warning that's fired when the param is provided.

Pull Request resolved: https://github.com/pytorch/audio/pull/2797

Reviewed By: carolineechen, xiaohui-zhang

Differential Revision: D40731238

Pulled By: hwangjeff

fbshipit-source-id: 6eea8eb9d4a85a805162e03ad91682a1946f92cd

546e699a

25 Oct, 2022 1 commit

Fix issue with the missing video frame in StreamWriter (#2789) · 17a2b93b

moto authored Oct 24, 2022

Summary:
Addresses https://github.com/pytorch/audio/issues/2790.

Previously AVPacket objects had duration==0.

`av_interleaved_write_frame` function was inferring the duration of packets by
comparing them against the next ones but It could not infer the duration of
the last packet, as there is no subsequent frame, thus was omitting it from the final data.

This commit fixes it by explicitly setting packet duration = 1 (one frame)
only for video. (audio AVPacket contains multiple samples, so it's different.
To ensure the correctness for audio, the tests were added.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2789

Reviewed By: xiaohui-zhang

Differential Revision: D40627439

Pulled By: mthrok

fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320

17a2b93b

19 Oct, 2022 2 commits

Add iemocap variants (#2778) · 34255386

Caroline Chen authored Oct 19, 2022

Summary:
add ability to load only improvised or only scripted utterances.

Pull Request resolved: https://github.com/pytorch/audio/pull/2778

Reviewed By: nateanl

Differential Revision: D40511865

Pulled By: carolineechen

fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539

34255386

Add file_name to the returned item in Snips dataset (#2775) · e8ae0ad2

Zhaoheng Ni authored Oct 18, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775

Reviewed By: carolineechen

Differential Revision: D40481144

Pulled By: nateanl

fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e

e8ae0ad2

12 Oct, 2022 1 commit

Skip hubert xlarge torchscript test (#2758) · c2ea6898

Caroline Chen authored Oct 11, 2022

Summary:
a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci

cc atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2758

Reviewed By: mthrok

Differential Revision: D40290535

Pulled By: carolineechen

fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57

c2ea6898

11 Oct, 2022 1 commit

Add Snips Dataset (#2738) · 84187909

Zhaoheng Ni authored Oct 10, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738

Reviewed By: carolineechen

Differential Revision: D40238099

Pulled By: nateanl

fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e

84187909

10 Oct, 2022 1 commit

Add unit test for LibriMix dataset (#2659) · c5b8e585

Zhaoheng Ni authored Oct 10, 2022

Summary:
Besides the unit test, the PR also addresses these issues:
- The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
- If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.

Pull Request resolved: https://github.com/pytorch/audio/pull/2659

Reviewed By: carolineechen

Differential Revision: D40229227

Pulled By: nateanl

fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235

c5b8e585

09 Oct, 2022 1 commit

Add IEMOCAP dataset (#2732) · 0b4b1fd4

Caroline Chen authored Oct 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732

Reviewed By: nateanl

Differential Revision: D40186996

Pulled By: nateanl

fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4

0b4b1fd4

07 Oct, 2022 1 commit

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740) · 7729723b

hwangjeff authored Oct 07, 2022

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

7729723b

21 Sep, 2022 1 commit

Support in-memory decoding via Tensor wrapper in StreamReader (#2694) · c5a43372

Moto Hira authored Sep 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

c5a43372

14 Sep, 2022 1 commit

Move Hybrid Demucs pipeline to beta (#2673) · 60868748

Caroline Chen authored Sep 14, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

60868748

13 Sep, 2022 1 commit

Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669) · 4d535e88

Zhaoheng Ni authored Sep 13, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669

Reviewed By: carolineechen, mthrok

Differential Revision: D39433560

Pulled By: nateanl

fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb

4d535e88

12 Sep, 2022 1 commit

Move hybrid demucs model out of prototype (#2668) · ec0e3a80

Caroline Chen authored Sep 12, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668

Reviewed By: nateanl, mthrok

Differential Revision: D39433671

Pulled By: carolineechen

fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c

ec0e3a80

01 Sep, 2022 1 commit

Add file-like object support to StreamWriter (#2648) · 28da8b84

moto authored Aug 31, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

28da8b84

24 Aug, 2022 1 commit

Add StreamWriter (#2628) · 72404de9

moto authored Aug 24, 2022

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

72404de9

11 Aug, 2022 1 commit

Add additive noise function (#2608) · f3bb30b8

hwangjeff authored Aug 11, 2022

Summary:
Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2608

Reviewed By: nateanl

Differential Revision: D38557141

Pulled By: hwangjeff

fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0

f3bb30b8

09 Aug, 2022 1 commit

Add NNLM support to CTC Decoder (#2528) · 03a0d68e

Caroline Chen authored Aug 09, 2022

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

03a0d68e

05 Aug, 2022 1 commit

Add convolution operator (#2602) · b396157d

hwangjeff authored Aug 05, 2022

Summary:
Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT.

Pull Request resolved: https://github.com/pytorch/audio/pull/2602

Reviewed By: nateanl, mthrok

Differential Revision: D38450771

Pulled By: hwangjeff

fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b

b396157d

03 Aug, 2022 2 commits

Add HDEMUCS_HIGH_MUSDB (#2601) · 6ecc11c2

Sean Kim authored Aug 03, 2022

Summary:
Add new model pretrained weights and tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2601

Reviewed By: carolineechen, nateanl

Differential Revision: D38396673

Pulled By: skim0514

fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1

6ecc11c2

An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a

bshall authored Aug 03, 2022

Summary:
I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
- I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
- I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
- I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
- I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?

I hope this is helpful! looking forward to hearing from you.

Pull Request resolved: https://github.com/pytorch/audio/pull/2472

Reviewed By: hwangjeff

Differential Revision: D38389155

Pulled By: carolineechen

fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904

946b180a

28 Jul, 2022 2 commits

Add Union normalization parameter on spectrogram and inverse spectrogram (#2554) · 0fde7c57

Sean Kim authored Jul 28, 2022

Summary:
Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104

Pull Request resolved: https://github.com/pytorch/audio/pull/2554

Reviewed By: carolineechen, mthrok

Differential Revision: D38247554

Pulled By: skim0514

fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602

0fde7c57

Change docstring for easier understanding (#2570) · 338e3104

Sean Kim authored Jul 28, 2022

Summary:
Edit factory function's docstrings.

Pull Request resolved: https://github.com/pytorch/audio/pull/2570

Reviewed By: carolineechen

Differential Revision: D38250369

Pulled By: skim0514

fbshipit-source-id: fa777e37d7cc517cf4ff1842d5585bf36558f50a

338e3104

26 Jul, 2022 1 commit

New Pipeline edits for HDemucs (#2565) · 4c4da32c

Sean Kim authored Jul 25, 2022

Summary:
Created new branch and brought in commits due to rebasing issues, resolved conflicts on new branch, close old branch.

Pull Request resolved: https://github.com/pytorch/audio/pull/2565

Reviewed By: nateanl, mthrok

Differential Revision: D38131189

Pulled By: skim0514

fbshipit-source-id: 96531480cf50562944abb28d70879f21b4609f15

4c4da32c

25 Jul, 2022 1 commit

Integration test fix deleting temporary directory (#2569) · 8dcf06ac

Sean Kim authored Jul 25, 2022

Summary:
Previous Issue: --use-tmp-hub-dir expected the temp directories used to store large file to be deleted after each test case, but pytest erases directories after 3 full test sessions. This commit fixes by manually deleting a new subdirectory created in each test case. https://github.com/pytorch/audio/pull/2565#discussion_r929007101

Pull Request resolved: https://github.com/pytorch/audio/pull/2569

Reviewed By: nateanl

Differential Revision: D38117848

Pulled By: skim0514

fbshipit-source-id: 3767cb8df1238fd6218f6aaa58d5d583cea72699

8dcf06ac

22 Jul, 2022 1 commit

Add documents for SourceSeparationBundle (#2559) · 6cee56ab

Zhaoheng Ni authored Jul 22, 2022

Summary:
- Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
- Add citation of Libri2Mix dataset in the bundle documentation.
- url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.

Pull Request resolved: https://github.com/pytorch/audio/pull/2559

Reviewed By: carolineechen

Differential Revision: D38036116

Pulled By: nateanl

fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836

6cee56ab

21 Jul, 2022 1 commit

Add SourceSeparationBundle to prototype (#2440) · 83362580

Zhaoheng Ni authored Jul 20, 2022

Summary:
- Add SourceSeparationBundle class for source separation pipeline
- Add `CONVTASNET_BASE_LIBRI2MIX` that is trained on Libri2Mix dataset.
- Add integration test with example mixture audio and expected scale-invariant signal-to-distortion ratio (Si-SDR) score. The test computes the Si-SDR score with permutation-invariant training (PIT) criterion for all permutations of sources and use the highest value as the final output. The test verifies if the score is equal to or larger than the expected value.

Pull Request resolved: https://github.com/pytorch/audio/pull/2440

Reviewed By: mthrok

Differential Revision: D37997646

Pulled By: nateanl

fbshipit-source-id: c951bcbbe8b7ed9553cb8793d6dc1ef90d5a29fe

83362580

19 Jul, 2022 1 commit

Adding pipeline changes, factory functions to HDemucs (#2547) · 62854588

Sean Kim authored Jul 19, 2022

Summary:
Factory functions have been added to HDemucs class and test the implementation within the testing files.

Pull Request resolved: https://github.com/pytorch/audio/pull/2547

Reviewed By: carolineechen

Differential Revision: D37948600

Pulled By: skim0514

fbshipit-source-id: 7ac4e4a71519450cfbbc24ff7d7e70521f676040

62854588

12 Jul, 2022 1 commit

Hybrid Demucs model implementation (#2506) · 608b8ea6

Sean Kim authored Jul 12, 2022

Summary:
Draft PR with initial model implementation with minor changes from previous implementation

Pull Request resolved: https://github.com/pytorch/audio/pull/2506

Reviewed By: nateanl

Differential Revision: D37762671

Pulled By: skim0514

fbshipit-source-id: b7dc0a6ef725d6ae6d76c23c882623f7d339977c

608b8ea6

07 Jul, 2022 1 commit

Add YUV444P support to StreamReader (#2516) · b2a90f91

moto authored Jul 06, 2022

Summary:
This commit add support for `"yuv444p"` type as output format of StreamReader.

Pull Request resolved: https://github.com/pytorch/audio/pull/2516

Reviewed By: hwangjeff

Differential Revision: D37659715

Pulled By: mthrok

fbshipit-source-id: eae9b5590d8f138a6ebf3808c08adfe068f11a2b

b2a90f91

06 Jul, 2022 1 commit

Fix fluent test for windows (#2510) · 09daa438

Caroline Chen authored Jul 05, 2022

Summary:
fluent dataset test currently fails on windows, due to new line generation in csv writer in testing and incorrect path parsing in dataset impl.

Pull Request resolved: https://github.com/pytorch/audio/pull/2510

Reviewed By: carolineechen

Differential Revision: D37573203

Pulled By: mthrok

fbshipit-source-id: 4868bc649690c7e596b002686c6128ce735d3564

09daa438

28 Jun, 2022 1 commit

Refactor AVDictionary clean up (#2507) · 0ad03adf

moto authored Jun 27, 2022

Summary:
Small clean up in ffmpeg binding code.

1. Make `get_option_dict` and `clean_up_dict` public utility
2. Merge the exception into `clean_up_dict`
3. Get rid of custom string join function and use `c10::Join`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2507

Reviewed By: hwangjeff

Differential Revision: D37466022

Pulled By: mthrok

fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969

0ad03adf

27 Jun, 2022 4 commits

Add missing __init__ in io test directory (#2511) · d50ed521

moto authored Jun 27, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2511

Reviewed By: nateanl

Differential Revision: D37461021

Pulled By: mthrok

fbshipit-source-id: 6f894c02bbefc5afda0f9584d26ad785f7c71ee4

d50ed521

Fix download links of RNNT pipelines in prototype (#2444) · 9b4ee17c

Zhaoheng Ni authored Jun 27, 2022

Summary:
In https://github.com/pytorch/audio/issues/2283, torchaudio's downloading function is updated to reduce code duplication. The links in `EMFORMER_RNNT_BASE_LIBRISPEECH` are updated, but the ones in prototype pipelines are not. This PR addresses it by updating the download links of `EMFORMER_RNNT_BASE_MUSTC` and `EMFORMER_RNNT_BASE_TEDLIUM3` in prototype. Corresponding integration tests are added as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/2444

Reviewed By: mthrok

Differential Revision: D37389178

Pulled By: nateanl

fbshipit-source-id: 46598dd71c95be47d1e1b54cef89ea51d280e17a

9b4ee17c

Add utility function to fetch FFmpeg library versions (#2467) · 4ba7dc38

moto authored Jun 27, 2022

Summary:
Follow-up of https://github.com/pytorch/audio/issues/2464. Add utility function to fetch the versions of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/2467

Reviewed By: carolineechen

Differential Revision: D37028006

Pulled By: mthrok

fbshipit-source-id: 72adce1e6b43985760ce55b715b0e59af5244fdb

4ba7dc38

Add VoxCeleb1 dataset (#2349) · 21b2d139

Zhaoheng Ni authored Jun 27, 2022

Summary:
This PR adds two dataset classes of VoxCeleb1 corpus.
- `VoxCeleb1Identification`
Each data sample contains the waveform, sample rate, speaker id, and the file id.
- `VoxCeleb1Verification`
Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids.

Pull Request resolved: https://github.com/pytorch/audio/pull/2349

Reviewed By: carolineechen

Differential Revision: D35927921

Pulled By: nateanl

fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449

21b2d139