Commits · 0b4b1fd4296fd7d54f0059a23cf9c9e168ad9376 · OpenDAS / Torchaudio

"alphafold/model/tf/shape_placeholders.py" did not exist on "2f0d89e765051fc9e26fb4c52e5ad91bbb0e7e0b"

09 Oct, 2022 1 commit

Caroline Chen authored Oct 09, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732

Reviewed By: nateanl

Differential Revision: D40186996

Pulled By: nateanl

fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4

0b4b1fd4

07 Oct, 2022 1 commit

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740) · 7729723b

hwangjeff authored Oct 07, 2022

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

7729723b

21 Sep, 2022 1 commit

Support in-memory decoding via Tensor wrapper in StreamReader (#2694) · c5a43372

Moto Hira authored Sep 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

c5a43372

12 Sep, 2022 1 commit

Move hybrid demucs model out of prototype (#2668) · ec0e3a80

Caroline Chen authored Sep 12, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668

Reviewed By: nateanl, mthrok

Differential Revision: D39433671

Pulled By: carolineechen

fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c

ec0e3a80

01 Sep, 2022 1 commit

Add file-like object support to StreamWriter (#2648) · 28da8b84

moto authored Aug 31, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

28da8b84

24 Aug, 2022 1 commit

Add StreamWriter (#2628) · 72404de9

moto authored Aug 24, 2022

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

72404de9

11 Aug, 2022 1 commit

Add additive noise function (#2608) · f3bb30b8

hwangjeff authored Aug 11, 2022

Summary:
Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2608

Reviewed By: nateanl

Differential Revision: D38557141

Pulled By: hwangjeff

fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0

f3bb30b8

09 Aug, 2022 1 commit

Add NNLM support to CTC Decoder (#2528) · 03a0d68e

Caroline Chen authored Aug 09, 2022

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

03a0d68e

05 Aug, 2022 1 commit

Add convolution operator (#2602) · b396157d

hwangjeff authored Aug 05, 2022

Summary:
Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT.

Pull Request resolved: https://github.com/pytorch/audio/pull/2602

Reviewed By: nateanl, mthrok

Differential Revision: D38450771

Pulled By: hwangjeff

fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b

b396157d

03 Aug, 2022 1 commit

An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a

bshall authored Aug 03, 2022

Summary:
I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
- I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
- I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
- I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
- I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?

I hope this is helpful! looking forward to hearing from you.

Pull Request resolved: https://github.com/pytorch/audio/pull/2472

Reviewed By: hwangjeff

Differential Revision: D38389155

Pulled By: carolineechen

fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904

946b180a

28 Jul, 2022 2 commits

Add Union normalization parameter on spectrogram and inverse spectrogram (#2554) · 0fde7c57

Sean Kim authored Jul 28, 2022

Summary:
Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104

Pull Request resolved: https://github.com/pytorch/audio/pull/2554

Reviewed By: carolineechen, mthrok

Differential Revision: D38247554

Pulled By: skim0514

fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602

0fde7c57

Change docstring for easier understanding (#2570) · 338e3104

Sean Kim authored Jul 28, 2022

Summary:
Edit factory function's docstrings.

Pull Request resolved: https://github.com/pytorch/audio/pull/2570

Reviewed By: carolineechen

Differential Revision: D38250369

Pulled By: skim0514

fbshipit-source-id: fa777e37d7cc517cf4ff1842d5585bf36558f50a

338e3104

19 Jul, 2022 1 commit

Adding pipeline changes, factory functions to HDemucs (#2547) · 62854588

Sean Kim authored Jul 19, 2022

Summary:
Factory functions have been added to HDemucs class and test the implementation within the testing files.

Pull Request resolved: https://github.com/pytorch/audio/pull/2547

Reviewed By: carolineechen

Differential Revision: D37948600

Pulled By: skim0514

fbshipit-source-id: 7ac4e4a71519450cfbbc24ff7d7e70521f676040

62854588

12 Jul, 2022 1 commit

Hybrid Demucs model implementation (#2506) · 608b8ea6

Sean Kim authored Jul 12, 2022

Summary:
Draft PR with initial model implementation with minor changes from previous implementation

Pull Request resolved: https://github.com/pytorch/audio/pull/2506

Reviewed By: nateanl

Differential Revision: D37762671

Pulled By: skim0514

fbshipit-source-id: b7dc0a6ef725d6ae6d76c23c882623f7d339977c

608b8ea6

07 Jul, 2022 1 commit

Add YUV444P support to StreamReader (#2516) · b2a90f91

moto authored Jul 06, 2022

Summary:
This commit add support for `"yuv444p"` type as output format of StreamReader.

Pull Request resolved: https://github.com/pytorch/audio/pull/2516

Reviewed By: hwangjeff

Differential Revision: D37659715

Pulled By: mthrok

fbshipit-source-id: eae9b5590d8f138a6ebf3808c08adfe068f11a2b

b2a90f91

06 Jul, 2022 1 commit

Fix fluent test for windows (#2510) · 09daa438

Caroline Chen authored Jul 05, 2022

Summary:
fluent dataset test currently fails on windows, due to new line generation in csv writer in testing and incorrect path parsing in dataset impl.

Pull Request resolved: https://github.com/pytorch/audio/pull/2510

Reviewed By: carolineechen

Differential Revision: D37573203

Pulled By: mthrok

fbshipit-source-id: 4868bc649690c7e596b002686c6128ce735d3564

09daa438

28 Jun, 2022 1 commit

Refactor AVDictionary clean up (#2507) · 0ad03adf

moto authored Jun 27, 2022

Summary:
Small clean up in ffmpeg binding code.

1. Make `get_option_dict` and `clean_up_dict` public utility
2. Merge the exception into `clean_up_dict`
3. Get rid of custom string join function and use `c10::Join`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2507

Reviewed By: hwangjeff

Differential Revision: D37466022

Pulled By: mthrok

fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969

0ad03adf

27 Jun, 2022 3 commits

Add missing __init__ in io test directory (#2511) · d50ed521

moto authored Jun 27, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2511

Reviewed By: nateanl

Differential Revision: D37461021

Pulled By: mthrok

fbshipit-source-id: 6f894c02bbefc5afda0f9584d26ad785f7c71ee4

d50ed521

Add utility function to fetch FFmpeg library versions (#2467) · 4ba7dc38

moto authored Jun 27, 2022

Summary:
Follow-up of https://github.com/pytorch/audio/issues/2464. Add utility function to fetch the versions of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/2467

Reviewed By: carolineechen

Differential Revision: D37028006

Pulled By: mthrok

fbshipit-source-id: 72adce1e6b43985760ce55b715b0e59af5244fdb

4ba7dc38

Add VoxCeleb1 dataset (#2349) · 21b2d139

Zhaoheng Ni authored Jun 27, 2022

Summary:
This PR adds two dataset classes of VoxCeleb1 corpus.
- `VoxCeleb1Identification`
Each data sample contains the waveform, sample rate, speaker id, and the file id.
- `VoxCeleb1Verification`
Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids.

Pull Request resolved: https://github.com/pytorch/audio/pull/2349

Reviewed By: carolineechen

Differential Revision: D35927921

Pulled By: nateanl

fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449

21b2d139

23 Jun, 2022 1 commit

[AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · fee994ce

CodemodService FBSourceBlackLinterBot authored Jun 23, 2022

Summary:
Meta:
**If you take no action, this diff will be automatically accepted on 2022-06-23.**
(To remove yourself from auto-accept diffs and just let them all land, add yourself to [this Butterfly rule](https://www.internalfb.com/butterfly/rule/904302247110220))

Produced by `tools/arcanist/lint/codemods/black-fbsource`.

#nocancel

Rules run:
- CodemodTransformerSimpleShell

Config Oncall: [lint](https://our.intern.facebook.com/intern/oncall3/?shortname=lint)
CodemodConfig: [CodemodConfigFBSourceBlackLinter](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/fbsource_arc_f/CodemodConfigFBSourceBlackLinter.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/13510799586951394/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).

_____

## Questions / Comments / Feedback?

**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=13510799586951394).**

* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.

drop-conflicts

Reviewed By: adamjernst

Differential Revision: D37375235

fbshipit-source-id: 3d7eb39e5c0539a78d1412f37562dec90b0fc759

fee994ce

21 Jun, 2022 1 commit

Create musdb handler and tests (#2484) · b92a8a09

Sean Kim authored Jun 21, 2022

Summary:
Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks.

Pull Request resolved: https://github.com/pytorch/audio/pull/2484

Reviewed By: carolineechen, nateanl

Differential Revision: D37250556

Pulled By: skim0514

fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51

b92a8a09

20 Jun, 2022 1 commit

Add fluent speech commands (#2480) · 66a67d2e

Caroline Chen authored Jun 20, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2480

Reviewed By: nateanl

Differential Revision: D37249571

Pulled By: carolineechen

fbshipit-source-id: caefeec4253c91f2579655a0c1735edaeed51be9

66a67d2e

13 Jun, 2022 1 commit
- [AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 71ed457e
  CodemodService FBSourceBlackLinterBot authored Jun 13, 2022
```
Reviewed By: ivanmurashko

Differential Revision: D37103342

fbshipit-source-id: adc908c790a413384bd88a75d3c2b4b0974c6674
```
  71ed457e
10 Jun, 2022 1 commit

Modifying Pitchshift for faster resampling (#2441) · df2262b5

Sean Kim authored Jun 10, 2022

Summary:
Split existing Pitchshift into multiple helper functions in order to cache kernel and speed up overall process addressing https://github.com/pytorch/audio/issues/2359.
Existing unit tests pass.

edit: functional and transforms unit test pass. Adopted lazy initialization to avoid BC-breaking.

Pull Request resolved: https://github.com/pytorch/audio/pull/2441

Reviewed By: carolineechen

Differential Revision: D36905582

Pulled By: skim0514

fbshipit-source-id: 6780db3ac8a29d59017a6abe7e82ce1fd17aaac2

df2262b5

08 Jun, 2022 2 commits

Fix metadata fetch (#2464) · 4d2fa190

moto authored Jun 08, 2022

Summary:
In https://github.com/pytorch/audio/issues/2461, `metadata` field was added to StreamInfo.
However, the value attached to this new field was source-level metadata,
while each stream can have different metadata.

* source level metadata
[AVFormatContext->metadata](https://ffmpeg.org/doxygen/4.1/structAVFormatContext.html#a3019a56080ed2e3297ff25bc2ff88adf)
* stream level metadata
[AVFormatContext->streams[]->metadata](https://ffmpeg.org/doxygen/4.1/structAVStream.html#a50d250a128a3da9ce3d135e84213fb82)

This commit moves source level metadata to dedicated method, `get_metadata`, and
fix the stream-level metadata to report stream metadata.

Pull Request resolved: https://github.com/pytorch/audio/pull/2464

Reviewed By: hwangjeff, xiaohui-zhang

Differential Revision: D36995452

Pulled By: mthrok

fbshipit-source-id: 534be1f7feb07790a0ce8624c336cdb7b65a8697

4d2fa190

Add metadata to source stream info (#2461) · 10d1bd89

moto authored Jun 07, 2022

Summary:
Add metadata, such as ID3 (https://github.com/pytorch/audio/commit/7d98db0567cb60fabcc173949b8c08e3a3487ac2)tag to `StreamReaderSourceAudioStream`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2461

Reviewed By: hwangjeff

Differential Revision: D36985656

Pulled By: mthrok

fbshipit-source-id: e66f9e6e980eb57c378cc643a8979b6b7813dae7

10d1bd89

04 Jun, 2022 1 commit

Make FFmpeg log level configurable (#2439) · 877a88c5

moto authored Jun 03, 2022

Summary:
Undesired logs are one of the loudest UX complains we get.
Yet, loading media files involves uncertainty which is
difficult to debug without debug log.

This commit introduces utility functions to configure logging level
so that we can ask users to enable it when they encounter an issue,
while defaulting to non-verbose option.

Pull Request resolved: https://github.com/pytorch/audio/pull/2439

Reviewed By: hwangjeff, xiaohui-zhang

Differential Revision: D36903763

Pulled By: mthrok

fbshipit-source-id: f4ddd9915b13197c2a2eb97e965005b8b5b8d987

877a88c5

03 Jun, 2022 1 commit

Remove possible manual seeds from test files. (#2436) · f0bc00c9

Sean Kim authored Jun 03, 2022

Summary:
For test files where applicable, removed manual seeds where applicable. Refactoring https://github.com/pytorch/audio/issues/2267

Pull Request resolved: https://github.com/pytorch/audio/pull/2436

Reviewed By: carolineechen

Differential Revision: D36896854

Pulled By: skim0514

fbshipit-source-id: 7b4dd8a8dbfbef271f5cc56564dc83a760407e6c

f0bc00c9

02 Jun, 2022 3 commits

Update QUESST14 getitem (#2435) · ceee6912

Caroline Chen authored Jun 02, 2022

Summary:
update QUESST14 getitem to include docstrings and additionally return sample rate

Pull Request resolved: https://github.com/pytorch/audio/pull/2435

Reviewed By: nateanl

Differential Revision: D36864254

Pulled By: carolineechen

fbshipit-source-id: 9e68bbc5de27ad2f32f6b298414103c4f6784801

ceee6912

Remove mad (#2428) · d2ecba98

moto authored Jun 02, 2022

Summary:
Remove the code related to libmad, which had been disabled in https://github.com/pytorch/audio/issues/2354

In https://github.com/pytorch/audio/issues/2419, we mp3 decoding to ffmpeg. But CI tests were still using libmad.
This commit completely removes libmad from torchaudio.

This is BC-breaking change as `apply_sox_effects_file` function cannot handle MP3, and it cannot fallback to ffmpeg.
The workaround for this is to use `torchaudio.load` then `apply_sox_effects_tensor`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2428

Reviewed By: carolineechen

Differential Revision: D36851805

Pulled By: mthrok

fbshipit-source-id: f98795c59a1ac61cef511f2bbeac37f7c3c69d55

d2ecba98

Use FFmpeg-based I/O as fallback in sox_io backend (#2419) · 19c60a08

moto authored Jun 01, 2022

Summary:
This commit add fallback mechanism to `info` and `load` functions of sox_io backend.
If torchaudio is compiled to use FFmpeg, and runtime dependencies are properly loaded,
in case `info` and `load` fail, it fallback to FFmpeg-based implementation.

BC-breaking changes:
 - FFmpeg does not report the number of frames for MP3, this is because MP3 does not store the information of the number of frames. It can be estimated from the audio duration and sample rate, but it might be inaccurate, so we keep it 0.

Depends on
- https://github.com/pytorch/audio/issues/2416
- https://github.com/pytorch/audio/issues/2417
- https://github.com/pytorch/audio/issues/2418
- https://github.com/pytorch/audio/issues/2423
- https://github.com/pytorch/audio/issues/2427

Pull Request resolved: https://github.com/pytorch/audio/pull/2419

Reviewed By: carolineechen

Differential Revision: D36740306

Pulled By: mthrok

fbshipit-source-id: 9e2ad095b8b39e41404970de0d8d9b5aaa856c97

19c60a08

01 Jun, 2022 3 commits

Tweak StreamReader error messages and tests (#2429) · 5d86054a

moto authored Jun 01, 2022

Summary:
* Update error messages
* Update audio stream tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2429

Reviewed By: carolineechen, nateanl

Differential Revision: D36812769

Pulled By: mthrok

fbshipit-source-id: 7a51d0c4dbae558010d2e59412333e4a7f00d318

5d86054a

Move Seed to Setup (#2425) · ac82bdc4

Sean Kim authored Jun 01, 2022

Summary:
Bringing in move seed commit from previous open commit https://github.com/pytorch/audio/issues/2267. Organizes seed to utils.

Pull Request resolved: https://github.com/pytorch/audio/pull/2425

Reviewed By: carolineechen, nateanl

Differential Revision: D36787599

Pulled By: skim0514

fbshipit-source-id: 37a0d632d13d4336a830c4b98bdb04828ed88c20

ac82bdc4

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

31 May, 2022 1 commit

Fail on Python if sox_io info/load does not succeed (#2423) · b56f60bf

moto authored May 31, 2022

Summary:
Extracted from https://github.com/pytorch/audio/issues/2419. Move the failure of sox_io from C++ to Python layer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2423

Reviewed By: carolineechen

Differential Revision: D36766152

Pulled By: mthrok

fbshipit-source-id: 53f897a608e97b81ebe5df29577374d88ce178f3

b56f60bf

29 May, 2022 1 commit

Update source info (#2418) · bb77cbeb

moto authored May 28, 2022

Summary:
Add num_frames and bits_per_sample to match with the current
`torchaudio.info` capability.

Pull Request resolved: https://github.com/pytorch/audio/pull/2418

Reviewed By: carolineechen

Differential Revision: D36749077

Pulled By: mthrok

fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713

bb77cbeb

23 May, 2022 2 commits

Add assertion checks to multi-channel functions (#2401) · 38e530d7

Zhaoheng Ni authored May 23, 2022

Summary:
- The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
- The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
- The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
- The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2401

Reviewed By: carolineechen

Differential Revision: D36597689

Pulled By: nateanl

fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6

38e530d7

Add LibriLightLimited dataset (#2302) · af9cab3b

Zhaoheng Ni authored May 23, 2022

Summary:
The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
It contains "10 min", "1 hour", "10 hour" splits.

Pull Request resolved: https://github.com/pytorch/audio/pull/2302

Reviewed By: mthrok

Differential Revision: D36388188

Pulled By: nateanl

fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249

af9cab3b

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d