Commits · 6ecc11c2eb5c6fc60bddf43d59d06842fb2ab015 · OpenDAS / Torchaudio

03 Aug, 2022 2 commits

Add HDEMUCS_HIGH_MUSDB (#2601) · 6ecc11c2

Sean Kim authored Aug 03, 2022

Summary:
Add new model pretrained weights and tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2601

Reviewed By: carolineechen, nateanl

Differential Revision: D38396673

Pulled By: skim0514

fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1

6ecc11c2

An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a

bshall authored Aug 03, 2022

Summary:
I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
- I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
- I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
- I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
- I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?

I hope this is helpful! looking forward to hearing from you.

Pull Request resolved: https://github.com/pytorch/audio/pull/2472

Reviewed By: hwangjeff

Differential Revision: D38389155

Pulled By: carolineechen

fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904

946b180a

28 Jul, 2022 2 commits

Add Union normalization parameter on spectrogram and inverse spectrogram (#2554) · 0fde7c57

Sean Kim authored Jul 28, 2022

Summary:
Add str to normalized parameter to enable frame_length based normalization to align with torch implementation of stft. Addresses issue https://github.com/pytorch/audio/issues/2104

Pull Request resolved: https://github.com/pytorch/audio/pull/2554

Reviewed By: carolineechen, mthrok

Differential Revision: D38247554

Pulled By: skim0514

fbshipit-source-id: c243c7a6b8fda2a1e565cef4600f7c5a06baf602

0fde7c57

Change docstring for easier understanding (#2570) · 338e3104

Sean Kim authored Jul 28, 2022

Summary:
Edit factory function's docstrings.

Pull Request resolved: https://github.com/pytorch/audio/pull/2570

Reviewed By: carolineechen

Differential Revision: D38250369

Pulled By: skim0514

fbshipit-source-id: fa777e37d7cc517cf4ff1842d5585bf36558f50a

338e3104

26 Jul, 2022 1 commit

New Pipeline edits for HDemucs (#2565) · 4c4da32c

Sean Kim authored Jul 25, 2022

Summary:
Created new branch and brought in commits due to rebasing issues, resolved conflicts on new branch, close old branch.

Pull Request resolved: https://github.com/pytorch/audio/pull/2565

Reviewed By: nateanl, mthrok

Differential Revision: D38131189

Pulled By: skim0514

fbshipit-source-id: 96531480cf50562944abb28d70879f21b4609f15

4c4da32c

25 Jul, 2022 1 commit

Integration test fix deleting temporary directory (#2569) · 8dcf06ac

Sean Kim authored Jul 25, 2022

Summary:
Previous Issue: --use-tmp-hub-dir expected the temp directories used to store large file to be deleted after each test case, but pytest erases directories after 3 full test sessions. This commit fixes by manually deleting a new subdirectory created in each test case. https://github.com/pytorch/audio/pull/2565#discussion_r929007101

Pull Request resolved: https://github.com/pytorch/audio/pull/2569

Reviewed By: nateanl

Differential Revision: D38117848

Pulled By: skim0514

fbshipit-source-id: 3767cb8df1238fd6218f6aaa58d5d583cea72699

8dcf06ac

22 Jul, 2022 1 commit

Add documents for SourceSeparationBundle (#2559) · 6cee56ab

Zhaoheng Ni authored Jul 22, 2022

Summary:
- Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
- Add citation of Libri2Mix dataset in the bundle documentation.
- url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.

Pull Request resolved: https://github.com/pytorch/audio/pull/2559

Reviewed By: carolineechen

Differential Revision: D38036116

Pulled By: nateanl

fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836

6cee56ab

21 Jul, 2022 1 commit

Add SourceSeparationBundle to prototype (#2440) · 83362580

Zhaoheng Ni authored Jul 20, 2022

Summary:
- Add SourceSeparationBundle class for source separation pipeline
- Add `CONVTASNET_BASE_LIBRI2MIX` that is trained on Libri2Mix dataset.
- Add integration test with example mixture audio and expected scale-invariant signal-to-distortion ratio (Si-SDR) score. The test computes the Si-SDR score with permutation-invariant training (PIT) criterion for all permutations of sources and use the highest value as the final output. The test verifies if the score is equal to or larger than the expected value.

Pull Request resolved: https://github.com/pytorch/audio/pull/2440

Reviewed By: mthrok

Differential Revision: D37997646

Pulled By: nateanl

fbshipit-source-id: c951bcbbe8b7ed9553cb8793d6dc1ef90d5a29fe

83362580

19 Jul, 2022 1 commit

Adding pipeline changes, factory functions to HDemucs (#2547) · 62854588

Sean Kim authored Jul 19, 2022

Summary:
Factory functions have been added to HDemucs class and test the implementation within the testing files.

Pull Request resolved: https://github.com/pytorch/audio/pull/2547

Reviewed By: carolineechen

Differential Revision: D37948600

Pulled By: skim0514

fbshipit-source-id: 7ac4e4a71519450cfbbc24ff7d7e70521f676040

62854588

12 Jul, 2022 1 commit

Hybrid Demucs model implementation (#2506) · 608b8ea6

Sean Kim authored Jul 12, 2022

Summary:
Draft PR with initial model implementation with minor changes from previous implementation

Pull Request resolved: https://github.com/pytorch/audio/pull/2506

Reviewed By: nateanl

Differential Revision: D37762671

Pulled By: skim0514

fbshipit-source-id: b7dc0a6ef725d6ae6d76c23c882623f7d339977c

608b8ea6

07 Jul, 2022 1 commit

Add YUV444P support to StreamReader (#2516) · b2a90f91

moto authored Jul 06, 2022

Summary:
This commit add support for `"yuv444p"` type as output format of StreamReader.

Pull Request resolved: https://github.com/pytorch/audio/pull/2516

Reviewed By: hwangjeff

Differential Revision: D37659715

Pulled By: mthrok

fbshipit-source-id: eae9b5590d8f138a6ebf3808c08adfe068f11a2b

b2a90f91

06 Jul, 2022 1 commit

Fix fluent test for windows (#2510) · 09daa438

Caroline Chen authored Jul 05, 2022

Summary:
fluent dataset test currently fails on windows, due to new line generation in csv writer in testing and incorrect path parsing in dataset impl.

Pull Request resolved: https://github.com/pytorch/audio/pull/2510

Reviewed By: carolineechen

Differential Revision: D37573203

Pulled By: mthrok

fbshipit-source-id: 4868bc649690c7e596b002686c6128ce735d3564

09daa438

28 Jun, 2022 1 commit

Refactor AVDictionary clean up (#2507) · 0ad03adf

moto authored Jun 27, 2022

Summary:
Small clean up in ffmpeg binding code.

1. Make `get_option_dict` and `clean_up_dict` public utility
2. Merge the exception into `clean_up_dict`
3. Get rid of custom string join function and use `c10::Join`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2507

Reviewed By: hwangjeff

Differential Revision: D37466022

Pulled By: mthrok

fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969

0ad03adf

27 Jun, 2022 4 commits

Add missing __init__ in io test directory (#2511) · d50ed521

moto authored Jun 27, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2511

Reviewed By: nateanl

Differential Revision: D37461021

Pulled By: mthrok

fbshipit-source-id: 6f894c02bbefc5afda0f9584d26ad785f7c71ee4

d50ed521

Fix download links of RNNT pipelines in prototype (#2444) · 9b4ee17c

Zhaoheng Ni authored Jun 27, 2022

Summary:
In https://github.com/pytorch/audio/issues/2283, torchaudio's downloading function is updated to reduce code duplication. The links in `EMFORMER_RNNT_BASE_LIBRISPEECH` are updated, but the ones in prototype pipelines are not. This PR addresses it by updating the download links of `EMFORMER_RNNT_BASE_MUSTC` and `EMFORMER_RNNT_BASE_TEDLIUM3` in prototype. Corresponding integration tests are added as well.

Pull Request resolved: https://github.com/pytorch/audio/pull/2444

Reviewed By: mthrok

Differential Revision: D37389178

Pulled By: nateanl

fbshipit-source-id: 46598dd71c95be47d1e1b54cef89ea51d280e17a

9b4ee17c

Add utility function to fetch FFmpeg library versions (#2467) · 4ba7dc38

moto authored Jun 27, 2022

Summary:
Follow-up of https://github.com/pytorch/audio/issues/2464. Add utility function to fetch the versions of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/2467

Reviewed By: carolineechen

Differential Revision: D37028006

Pulled By: mthrok

fbshipit-source-id: 72adce1e6b43985760ce55b715b0e59af5244fdb

4ba7dc38

Add VoxCeleb1 dataset (#2349) · 21b2d139

Zhaoheng Ni authored Jun 27, 2022

Summary:
This PR adds two dataset classes of VoxCeleb1 corpus.
- `VoxCeleb1Identification`
Each data sample contains the waveform, sample rate, speaker id, and the file id.
- `VoxCeleb1Verification`
Each data sample contains a pair of waveforms, sample rate, the label indicating if they are from the same speaker, and the file ids.

Pull Request resolved: https://github.com/pytorch/audio/pull/2349

Reviewed By: carolineechen

Differential Revision: D35927921

Pulled By: nateanl

fbshipit-source-id: 3e07ddd329178777698841565053eb59befe6449

21b2d139

23 Jun, 2022 1 commit

[AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · fee994ce

CodemodService FBSourceBlackLinterBot authored Jun 23, 2022

Summary:
Meta:
**If you take no action, this diff will be automatically accepted on 2022-06-23.**
(To remove yourself from auto-accept diffs and just let them all land, add yourself to [this Butterfly rule](https://www.internalfb.com/butterfly/rule/904302247110220))

Produced by `tools/arcanist/lint/codemods/black-fbsource`.

#nocancel

Rules run:
- CodemodTransformerSimpleShell

Config Oncall: [lint](https://our.intern.facebook.com/intern/oncall3/?shortname=lint)
CodemodConfig: [CodemodConfigFBSourceBlackLinter](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/fbsource_arc_f/CodemodConfigFBSourceBlackLinter.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/13510799586951394/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).

_____

## Questions / Comments / Feedback?

**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=13510799586951394).**

* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.

drop-conflicts

Reviewed By: adamjernst

Differential Revision: D37375235

fbshipit-source-id: 3d7eb39e5c0539a78d1412f37562dec90b0fc759

fee994ce

21 Jun, 2022 1 commit

Create musdb handler and tests (#2484) · b92a8a09

Sean Kim authored Jun 21, 2022

Summary:
Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks.

Pull Request resolved: https://github.com/pytorch/audio/pull/2484

Reviewed By: carolineechen, nateanl

Differential Revision: D37250556

Pulled By: skim0514

fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51

b92a8a09

20 Jun, 2022 1 commit

Add fluent speech commands (#2480) · 66a67d2e

Caroline Chen authored Jun 20, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2480

Reviewed By: nateanl

Differential Revision: D37249571

Pulled By: carolineechen

fbshipit-source-id: caefeec4253c91f2579655a0c1735edaeed51be9

66a67d2e

13 Jun, 2022 1 commit
- [AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · 71ed457e
  CodemodService FBSourceBlackLinterBot authored Jun 13, 2022
```
Reviewed By: ivanmurashko

Differential Revision: D37103342

fbshipit-source-id: adc908c790a413384bd88a75d3c2b4b0974c6674
```
  71ed457e
10 Jun, 2022 1 commit

Modifying Pitchshift for faster resampling (#2441) · df2262b5

Sean Kim authored Jun 10, 2022

Summary:
Split existing Pitchshift into multiple helper functions in order to cache kernel and speed up overall process addressing https://github.com/pytorch/audio/issues/2359.
Existing unit tests pass.

edit: functional and transforms unit test pass. Adopted lazy initialization to avoid BC-breaking.

Pull Request resolved: https://github.com/pytorch/audio/pull/2441

Reviewed By: carolineechen

Differential Revision: D36905582

Pulled By: skim0514

fbshipit-source-id: 6780db3ac8a29d59017a6abe7e82ce1fd17aaac2

df2262b5

08 Jun, 2022 2 commits

Fix metadata fetch (#2464) · 4d2fa190

moto authored Jun 08, 2022

Summary:
In https://github.com/pytorch/audio/issues/2461, `metadata` field was added to StreamInfo.
However, the value attached to this new field was source-level metadata,
while each stream can have different metadata.

* source level metadata
[AVFormatContext->metadata](https://ffmpeg.org/doxygen/4.1/structAVFormatContext.html#a3019a56080ed2e3297ff25bc2ff88adf)
* stream level metadata
[AVFormatContext->streams[]->metadata](https://ffmpeg.org/doxygen/4.1/structAVStream.html#a50d250a128a3da9ce3d135e84213fb82)

This commit moves source level metadata to dedicated method, `get_metadata`, and
fix the stream-level metadata to report stream metadata.

Pull Request resolved: https://github.com/pytorch/audio/pull/2464

Reviewed By: hwangjeff, xiaohui-zhang

Differential Revision: D36995452

Pulled By: mthrok

fbshipit-source-id: 534be1f7feb07790a0ce8624c336cdb7b65a8697

4d2fa190

Add metadata to source stream info (#2461) · 10d1bd89

moto authored Jun 07, 2022

Summary:
Add metadata, such as ID3 (https://github.com/pytorch/audio/commit/7d98db0567cb60fabcc173949b8c08e3a3487ac2)tag to `StreamReaderSourceAudioStream`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2461

Reviewed By: hwangjeff

Differential Revision: D36985656

Pulled By: mthrok

fbshipit-source-id: e66f9e6e980eb57c378cc643a8979b6b7813dae7

10d1bd89

07 Jun, 2022 1 commit

Update smoke test (#2455) · d2d8b670

moto authored Jun 06, 2022

Summary:
Import StreamReader from the new location

Pull Request resolved: https://github.com/pytorch/audio/pull/2455

Reviewed By: nateanl

Differential Revision: D36959668

Pulled By: mthrok

fbshipit-source-id: c2b8c9f9dff1ec306ea39c495294faa9208b3c4e

d2d8b670

04 Jun, 2022 1 commit

Make FFmpeg log level configurable (#2439) · 877a88c5

moto authored Jun 03, 2022

Summary:
Undesired logs are one of the loudest UX complains we get.
Yet, loading media files involves uncertainty which is
difficult to debug without debug log.

This commit introduces utility functions to configure logging level
so that we can ask users to enable it when they encounter an issue,
while defaulting to non-verbose option.

Pull Request resolved: https://github.com/pytorch/audio/pull/2439

Reviewed By: hwangjeff, xiaohui-zhang

Differential Revision: D36903763

Pulled By: mthrok

fbshipit-source-id: f4ddd9915b13197c2a2eb97e965005b8b5b8d987

877a88c5

03 Jun, 2022 1 commit

Remove possible manual seeds from test files. (#2436) · f0bc00c9

Sean Kim authored Jun 03, 2022

Summary:
For test files where applicable, removed manual seeds where applicable. Refactoring https://github.com/pytorch/audio/issues/2267

Pull Request resolved: https://github.com/pytorch/audio/pull/2436

Reviewed By: carolineechen

Differential Revision: D36896854

Pulled By: skim0514

fbshipit-source-id: 7b4dd8a8dbfbef271f5cc56564dc83a760407e6c

f0bc00c9

02 Jun, 2022 3 commits

Update QUESST14 getitem (#2435) · ceee6912

Caroline Chen authored Jun 02, 2022

Summary:
update QUESST14 getitem to include docstrings and additionally return sample rate

Pull Request resolved: https://github.com/pytorch/audio/pull/2435

Reviewed By: nateanl

Differential Revision: D36864254

Pulled By: carolineechen

fbshipit-source-id: 9e68bbc5de27ad2f32f6b298414103c4f6784801

ceee6912

Remove mad (#2428) · d2ecba98

moto authored Jun 02, 2022

Summary:
Remove the code related to libmad, which had been disabled in https://github.com/pytorch/audio/issues/2354

In https://github.com/pytorch/audio/issues/2419, we mp3 decoding to ffmpeg. But CI tests were still using libmad.
This commit completely removes libmad from torchaudio.

This is BC-breaking change as `apply_sox_effects_file` function cannot handle MP3, and it cannot fallback to ffmpeg.
The workaround for this is to use `torchaudio.load` then `apply_sox_effects_tensor`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2428

Reviewed By: carolineechen

Differential Revision: D36851805

Pulled By: mthrok

fbshipit-source-id: f98795c59a1ac61cef511f2bbeac37f7c3c69d55

d2ecba98

Use FFmpeg-based I/O as fallback in sox_io backend (#2419) · 19c60a08

moto authored Jun 01, 2022

Summary:
This commit add fallback mechanism to `info` and `load` functions of sox_io backend.
If torchaudio is compiled to use FFmpeg, and runtime dependencies are properly loaded,
in case `info` and `load` fail, it fallback to FFmpeg-based implementation.

BC-breaking changes:
 - FFmpeg does not report the number of frames for MP3, this is because MP3 does not store the information of the number of frames. It can be estimated from the audio duration and sample rate, but it might be inaccurate, so we keep it 0.

Depends on
- https://github.com/pytorch/audio/issues/2416
- https://github.com/pytorch/audio/issues/2417
- https://github.com/pytorch/audio/issues/2418
- https://github.com/pytorch/audio/issues/2423
- https://github.com/pytorch/audio/issues/2427

Pull Request resolved: https://github.com/pytorch/audio/pull/2419

Reviewed By: carolineechen

Differential Revision: D36740306

Pulled By: mthrok

fbshipit-source-id: 9e2ad095b8b39e41404970de0d8d9b5aaa856c97

19c60a08

01 Jun, 2022 3 commits

Tweak StreamReader error messages and tests (#2429) · 5d86054a

moto authored Jun 01, 2022

Summary:
* Update error messages
* Update audio stream tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2429

Reviewed By: carolineechen, nateanl

Differential Revision: D36812769

Pulled By: mthrok

fbshipit-source-id: 7a51d0c4dbae558010d2e59412333e4a7f00d318

5d86054a

Move Seed to Setup (#2425) · ac82bdc4

Sean Kim authored Jun 01, 2022

Summary:
Bringing in move seed commit from previous open commit https://github.com/pytorch/audio/issues/2267. Organizes seed to utils.

Pull Request resolved: https://github.com/pytorch/audio/pull/2425

Reviewed By: carolineechen, nateanl

Differential Revision: D36787599

Pulled By: skim0514

fbshipit-source-id: 37a0d632d13d4336a830c4b98bdb04828ed88c20

ac82bdc4

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

31 May, 2022 1 commit

Fail on Python if sox_io info/load does not succeed (#2423) · b56f60bf

moto authored May 31, 2022

Summary:
Extracted from https://github.com/pytorch/audio/issues/2419. Move the failure of sox_io from C++ to Python layer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2423

Reviewed By: carolineechen

Differential Revision: D36766152

Pulled By: mthrok

fbshipit-source-id: 53f897a608e97b81ebe5df29577374d88ce178f3

b56f60bf

29 May, 2022 1 commit

Update source info (#2418) · bb77cbeb

moto authored May 28, 2022

Summary:
Add num_frames and bits_per_sample to match with the current
`torchaudio.info` capability.

Pull Request resolved: https://github.com/pytorch/audio/pull/2418

Reviewed By: carolineechen

Differential Revision: D36749077

Pulled By: mthrok

fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713

bb77cbeb

23 May, 2022 2 commits

Add assertion checks to multi-channel functions (#2401) · 38e530d7

Zhaoheng Ni authored May 23, 2022

Summary:
- The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
- The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
- The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
- The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2401

Reviewed By: carolineechen

Differential Revision: D36597689

Pulled By: nateanl

fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6

38e530d7

Add LibriLightLimited dataset (#2302) · af9cab3b

Zhaoheng Ni authored May 23, 2022

Summary:
The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
It contains "10 min", "1 hour", "10 hour" splits.

Pull Request resolved: https://github.com/pytorch/audio/pull/2302

Reviewed By: mthrok

Differential Revision: D36388188

Pulled By: nateanl

fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249

af9cab3b

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

20 May, 2022 1 commit

Refactor LibriSpeech tests to accommodate different dataset classes (#2392) · 010583b6

Jeff Hwang authored May 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2392

Refactors LibriSpeech tests to accommodate different dataset classes

Reviewed By: xiaohui-zhang

Differential Revision: D36387835

fbshipit-source-id: 73b4e7565b4a077b25f036f4bd854ac7f2194b28

010583b6

19 May, 2022 1 commit

Refactor Streamer implementation (#2402) · eed57534

moto authored May 19, 2022

Summary:
* Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
* Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
* Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†

† Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.

Ref https://github.com/pytorch/audio/issues/2400

Pull Request resolved: https://github.com/pytorch/audio/pull/2402

Reviewed By: nateanl

Differential Revision: D36488808

Pulled By: mthrok

fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047

eed57534