Commits · 3229fc55052147e7f59e2471cc0123f717cd9913 · OpenDAS / Torchaudio

04 Jun, 2022 1 commit

Update CTC decoder docs (#2443) · 3229fc55

Caroline Chen authored Jun 03, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2443

Reviewed By: nateanl

Differential Revision: D36909822

Pulled By: carolineechen

fbshipit-source-id: ef3ab2345e7a4666cf29dd02c83d03504e8aa62c

3229fc55

03 Jun, 2022 5 commits

Update audio data augmentation tutorial (#2388) · 41082eb0

moto authored Jun 03, 2022

Summary:
- Adopt `torchaudio.utils.download_asset` to simplify asset management.
- Break down the first section about helper functions.
- Reduce the number of helper functions

https://output.circle-artifacts.com/output/job/d7dd1b93-6dfe-46da-a080-109bfdc63881/artifacts/0/docs/tutorials/audio_data_augmentation_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2388

Reviewed By: carolineechen

Differential Revision: D36404405

Pulled By: mthrok

fbshipit-source-id: f460ed810519797fce6e2fa7baaee110bddd1d06

41082eb0

Update audio resampling tutorial (#2386) · fd2be89a

moto authored Jun 03, 2022

Summary:
- Replace mis-use of plot_specgram with plot_sweep, and remove plot_specgram
- Move `benchmark_resample` to later section

https://output.circle-artifacts.com/output/job/9f7af187-777d-4d75-840f-2630a36295b7/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2386

Reviewed By: carolineechen

Differential Revision: D36404403

Pulled By: mthrok

fbshipit-source-id: f9df8453e3f531bdc4549b0134e5dbba90653bf7

fd2be89a

Update audio feature extraction tutorial (#2391) · 8e20d546

moto authored Jun 03, 2022

Summary:
- Adopt torchaudio.utils.download_asset to simplify asset management.
- Break down the first section about helper functions.
- Reduce the number of helper functions

Pull Request resolved: https://github.com/pytorch/audio/pull/2391

Reviewed By: carolineechen, nateanl

Differential Revision: D36885626

Pulled By: mthrok

fbshipit-source-id: 1306f22ab70ab1e7f74ed7e43bf43150015448b6

8e20d546

Remove possible manual seeds from test files. (#2436) · f0bc00c9

Sean Kim authored Jun 03, 2022

Summary:
For test files where applicable, removed manual seeds where applicable. Refactoring https://github.com/pytorch/audio/issues/2267

Pull Request resolved: https://github.com/pytorch/audio/pull/2436

Reviewed By: carolineechen

Differential Revision: D36896854

Pulled By: skim0514

fbshipit-source-id: 7b4dd8a8dbfbef271f5cc56564dc83a760407e6c

f0bc00c9

Refactor M1 logic and fix version (#2438) · b68864ca

Andrey Talman authored Jun 03, 2022

Summary:
Refactor M1 logic
These improvement introduced in following PR: https://github.com/pytorch/vision/pull/6117

Pull Request resolved: https://github.com/pytorch/audio/pull/2438

Reviewed By: nateanl

Differential Revision: D36896028

Pulled By: atalman

fbshipit-source-id: 2ce360bfa78b2a7c77d5d4db800d487d171831a9

b68864ca

02 Jun, 2022 5 commits

Retrieve version from version.txt (#2434) · c05498c8

Andrey Talman authored Jun 02, 2022

Summary:
Retrieve version from version.txt
These improvement introduced in following PR: https://github.com/pytorch/vision/pull/6117
In addition to this we add version.txt file to help us manage torchaudio version

Pull Request resolved: https://github.com/pytorch/audio/pull/2434

Reviewed By: mthrok

Differential Revision: D36867886

Pulled By: atalman

fbshipit-source-id: 14b6d653e46489d8db1c5ae2016a8202c632861e

c05498c8

Update QUESST14 getitem (#2435) · ceee6912

Caroline Chen authored Jun 02, 2022

Summary:
update QUESST14 getitem to include docstrings and additionally return sample rate

Pull Request resolved: https://github.com/pytorch/audio/pull/2435

Reviewed By: nateanl

Differential Revision: D36864254

Pulled By: carolineechen

fbshipit-source-id: 9e68bbc5de27ad2f32f6b298414103c4f6784801

ceee6912

Remove mad (#2428) · d2ecba98

moto authored Jun 02, 2022

Summary:
Remove the code related to libmad, which had been disabled in https://github.com/pytorch/audio/issues/2354

In https://github.com/pytorch/audio/issues/2419, we mp3 decoding to ffmpeg. But CI tests were still using libmad.
This commit completely removes libmad from torchaudio.

This is BC-breaking change as `apply_sox_effects_file` function cannot handle MP3, and it cannot fallback to ffmpeg.
The workaround for this is to use `torchaudio.load` then `apply_sox_effects_tensor`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2428

Reviewed By: carolineechen

Differential Revision: D36851805

Pulled By: mthrok

fbshipit-source-id: f98795c59a1ac61cef511f2bbeac37f7c3c69d55

d2ecba98

Update MVDR beamforming tutorial (#2398) · d01f5891

Zhaoheng Ni authored Jun 01, 2022

Summary:
- Use `download_asset` to download audios.
- Replace `MVDR` module with new-added `SoudenMVDR` and `RTFMVDR` modules.
- Benchmark performances of `F.rtf_evd` and `F.rtf_power` for RTF computation.
- Visualize the spectrograms and masks.

Pull Request resolved: https://github.com/pytorch/audio/pull/2398

Reviewed By: carolineechen

Differential Revision: D36549402

Pulled By: nateanl

fbshipit-source-id: dfd6754e6c33246e6991ccc51c4603b12502a1b5

d01f5891

Use FFmpeg-based I/O as fallback in sox_io backend (#2419) · 19c60a08

moto authored Jun 01, 2022

Summary:
This commit add fallback mechanism to `info` and `load` functions of sox_io backend.
If torchaudio is compiled to use FFmpeg, and runtime dependencies are properly loaded,
in case `info` and `load` fail, it fallback to FFmpeg-based implementation.

BC-breaking changes:
 - FFmpeg does not report the number of frames for MP3, this is because MP3 does not store the information of the number of frames. It can be estimated from the audio duration and sample rate, but it might be inaccurate, so we keep it 0.

Depends on
- https://github.com/pytorch/audio/issues/2416
- https://github.com/pytorch/audio/issues/2417
- https://github.com/pytorch/audio/issues/2418
- https://github.com/pytorch/audio/issues/2423
- https://github.com/pytorch/audio/issues/2427

Pull Request resolved: https://github.com/pytorch/audio/pull/2419

Reviewed By: carolineechen

Differential Revision: D36740306

Pulled By: mthrok

fbshipit-source-id: 9e2ad095b8b39e41404970de0d8d9b5aaa856c97

19c60a08

01 Jun, 2022 8 commits

Raising RuntimeErrors when datasets missing (#2430) · a61b90c2

Sean Kim authored Jun 01, 2022

Summary:
Checks download flag and raises error when dataset is missing given download flag exists. Unit tested manually.

edit: Changed path to check as well as comment that is returned.

Pull Request resolved: https://github.com/pytorch/audio/pull/2430

Reviewed By: carolineechen

Differential Revision: D36815729

Pulled By: skim0514

fbshipit-source-id: f062db7919271665b88ec9754d85cfa83b4f6fa3

a61b90c2

Disable OpenMP on mac (#2431) · 6e563839

moto authored Jun 01, 2022

Summary:
A couple of weeks ago we started to see OpenMP not found error on macOS CI.
From https://github.com/pytorch/audio/issues/2404, we install OpenMP from brew, and build passes, but unit tests are seg-faulting ever since.

https://app.circleci.com/pipelines/github/pytorch/audio/10825/workflows/c0ecae99-d409-4df2-ab91-9bcb126c309d/jobs/671518

The failing test uses `torchaudio.functional.filitfilt`, which uses [OpenMP for parallel execution](https://github.com/pytorch/audio/blob/6057d3cf1c2f3a4c5072a3853a021bb8b4ce61f7/torchaudio/csrc/lfilter.cpp#L20).

This commit reverts https://github.com/pytorch/audio/issues/2404 and disables OpenMP for macOS builds and tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2431

Reviewed By: atalman

Differential Revision: D36819141

Pulled By: mthrok

fbshipit-source-id: 824300866a55f8b029d21649dc96cd80ae2ff697

6e563839

Tweak StreamReader error messages and tests (#2429) · 5d86054a

moto authored Jun 01, 2022

Summary:
* Update error messages
* Update audio stream tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2429

Reviewed By: carolineechen, nateanl

Differential Revision: D36812769

Pulled By: mthrok

fbshipit-source-id: 7a51d0c4dbae558010d2e59412333e4a7f00d318

5d86054a

Move Seed to Setup (#2425) · ac82bdc4

Sean Kim authored Jun 01, 2022

Summary:
Bringing in move seed commit from previous open commit https://github.com/pytorch/audio/issues/2267. Organizes seed to utils.

Pull Request resolved: https://github.com/pytorch/audio/pull/2425

Reviewed By: carolineechen, nateanl

Differential Revision: D36787599

Pulled By: skim0514

fbshipit-source-id: 37a0d632d13d4336a830c4b98bdb04828ed88c20

ac82bdc4

Dataset doc fixes (#2426) · 94653bf4

Caroline Chen authored Jun 01, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2426

Reviewed By: nateanl

Differential Revision: D36791423

Pulled By: carolineechen

fbshipit-source-id: e011147a716c940755032b8c68f5717d11fc91bf

94653bf4

Add conv_tasnet_base factory function to prototype (#2411) · 6057d3cf

Zhaoheng Ni authored Jun 01, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2411

Reviewed By: carolineechen

Differential Revision: D36663904

Pulled By: nateanl

fbshipit-source-id: c6a7dd530c9cfbb58b7121ebe02db6ae293cc2d0

6057d3cf

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

Move FileObj to dedicated source (#2427) · b374cc7b

moto authored May 31, 2022

Summary:
Extract from https://github.com/pytorch/audio/issues/2419. Move the `FileObj` definition to dedicated file, so that it can be reused from files other than StreamReader.

Pull Request resolved: https://github.com/pytorch/audio/pull/2427

Reviewed By: carolineechen

Differential Revision: D36794367

Pulled By: mthrok

fbshipit-source-id: 999658f3f4d833566d933c9223e7a5d49d300574

b374cc7b

31 May, 2022 2 commits

Fail on Python if sox_io info/load does not succeed (#2423) · b56f60bf

moto authored May 31, 2022

Summary:
Extracted from https://github.com/pytorch/audio/issues/2419. Move the failure of sox_io from C++ to Python layer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2423

Reviewed By: carolineechen

Differential Revision: D36766152

Pulled By: mthrok

fbshipit-source-id: 53f897a608e97b81ebe5df29577374d88ce178f3

b56f60bf

Adding m1 builds to torchaudio (#2421) · c209b70d

Andrey Talman authored May 30, 2022

Summary:
This PR adds M1 wheel builds for torchaudio
Based on this PR: https://github.com/pytorch/vision/pull/5948
And this Builder [script](https://github.com/pytorch/builder/blob/main/build_m1_domains.sh)

Pull Request resolved: https://github.com/pytorch/audio/pull/2421

Reviewed By: mthrok

Differential Revision: D36767469

Pulled By: atalman

fbshipit-source-id: 9fc3b74b50ee669a230302fd27682702f83f63dc

c209b70d

30 May, 2022 1 commit

Pin test tool versions in CI (#2422) · 22a5d084

moto authored May 30, 2022

Summary:
All the unittests jobs are failing due to import error due to protobuf and scipy.
This commit pins the versions of them to an older version.

## protobuf

https://app.circleci.com/pipelines/github/pytorch/audio/10979/workflows/42005226-ca7e-471c-80f4-db09f4bd2089/jobs/692078

```
E   TypeError: Descriptors cannot not be created directly.
E   If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
E   If you cannot immediately regenerate your protos, some other possible workarounds are:
E    1. Downgrade the protobuf package to 3.20.x or lower.
E    2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
E
E   More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
```

https://github.com/protocolbuffers/protobuf/issues/10051
https://github.com/PyTorchLightning/pytorch-lightning/issues/13159

## scipy (pypocketfft)

1.8.1 is causing issue.

https://app.circleci.com/pipelines/github/pytorch/audio/10980/workflows/470a9361-4cc5-4d7c-9264-28fc8b86f1cb/jobs/692267

    ```
    ../env/lib/python3.9/site-packages/librosa/core/audio.py:11: in <module>
        import scipy.signal
    ../env/lib/python3.9/site-packages/scipy/signal/__init__.py:309: in <module>
        from . import _sigtools, windows
    ../env/lib/python3.9/site-packages/scipy/signal/windows/__init__.py:41: in <module>
        from ._windows import *
    ../env/lib/python3.9/site-packages/scipy/signal/windows/_windows.py:7: in <module>
        from scipy import linalg, special, fft as sp_fft
    ../env/lib/python3.9/site-packages/scipy/fft/__init__.py:91: in <module>
        from ._helper import next_fast_len
    ../env/lib/python3.9/site-packages/scipy/fft/_helper.py:3: in <module>
        from ._pocketfft import helper as _helper
    ../env/lib/python3.9/site-packages/scipy/fft/_pocketfft/__init__.py:3: in <module>
        from .basic import *
    ../env/lib/python3.9/site-packages/scipy/fft/_pocketfft/basic.py:6: in <module>
        from . import pypocketfft as pfft
    E   ImportError: /home/circleci/project/env/lib/python3.9/site-packages/torch/lib/../../../.././libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/circleci/project/env/lib/python3.9/site-packages/scipy/fft/_pocketfft/pypocketfft.cpython-39-x86_64-linux-gnu.so)

Pull Request resolved: https://github.com/pytorch/audio/pull/2422

Reviewed By: atalman

Differential Revision: D36764198

Pulled By: mthrok

fbshipit-source-id: 897a79fe9c3165206c2e747147fd0f257fc4f683

22a5d084

29 May, 2022 2 commits

Update source info (#2418) · bb77cbeb

moto authored May 28, 2022

Summary:
Add num_frames and bits_per_sample to match with the current
`torchaudio.info` capability.

Pull Request resolved: https://github.com/pytorch/audio/pull/2418

Reviewed By: carolineechen

Differential Revision: D36749077

Pulled By: mthrok

fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713

bb77cbeb

Change sox_io C++ return type to optional (#2416) · fd7ace17

moto authored May 28, 2022

Summary:
Preparation for upcoming change where load/info function will use fallback
if sox_io backend cannot handle the input.

Pull Request resolved: https://github.com/pytorch/audio/pull/2416

Reviewed By: carolineechen

Differential Revision: D36736969

Pulled By: mthrok

fbshipit-source-id: f804cfda3678f13bf0c2f6557a2f82ae42ae3c03

fd7ace17

28 May, 2022 1 commit

Update I/O initialization (#2417) · 65ab62e6

moto authored May 28, 2022

Summary:
Attempt to load ffmpeg extension at the top level import

Preparation to use ffmpeg-based I/O as a fallback for sox_io backend.

Pull Request resolved: https://github.com/pytorch/audio/pull/2417

Reviewed By: carolineechen

Differential Revision: D36736989

Pulled By: mthrok

fbshipit-source-id: 0beb6f459313b5ea91597393ccb12571444c54d9

65ab62e6

27 May, 2022 1 commit

Refactor Streamer to StreamReader in C++ codebase (#2403) · 9ef6c23d

moto authored May 27, 2022

Summary:
* `Streamer` has been renamed to `StreamReader` when it was moved from prototype to beta.
This commit applies the same name change to the C++ source code.

* Fix miscellaneous lint issues

* Make the code compilable on FFmpeg 5

Pull Request resolved: https://github.com/pytorch/audio/pull/2403

Reviewed By: carolineechen

Differential Revision: D36613053

Pulled By: mthrok

fbshipit-source-id: 69fedd6720d488dadf4dfe7d375ee76d216b215d

9ef6c23d

26 May, 2022 1 commit
- change Adam to AdamW (#2412) · 752de3e4
  nateanl authored May 26, 2022
  
  752de3e4
24 May, 2022 2 commits

Fix documentation (#2409) · 39c2c0a7

moto authored May 24, 2022

Summary:
Follow-up of https://github.com/pytorch/audio/issues/2407, the <script> was not properly closed on pages other than tutorials

Pull Request resolved: https://github.com/pytorch/audio/pull/2409

Reviewed By: carolineechen

Differential Revision: D36632668

Pulled By: mthrok

fbshipit-source-id: 9c0409a8011d77f8689e2dcdc1bd9844d3d31f79

39c2c0a7

Fix documentation (#2407) · 474510f2

moto authored May 24, 2022

Summary:
This commit fixes multiple issues with documentation.

https://output.circle-artifacts.com/output/job/23245537-e57b-4b9d-9b81-b3df20996d1f/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

1. Duplicated requirejs
The nbsphinx extension introduced in https://github.com/pytorch/audio/pull/2393 pulled a requirejs
which caused the initialization script to halt.
As a result, the right side bar was left uninitialized.

2. Undefined variable error
It turned out that PyTorch's theme expected the downstream projects
to define `collapsedSections` variable.
Currently console log shows `collapsedSections is not defined`.
As a result of this fix, we start to see the + symbol on left side.

3. Fix the behavior of default expand
Tweaks the right-side bar initialization behavior
so that expand-all only happens once, not at every resize.

4. Overwrite the link to GitHub
The `GitHub` tab in main-menu always linked PyTorch core.
This commit adds overwrite to torchaudio page

Pull Request resolved: https://github.com/pytorch/audio/pull/2407

Reviewed By: carolineechen

Differential Revision: D36612904

Pulled By: mthrok

fbshipit-source-id: 56aa7623a8925a241cf4790ac77a87424ad9237c

474510f2

23 May, 2022 3 commits

Add assertion checks to multi-channel functions (#2401) · 38e530d7

Zhaoheng Ni authored May 23, 2022

Summary:
- The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
- The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
- The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
- The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2401

Reviewed By: carolineechen

Differential Revision: D36597689

Pulled By: nateanl

fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6

38e530d7

Add LibriLightLimited dataset (#2302) · af9cab3b

Zhaoheng Ni authored May 23, 2022

Summary:
The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
It contains "10 min", "1 hour", "10 hour" splits.

Pull Request resolved: https://github.com/pytorch/audio/pull/2302

Reviewed By: mthrok

Differential Revision: D36388188

Pulled By: nateanl

fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249

af9cab3b

Add recipe for HuBERT model pre-training (#2198) · 48a0c17a

Zhaoheng Ni authored May 23, 2022

Summary:
Replace https://github.com/pytorch/audio/issues/2129

Pull Request resolved: https://github.com/pytorch/audio/pull/2198

Reviewed By: carolineechen

Differential Revision: D36544163

Pulled By: nateanl

fbshipit-source-id: 3f19ba5b0f2c2b9e93b0603c3b4491c1dbc40ef8

48a0c17a

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

20 May, 2022 3 commits

Tweak build doc job to avoid timeout (#2399) · 67762993

moto authored May 20, 2022

Summary:
After https://github.com/pytorch/audio/issues/2395, build_doc job is exceeding default no-output-timeout
threshould (10m).

This commit updates the timeout threshold to 30m.
Also it moves the installation of tools to the previous step.

Pull Request resolved: https://github.com/pytorch/audio/pull/2399

Reviewed By: carolineechen

Differential Revision: D36539022

Pulled By: mthrok

fbshipit-source-id: 391764a0fb5bf87cfb2beaab401a90dcb56493e5

67762993

Refactor LibriSpeech tests to accommodate different dataset classes (#2392) · 010583b6

Jeff Hwang authored May 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2392

Refactors LibriSpeech tests to accommodate different dataset classes

Reviewed By: xiaohui-zhang

Differential Revision: D36387835

fbshipit-source-id: 73b4e7565b4a077b25f036f4bd854ac7f2194b28

010583b6

Add tutorial to use NVDEC with Stream API (#2393) · 07ace387

moto authored May 20, 2022

Summary:
This commit adds tutorial to enable/use NVDEC with Stream API.

https://output.circle-artifacts.com/output/job/19e66a4b-1819-4804-8834-d38e6c80c4fd/artifacts/0/docs/hw_acceleration_tutorial.html

Because the use of NVDEC requires build / install FFmpeg from source,
this tutorial was authored on Google Colab, tailored to its environment.

The tutorial here is the result of the notebook execution, with
the link to the publicly accessible Google Colab notebook.

Pull Request resolved: https://github.com/pytorch/audio/pull/2393

Reviewed By: hwangjeff

Differential Revision: D36404408

Pulled By: mthrok

fbshipit-source-id: 9c820d3db4d06c5b343ecad0708489125ca06948

07ace387

19 May, 2022 2 commits

ci: Install libomp on macos (#2404) · 38cf5b7a

Eli Uriegas authored May 19, 2022

Summary:
To resolve nightly / general build issues relating to OpenMP not being found, see https://hud.pytorch.org/pytorch/audio/commit/c6a376cc5679c1940e49fc3e0ba22eaead6c2467



```
-- Found Torch: /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/torch/lib/libtorch.dylib
CMake Error at /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
Call Stack (most recent call first):
  /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindOpenMP.cmake:544 (find_package_handle_standard_args)
  CMakeLists.txt:131 (find_package)

-- Configuring incomplete, errors occurred!
```
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Pull Request resolved: https://github.com/pytorch/audio/pull/2404

Reviewed By: atalman

Differential Revision: D36495791

Pulled By: seemethere

fbshipit-source-id: 7b6fa2a62fda6fc468cfcbdf8d2163e6b9c327b0

38cf5b7a

Refactor Streamer implementation (#2402) · eed57534

moto authored May 19, 2022

Summary:
* Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
* Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
* Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†

† Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.

Ref https://github.com/pytorch/audio/issues/2400

Pull Request resolved: https://github.com/pytorch/audio/pull/2402

Reviewed By: nateanl

Differential Revision: D36488808

Pulled By: mthrok

fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047

eed57534

18 May, 2022 1 commit

Add feature_grad_mult argument to HuBERTPretrainModel (#2335) · 647f28e4

Zhaoheng Ni authored May 18, 2022

Summary:
In Wav2Vec2 and HuBERT model training, the convolutional feature extraction layers use `group_norm` for normalization in `Base` model, while they use `layer_norm` in `Large` and `XLarge` models. For `Base` model, the gradients of feature extraction layers will be unstable in pre-training, thus we need to scale down the gradient by multiplying 0.1.

In this PR, we add such argument to `HuBERTPretrainModel` to control the gradient of feature extractor layers. We also put the argument in the factory functions (`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`. The reason is in finetuning, the feature extractor's parameters are fixed, we can multiply the gradient with 0.0 to avoid back propagating gradients.

Pull Request resolved: https://github.com/pytorch/audio/pull/2335

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D35646928

Pulled By: nateanl

fbshipit-source-id: 6a9563e227aac6e3127b634357946d860f26c994

647f28e4

17 May, 2022 1 commit

Expand subsections in tutorials by default (#2397) · c6a376cc

moto authored May 17, 2022

Summary:
This commit updates the `window.sideMenus.handleRightMenu`, so that
subsections are expanded on tutorials by default.

https://output.circle-artifacts.com/output/job/98508917-87df-4666-9958-c70683b3245d/artifacts/0/docs/tutorials/audio_io_tutorial.html

Tutorial subsections are important because they have anchors so
allow us to get the link to the specific figures / audio samples.

When responding issues/questions and when there is a corresponding
code snippet in tutorial, it is often easy to answer with links to
the tutorial.

However, by default the tutorial page collapses right side bar, and
I have to click the small "+" symbols to navigate to the subsection,
and the state of expansion does not persist across the page refresh.

This has been a pain point since we updated the Sphinx version to 3
in https://github.com/pytorch/audio/pull/1685.

Pull Request resolved: https://github.com/pytorch/audio/pull/2397

Reviewed By: xiaohui-zhang

Differential Revision: D36429745

Pulled By: mthrok

fbshipit-source-id: 97a5ae9270e68f8e88f0bca766d5a2c1839634e3

c6a376cc