Commits · 94653bf4b9fad703fb134bf32eba97b8fc03580d · OpenDAS / Torchaudio

01 Jun, 2022 4 commits

Caroline Chen authored Jun 01, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2426

Reviewed By: nateanl

Differential Revision: D36791423

Pulled By: carolineechen

fbshipit-source-id: e011147a716c940755032b8c68f5717d11fc91bf

94653bf4

Add conv_tasnet_base factory function to prototype (#2411) · 6057d3cf

Zhaoheng Ni authored Jun 01, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2411

Reviewed By: carolineechen

Differential Revision: D36663904

Pulled By: nateanl

fbshipit-source-id: c6a7dd530c9cfbb58b7121ebe02db6ae293cc2d0

6057d3cf

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

Move FileObj to dedicated source (#2427) · b374cc7b

moto authored May 31, 2022

Summary:
Extract from https://github.com/pytorch/audio/issues/2419. Move the `FileObj` definition to dedicated file, so that it can be reused from files other than StreamReader.

Pull Request resolved: https://github.com/pytorch/audio/pull/2427

Reviewed By: carolineechen

Differential Revision: D36794367

Pulled By: mthrok

fbshipit-source-id: 999658f3f4d833566d933c9223e7a5d49d300574

b374cc7b

31 May, 2022 2 commits

Fail on Python if sox_io info/load does not succeed (#2423) · b56f60bf

moto authored May 31, 2022

Summary:
Extracted from https://github.com/pytorch/audio/issues/2419. Move the failure of sox_io from C++ to Python layer.

Pull Request resolved: https://github.com/pytorch/audio/pull/2423

Reviewed By: carolineechen

Differential Revision: D36766152

Pulled By: mthrok

fbshipit-source-id: 53f897a608e97b81ebe5df29577374d88ce178f3

b56f60bf

Adding m1 builds to torchaudio (#2421) · c209b70d

Andrey Talman authored May 30, 2022

Summary:
This PR adds M1 wheel builds for torchaudio
Based on this PR: https://github.com/pytorch/vision/pull/5948
And this Builder [script](https://github.com/pytorch/builder/blob/main/build_m1_domains.sh)

Pull Request resolved: https://github.com/pytorch/audio/pull/2421

Reviewed By: mthrok

Differential Revision: D36767469

Pulled By: atalman

fbshipit-source-id: 9fc3b74b50ee669a230302fd27682702f83f63dc

c209b70d

30 May, 2022 1 commit

Pin test tool versions in CI (#2422) · 22a5d084

moto authored May 30, 2022

Summary:
All the unittests jobs are failing due to import error due to protobuf and scipy.
This commit pins the versions of them to an older version.

## protobuf

https://app.circleci.com/pipelines/github/pytorch/audio/10979/workflows/42005226-ca7e-471c-80f4-db09f4bd2089/jobs/692078

```
E   TypeError: Descriptors cannot not be created directly.
E   If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
E   If you cannot immediately regenerate your protos, some other possible workarounds are:
E    1. Downgrade the protobuf package to 3.20.x or lower.
E    2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
E
E   More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
```

https://github.com/protocolbuffers/protobuf/issues/10051
https://github.com/PyTorchLightning/pytorch-lightning/issues/13159

## scipy (pypocketfft)

1.8.1 is causing issue.

https://app.circleci.com/pipelines/github/pytorch/audio/10980/workflows/470a9361-4cc5-4d7c-9264-28fc8b86f1cb/jobs/692267

    ```
    ../env/lib/python3.9/site-packages/librosa/core/audio.py:11: in <module>
        import scipy.signal
    ../env/lib/python3.9/site-packages/scipy/signal/__init__.py:309: in <module>
        from . import _sigtools, windows
    ../env/lib/python3.9/site-packages/scipy/signal/windows/__init__.py:41: in <module>
        from ._windows import *
    ../env/lib/python3.9/site-packages/scipy/signal/windows/_windows.py:7: in <module>
        from scipy import linalg, special, fft as sp_fft
    ../env/lib/python3.9/site-packages/scipy/fft/__init__.py:91: in <module>
        from ._helper import next_fast_len
    ../env/lib/python3.9/site-packages/scipy/fft/_helper.py:3: in <module>
        from ._pocketfft import helper as _helper
    ../env/lib/python3.9/site-packages/scipy/fft/_pocketfft/__init__.py:3: in <module>
        from .basic import *
    ../env/lib/python3.9/site-packages/scipy/fft/_pocketfft/basic.py:6: in <module>
        from . import pypocketfft as pfft
    E   ImportError: /home/circleci/project/env/lib/python3.9/site-packages/torch/lib/../../../.././libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/circleci/project/env/lib/python3.9/site-packages/scipy/fft/_pocketfft/pypocketfft.cpython-39-x86_64-linux-gnu.so)

Pull Request resolved: https://github.com/pytorch/audio/pull/2422

Reviewed By: atalman

Differential Revision: D36764198

Pulled By: mthrok

fbshipit-source-id: 897a79fe9c3165206c2e747147fd0f257fc4f683

22a5d084

29 May, 2022 2 commits

Update source info (#2418) · bb77cbeb

moto authored May 28, 2022

Summary:
Add num_frames and bits_per_sample to match with the current
`torchaudio.info` capability.

Pull Request resolved: https://github.com/pytorch/audio/pull/2418

Reviewed By: carolineechen

Differential Revision: D36749077

Pulled By: mthrok

fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713

bb77cbeb

Change sox_io C++ return type to optional (#2416) · fd7ace17

moto authored May 28, 2022

Summary:
Preparation for upcoming change where load/info function will use fallback
if sox_io backend cannot handle the input.

Pull Request resolved: https://github.com/pytorch/audio/pull/2416

Reviewed By: carolineechen

Differential Revision: D36736969

Pulled By: mthrok

fbshipit-source-id: f804cfda3678f13bf0c2f6557a2f82ae42ae3c03

fd7ace17

28 May, 2022 1 commit

Update I/O initialization (#2417) · 65ab62e6

moto authored May 28, 2022

Summary:
Attempt to load ffmpeg extension at the top level import

Preparation to use ffmpeg-based I/O as a fallback for sox_io backend.

Pull Request resolved: https://github.com/pytorch/audio/pull/2417

Reviewed By: carolineechen

Differential Revision: D36736989

Pulled By: mthrok

fbshipit-source-id: 0beb6f459313b5ea91597393ccb12571444c54d9

65ab62e6

27 May, 2022 1 commit

Refactor Streamer to StreamReader in C++ codebase (#2403) · 9ef6c23d

moto authored May 27, 2022

Summary:
* `Streamer` has been renamed to `StreamReader` when it was moved from prototype to beta.
This commit applies the same name change to the C++ source code.

* Fix miscellaneous lint issues

* Make the code compilable on FFmpeg 5

Pull Request resolved: https://github.com/pytorch/audio/pull/2403

Reviewed By: carolineechen

Differential Revision: D36613053

Pulled By: mthrok

fbshipit-source-id: 69fedd6720d488dadf4dfe7d375ee76d216b215d

9ef6c23d

26 May, 2022 1 commit
- change Adam to AdamW (#2412) · 752de3e4
  nateanl authored May 26, 2022
  
  752de3e4
24 May, 2022 2 commits

Fix documentation (#2409) · 39c2c0a7

moto authored May 24, 2022

Summary:
Follow-up of https://github.com/pytorch/audio/issues/2407, the <script> was not properly closed on pages other than tutorials

Pull Request resolved: https://github.com/pytorch/audio/pull/2409

Reviewed By: carolineechen

Differential Revision: D36632668

Pulled By: mthrok

fbshipit-source-id: 9c0409a8011d77f8689e2dcdc1bd9844d3d31f79

39c2c0a7

Fix documentation (#2407) · 474510f2

moto authored May 24, 2022

Summary:
This commit fixes multiple issues with documentation.

https://output.circle-artifacts.com/output/job/23245537-e57b-4b9d-9b81-b3df20996d1f/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

1. Duplicated requirejs
The nbsphinx extension introduced in https://github.com/pytorch/audio/pull/2393 pulled a requirejs
which caused the initialization script to halt.
As a result, the right side bar was left uninitialized.

2. Undefined variable error
It turned out that PyTorch's theme expected the downstream projects
to define `collapsedSections` variable.
Currently console log shows `collapsedSections is not defined`.
As a result of this fix, we start to see the + symbol on left side.

3. Fix the behavior of default expand
Tweaks the right-side bar initialization behavior
so that expand-all only happens once, not at every resize.

4. Overwrite the link to GitHub
The `GitHub` tab in main-menu always linked PyTorch core.
This commit adds overwrite to torchaudio page

Pull Request resolved: https://github.com/pytorch/audio/pull/2407

Reviewed By: carolineechen

Differential Revision: D36612904

Pulled By: mthrok

fbshipit-source-id: 56aa7623a8925a241cf4790ac77a87424ad9237c

474510f2

23 May, 2022 3 commits

Add assertion checks to multi-channel functions (#2401) · 38e530d7

Zhaoheng Ni authored May 23, 2022

Summary:
- The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
- The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
- The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
- The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2401

Reviewed By: carolineechen

Differential Revision: D36597689

Pulled By: nateanl

fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6

38e530d7

Add LibriLightLimited dataset (#2302) · af9cab3b

Zhaoheng Ni authored May 23, 2022

Summary:
The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
It contains "10 min", "1 hour", "10 hour" splits.

Pull Request resolved: https://github.com/pytorch/audio/pull/2302

Reviewed By: mthrok

Differential Revision: D36388188

Pulled By: nateanl

fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249

af9cab3b

Add recipe for HuBERT model pre-training (#2198) · 48a0c17a

Zhaoheng Ni authored May 23, 2022

Summary:
Replace https://github.com/pytorch/audio/issues/2129

Pull Request resolved: https://github.com/pytorch/audio/pull/2198

Reviewed By: carolineechen

Differential Revision: D36544163

Pulled By: nateanl

fbshipit-source-id: 3f19ba5b0f2c2b9e93b0603c3b4491c1dbc40ef8

48a0c17a

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

20 May, 2022 3 commits

Tweak build doc job to avoid timeout (#2399) · 67762993

moto authored May 20, 2022

Summary:
After https://github.com/pytorch/audio/issues/2395, build_doc job is exceeding default no-output-timeout
threshould (10m).

This commit updates the timeout threshold to 30m.
Also it moves the installation of tools to the previous step.

Pull Request resolved: https://github.com/pytorch/audio/pull/2399

Reviewed By: carolineechen

Differential Revision: D36539022

Pulled By: mthrok

fbshipit-source-id: 391764a0fb5bf87cfb2beaab401a90dcb56493e5

67762993

Refactor LibriSpeech tests to accommodate different dataset classes (#2392) · 010583b6

Jeff Hwang authored May 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2392

Refactors LibriSpeech tests to accommodate different dataset classes

Reviewed By: xiaohui-zhang

Differential Revision: D36387835

fbshipit-source-id: 73b4e7565b4a077b25f036f4bd854ac7f2194b28

010583b6

Add tutorial to use NVDEC with Stream API (#2393) · 07ace387

moto authored May 20, 2022

Summary:
This commit adds tutorial to enable/use NVDEC with Stream API.

https://output.circle-artifacts.com/output/job/19e66a4b-1819-4804-8834-d38e6c80c4fd/artifacts/0/docs/hw_acceleration_tutorial.html

Because the use of NVDEC requires build / install FFmpeg from source,
this tutorial was authored on Google Colab, tailored to its environment.

The tutorial here is the result of the notebook execution, with
the link to the publicly accessible Google Colab notebook.

Pull Request resolved: https://github.com/pytorch/audio/pull/2393

Reviewed By: hwangjeff

Differential Revision: D36404408

Pulled By: mthrok

fbshipit-source-id: 9c820d3db4d06c5b343ecad0708489125ca06948

07ace387

19 May, 2022 2 commits

ci: Install libomp on macos (#2404) · 38cf5b7a

Eli Uriegas authored May 19, 2022

Summary:
To resolve nightly / general build issues relating to OpenMP not being found, see https://hud.pytorch.org/pytorch/audio/commit/c6a376cc5679c1940e49fc3e0ba22eaead6c2467



```
-- Found Torch: /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/torch/lib/libtorch.dylib
CMake Error at /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
Call Stack (most recent call first):
  /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindOpenMP.cmake:544 (find_package_handle_standard_args)
  CMakeLists.txt:131 (find_package)

-- Configuring incomplete, errors occurred!
```
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Pull Request resolved: https://github.com/pytorch/audio/pull/2404

Reviewed By: atalman

Differential Revision: D36495791

Pulled By: seemethere

fbshipit-source-id: 7b6fa2a62fda6fc468cfcbdf8d2163e6b9c327b0

38cf5b7a

Refactor Streamer implementation (#2402) · eed57534

moto authored May 19, 2022

Summary:
* Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
* Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
* Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†

† Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.

Ref https://github.com/pytorch/audio/issues/2400

Pull Request resolved: https://github.com/pytorch/audio/pull/2402

Reviewed By: nateanl

Differential Revision: D36488808

Pulled By: mthrok

fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047

eed57534

18 May, 2022 1 commit

Add feature_grad_mult argument to HuBERTPretrainModel (#2335) · 647f28e4

Zhaoheng Ni authored May 18, 2022

Summary:
In Wav2Vec2 and HuBERT model training, the convolutional feature extraction layers use `group_norm` for normalization in `Base` model, while they use `layer_norm` in `Large` and `XLarge` models. For `Base` model, the gradients of feature extraction layers will be unstable in pre-training, thus we need to scale down the gradient by multiplying 0.1.

In this PR, we add such argument to `HuBERTPretrainModel` to control the gradient of feature extractor layers. We also put the argument in the factory functions (`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`. The reason is in finetuning, the feature extractor's parameters are fixed, we can multiply the gradient with 0.0 to avoid back propagating gradients.

Pull Request resolved: https://github.com/pytorch/audio/pull/2335

Reviewed By: xiaohui-zhang, mthrok

Differential Revision: D35646928

Pulled By: nateanl

fbshipit-source-id: 6a9563e227aac6e3127b634357946d860f26c994

647f28e4

17 May, 2022 1 commit

Expand subsections in tutorials by default (#2397) · c6a376cc

moto authored May 17, 2022

Summary:
This commit updates the `window.sideMenus.handleRightMenu`, so that
subsections are expanded on tutorials by default.

https://output.circle-artifacts.com/output/job/98508917-87df-4666-9958-c70683b3245d/artifacts/0/docs/tutorials/audio_io_tutorial.html

Tutorial subsections are important because they have anchors so
allow us to get the link to the specific figures / audio samples.

When responding issues/questions and when there is a corresponding
code snippet in tutorial, it is often easy to answer with links to
the tutorial.

However, by default the tutorial page collapses right side bar, and
I have to click the small "+" symbols to navigate to the subsection,
and the state of expansion does not persist across the page refresh.

This has been a pain point since we updated the Sphinx version to 3
in https://github.com/pytorch/audio/pull/1685.

Pull Request resolved: https://github.com/pytorch/audio/pull/2397

Reviewed By: xiaohui-zhang

Differential Revision: D36429745

Pulled By: mthrok

fbshipit-source-id: 97a5ae9270e68f8e88f0bca766d5a2c1839634e3

c6a376cc

16 May, 2022 1 commit

Update build_doc job to use Conda CUDA package (#2395) · 8fd60cc8

moto authored May 16, 2022

Summary:
This commit moves `build_doc` job to run on top of Conda binary
build job.

The motivation is that Conda provides easy access to third party
tools that are required to build complex documentation.

Specifically in https://github.com/pytorch/audio/pull/2393,
ipynb-style tutorial is being added, which requires `nbsphinx`.

`nbsphinx` requires `pandoc` package and there was some issue
with the version from PyPI. A workaround is to use the one from
Conda package.

Pull Request resolved: https://github.com/pytorch/audio/pull/2395

Reviewed By: carolineechen, nateanl

Differential Revision: D36404407

Pulled By: mthrok

fbshipit-source-id: 26ec5ebfd5be795384306a9f24817a2eb3ec96c1

8fd60cc8

15 May, 2022 1 commit

[codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402214

fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c

d62875cc

13 May, 2022 2 commits

Refactor LibriSpeech dataset (#2387) · 44f4a5ea

hwangjeff authored May 13, 2022

Summary:
Refactors `librispeech.py` to clarify its logic.

Pull Request resolved: https://github.com/pytorch/audio/pull/2387

Reviewed By: nateanl

Differential Revision: D36359176

Pulled By: hwangjeff

fbshipit-source-id: 595dd1421738279896348448dd72ca57bfe7cef2

44f4a5ea

Move Streamer API out of prototype (#2378) · 72b712a1

moto authored May 13, 2022

Summary:
This commit moves the Streaming API out of prototype module.

* The related classes are renamed as following

  - `Streamer` -> `StreamReader`.
  - `SourceStream` -> `StreamReaderSourceStream`
  - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
  - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
  - `OutputStream` -> `StreamReaderOutputStream`

This change is preemptive measurement for the possibility to add
`StreamWriter` API.

* Replace BUILD_FFMPEG build arg with USE_FFMPEG

We are not building FFmpeg, so USE_FFMPEG is more appropriate

 ---

After https://github.com/pytorch/audio/issues/2377

Remaining TODOs: (different PRs)
- [ ] Introduce `is_ffmpeg_binding_available` function.
- [ ] Refactor C++ code:
   - Rename `Streamer` to `StreamReader`.
   - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
   - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
   - Introduce `stream_reader` directory.
- [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)

Pull Request resolved: https://github.com/pytorch/audio/pull/2378

Reviewed By: carolineechen

Differential Revision: D36359299

Pulled By: mthrok

fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328

72b712a1

12 May, 2022 4 commits

Use module-level `__getattr__` to implement delayed initialization (#2377) · 9499f642

moto authored May 12, 2022

Summary:
This commit updates the lazy module initialization logic for
`torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`.

- The modules are importable regarless of optional dependencies.
i.e. `import torchaudio.prototype.io` does not trigger the check for
optional dependencies.

- Optional dependencies are checked when the actual
API is imported for the first time.
i.e. `from torchaudio.prototype.io import Streamer` triggers the check
for optional dependencies.

The downside is that;

- `import torchaudio.prototype.io.Streamer` no longer works.

## Details:

Starting from Python 3.7, modules can bave `__getattr__` function,
which serves as a fallback if the import mechanism cannot find the
attribute.

This can be used to implement lazy import.

```python
def __getattr__(name):
    global pi
    if name == 'pi':
        import math
        pi = math.pi
        return pi
    raise AttributeError(...)
```

Ref: https://twitter.com/raymondh/status/1094686528440168453

The implementation performs lazy import for the APIs that work with
external/optional dependencies. In addition, it also check if the
binding is initialized only once.

## Why is this preferable approach?

Previously, the optional dependencies were checked at the tiem module
is imported;

https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5

As long as this module is in `prototype`, which we ask users to import
explictly, users had control whether they want/do not want to install
the optional dependencies.

This approach only works for one optional dependencies per one module.
Say, we add different I/O library as an optional dependency, we need to
put all the APIs in dedicated submodule. This prevents us from having
flat namespace.
i.e. the I/O modules with multiple optional dependencies would look like

```python
# Client code
from torchaudio.io.foo import FooFeature
from torchaudio.io.bar import BarFeature
```

where the new approach would allow

```python
#client code
from torchaudio.io import FooFeature, BarFeature
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2377

Reviewed By: nateanl

Differential Revision: D36305603

Pulled By: mthrok

fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae

9499f642

Refactor MVDR module (#2383) · f5036c71

Zhaoheng Ni authored May 12, 2022

Summary:
- Use `apply_beamforming`, `rtf_evd`, `rtf_power`, `mvdr_weights_souden`, `mvdr_weights_rtf` methods under `torchaudio.functional` to replace the class methods.
- Refactor docstrings in `PSD` and `MVDR`.
- Put `_get_mvdr_vector` outside of `MVDR` class as it doesn't call self methods inside.
- Since MVDR uses einsum for matrix operations, packing and unpacking batches are not necessary. It can be tested by the [batch_consistency_test](https://github.com/pytorch/audio/blob/main/test/torchaudio_unittest/transforms/batch_consistency_test.py#L202). Removed it from the code.

Pull Request resolved: https://github.com/pytorch/audio/pull/2383

Reviewed By: carolineechen, mthrok

Differential Revision: D36338373

Pulled By: nateanl

fbshipit-source-id: a48a6ae2825657e5967a19656245596cdf037c5f

f5036c71

Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680

Zhaoheng Ni authored May 12, 2022

Summary:
- When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
- If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/2296

Reviewed By: mthrok

Differential Revision: D36323217

Pulled By: nateanl

fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423

09639680

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

11 May, 2022 6 commits

Move FFmpeg integrity test from conda smoke test to custom smoke test (#2381) · 9877f544

moto authored May 11, 2022

Summary:
Conda package build performs simple smoke test, which is different
from smoke_test jobs we define on our CI jobs.

Currently Conda packaging smoke test verifies the imporatability of
`torchaudio.prototype.io`, which requires FFmpeg 4.

1. We list FFmpeg 4 as runtime requirements, but this means that
conda's dependency resolver takes FFmpeg 4 into consideration.
FFmpeg 5 was release this year, and we can expect that user base
will move to FFmpeg gradually. If user environment has some constraint
on FFmpeg, torchaudio will have conflict and it will prevent users
from install torchaudio.

2. In #2377 the way optional dependency is checked/initialized is changed,
so this Conda smoke test will no longer check the integrity with FFmpeg libraries.

To solve the issues above, this commit moves the part that tests integrity with
FFmpeg libraries to the smoke test we define on CircleCI.

Pull Request resolved: https://github.com/pytorch/audio/pull/2381

Reviewed By: carolineechen

Differential Revision: D36323706

Pulled By: mthrok

fbshipit-source-id: 57ca816e0f3ad8e16d21e56062f6ed8a09ab93a3

9877f544

Move multi-channel modules to a separate file (#2382) · 448f53e1

Zhaoheng Ni authored May 11, 2022

Summary:
The modules include:
- PSD
- MVDR
- RTFMVDR
- SoudenMVDR

Pull Request resolved: https://github.com/pytorch/audio/pull/2382

Reviewed By: carolineechen

Differential Revision: D36314096

Pulled By: nateanl

fbshipit-source-id: 9d7d962b1c70cdc435a579191ad88838dd6fc0ba

448f53e1

Remove CodeQL (#2380) · 961a3ae9

moto authored May 11, 2022

Summary:
Since a while ago, CodeQL is always emitting red signal, but the team
does not know what this is / how to fix this. At this point, it is
purely noise while not providing a valuable signal.

Ref https://github.com/pytorch/audio/issues/2314

Pull Request resolved: https://github.com/pytorch/audio/pull/2380

Reviewed By: carolineechen

Differential Revision: D36305599

Pulled By: mthrok

fbshipit-source-id: 27ece58730066543600f3873397b9a239e54beb0

961a3ae9

Ignore TempDir clean up error (#2379) · f35ad461

moto authored May 11, 2022

Summary:
On CircleCI, Windows unittests are failing for Python 3.7 with
`PermissionError` at the end of test when it cleans up temporary
directory.

According to the discussion https://github.com/python/cpython/issues/74168,
this is caused by a known issue with `shutil.rmtree`.

In the above thread it is advised to simply ignore the error as it
is not guaranteed that temp directories are cleaned up.

This commit follows the same path and simply ignore the error
so that our CI gets back to green.

Pull Request resolved: https://github.com/pytorch/audio/pull/2379

Reviewed By: carolineechen

Differential Revision: D36305595

Pulled By: mthrok

fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5

f35ad461

Refactor LibriSpeech Conformer RNN-T recipe (#2366) · 69467ea5

hwangjeff authored May 10, 2022

Summary:
Modifies the example LibriSpeech Conformer RNN-T recipe as follows:
- Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module).
- Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms).
- Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments).
- Modifies training script to allow for specifying path model checkpoint to restart training from.

Pull Request resolved: https://github.com/pytorch/audio/pull/2366

Reviewed By: mthrok

Differential Revision: D36305028

Pulled By: hwangjeff

fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba

69467ea5

Refactor the constructors of pointer wrappers (#2373) · 93c26d63

moto authored May 10, 2022

Summary:
This commit refactor the constructor of wrapper classes so that
wrapper classes are only responsible for deallocation of underlying
FFmpeg custom structures.

The responsibility of custom initialization is moved to helper functions.

Context:

FFmpeg API uses bunch of raw pointers, which require dedicated allocater
and deallcoator. In torchaudio we wrap these pointers with
`std::unique_ptr<>` to adopt RAII semantics.

Currently all of the customization logics required for `Streamer` are
handled by the constructor of wrapper class. Like the following;

```
AVFormatContextPtr(
      const std::string& src,
      const std::string& device,
      const std::map<std::string, std::string>& option);
```

This constructor allocates the raw `AVFormatContext*` pointer,
while initializing it with the given option, then it parses the
input media.

As we consider the write/encode features, which require different way
of initializing the `AVFormatContext*`, making it the responsibility
of constructors of `AVFormatContextPtr` reduce the flexibility.

Thus this commit moves the customization to helper factory function.

- `AVFormatContextPtr(...)` -> `get_input_format_context(...)`
- `AVCodecContextPtr(...)` -> `get_decode_context(...)`

Pull Request resolved: https://github.com/pytorch/audio/pull/2373

Reviewed By: hwangjeff

Differential Revision: D36230148

Pulled By: mthrok

fbshipit-source-id: 202d57d549223904ee958193f3b386ef5a9cda3a

93c26d63

10 May, 2022 1 commit

Add ConvEmformer module (#2358) · 2c79b55a

hwangjeff authored May 10, 2022

Summary:
Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241.

Continuation of https://github.com/pytorch/audio/issues/2324.

Pull Request resolved: https://github.com/pytorch/audio/pull/2358

Reviewed By: nateanl, xiaohui-zhang

Differential Revision: D36137992

Pulled By: hwangjeff

fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063

2c79b55a