1. 24 May, 2022 1 commit
  2. 23 May, 2022 3 commits
  3. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d
  4. 20 May, 2022 3 commits
  5. 19 May, 2022 2 commits
    • Eli Uriegas's avatar
      ci: Install libomp on macos (#2404) · 38cf5b7a
      Eli Uriegas authored
      Summary:
      To resolve nightly / general build issues relating to OpenMP not being found, see https://hud.pytorch.org/pytorch/audio/commit/c6a376cc5679c1940e49fc3e0ba22eaead6c2467
      
      
      
      ```
      -- Found Torch: /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/torch/lib/libtorch.dylib
      CMake Error at /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
        Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
      Call Stack (most recent call first):
        /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
        /Users/distiller/miniconda3/envs/env3.10/lib/python3.10/site-packages/cmake/data/CMake.app/Contents/share/cmake-3.22/Modules/FindOpenMP.cmake:544 (find_package_handle_standard_args)
        CMakeLists.txt:131 (find_package)
      
      -- Configuring incomplete, errors occurred!
      ```
      Signed-off-by: default avatarEli Uriegas <eliuriegas@fb.com>
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2404
      
      Reviewed By: atalman
      
      Differential Revision: D36495791
      
      Pulled By: seemethere
      
      fbshipit-source-id: 7b6fa2a62fda6fc468cfcbdf8d2163e6b9c327b0
      38cf5b7a
    • moto's avatar
      Refactor Streamer implementation (#2402) · eed57534
      moto authored
      Summary:
      * Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
      * Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
      * Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†
      
      † Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.
      
      Ref https://github.com/pytorch/audio/issues/2400
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2402
      
      Reviewed By: nateanl
      
      Differential Revision: D36488808
      
      Pulled By: mthrok
      
      fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047
      eed57534
  6. 18 May, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add feature_grad_mult argument to HuBERTPretrainModel (#2335) · 647f28e4
      Zhaoheng Ni authored
      Summary:
      In Wav2Vec2 and HuBERT model training, the convolutional feature extraction layers use `group_norm` for normalization in `Base` model, while they use `layer_norm` in `Large` and `XLarge` models. For `Base` model, the gradients of feature extraction layers will be unstable in pre-training, thus we need to scale down the gradient by multiplying 0.1.
      
      In this PR, we add such argument to `HuBERTPretrainModel` to control the gradient of feature extractor layers. We also put the argument in the factory functions (`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`. The reason is in finetuning, the feature extractor's parameters are fixed, we can multiply the gradient with 0.0 to avoid back propagating gradients.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2335
      
      Reviewed By: xiaohui-zhang, mthrok
      
      Differential Revision: D35646928
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6a9563e227aac6e3127b634357946d860f26c994
      647f28e4
  7. 17 May, 2022 1 commit
  8. 16 May, 2022 1 commit
    • moto's avatar
      Update build_doc job to use Conda CUDA package (#2395) · 8fd60cc8
      moto authored
      Summary:
      This commit moves `build_doc` job to run on top of Conda binary
      build job.
      
      The motivation is that Conda provides easy access to third party
      tools that are required to build complex documentation.
      
      Specifically in https://github.com/pytorch/audio/pull/2393,
      ipynb-style tutorial is being added, which requires `nbsphinx`.
      
      `nbsphinx` requires `pandoc` package and there was some issue
      with the version from PyPI. A workaround is to use the one from
      Conda package.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2395
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D36404407
      
      Pulled By: mthrok
      
      fbshipit-source-id: 26ec5ebfd5be795384306a9f24817a2eb3ec96c1
      8fd60cc8
  9. 15 May, 2022 1 commit
    • John Reese's avatar
      [codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc
      John Reese authored
      Summary:
      Applies new import merging and sorting from µsort v1.0.
      
      When merging imports, µsort will make a best-effort to move associated
      comments to match merged elements, but there are known limitations due to
      the diynamic nature of Python and developer tooling. These changes should
      not produce any dangerous runtime changes, but may require touch-ups to
      satisfy linters and other tooling.
      
      Note that µsort uses case-insensitive, lexicographical sorting, which
      results in a different ordering compared to isort. This provides a more
      consistent sorting order, matching the case-insensitive order used when
      sorting import statements by module name, and ensures that "frog", "FROG",
      and "Frog" always sort next to each other.
      
      For details on µsort's sorting and merging semantics, see the user guide:
      https://usort.readthedocs.io/en/stable/guide.html#sorting
      
      Reviewed By: lisroach
      
      Differential Revision: D36402214
      
      fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
      d62875cc
  10. 13 May, 2022 2 commits
    • hwangjeff's avatar
      Refactor LibriSpeech dataset (#2387) · 44f4a5ea
      hwangjeff authored
      Summary:
      Refactors `librispeech.py` to clarify its logic.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2387
      
      Reviewed By: nateanl
      
      Differential Revision: D36359176
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 595dd1421738279896348448dd72ca57bfe7cef2
      44f4a5ea
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  11. 12 May, 2022 4 commits
    • moto's avatar
      Use module-level `__getattr__` to implement delayed initialization (#2377) · 9499f642
      moto authored
      Summary:
      This commit updates the lazy module initialization logic for
      `torchaudio.prototype.io` and `torchaudio.prototype.ctc_decoder`.
      
      - The modules are importable regarless of optional dependencies.
      i.e. `import torchaudio.prototype.io` does not trigger the check for
      optional dependencies.
      
      - Optional dependencies are checked when the actual
      API is imported for the first time.
      i.e. `from torchaudio.prototype.io import Streamer` triggers the check
      for optional dependencies.
      
      The downside is that;
      
      - `import torchaudio.prototype.io.Streamer` no longer works.
      
      ## Details:
      
      Starting from Python 3.7, modules can bave `__getattr__` function,
      which serves as a fallback if the import mechanism cannot find the
      attribute.
      
      This can be used to implement lazy import.
      
      ```python
      def __getattr__(name):
          global pi
          if name == 'pi':
              import math
              pi = math.pi
              return pi
          raise AttributeError(...)
      ```
      
      Ref: https://twitter.com/raymondh/status/1094686528440168453
      
      The implementation performs lazy import for the APIs that work with
      external/optional dependencies. In addition, it also check if the
      binding is initialized only once.
      
      ## Why is this preferable approach?
      
      Previously, the optional dependencies were checked at the tiem module
      is imported;
      
      https://github.com/pytorch/audio/blob/2f4eb4ac2f48a597825d3631a840afd855fe6b39/torchaudio/prototype/io/__init__.py#L1-L5
      
      As long as this module is in `prototype`, which we ask users to import
      explictly, users had control whether they want/do not want to install
      the optional dependencies.
      
      This approach only works for one optional dependencies per one module.
      Say, we add different I/O library as an optional dependency, we need to
      put all the APIs in dedicated submodule. This prevents us from having
      flat namespace.
      i.e. the I/O modules with multiple optional dependencies would look like
      
      ```python
      # Client code
      from torchaudio.io.foo import FooFeature
      from torchaudio.io.bar import BarFeature
      ```
      
      where the new approach would allow
      
      ```python
      #client code
      from torchaudio.io import FooFeature, BarFeature
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2377
      
      Reviewed By: nateanl
      
      Differential Revision: D36305603
      
      Pulled By: mthrok
      
      fbshipit-source-id: c1eb6cac203f6dd0026d99f9a1de1af590a535ae
      9499f642
    • Zhaoheng Ni's avatar
      Refactor MVDR module (#2383) · f5036c71
      Zhaoheng Ni authored
      Summary:
      - Use `apply_beamforming`, `rtf_evd`, `rtf_power`, `mvdr_weights_souden`, `mvdr_weights_rtf` methods under `torchaudio.functional` to replace the class methods.
      - Refactor docstrings in `PSD` and `MVDR`.
      - Put `_get_mvdr_vector` outside of `MVDR` class as it doesn't call self methods inside.
      - Since MVDR uses einsum for matrix operations, packing and unpacking batches are not necessary. It can be tested by the [batch_consistency_test](https://github.com/pytorch/audio/blob/main/test/torchaudio_unittest/transforms/batch_consistency_test.py#L202). Removed it from the code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2383
      
      Reviewed By: carolineechen, mthrok
      
      Differential Revision: D36338373
      
      Pulled By: nateanl
      
      fbshipit-source-id: a48a6ae2825657e5967a19656245596cdf037c5f
      f5036c71
    • Zhaoheng Ni's avatar
      Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680
      Zhaoheng Ni authored
      Summary:
      - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
      This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
      - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2296
      
      Reviewed By: mthrok
      
      Differential Revision: D36323217
      
      Pulled By: nateanl
      
      fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
      09639680
    • John Reese's avatar
      [black][codemod] formatting changes from black 22.3.0 · 595dc5d3
      John Reese authored
      Summary:
      Applies the black-fbsource codemod with the new build of pyfmt.
      
      paintitblack
      
      Reviewed By: lisroach
      
      Differential Revision: D36324783
      
      fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
      595dc5d3
  12. 11 May, 2022 6 commits
    • moto's avatar
      Move FFmpeg integrity test from conda smoke test to custom smoke test (#2381) · 9877f544
      moto authored
      Summary:
      Conda package build performs simple smoke test, which is different
      from smoke_test jobs we define on our CI jobs.
      
      Currently Conda packaging smoke test verifies the imporatability of
      `torchaudio.prototype.io`, which requires FFmpeg 4.
      
      1. We list FFmpeg 4 as runtime requirements, but this means that
      conda's dependency resolver takes FFmpeg 4 into consideration.
      FFmpeg 5 was release this year, and we can expect that user base
      will move to FFmpeg gradually. If user environment has some constraint
      on FFmpeg, torchaudio will have conflict and it will prevent users
      from install torchaudio.
      
      2. In #2377 the way optional dependency is checked/initialized is changed,
      so this Conda smoke test will no longer check the integrity with FFmpeg libraries.
      
      To solve the issues above, this commit moves the part that tests integrity with
      FFmpeg libraries to the smoke test we define on CircleCI.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2381
      
      Reviewed By: carolineechen
      
      Differential Revision: D36323706
      
      Pulled By: mthrok
      
      fbshipit-source-id: 57ca816e0f3ad8e16d21e56062f6ed8a09ab93a3
      9877f544
    • Zhaoheng Ni's avatar
      Move multi-channel modules to a separate file (#2382) · 448f53e1
      Zhaoheng Ni authored
      Summary:
      The modules include:
      - PSD
      - MVDR
      - RTFMVDR
      - SoudenMVDR
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2382
      
      Reviewed By: carolineechen
      
      Differential Revision: D36314096
      
      Pulled By: nateanl
      
      fbshipit-source-id: 9d7d962b1c70cdc435a579191ad88838dd6fc0ba
      448f53e1
    • moto's avatar
      Remove CodeQL (#2380) · 961a3ae9
      moto authored
      Summary:
      Since a while ago, CodeQL is always emitting red signal, but the team
      does not know what this is / how to fix this. At this point, it is
      purely noise while not providing a valuable signal.
      
      Ref https://github.com/pytorch/audio/issues/2314
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2380
      
      Reviewed By: carolineechen
      
      Differential Revision: D36305599
      
      Pulled By: mthrok
      
      fbshipit-source-id: 27ece58730066543600f3873397b9a239e54beb0
      961a3ae9
    • moto's avatar
      Ignore TempDir clean up error (#2379) · f35ad461
      moto authored
      Summary:
      On CircleCI, Windows unittests are failing for Python 3.7 with
      `PermissionError` at the end of test when it cleans up temporary
      directory.
      
      According to the discussion https://github.com/python/cpython/issues/74168,
      this is caused by a known issue with `shutil.rmtree`.
      
      In the above thread it is advised to simply ignore the error as it
      is not guaranteed that temp directories are cleaned up.
      
      This commit follows the same path and simply ignore the error
      so that our CI gets back to green.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2379
      
      Reviewed By: carolineechen
      
      Differential Revision: D36305595
      
      Pulled By: mthrok
      
      fbshipit-source-id: d9049c2ee3447712119786311f639a1f9f8911c5
      f35ad461
    • hwangjeff's avatar
      Refactor LibriSpeech Conformer RNN-T recipe (#2366) · 69467ea5
      hwangjeff authored
      Summary:
      Modifies the example LibriSpeech Conformer RNN-T recipe as follows:
      - Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module).
      - Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms).
      - Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments).
      - Modifies training script to allow for specifying path model checkpoint to restart training from.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2366
      
      Reviewed By: mthrok
      
      Differential Revision: D36305028
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba
      69467ea5
    • moto's avatar
      Refactor the constructors of pointer wrappers (#2373) · 93c26d63
      moto authored
      Summary:
      This commit refactor the constructor of wrapper classes so that
      wrapper classes are only responsible for deallocation of underlying
      FFmpeg custom structures.
      
      The responsibility of custom initialization is moved to helper functions.
      
      Context:
      
      FFmpeg API uses bunch of raw pointers, which require dedicated allocater
      and deallcoator. In torchaudio we wrap these pointers with
      `std::unique_ptr<>` to adopt RAII semantics.
      
      Currently all of the customization logics required for `Streamer` are
      handled by the constructor of wrapper class. Like the following;
      
      ```
      AVFormatContextPtr(
            const std::string& src,
            const std::string& device,
            const std::map<std::string, std::string>& option);
      ```
      
      This constructor allocates the raw `AVFormatContext*` pointer,
      while initializing it with the given option, then it parses the
      input media.
      
      As we consider the write/encode features, which require different way
      of initializing the `AVFormatContext*`, making it the responsibility
      of constructors of `AVFormatContextPtr` reduce the flexibility.
      
      Thus this commit moves the customization to helper factory function.
      
      - `AVFormatContextPtr(...)` -> `get_input_format_context(...)`
      - `AVCodecContextPtr(...)` -> `get_decode_context(...)`
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2373
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36230148
      
      Pulled By: mthrok
      
      fbshipit-source-id: 202d57d549223904ee958193f3b386ef5a9cda3a
      93c26d63
  13. 10 May, 2022 8 commits
    • hwangjeff's avatar
      Add ConvEmformer module (#2358) · 2c79b55a
      hwangjeff authored
      Summary:
      Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241.
      
      Continuation of https://github.com/pytorch/audio/issues/2324.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2358
      
      Reviewed By: nateanl, xiaohui-zhang
      
      Differential Revision: D36137992
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
      2c79b55a
    • Zhaoheng Ni's avatar
      Fix return dtype in MVDR module (#2376) · 2f4eb4ac
      Zhaoheng Ni authored
      Summary:
      Address https://github.com/pytorch/audio/issues/2375
      The MVDR module internally transforms the dtype of complex tensors to `torch.complex128` for computation and transforms it back to the original dtype before returning the Tensor. However, it didn't convert back successfully due to `specgram_enhanced.to(dtype)`, which should be `specgram_enhanced = specgram_enhanced.to(dtype)`. Fix it to make the output dtype consistent with original input.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2376
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36280851
      
      Pulled By: nateanl
      
      fbshipit-source-id: 553d1b98f899547209a4e3ebc59920c7ef1f3112
      2f4eb4ac
    • Kyle Chen's avatar
      [ROCm] Update to rocm5.1.1 (#2362) · eab2f39d
      Kyle Chen authored
      Summary:
      previous update for rocm: https://github.com/pytorch/audio/pull/2186
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2362
      
      Reviewed By: seemethere
      
      Differential Revision: D36283672
      
      Pulled By: mthrok
      
      fbshipit-source-id: bfd38940d027c8ccd72ab48991e5ab7f84b0e9c0
      eab2f39d
    • Zhaoheng Ni's avatar
      Add RTFMVDR module (#2368) · 4b021ae3
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
      The input arguments are:
      - multi-channel spectrum.
      - RTF vector of the target speech
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2368
      
      Reviewed By: carolineechen
      
      Differential Revision: D36214940
      
      Pulled By: nateanl
      
      fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
      4b021ae3
    • Zhaoheng Ni's avatar
      Add diagonal_loading optional to rtf_power (#2369) · da1e83cc
      Zhaoheng Ni authored
      Summary:
      When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.
      
      This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2369
      
      Reviewed By: carolineechen
      
      Differential Revision: D36204130
      
      Pulled By: nateanl
      
      fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420
      da1e83cc
    • Zhaoheng Ni's avatar
      Add SoudenMVDR module (#2367) · aed5eb88
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
      The input arguments are:
      - multi-channel spectrum.
      - PSD matrix of target speech.
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2367
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36198015
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
      aed5eb88
    • moto's avatar
      Add HW acceleration support on Streamer (#2331) · 54d2d04f
      moto authored
      Summary:
      This commits add `hw_accel` option to `Streamer::add_video_stream` method.
      Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA,
      when the following conditions are met.
      1. the video format is H264,
      2. underlying ffmpeg is compiled with NVENC, and
      3. the client code specifies `decoder="h264_cuvid"`.
      
      A simple benchmark yields x7 improvement in the decoding speed.
      
      <details>
      
      ```python
      import time
      
      from torchaudio.prototype.io import Streamer
      
      srcs = [
          "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
          "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",  # offline version
      ]
      
      patterns = [
          ("h264_cuvid", None, "cuda:0"),  # NVDEC on CUDA:0 -> CUDA:0
          ("h264_cuvid", None, "cuda:1"),  # NVDEC on CUDA:1 -> CUDA:1
          ("h264_cuvid", None, None),  # NVDEC -> CPU
          (None, None, None),  # CPU
      ]
      
      for src in srcs:
          print(src, flush=True)
          for (decoder, decoder_options, hw_accel) in patterns:
              s = Streamer(src)
              s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel)
      
              t0 = time.monotonic()
              num_frames = 0
      	for i, (chunk, ) in enumerate(s.stream()):
      	    num_frames += chunk.shape[0]
              t1 = time.monotonic()
              print(chunk.dtype, chunk.shape, chunk.device)
              print(time.monotonic() - t0, num_frames, flush=True)
      ```
      </details>
      
      ```
      https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
      10.781158386962488 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
      10.771313901990652 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      27.88662809302332 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      83.22728440898936 6175
      ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
      12.945253834011964 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
      12.870224556012545 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      28.03406483103754 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      82.6120332319988 6175
      ```
      
      With HW resizing
      
      <details>
      
      ```python
      import time
      
      from torchaudio.prototype.io import Streamer
      
      srcs = [
          "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
          "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
      ]
      
      patterns = [
          # Decode with NVDEC, CUDA HW scaling -> CUDA:0
          ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"),
          # Decoded with NVDEC, CUDA HW scaling -> CPU
          ("h264_cuvid", {"resize": "960x540"}, "", None),
          # CPU decoding, CPU scaling
          (None, None, "scale=width=960:height=540", None),
      ]
      
      for src in srcs:
          print(src, flush=True)
          for (decoder, decoder_options, filter_desc, hw_accel) in patterns:
              s = Streamer(src)
              s.add_video_stream(
                  5,
                  decoder=decoder,
                  decoder_options=decoder_options,
                  filter_desc=filter_desc,
                  hw_accel=hw_accel,
              )
      
              t0 = time.monotonic()
              num_frames = 0
              for i, (chunk, ) in enumerate(s.stream()):
                  num_frames += chunk.shape[0]
              t1 = time.monotonic()
              print(chunk.dtype, chunk.shape, chunk.device)
              print(time.monotonic() - t0, num_frames, flush=True)
      ```
      
      </details>
      
      ```
      ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
      12.890056837990414 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      10.697489063022658 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      85.19899423001334 6175
      
      https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
      10.712715593050234 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      11.030170071986504 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      84.8515750519582 6175
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2331
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36217169
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b
      54d2d04f
    • Caroline Chen's avatar
      Add citations for datasets (#2371) · 638120ca
      Caroline Chen authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D36246167
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4
      638120ca
  14. 09 May, 2022 1 commit
  15. 06 May, 2022 2 commits
    • moto's avatar
      Use custom FFmpeg libraries for torchaudio binary distributions (#2355) · b7624c60
      moto authored
      Summary:
      This commit changes the way torchaudio binary distributions are built.
      
      * For all the binary distributions (conda/pip on Linux/macOS/Windnows), build custom FFmpeg libraries.
      * The custom FFmpeg libraries do not use `--use-gpl` nor `--use-nonfree`, so that they stay LGPL.
      * The custom FFmpeg libraries employ rpath so that the torchaudio binary distributions look for the corresponding FFmpeg libraries installed in the runtime environment.
      * The torchaudio binary build process will use them to bootstrap its build process.
      * The custom FFmpeg libraries are NOT shipped.
      
      This commit also add disclaimer about FFmpeg in README.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2355
      
      Reviewed By: nateanl
      
      Differential Revision: D36202087
      
      Pulled By: mthrok
      
      fbshipit-source-id: c30e5222ba190106c897e42f567cac9152dbd8ef
      b7624c60
    • moto's avatar
      Refactor smoke test executions (#2365) · 6a8a28bb
      moto authored
      Summary:
      The smoke test jobs simply perform `import torchaudio` to check
      if the package artifacts are sane.
      
      Originally, the CI was executing it in the root directory.
      This was fine unless the source code is checked out.
      When source code is checked out, performing `import torchaudio` in
      root directory would import source torchaudio directory, instead of the
      installed package.
      
      This error is difficult to notice, so this commit introduces common script to
      perform the smoke test, while moving out of root directory.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2365
      
      Reviewed By: carolineechen
      
      Differential Revision: D36202069
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf
      6a8a28bb
  16. 05 May, 2022 2 commits
  17. 28 Apr, 2022 1 commit