1. 10 May, 2022 6 commits
    • Kyle Chen's avatar
      [ROCm] Update to rocm5.1.1 (#2362) · eab2f39d
      Kyle Chen authored
      Summary:
      previous update for rocm: https://github.com/pytorch/audio/pull/2186
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2362
      
      Reviewed By: seemethere
      
      Differential Revision: D36283672
      
      Pulled By: mthrok
      
      fbshipit-source-id: bfd38940d027c8ccd72ab48991e5ab7f84b0e9c0
      eab2f39d
    • Zhaoheng Ni's avatar
      Add RTFMVDR module (#2368) · 4b021ae3
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
      The input arguments are:
      - multi-channel spectrum.
      - RTF vector of the target speech
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2368
      
      Reviewed By: carolineechen
      
      Differential Revision: D36214940
      
      Pulled By: nateanl
      
      fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
      4b021ae3
    • Zhaoheng Ni's avatar
      Add diagonal_loading optional to rtf_power (#2369) · da1e83cc
      Zhaoheng Ni authored
      Summary:
      When computing the MVDR beamforming weights using the power iteration method, the PSD matrix of noise can be applied with diagonal loading to improve the robustness. This is also applicable to computing the RTF matrix (See https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py#L614 as an example). This also aligns with current `torchaudio.transforms.MVDR` module to keep the consistency.
      
      This PR adds the `diagonal_loading` argument with `True` as default value to `torchaudio.functional.rtf_power`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2369
      
      Reviewed By: carolineechen
      
      Differential Revision: D36204130
      
      Pulled By: nateanl
      
      fbshipit-source-id: 93a58d5c2107841a16c4e32f0c16ab0d6b2d9420
      da1e83cc
    • Zhaoheng Ni's avatar
      Add SoudenMVDR module (#2367) · aed5eb88
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
      The input arguments are:
      - multi-channel spectrum.
      - PSD matrix of target speech.
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2367
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36198015
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
      aed5eb88
    • moto's avatar
      Add HW acceleration support on Streamer (#2331) · 54d2d04f
      moto authored
      Summary:
      This commits add `hw_accel` option to `Streamer::add_video_stream` method.
      Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA,
      when the following conditions are met.
      1. the video format is H264,
      2. underlying ffmpeg is compiled with NVENC, and
      3. the client code specifies `decoder="h264_cuvid"`.
      
      A simple benchmark yields x7 improvement in the decoding speed.
      
      <details>
      
      ```python
      import time
      
      from torchaudio.prototype.io import Streamer
      
      srcs = [
          "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
          "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",  # offline version
      ]
      
      patterns = [
          ("h264_cuvid", None, "cuda:0"),  # NVDEC on CUDA:0 -> CUDA:0
          ("h264_cuvid", None, "cuda:1"),  # NVDEC on CUDA:1 -> CUDA:1
          ("h264_cuvid", None, None),  # NVDEC -> CPU
          (None, None, None),  # CPU
      ]
      
      for src in srcs:
          print(src, flush=True)
          for (decoder, decoder_options, hw_accel) in patterns:
              s = Streamer(src)
              s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel)
      
              t0 = time.monotonic()
              num_frames = 0
      	for i, (chunk, ) in enumerate(s.stream()):
      	    num_frames += chunk.shape[0]
              t1 = time.monotonic()
              print(chunk.dtype, chunk.shape, chunk.device)
              print(time.monotonic() - t0, num_frames, flush=True)
      ```
      </details>
      
      ```
      https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
      10.781158386962488 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
      10.771313901990652 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      27.88662809302332 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      83.22728440898936 6175
      ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
      12.945253834011964 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
      12.870224556012545 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      28.03406483103754 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      82.6120332319988 6175
      ```
      
      With HW resizing
      
      <details>
      
      ```python
      import time
      
      from torchaudio.prototype.io import Streamer
      
      srcs = [
          "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
          "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
      ]
      
      patterns = [
          # Decode with NVDEC, CUDA HW scaling -> CUDA:0
          ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"),
          # Decoded with NVDEC, CUDA HW scaling -> CPU
          ("h264_cuvid", {"resize": "960x540"}, "", None),
          # CPU decoding, CPU scaling
          (None, None, "scale=width=960:height=540", None),
      ]
      
      for src in srcs:
          print(src, flush=True)
          for (decoder, decoder_options, filter_desc, hw_accel) in patterns:
              s = Streamer(src)
              s.add_video_stream(
                  5,
                  decoder=decoder,
                  decoder_options=decoder_options,
                  filter_desc=filter_desc,
                  hw_accel=hw_accel,
              )
      
              t0 = time.monotonic()
              num_frames = 0
              for i, (chunk, ) in enumerate(s.stream()):
                  num_frames += chunk.shape[0]
              t1 = time.monotonic()
              print(chunk.dtype, chunk.shape, chunk.device)
              print(time.monotonic() - t0, num_frames, flush=True)
      ```
      
      </details>
      
      ```
      ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
      12.890056837990414 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      10.697489063022658 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      85.19899423001334 6175
      
      https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
      10.712715593050234 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      11.030170071986504 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      84.8515750519582 6175
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2331
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36217169
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b
      54d2d04f
    • Caroline Chen's avatar
      Add citations for datasets (#2371) · 638120ca
      Caroline Chen authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D36246167
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4
      638120ca
  2. 09 May, 2022 1 commit
  3. 06 May, 2022 2 commits
    • moto's avatar
      Use custom FFmpeg libraries for torchaudio binary distributions (#2355) · b7624c60
      moto authored
      Summary:
      This commit changes the way torchaudio binary distributions are built.
      
      * For all the binary distributions (conda/pip on Linux/macOS/Windnows), build custom FFmpeg libraries.
      * The custom FFmpeg libraries do not use `--use-gpl` nor `--use-nonfree`, so that they stay LGPL.
      * The custom FFmpeg libraries employ rpath so that the torchaudio binary distributions look for the corresponding FFmpeg libraries installed in the runtime environment.
      * The torchaudio binary build process will use them to bootstrap its build process.
      * The custom FFmpeg libraries are NOT shipped.
      
      This commit also add disclaimer about FFmpeg in README.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2355
      
      Reviewed By: nateanl
      
      Differential Revision: D36202087
      
      Pulled By: mthrok
      
      fbshipit-source-id: c30e5222ba190106c897e42f567cac9152dbd8ef
      b7624c60
    • moto's avatar
      Refactor smoke test executions (#2365) · 6a8a28bb
      moto authored
      Summary:
      The smoke test jobs simply perform `import torchaudio` to check
      if the package artifacts are sane.
      
      Originally, the CI was executing it in the root directory.
      This was fine unless the source code is checked out.
      When source code is checked out, performing `import torchaudio` in
      root directory would import source torchaudio directory, instead of the
      installed package.
      
      This error is difficult to notice, so this commit introduces common script to
      perform the smoke test, while moving out of root directory.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2365
      
      Reviewed By: carolineechen
      
      Differential Revision: D36202069
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf
      6a8a28bb
  4. 05 May, 2022 2 commits
  5. 28 Apr, 2022 2 commits
  6. 27 Apr, 2022 1 commit
  7. 26 Apr, 2022 5 commits
  8. 25 Apr, 2022 1 commit
  9. 22 Apr, 2022 3 commits
  10. 21 Apr, 2022 2 commits
    • Andrey Talman's avatar
      CUDA 11.6 for TorchAudio (#2328) · 2acafdaf
      Andrey Talman authored
      Summary:
      CUDA 11.6 for TorchAudio
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2328
      
      Reviewed By: mthrok
      
      Differential Revision: D35826414
      
      Pulled By: atalman
      
      fbshipit-source-id: 0a471f0566286d69c0c73191aea7fd5ac0647e5f
      2acafdaf
    • hwangjeff's avatar
      Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29
      hwangjeff authored
      Summary:
      PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2339
      
      Reviewed By: nateanl
      
      Differential Revision: D35806529
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
      6b242c29
  11. 19 Apr, 2022 1 commit
  12. 18 Apr, 2022 1 commit
  13. 15 Apr, 2022 1 commit
  14. 14 Apr, 2022 3 commits
    • moto's avatar
      Support specifying decoder and its options (#2327) · be243c59
      moto authored
      Summary:
      This commit adds support to specify decoder to Streamer's add stream method.
      This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options.
      
      This allows to override the decoder codec and/or specify the option of
      the decoder.
      
      This change allows to specify Nvidia NVDEC codec for supported formats,
      which uses dedicated hardware for decoding the video.
      
       ---
      
      Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to  `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2327
      
      Reviewed By: carolineechen
      
      Differential Revision: D35626904
      
      Pulled By: mthrok
      
      fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede
      be243c59
    • moto's avatar
      Support NV12 format in video decoding (#2330) · 7972be99
      moto authored
      Summary:
      Support NV12 format in Streamer API.
      
      NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values.
      https://chromium.googlesource.com/libyuv/libyuv/+/HEAD/docs/formats.md#nv12-and-nv21
      
      The original UV plane is smaller than Y plane, so in this implmentation,
      UV plane is upsampled to match the size of Y plane.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2330
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35632351
      
      Pulled By: mthrok
      
      fbshipit-source-id: aab4fbc0ce2bb7a1fb67264c27208b610fb56e27
      7972be99
    • moto's avatar
      Add YUV420P format support to Streamer API (#2334) · 2f70e2f9
      moto authored
      Summary:
      This commit adds YUV420P format support to Streamer API.
      When the native format of a video is YUV420P, the Streamer will
      output Tensor of YUV color channel.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2334
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35632916
      
      Pulled By: mthrok
      
      fbshipit-source-id: a7a0078788433060266b8bd3e7cad023f41389f5
      2f70e2f9
  15. 13 Apr, 2022 2 commits
    • hwangjeff's avatar
      Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b
      hwangjeff authored
      Summary:
      Adds Conformer RNN-T LibriSpeech training recipe to examples directory.
      
      Produces 30M-parameter model that achieves the following WER:
      
      |                     |          WER |
      |:-------------------:|-------------:|
      | test-clean          |       0.0310 |
      | test-other          |       0.0805 |
      | dev-clean           |       0.0314 |
      | dev-other           |       0.0827 |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2329
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35578727
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d
      c262758b
    • hwangjeff's avatar
      Add nightly build installation code snippet to prototype feature tutorials (#2325) · fb51cecc
      hwangjeff authored
      Summary:
      Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2325
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35597753
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c
      fb51cecc
  16. 12 Apr, 2022 1 commit
    • hwangjeff's avatar
      Add Conformer RNN-T model prototype (#2322) · b0c8e239
      hwangjeff authored
      Summary:
      Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
      - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
      - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
      - Introduces tests for `conformer_rnnt_model`.
      - Adds docs.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2322
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35565987
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
      b0c8e239
  17. 11 Apr, 2022 1 commit
    • moto's avatar
      Fix ffmpeg integration for ffmpeg 5.0 (#2326) · bd319959
      moto authored
      Summary:
      This commit makes the FFmpeg integration support FFmpeg 5.0
      
      In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed,
      so that they deal with constant version of `AVInputFormat`.
      
      > 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h
      >  Constified the pointers to AVInputFormats and AVOutputFormats
      >  in AVFormatContext, avformat_alloc_output_context2(),
      >  av_find_input_format(), av_probe_input_format(),
      >  av_probe_input_format2(), av_probe_input_format3(),
      >  av_probe_input_buffer2(), av_probe_input_buffer(),
      >  avformat_open_input(), av_guess_format() and av_guess_codec().
      >  Furthermore, constified the AVProbeData in av_probe_input_format(),
      >  av_probe_input_format2() and av_probe_input_format3().
      
      https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2326
      
      Reviewed By: carolineechen
      
      Differential Revision: D35551380
      
      Pulled By: mthrok
      
      fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666
      bd319959
  18. 08 Apr, 2022 1 commit
    • moto's avatar
      Add devices/properties badges (#2321) · 72ae755a
      moto authored
      Summary:
      Add badges of supported properties and devices to functionals and transforms.
      
      This commit adds `.. devices::` and `.. properties::` directives to sphinx.
      
      APIs with these directives will have badges (based off of shields.io) which link to the
      page with description of these features.
      
      Continuation of https://github.com/pytorch/audio/issues/2316
      Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2321
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35489063
      
      Pulled By: mthrok
      
      fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
      72ae755a
  19. 06 Apr, 2022 2 commits
  20. 05 Apr, 2022 2 commits