1. 06 May, 2022 1 commit
    • moto's avatar
      Refactor smoke test executions (#2365) · 6a8a28bb
      moto authored
      Summary:
      The smoke test jobs simply perform `import torchaudio` to check
      if the package artifacts are sane.
      
      Originally, the CI was executing it in the root directory.
      This was fine unless the source code is checked out.
      When source code is checked out, performing `import torchaudio` in
      root directory would import source torchaudio directory, instead of the
      installed package.
      
      This error is difficult to notice, so this commit introduces common script to
      perform the smoke test, while moving out of root directory.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2365
      
      Reviewed By: carolineechen
      
      Differential Revision: D36202069
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4396f85fec5c54869ada4c08f51304539f1b05cf
      6a8a28bb
  2. 05 May, 2022 2 commits
  3. 28 Apr, 2022 2 commits
  4. 27 Apr, 2022 1 commit
  5. 26 Apr, 2022 5 commits
  6. 25 Apr, 2022 1 commit
  7. 22 Apr, 2022 3 commits
  8. 21 Apr, 2022 2 commits
    • Andrey Talman's avatar
      CUDA 11.6 for TorchAudio (#2328) · 2acafdaf
      Andrey Talman authored
      Summary:
      CUDA 11.6 for TorchAudio
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2328
      
      Reviewed By: mthrok
      
      Differential Revision: D35826414
      
      Pulled By: atalman
      
      fbshipit-source-id: 0a471f0566286d69c0c73191aea7fd5ac0647e5f
      2acafdaf
    • hwangjeff's avatar
      Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29
      hwangjeff authored
      Summary:
      PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2339
      
      Reviewed By: nateanl
      
      Differential Revision: D35806529
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
      6b242c29
  9. 19 Apr, 2022 1 commit
  10. 18 Apr, 2022 1 commit
  11. 15 Apr, 2022 1 commit
  12. 14 Apr, 2022 3 commits
    • moto's avatar
      Support specifying decoder and its options (#2327) · be243c59
      moto authored
      Summary:
      This commit adds support to specify decoder to Streamer's add stream method.
      This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options.
      
      This allows to override the decoder codec and/or specify the option of
      the decoder.
      
      This change allows to specify Nvidia NVDEC codec for supported formats,
      which uses dedicated hardware for decoding the video.
      
       ---
      
      Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to  `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2327
      
      Reviewed By: carolineechen
      
      Differential Revision: D35626904
      
      Pulled By: mthrok
      
      fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede
      be243c59
    • moto's avatar
      Support NV12 format in video decoding (#2330) · 7972be99
      moto authored
      Summary:
      Support NV12 format in Streamer API.
      
      NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values.
      https://chromium.googlesource.com/libyuv/libyuv/+/HEAD/docs/formats.md#nv12-and-nv21
      
      The original UV plane is smaller than Y plane, so in this implmentation,
      UV plane is upsampled to match the size of Y plane.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2330
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35632351
      
      Pulled By: mthrok
      
      fbshipit-source-id: aab4fbc0ce2bb7a1fb67264c27208b610fb56e27
      7972be99
    • moto's avatar
      Add YUV420P format support to Streamer API (#2334) · 2f70e2f9
      moto authored
      Summary:
      This commit adds YUV420P format support to Streamer API.
      When the native format of a video is YUV420P, the Streamer will
      output Tensor of YUV color channel.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2334
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35632916
      
      Pulled By: mthrok
      
      fbshipit-source-id: a7a0078788433060266b8bd3e7cad023f41389f5
      2f70e2f9
  13. 13 Apr, 2022 2 commits
    • hwangjeff's avatar
      Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b
      hwangjeff authored
      Summary:
      Adds Conformer RNN-T LibriSpeech training recipe to examples directory.
      
      Produces 30M-parameter model that achieves the following WER:
      
      |                     |          WER |
      |:-------------------:|-------------:|
      | test-clean          |       0.0310 |
      | test-other          |       0.0805 |
      | dev-clean           |       0.0314 |
      | dev-other           |       0.0827 |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2329
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35578727
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d
      c262758b
    • hwangjeff's avatar
      Add nightly build installation code snippet to prototype feature tutorials (#2325) · fb51cecc
      hwangjeff authored
      Summary:
      Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2325
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35597753
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c
      fb51cecc
  14. 12 Apr, 2022 1 commit
    • hwangjeff's avatar
      Add Conformer RNN-T model prototype (#2322) · b0c8e239
      hwangjeff authored
      Summary:
      Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
      - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
      - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
      - Introduces tests for `conformer_rnnt_model`.
      - Adds docs.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2322
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35565987
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
      b0c8e239
  15. 11 Apr, 2022 1 commit
    • moto's avatar
      Fix ffmpeg integration for ffmpeg 5.0 (#2326) · bd319959
      moto authored
      Summary:
      This commit makes the FFmpeg integration support FFmpeg 5.0
      
      In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed,
      so that they deal with constant version of `AVInputFormat`.
      
      > 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h
      >  Constified the pointers to AVInputFormats and AVOutputFormats
      >  in AVFormatContext, avformat_alloc_output_context2(),
      >  av_find_input_format(), av_probe_input_format(),
      >  av_probe_input_format2(), av_probe_input_format3(),
      >  av_probe_input_buffer2(), av_probe_input_buffer(),
      >  avformat_open_input(), av_guess_format() and av_guess_codec().
      >  Furthermore, constified the AVProbeData in av_probe_input_format(),
      >  av_probe_input_format2() and av_probe_input_format3().
      
      https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2326
      
      Reviewed By: carolineechen
      
      Differential Revision: D35551380
      
      Pulled By: mthrok
      
      fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666
      bd319959
  16. 08 Apr, 2022 1 commit
    • moto's avatar
      Add devices/properties badges (#2321) · 72ae755a
      moto authored
      Summary:
      Add badges of supported properties and devices to functionals and transforms.
      
      This commit adds `.. devices::` and `.. properties::` directives to sphinx.
      
      APIs with these directives will have badges (based off of shields.io) which link to the
      page with description of these features.
      
      Continuation of https://github.com/pytorch/audio/issues/2316
      Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2321
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35489063
      
      Pulled By: mthrok
      
      fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
      72ae755a
  17. 06 Apr, 2022 2 commits
  18. 05 Apr, 2022 2 commits
  19. 04 Apr, 2022 2 commits
  20. 01 Apr, 2022 5 commits
    • Zhaoheng Ni's avatar
      Fix loading checkpoint in hubert preprocessing (#2310) · 87f0d198
      Zhaoheng Ni authored
      Summary:
      When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2310
      
      Reviewed By: mthrok
      
      Differential Revision: D35316903
      
      Pulled By: nateanl
      
      fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed
      87f0d198
    • moto's avatar
      Update GNU config files to support `arm64-apple` system (#2307) · 3ed39e15
      moto authored
      Summary:
      This commit
      1. Updates the config.guess and config.sub files and
      2. applies them to all the third party libraries that use them.
      
      This resolves the following build failure on M1 mac with newer SDK.
      
      On MacBookPro with M1 chip, with the recent OS update, something
      about the development environment has been changed (probably newer
      version of XCode) and the build stopeed working with the following
      errors from third party dependencies.
      
      ```
      checking build system type... Invalid configuration ‘arm64-apple-darwin20.0.0': machine ‘arm64-apple' not recognized
      ```
      
      note: config files are taken from https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2307
      
      Reviewed By: nateanl
      
      Differential Revision: D35318273
      
      Pulled By: mthrok
      
      fbshipit-source-id: 746ac51dd1816767aa78b88445f76a29acfd29e8
      3ed39e15
    • moto's avatar
      Put CONDA_PREFIX second priority of ffmpeg search path (#2312) · 6a418a89
      moto authored
      Summary:
      Change the cmake logic to search CONDA_PREFIX before falling back
      to the other default paths and system paths.
      
      1. FFMPEG_ROOT
      2. CONDA_PREFIX
      3. Other locations (Package managers and system paths)
      
      For users with regular conda installation, ffmpeg from conda should
      be picked automatically.
      If anyone wants to specify the ffmpeg, then can set FFMPEG_ROOT
      variable to the location of desired installation.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2312
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35317383
      
      Pulled By: mthrok
      
      fbshipit-source-id: 52aef8f3f7f0f8f1eaf7a89a2d1ccfb6265e2c50
      6a418a89
    • Moto Hira's avatar
      Refactor the internal of transforms module (#2309) · 72f9a4e3
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2309
      
      For upcoming improved Kaldi features which are comprised of multiple classes / functions, put all the transforms implementations in dedicated directory.
      
      Reviewed By: nateanl
      
      Differential Revision: D35303682
      
      fbshipit-source-id: 5bc8c07ef639683008c0f76ffe56e3941f772659
      72f9a4e3
    • moto's avatar
      Loosen atol for melscale batch test for Windows (#2305) · d65a0f3e
      moto authored
      Summary:
      The `transforms.batch_consistency_test.TestTransforms` test is failing for Windows.
      
      https://app.circleci.com/pipelines/github/pytorch/audio/10093/workflows/bbe003c4-3dfa-4729-a3e1-c942ab1243d4/jobs/594272
      
      ```
      >       self.assertEqual(items_result, batch_result, rtol=rtol, atol=atol)
      E       AssertionError: Tensor-likes are not close!
      E
      E       Mismatched elements: 28 / 196608 (0.0%)
      E       Greatest absolute difference: 2.0023435354232788e-07 at index (1, 1, 127, 100) (up to 1e-08 allowed)
      E       Greatest relative difference: 0.0005069057444598896 at index (0, 0, 114, 129) (up to 1e-05 allowed)
      ```
      
      The value of atol==1e-08 seems very strict but all the other batch
      consistency tests are passing.
      
      The violation is for very small number of samples, which looks
      suspicious, but I think it is okay to reduce it to `1e-06` for Windows.
      
      `1e-06` is still more strict than the majority of the comparison tests we have.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2305
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35298056
      
      Pulled By: mthrok
      
      fbshipit-source-id: a7d20f408c16cff7d363f4a9462c64e19d1c99f7
      d65a0f3e
  21. 31 Mar, 2022 1 commit
    • moto's avatar
      Randomize initial phase of sinusoid data in test (#2301) · c6c6b689
      moto authored
      Summary:
      This commit update `get_sinusoid` function in test utility so that
      when a multi channel is requested, non-primal channel have randomized
      initial phase.
      
      This adds some variety in test data which should not break the tests.
      Currently `get_sinusoid` returns identical waveforms for all the channels.
      This multi channel support was added just to mock the input data so that
      it is easy to test features with multi-channel inputs, so tests should not be
      expecting the all channels to be identical.
      
      When working on numerical parity, it is more useful if the raw waveforms
      are somewhat different.
      
      Image: waveforms generated by `get_sinusoid` after the change. left: 1st channel, right: 2nd channel
      <img width="524" alt="Screen Shot 2022-03-31 at 10 06 17 AM" src="https://user-images.githubusercontent.com/855818/161111163-1ea58ff6-51ee-4e37-bcd6-411041dd2603.png">
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2301
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35291689
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9160d07ccdd1494acb6d41cb07ac434c0676dbfd
      c6c6b689