1. 01 Feb, 2022 2 commits
    • hwangjeff's avatar
      Move ASR features out of prototype (#2187) · aca5591c
      hwangjeff authored
      Summary:
      Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2187
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D33918092
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
      aca5591c
    • Caroline Chen's avatar
      Add CTC decoder timesteps (#2184) · d43ce015
      Caroline Chen authored
      Summary:
      add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2184
      
      Reviewed By: mthrok
      
      Differential Revision: D33905530
      
      Pulled By: carolineechen
      
      fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
      d43ce015
  2. 27 Jan, 2022 2 commits
    • Caroline Chen's avatar
      Add no lm support for CTC decoder (#2174) · 4c3fa875
      Caroline Chen authored
      Summary:
      Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2174
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33798674
      
      Pulled By: carolineechen
      
      fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
      4c3fa875
    • moto's avatar
      Add `is_ffmpeg_available` in test (#2170) · 39fe9df6
      moto authored
      Summary:
      Part of https://github.com/pytorch/audio/issues/2164.
      To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available,
      this commit adds `is_ffmpeg_available`.
      
      The availability of the features depend on two factors;
      1. If it was enabled at build.
      2. If the ffmpeg libraries are found at runtime.
      
      A simple way (for OSS workflow) to detect these is simply checking if
      `libtorchaudio_ffmpeg` presents and can be loaded without a failure.
      
      To facilitate this, this commit changes the
      `torchaudio._extension._load_lib` to return boolean result.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2170
      
      Reviewed By: carolineechen
      
      Differential Revision: D33797695
      
      Pulled By: mthrok
      
      fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
      39fe9df6
  3. 26 Jan, 2022 3 commits
  4. 21 Jan, 2022 1 commit
  5. 20 Jan, 2022 1 commit
  6. 05 Jan, 2022 1 commit
  7. 30 Dec, 2021 2 commits
  8. 29 Dec, 2021 3 commits
  9. 23 Dec, 2021 3 commits
  10. 21 Dec, 2021 1 commit
    • moto's avatar
      Fix load behavior for 24-bit input (#2084) · 4554d242
      moto authored
      Summary:
      ## bug description
      
      When a 24 bits-par-sample audio is loaded via file-like object,
      the loaded Tensor is wrong. It was fine if the audio is loaded
      from local file.
      
      ## The cause of the bug
      
      The core of the sox's decoding mechanism is `sox_read` function,
      one of which parameter is the maximum number of samples to decode
      from the given buffer.
      
      https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]
      
      The `sox_read` function is called in what is called `drain` effect,
      callback and this callback receives output buffer and its size in
      byte. The previous implementation passed this size value as
      the argument of `sox_read` for the maximum number of samples to
      read. Since buffer size is larger than the number of samples fit in
      the buffer, `sox_read` function always consumed the entire
      buffer. (This behavior is not wrong except when the input is
      24 bits-per-sample and file-like object.)
      
      When the input is read from file-like object, inside of drain
      callback, new data are fetched via Python's `read` method and
      loaded on fixed-size memory region. The size of this memory region
      can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
      but the default value is 8096.
      
      If the input format is 24 bits-per-sample, the end of memory region
      does not necessarily correspond to the end of a valid sample.
      When `sox_read` consumes all the data in the buffer region, the data
      at the end introduces some unexpected values.
      This causes the aforementioned bug
      
      ## Fix
      
      Pass proper (better estimated) maximum number of samples decodable to
      `sox_read`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2084
      
      Reviewed By: carolineechen
      
      Differential Revision: D33236947
      
      Pulled By: mthrok
      
      fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
      4554d242
  11. 30 Nov, 2021 1 commit
    • hwangjeff's avatar
      Revise Griffin-Lim transform test to reduce execution time (#2037) · 96b1fa72
      hwangjeff authored
      Summary:
      Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time.
      
      For one of the four tests:
      Before:
      ```
      test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]
      
      ======================== 1 passed in 517.35s (0:08:37) =========================
      ```
      
      After:
      ```
      test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]
      
      ======================== 1 passed in 104.59s (0:01:44) =========================
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2037
      
      Reviewed By: mthrok
      
      Differential Revision: D32726213
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e
      96b1fa72
  12. 24 Nov, 2021 1 commit
    • hwangjeff's avatar
      Add RNN-T beam search decoder (#2028) · 60a85b50
      hwangjeff authored
      Summary:
      Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2028
      
      Reviewed By: mthrok
      
      Differential Revision: D32627919
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
      60a85b50
  13. 23 Nov, 2021 1 commit
    • moto's avatar
      Temporarily skip threadpool test (#2025) · 05ae795a
      moto authored
      Summary:
      The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2025
      
      Reviewed By: nateanl
      
      Differential Revision: D32615933
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f
      05ae795a
  14. 22 Nov, 2021 2 commits
    • Zhaoheng Ni's avatar
      Relax dtype for MVDR (#2024) · 392a03c8
      Zhaoheng Ni authored
      Summary:
      Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2024
      
      Reviewed By: carolineechen
      
      Differential Revision: D32594051
      
      Pulled By: nateanl
      
      fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46
      392a03c8
    • Zhaoheng Ni's avatar
      Improve MVDR stability (#2004) · fb2f9538
      Zhaoheng Ni authored
      Summary:
      Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2004
      
      Reviewed By: mthrok
      
      Differential Revision: D32539827
      
      Pulled By: nateanl
      
      fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22
      fb2f9538
  15. 18 Nov, 2021 2 commits
  16. 17 Nov, 2021 1 commit
  17. 04 Nov, 2021 2 commits
  18. 03 Nov, 2021 3 commits
  19. 02 Nov, 2021 3 commits
  20. 28 Oct, 2021 1 commit
  21. 27 Oct, 2021 1 commit
  22. 25 Oct, 2021 1 commit
  23. 22 Oct, 2021 1 commit
  24. 21 Oct, 2021 1 commit