1. 22 Mar, 2022 1 commit
    • moto's avatar
      Add download utility specialized for torchaudio (#2283) · 64b98521
      moto authored
      Summary:
      In recent updates, torchaudio added features that download assets/models from
      download.pytorch.org/torchaudio.
      
      To reduce the code duplication, the implementations uses utilities from
      ``torch.hub``, but still, there are patterns repeated in implementing
      the fetch mechanism, notably cache and local file path handling.
      
      This commit introduces the utility function that handles
      download/cache/local path management that can be used for
      fetching pre-trained model data.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2283
      
      Reviewed By: carolineechen
      
      Differential Revision: D35050469
      
      Pulled By: mthrok
      
      fbshipit-source-id: 219dd806f9a96c54d2d31e981c1bbe282772702b
      64b98521
  2. 04 Mar, 2022 2 commits
    • moto's avatar
      Flush and reset internal state after seek (#2264) · 7e1afc40
      moto authored
      Summary:
      This commit adds the following behavior to `seek` so that `seek`
      works after a frame is decoded.
      
      1. Flush the decoder buffer.
      2. Recreate filter graphs (so that internal state is re-initialized)
      3. Discard the buffered tensor. (decoded chunks)
      
      Also it disallows negative values for seek timestamp.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2264
      
      Reviewed By: carolineechen
      
      Differential Revision: D34497826
      
      Pulled By: mthrok
      
      fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
      7e1afc40
    • moto's avatar
      Make Streamer fail if an invalid option is provided (#2263) · 04875eef
      moto authored
      Summary:
      `torchaudio.prototype.io.Streamer` class takes context dependant options
      as `option` argument in the form of mappings of strings.
      
      Currently there is no check if the provided options were valid for
      the given input.
      
      This commit adds the check and raise an error if an invalid erro is given.
      
      This is analogous to `ffmpeg` command error handling.
      
      ```
      $ ffmpeg -foo
      ...
      Unrecognized option 'foo'.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2263
      
      Reviewed By: hwangjeff
      
      Differential Revision: D34495111
      
      Pulled By: mthrok
      
      fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
      04875eef
  3. 26 Feb, 2022 2 commits
  4. 25 Feb, 2022 5 commits
  5. 24 Feb, 2022 1 commit
  6. 17 Feb, 2022 2 commits
    • Zhaoheng Ni's avatar
      Refactor batch consistency test in functional (#2245) · 9cf59e75
      Zhaoheng Ni authored
      Summary:
      In batch_consistency tests, the `assert_batch_consistency` method only accepts single Tensor, which is not applicable to some methods. For example, `lfilter` and `filtfilt` requires three Tensors as the arguments, hence they don't follow `assert_batch_consistency` in the tests.
      This PR refactors the test to accept a tuple of Tensors which have `batch` dimension. For the other arguments like `int` or `str`, they are given as `*args` after the tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2245
      
      Reviewed By: mthrok
      
      Differential Revision: D34273035
      
      Pulled By: nateanl
      
      fbshipit-source-id: 0096b4f062fb4e983818e5374bed6efc7b15b056
      9cf59e75
    • Zhaoheng Ni's avatar
      Add unit tests for PyTorch Lightning modules of emformer_rnnt recipes (#2240) · b5d77b15
      Zhaoheng Ni authored
      Summary:
      - Refactor the current `LibriSpeechRNNTModule`'s unit test.
      - Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule`
      - Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2240
      
      Reviewed By: mthrok
      
      Differential Revision: D34285195
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f
      b5d77b15
  7. 16 Feb, 2022 2 commits
    • Zhaoheng Ni's avatar
      Refactor torchscript consistency test in functional (#2246) · 87d79889
      Zhaoheng Ni authored
      Summary:
      In torchscript_consistency tests, the `func` in each test method only accepts one `tensor` as the argument, for the other arguments of `F.xyz` method, they need to be defined inside the `func`. If there is no `Tensor` argument in `F.xzy`, the tests use a `dummy` tensor which is not used anywhere. In this PR, we refactor ``_assert_consistency`` and ``_assert_consistency_complex`` to accept a tuple of inputs instead of just one `tensor`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2246
      
      Reviewed By: carolineechen
      
      Differential Revision: D34273057
      
      Pulled By: nateanl
      
      fbshipit-source-id: a3900edb3b2c58638e513e1490279d771ebc3d0b
      87d79889
    • Zhaoheng Ni's avatar
      Add complex dtype support in functional autograd test (#2244) · eeba91dc
      Zhaoheng Ni authored
      Summary:
      In autograd tests, to guarantee the precision, the dtype of Tensors are converted to `torch.float64` if they are real. However, the complex dtype is not considered. This PR adds `self.complex_dtype` support to the inputs.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2244
      
      Reviewed By: mthrok
      
      Differential Revision: D34272998
      
      Pulled By: nateanl
      
      fbshipit-source-id: e8698a74d7b8d99ee0fcb5f5cb5f2ffc8c80b9b5
      eeba91dc
  8. 15 Feb, 2022 1 commit
  9. 11 Feb, 2022 2 commits
  10. 09 Feb, 2022 2 commits
    • hwangjeff's avatar
      Clean up Emformer (#2207) · 87d7694d
      hwangjeff authored
      Summary:
      - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases.
      - Clarify expected input shapes in API documentation.
      - Adjust `infer` tests to reflect expected usage.
      - Add assertion for input shape for `infer`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2207
      
      Reviewed By: mthrok
      
      Differential Revision: D34101205
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
      87d7694d
    • hwangjeff's avatar
      Fix librosa calls (#2208) · e5d567c9
      hwangjeff authored
      Summary:
      Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2208
      
      Reviewed By: mthrok
      
      Differential Revision: D34099793
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
      e5d567c9
  11. 02 Feb, 2022 1 commit
  12. 01 Feb, 2022 2 commits
    • hwangjeff's avatar
      Move ASR features out of prototype (#2187) · aca5591c
      hwangjeff authored
      Summary:
      Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2187
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D33918092
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
      aca5591c
    • Caroline Chen's avatar
      Add CTC decoder timesteps (#2184) · d43ce015
      Caroline Chen authored
      Summary:
      add timesteps field to CTC decoder hypotheses, corresponding to the time step of occurrences of non-blank tokens
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2184
      
      Reviewed By: mthrok
      
      Differential Revision: D33905530
      
      Pulled By: carolineechen
      
      fbshipit-source-id: c575d25655fcf252754ee3c2447949a4c059461a
      d43ce015
  13. 27 Jan, 2022 2 commits
    • Caroline Chen's avatar
      Add no lm support for CTC decoder (#2174) · 4c3fa875
      Caroline Chen authored
      Summary:
      Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2174
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33798674
      
      Pulled By: carolineechen
      
      fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
      4c3fa875
    • moto's avatar
      Add `is_ffmpeg_available` in test (#2170) · 39fe9df6
      moto authored
      Summary:
      Part of https://github.com/pytorch/audio/issues/2164.
      To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available,
      this commit adds `is_ffmpeg_available`.
      
      The availability of the features depend on two factors;
      1. If it was enabled at build.
      2. If the ffmpeg libraries are found at runtime.
      
      A simple way (for OSS workflow) to detect these is simply checking if
      `libtorchaudio_ffmpeg` presents and can be loaded without a failure.
      
      To facilitate this, this commit changes the
      `torchaudio._extension._load_lib` to return boolean result.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2170
      
      Reviewed By: carolineechen
      
      Differential Revision: D33797695
      
      Pulled By: mthrok
      
      fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
      39fe9df6
  14. 26 Jan, 2022 3 commits
  15. 21 Jan, 2022 1 commit
  16. 20 Jan, 2022 1 commit
  17. 05 Jan, 2022 1 commit
  18. 30 Dec, 2021 2 commits
  19. 29 Dec, 2021 3 commits
  20. 23 Dec, 2021 3 commits
  21. 21 Dec, 2021 1 commit
    • moto's avatar
      Fix load behavior for 24-bit input (#2084) · 4554d242
      moto authored
      Summary:
      ## bug description
      
      When a 24 bits-par-sample audio is loaded via file-like object,
      the loaded Tensor is wrong. It was fine if the audio is loaded
      from local file.
      
      ## The cause of the bug
      
      The core of the sox's decoding mechanism is `sox_read` function,
      one of which parameter is the maximum number of samples to decode
      from the given buffer.
      
      https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]
      
      The `sox_read` function is called in what is called `drain` effect,
      callback and this callback receives output buffer and its size in
      byte. The previous implementation passed this size value as
      the argument of `sox_read` for the maximum number of samples to
      read. Since buffer size is larger than the number of samples fit in
      the buffer, `sox_read` function always consumed the entire
      buffer. (This behavior is not wrong except when the input is
      24 bits-per-sample and file-like object.)
      
      When the input is read from file-like object, inside of drain
      callback, new data are fetched via Python's `read` method and
      loaded on fixed-size memory region. The size of this memory region
      can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
      but the default value is 8096.
      
      If the input format is 24 bits-per-sample, the end of memory region
      does not necessarily correspond to the end of a valid sample.
      When `sox_read` consumes all the data in the buffer region, the data
      at the end introduces some unexpected values.
      This causes the aforementioned bug
      
      ## Fix
      
      Pass proper (better estimated) maximum number of samples decodable to
      `sox_read`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2084
      
      Reviewed By: carolineechen
      
      Differential Revision: D33236947
      
      Pulled By: mthrok
      
      fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
      4554d242