1. 15 Feb, 2022 3 commits
    • moto's avatar
      Improve ffmpeg library discovery (#2204) · 963905e4
      moto authored
      Summary:
      This commit fixes the issue with ffmpeg discovery at build time.
      The original implementation had issues like.
      
      1. Wrong usage of FindFFMPEG, which caused mixture of ffmpeg libraries from system directory and user directory.
      2. The optional `FFMPEG_ROOT` variable was not set within cmake.
      
      The issue 1 is problematic when a user does not have a permission to
      modify the environment. For example, an old version of ffmpeg, which is
      installed in a directory managed by the system (such as `/usr/local/lib`),
      then there is no way to specify a path in which user installs a supported version
      of ffmpeg.
      
      This commit changes the behavior by first searching the library
      in `FFMPEG_ROOT` environment variables, then
      resorting to the original behavior of searching the custom paths with
      system default path.
      
      Also this commirt removes support for `libavresample`, which is deprecated in
      ffmpeg 4 and removed in ffmpeg 5.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2204
      
      Reviewed By: carolineechen
      
      Differential Revision: D34225769
      
      Pulled By: mthrok
      
      fbshipit-source-id: 95b0bfaaef31e2e69e6df29f789010f48a48210b
      963905e4
    • moto's avatar
      Update context building to not delay the inference (#2213) · 8e3c6144
      moto authored
      Summary:
      Updating the context cacher so that fetched audio chunk is used for inference immediately.
      
      https://github.com/pytorch/audio/pull/2202#discussion_r802838174
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2213
      
      Reviewed By: hwangjeff
      
      Differential Revision: D34235230
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e
      8e3c6144
    • hwangjeff's avatar
      Adjust Conformer args (#2223) · 411b5dcf
      hwangjeff authored
      Summary:
      Orders and names Conformer's initializer args to be more consistent with Emformer's.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2223
      
      Reviewed By: mthrok
      
      Differential Revision: D34226177
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 111c7ff27841aeac302ea5f6f7b50cc72c570829
      411b5dcf
  2. 11 Feb, 2022 7 commits
  3. 10 Feb, 2022 1 commit
  4. 09 Feb, 2022 2 commits
    • hwangjeff's avatar
      Clean up Emformer (#2207) · 87d7694d
      hwangjeff authored
      Summary:
      - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases.
      - Clarify expected input shapes in API documentation.
      - Adjust `infer` tests to reflect expected usage.
      - Add assertion for input shape for `infer`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2207
      
      Reviewed By: mthrok
      
      Differential Revision: D34101205
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
      87d7694d
    • hwangjeff's avatar
      Fix librosa calls (#2208) · e5d567c9
      hwangjeff authored
      Summary:
      Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2208
      
      Reviewed By: mthrok
      
      Differential Revision: D34099793
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
      e5d567c9
  5. 04 Feb, 2022 1 commit
  6. 03 Feb, 2022 3 commits
  7. 02 Feb, 2022 5 commits
  8. 01 Feb, 2022 6 commits
  9. 31 Jan, 2022 1 commit
  10. 27 Jan, 2022 4 commits
    • hwangjeff's avatar
      Remove invalid token blanking logic from RNN-T decoder (#2180) · ed6256a2
      hwangjeff authored
      Summary:
      This PR removes logic in `RNNTBeamSearch` that blanks out joiner output values corresponding to special tokens, e.g. \<unk\>, \<eos\>, for the following reasons:
      - Provided that the model was configured and trained properly, it shouldn't be necessary, e.g. the model would naturally produce low probabilities for special tokens if they don't exist in the training set.
      - For our pre-trained LibriSpeech training pipeline, the removal of the logic doesn't affect evaluation WER on any of the dev/test splits.
      - The existing logic doesn't generalize to arbitrary token vocabularies.
      - Internally, it seems to have been acknowledged that this logic was introduced to compensate for quirks in other parts of the modeling infra.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2180
      
      Reviewed By: carolineechen, mthrok
      
      Differential Revision: D33822683
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: e7047e294f71c732c77ae0c20fec60412f26f05a
      ed6256a2
    • Caroline Chen's avatar
      Add no lm support for CTC decoder (#2174) · 4c3fa875
      Caroline Chen authored
      Summary:
      Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2174
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33798674
      
      Pulled By: carolineechen
      
      fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
      4c3fa875
    • Zhaoheng Ni's avatar
      Refactor RNNT factory function to support num_symbols argument (#2178) · 2cb87c6b
      Zhaoheng Ni authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178
      
      Reviewed By: mthrok
      
      Differential Revision: D33797649
      
      Pulled By: nateanl
      
      fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603
      2cb87c6b
    • moto's avatar
      Add `is_ffmpeg_available` in test (#2170) · 39fe9df6
      moto authored
      Summary:
      Part of https://github.com/pytorch/audio/issues/2164.
      To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available,
      this commit adds `is_ffmpeg_available`.
      
      The availability of the features depend on two factors;
      1. If it was enabled at build.
      2. If the ffmpeg libraries are found at runtime.
      
      A simple way (for OSS workflow) to detect these is simply checking if
      `libtorchaudio_ffmpeg` presents and can be loaded without a failure.
      
      To facilitate this, this commit changes the
      `torchaudio._extension._load_lib` to return boolean result.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2170
      
      Reviewed By: carolineechen
      
      Differential Revision: D33797695
      
      Pulled By: mthrok
      
      fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
      39fe9df6
  11. 26 Jan, 2022 6 commits
  12. 24 Jan, 2022 1 commit
    • popcornell's avatar
      allow Tacotron2 decoding batch_size 1 examples (#2156) · cea1dc66
      popcornell authored
      Summary:
      it seems to me that the current Tacotron2 model does not allow for decoding batch size 1 examples:
      e.g. following code fails. I may have a fix for that.
      
      ```python
      if __name__ == "__main__":
          max_length = 400
          n_batch = 1
          hdim = 32
          dec = _Decoder(
              encoder_embedding_dim=hdim,
              n_mels = hdim,
              n_frames_per_step = 1,
              decoder_rnn_dim = 1024,
              decoder_max_step = 2000,
              decoder_dropout = 0.1,
              decoder_early_stopping = True,
              attention_rnn_dim = 1024,
              attention_hidden_dim = 128,
              attention_location_n_filter = 32,
              attention_location_kernel_size = 31,
              attention_dropout = 0.1,
              prenet_dim = 256,
              gate_threshold = 0.5)
      
          inp = torch.rand((n_batch, max_length, hdim))
          lengths = torch.tensor([max_length]).expand(n_batch).to(inp.device, inp.dtype)
          dec(inp, torch.rand((n_batch, hdim, max_length)), lengths)[0]
          dec.infer(inp, lengths)[0]
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2156
      
      Reviewed By: carolineechen
      
      Differential Revision: D33744006
      
      Pulled By: nateanl
      
      fbshipit-source-id: 7d04726dfe7e45951ab0007f22f10f90f26379a7
      cea1dc66