1. 09 Feb, 2022 2 commits
    • hwangjeff's avatar
      Clean up Emformer (#2207) · 87d7694d
      hwangjeff authored
      Summary:
      - Make `segment_length` a required argument rather than optional argument to force users to consciously choose input segment lengths for their use cases.
      - Clarify expected input shapes in API documentation.
      - Adjust `infer` tests to reflect expected usage.
      - Add assertion for input shape for `infer`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2207
      
      Reviewed By: mthrok
      
      Differential Revision: D34101205
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 1d1233d5edee5818d4669b4e47d44559e7ebb304
      87d7694d
    • hwangjeff's avatar
      Fix librosa calls (#2208) · e5d567c9
      hwangjeff authored
      Summary:
      Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2208
      
      Reviewed By: mthrok
      
      Differential Revision: D34099793
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
      e5d567c9
  2. 04 Feb, 2022 1 commit
  3. 03 Feb, 2022 3 commits
  4. 02 Feb, 2022 5 commits
  5. 01 Feb, 2022 6 commits
  6. 31 Jan, 2022 1 commit
  7. 27 Jan, 2022 4 commits
    • hwangjeff's avatar
      Remove invalid token blanking logic from RNN-T decoder (#2180) · ed6256a2
      hwangjeff authored
      Summary:
      This PR removes logic in `RNNTBeamSearch` that blanks out joiner output values corresponding to special tokens, e.g. \<unk\>, \<eos\>, for the following reasons:
      - Provided that the model was configured and trained properly, it shouldn't be necessary, e.g. the model would naturally produce low probabilities for special tokens if they don't exist in the training set.
      - For our pre-trained LibriSpeech training pipeline, the removal of the logic doesn't affect evaluation WER on any of the dev/test splits.
      - The existing logic doesn't generalize to arbitrary token vocabularies.
      - Internally, it seems to have been acknowledged that this logic was introduced to compensate for quirks in other parts of the modeling infra.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2180
      
      Reviewed By: carolineechen, mthrok
      
      Differential Revision: D33822683
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: e7047e294f71c732c77ae0c20fec60412f26f05a
      ed6256a2
    • Caroline Chen's avatar
      Add no lm support for CTC decoder (#2174) · 4c3fa875
      Caroline Chen authored
      Summary:
      Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2174
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33798674
      
      Pulled By: carolineechen
      
      fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
      4c3fa875
    • Zhaoheng Ni's avatar
      Refactor RNNT factory function to support num_symbols argument (#2178) · 2cb87c6b
      Zhaoheng Ni authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2178
      
      Reviewed By: mthrok
      
      Differential Revision: D33797649
      
      Pulled By: nateanl
      
      fbshipit-source-id: 7a8f54294e7b5bd4d343c8e361e747bfd8b5b603
      2cb87c6b
    • moto's avatar
      Add `is_ffmpeg_available` in test (#2170) · 39fe9df6
      moto authored
      Summary:
      Part of https://github.com/pytorch/audio/issues/2164.
      To make the tests introduced in https://github.com/pytorch/audio/issues/2164 skippable if ffmpeg features are not available,
      this commit adds `is_ffmpeg_available`.
      
      The availability of the features depend on two factors;
      1. If it was enabled at build.
      2. If the ffmpeg libraries are found at runtime.
      
      A simple way (for OSS workflow) to detect these is simply checking if
      `libtorchaudio_ffmpeg` presents and can be loaded without a failure.
      
      To facilitate this, this commit changes the
      `torchaudio._extension._load_lib` to return boolean result.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2170
      
      Reviewed By: carolineechen
      
      Differential Revision: D33797695
      
      Pulled By: mthrok
      
      fbshipit-source-id: 85e767fc06350b8f99de255bc965b8c92b8cfe97
      39fe9df6
  8. 26 Jan, 2022 6 commits
  9. 24 Jan, 2022 1 commit
    • popcornell's avatar
      allow Tacotron2 decoding batch_size 1 examples (#2156) · cea1dc66
      popcornell authored
      Summary:
      it seems to me that the current Tacotron2 model does not allow for decoding batch size 1 examples:
      e.g. following code fails. I may have a fix for that.
      
      ```python
      if __name__ == "__main__":
          max_length = 400
          n_batch = 1
          hdim = 32
          dec = _Decoder(
              encoder_embedding_dim=hdim,
              n_mels = hdim,
              n_frames_per_step = 1,
              decoder_rnn_dim = 1024,
              decoder_max_step = 2000,
              decoder_dropout = 0.1,
              decoder_early_stopping = True,
              attention_rnn_dim = 1024,
              attention_hidden_dim = 128,
              attention_location_n_filter = 32,
              attention_location_kernel_size = 31,
              attention_dropout = 0.1,
              prenet_dim = 256,
              gate_threshold = 0.5)
      
          inp = torch.rand((n_batch, max_length, hdim))
          lengths = torch.tensor([max_length]).expand(n_batch).to(inp.device, inp.dtype)
          dec(inp, torch.rand((n_batch, hdim, max_length)), lengths)[0]
          dec.infer(inp, lengths)[0]
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2156
      
      Reviewed By: carolineechen
      
      Differential Revision: D33744006
      
      Pulled By: nateanl
      
      fbshipit-source-id: 7d04726dfe7e45951ab0007f22f10f90f26379a7
      cea1dc66
  10. 22 Jan, 2022 1 commit
  11. 21 Jan, 2022 3 commits
  12. 20 Jan, 2022 2 commits
  13. 19 Jan, 2022 2 commits
  14. 18 Jan, 2022 1 commit
  15. 14 Jan, 2022 2 commits