1. 28 Dec, 2021 1 commit
  2. 23 Dec, 2021 3 commits
  3. 24 Nov, 2021 1 commit
    • hwangjeff's avatar
      Add RNN-T beam search decoder (#2028) · 60a85b50
      hwangjeff authored
      Summary:
      Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2028
      
      Reviewed By: mthrok
      
      Differential Revision: D32627919
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
      60a85b50
  4. 23 Nov, 2021 1 commit
  5. 18 Nov, 2021 1 commit
    • hwangjeff's avatar
      Add Emformer RNN-T model (#2003) · 78ce7010
      hwangjeff authored
      Summary:
      Adds streaming-capable recurrent neural network transducer (RNN-T) model that uses Emformer for its transcription network. Includes two factory functions — one that allows for building a custom model, and one that builds a preconfigured base model.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2003
      
      Reviewed By: nateanl
      
      Differential Revision: D32440879
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 601cb1de368427f25e3b7d120e185960595d2360
      78ce7010
  6. 10 Nov, 2021 1 commit
  7. 05 Nov, 2021 4 commits
  8. 04 Nov, 2021 5 commits
  9. 03 Nov, 2021 1 commit
  10. 02 Nov, 2021 3 commits
  11. 29 Oct, 2021 1 commit
  12. 28 Oct, 2021 1 commit
  13. 27 Oct, 2021 2 commits
  14. 26 Oct, 2021 1 commit
  15. 25 Oct, 2021 1 commit
  16. 18 Oct, 2021 2 commits
  17. 16 Oct, 2021 1 commit
  18. 15 Oct, 2021 5 commits
  19. 08 Oct, 2021 2 commits
  20. 07 Oct, 2021 3 commits
    • moto's avatar
      Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80
      moto authored
      This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.
      
      The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
      1. `Wav2Vec2Model.extract_features`
      In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
      2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
          - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
          - Added dropout parameters.
      274ada80
    • moto's avatar
      60aeb78a
    • moto's avatar
      Make the core wav2vec2 factory function public (#1829) · 31a69c36
      moto authored
      This commit makes the following changes
      1. Make the factory function with full customizability public.
          i.e. `_get_model(...) -> wav2vec2_model(...)`.
      2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
          i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`
      
      ### Why?
      
      While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
        1. Model implementation is responsible for computation logic,
        2. factory functions are responsible for customizability and model construction,
        3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).
      
      (note: for simple models, combining 1 and 2 is also okay.)
      
      This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.
      
      This PR rectifies it by making the mother factory function public.
      This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.
      31a69c36