1. 28 Jul, 2023 1 commit
    • Zhaoheng Ni's avatar
      Move TorchAudio-Squim models to Beta (#3512) · b7d2d928
      Zhaoheng Ni authored
      Summary:
      The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3512
      
      Reviewed By: mthrok
      
      Differential Revision: D47837434
      
      Pulled By: nateanl
      
      fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8
      b7d2d928
  2. 10 Dec, 2022 1 commit
  3. 08 Dec, 2022 1 commit
    • Grigory Sizov's avatar
      Follow up on WavLM bundles (#2895) · 41d007b4
      Grigory Sizov authored
      Summary:
      Addressed mthrok's comments in https://github.com/pytorch/audio/pull/2833:
      - Moved model type from `_params` directly into the bundle definition. For now I defined model type as "WavLM" for WavLM bundles and "Wav2Vec2" for everything else. We can also distinguish between different Wav2Vec2 falvours - Hubert, VoxPopuli etc, but at the moment this won't imply any functional differences, so I didn't do it
      - Expanded the title underline to match the title length
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2895
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D41799875
      
      Pulled By: sgrigory
      
      fbshipit-source-id: 0730d4f91ed60e900643bb74d6cccdd7aa5d7b39
      41d007b4
  4. 21 Sep, 2022 1 commit
  5. 15 Sep, 2022 1 commit
  6. 14 Sep, 2022 1 commit
  7. 12 Sep, 2022 1 commit
  8. 21 Apr, 2022 1 commit
    • hwangjeff's avatar
      Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29
      hwangjeff authored
      Summary:
      PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2339
      
      Reviewed By: nateanl
      
      Differential Revision: D35806529
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
      6b242c29
  9. 01 Feb, 2022 1 commit
    • hwangjeff's avatar
      Move ASR features out of prototype (#2187) · aca5591c
      hwangjeff authored
      Summary:
      Moves ASR features out of `torchaudio.prototype`. Specifically, merges contents of `torchaudio.prototype.models` into `torchaudio.models` and contents of `torchaudio.prototype.pipelines` into `torchaudio.pipelines` and updates refs, tests, and docs accordingly.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2187
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D33918092
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: f003f289a7e5d7d43f85b7c270b58bdf2ed6344c
      aca5591c
  10. 28 Dec, 2021 1 commit
    • Zhaoheng Ni's avatar
      Add HuBERT pretrain model to enable training from scratch (#2064) · 37a2555f
      Zhaoheng Ni authored
      Summary:
      - Add three factory functions:`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`, to enable the HuBERT model to train from scratch.
      - Add `num_classes` argument to `hubert_pretrain_base` factory function because the base model has two iterations of training, the first iteration the `num_cluster` is 100, in the second iteration `num_cluster` is 500.
      - The model takes `waveforms`, `labels`, and `lengths` as inputs
      - The model generates the last layer of transformer embedding, `logit_m`, `logit_u` as the outputs.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2064
      
      Reviewed By: hwangjeff, mthrok
      
      Differential Revision: D33338587
      
      Pulled By: nateanl
      
      fbshipit-source-id: 534bc17c576c5f344043d8ba098204b8da6e630a
      37a2555f
  11. 04 Nov, 2021 1 commit
  12. 15 Oct, 2021 2 commits
  13. 08 Oct, 2021 2 commits
  14. 07 Oct, 2021 3 commits
    • moto's avatar
      Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80
      moto authored
      This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.
      
      The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
      1. `Wav2Vec2Model.extract_features`
      In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
      2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
          - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
          - Added dropout parameters.
      274ada80
    • moto's avatar
      60aeb78a
    • moto's avatar
      Make the core wav2vec2 factory function public (#1829) · 31a69c36
      moto authored
      This commit makes the following changes
      1. Make the factory function with full customizability public.
          i.e. `_get_model(...) -> wav2vec2_model(...)`.
      2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
          i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`
      
      ### Why?
      
      While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
        1. Model implementation is responsible for computation logic,
        2. factory functions are responsible for customizability and model construction,
        3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).
      
      (note: for simple models, combining 1 and 2 is also okay.)
      
      This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.
      
      This PR rectifies it by making the mother factory function public.
      This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.
      31a69c36
  15. 06 Oct, 2021 2 commits
  16. 05 Oct, 2021 1 commit
  17. 29 Sep, 2021 1 commit
    • moto's avatar
      Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · 5c01c25f
      moto authored
      * Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`
      
      In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
      and ones for fine-tuning models (pretraining model + extra Linear module).
      
      I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
      but did not feel right, because the architecture code is more generic.
      Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
      it does not have to be ASR.
      This became more evident as we add pre-trained parameters support, such as #1799.
      It matters more for the weight files that for which task and on which dataset it was
      trained on. For factory function, ASR task is not relevant.
      
      Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.
      
      Note: Since the new functions are not release yet, this PR itself is not BC-breaking.
      5c01c25f
  18. 28 Sep, 2021 1 commit
    • moto's avatar
      Add HuBERT model architectures (#1769) · a7854f33
      moto authored
      This commit adds the following HuBERT model architectures
      
       - `base` (pre-training)
       - `large` (pre-training / fine-tuning)
       - `xlarge` (pre-training / fine-tuning)
      
      Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
      With these models, it is possible to 
      - import the pre-trained model published by `fairseq` and TorchScript it.
      - fine-tune the existing model for downstream task.
      a7854f33
  19. 24 Sep, 2021 1 commit
    • moto's avatar
      [BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4
      moto authored
      * [BC-Breaking] Split pretraining and finetuning factory functions
      
      Previously, factory functions of wav2vec2 only generated the architecture
      for the fine-tuning architecture used in wav2ve2 paper for ASR task.
      That is, pre-training architecture + Linear module, and it did not
      provide a straightforward way to generate architectures for pre-training.
      
      The goal of the original implementation was to allow the inference of
      wav2vec2 in non-Python environment via TorchScript. Now we would like to
      expand it to pre-training/fine-tuning and HuBERT model as well.
      
      Therefore, we need to have factory functions for both pre-training and
      fine-tuning. This commit introduces new factory functions and separate
      functions for pre-training and fine-tuning.
      
      1. New functions for ASR fine-tuning.
      
      We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
      used for the fine-tuning task in wav2vec2 paper. *1
      
      2. Re-purpse the old functions
      
      The existing functions, `wav2vec2_XXX`, now generates the architecture with
      pre-trainig module only. (no Linear module)
      
      Note
      *1 This architecture is just one way to define architecture for fine-tuning
      and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
      designed to provide these specific fine-tuning configuration and they are not
      meant to support generic architecture for downstream task.
      b2e9f1e4
  20. 17 Sep, 2021 1 commit
  21. 23 Aug, 2021 1 commit
  22. 18 Aug, 2021 1 commit
  23. 20 Jul, 2021 1 commit
  24. 03 Jun, 2021 1 commit
    • moto's avatar
      Update docs (#1550) · 0166a851
      moto authored
      * Use `bibtex` for paper citations.
        * add `override.css` for fixing back reference.
        * wav2vec2
        * wav2letter
        * convtasnet
        * deepspeech
        * rnnt-loss
        * griffinlim
      * Fix broken references in `filtering`.
      * Fix note in soundfile backends.
      * Tweak wav2vec2 example.
      * Removes unused `pytorch_theme.css`
      0166a851
  25. 01 Jun, 2021 1 commit
  26. 27 May, 2021 2 commits
  27. 11 May, 2021 1 commit
  28. 01 Oct, 2020 1 commit
  29. 29 Jul, 2020 1 commit
  30. 28 Apr, 2020 1 commit
    • Tomás Osório's avatar
      Add model Wav2Letter (#462) · d678357f
      Tomás Osório authored
      * add wav2letter model
      
      * add unit_test to model
      
      * add docstrings
      
      * add documentation
      
      * fix minor error, change logic on forward
      
      * update padding same with ceil
      
      * add inline typing and minor fixes to docstrings
      
      * remove python2
      
      * add formula do docstrings, change param name
      
      * add test with mfcc, add pytest
      
      * fix bug, update docstrings
      
      * change parameter name
      d678357f