1. 23 Dec, 2021 2 commits
  2. 21 Dec, 2021 1 commit
    • moto's avatar
      Fix load behavior for 24-bit input (#2084) · 4554d242
      moto authored
      Summary:
      ## bug description
      
      When a 24 bits-par-sample audio is loaded via file-like object,
      the loaded Tensor is wrong. It was fine if the audio is loaded
      from local file.
      
      ## The cause of the bug
      
      The core of the sox's decoding mechanism is `sox_read` function,
      one of which parameter is the maximum number of samples to decode
      from the given buffer.
      
      https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]
      
      The `sox_read` function is called in what is called `drain` effect,
      callback and this callback receives output buffer and its size in
      byte. The previous implementation passed this size value as
      the argument of `sox_read` for the maximum number of samples to
      read. Since buffer size is larger than the number of samples fit in
      the buffer, `sox_read` function always consumed the entire
      buffer. (This behavior is not wrong except when the input is
      24 bits-per-sample and file-like object.)
      
      When the input is read from file-like object, inside of drain
      callback, new data are fetched via Python's `read` method and
      loaded on fixed-size memory region. The size of this memory region
      can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
      but the default value is 8096.
      
      If the input format is 24 bits-per-sample, the end of memory region
      does not necessarily correspond to the end of a valid sample.
      When `sox_read` consumes all the data in the buffer region, the data
      at the end introduces some unexpected values.
      This causes the aforementioned bug
      
      ## Fix
      
      Pass proper (better estimated) maximum number of samples decodable to
      `sox_read`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2084
      
      Reviewed By: carolineechen
      
      Differential Revision: D33236947
      
      Pulled By: mthrok
      
      fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
      4554d242
  3. 30 Nov, 2021 1 commit
    • hwangjeff's avatar
      Revise Griffin-Lim transform test to reduce execution time (#2037) · 96b1fa72
      hwangjeff authored
      Summary:
      Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time.
      
      For one of the four tests:
      Before:
      ```
      test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]
      
      ======================== 1 passed in 517.35s (0:08:37) =========================
      ```
      
      After:
      ```
      test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]
      
      ======================== 1 passed in 104.59s (0:01:44) =========================
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2037
      
      Reviewed By: mthrok
      
      Differential Revision: D32726213
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e
      96b1fa72
  4. 24 Nov, 2021 1 commit
    • hwangjeff's avatar
      Add RNN-T beam search decoder (#2028) · 60a85b50
      hwangjeff authored
      Summary:
      Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2028
      
      Reviewed By: mthrok
      
      Differential Revision: D32627919
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
      60a85b50
  5. 23 Nov, 2021 1 commit
    • moto's avatar
      Temporarily skip threadpool test (#2025) · 05ae795a
      moto authored
      Summary:
      The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2025
      
      Reviewed By: nateanl
      
      Differential Revision: D32615933
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f
      05ae795a
  6. 22 Nov, 2021 2 commits
    • Zhaoheng Ni's avatar
      Relax dtype for MVDR (#2024) · 392a03c8
      Zhaoheng Ni authored
      Summary:
      Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2024
      
      Reviewed By: carolineechen
      
      Differential Revision: D32594051
      
      Pulled By: nateanl
      
      fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46
      392a03c8
    • Zhaoheng Ni's avatar
      Improve MVDR stability (#2004) · fb2f9538
      Zhaoheng Ni authored
      Summary:
      Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2004
      
      Reviewed By: mthrok
      
      Differential Revision: D32539827
      
      Pulled By: nateanl
      
      fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22
      fb2f9538
  7. 18 Nov, 2021 2 commits
  8. 17 Nov, 2021 1 commit
  9. 04 Nov, 2021 1 commit
  10. 03 Nov, 2021 2 commits
  11. 28 Oct, 2021 1 commit
  12. 22 Oct, 2021 1 commit
  13. 13 Oct, 2021 2 commits
  14. 10 Oct, 2021 1 commit
    • moto's avatar
      Store n_bits in WaveRNN (#1847) · 9637c6bf
      moto authored
      Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.
      9637c6bf
  15. 08 Oct, 2021 1 commit
  16. 07 Oct, 2021 2 commits
    • moto's avatar
      Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80
      moto authored
      This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.
      
      The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
      1. `Wav2Vec2Model.extract_features`
      In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
      2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
          - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
          - Added dropout parameters.
      274ada80
    • moto's avatar
      Make the core wav2vec2 factory function public (#1829) · 31a69c36
      moto authored
      This commit makes the following changes
      1. Make the factory function with full customizability public.
          i.e. `_get_model(...) -> wav2vec2_model(...)`.
      2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
          i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`
      
      ### Why?
      
      While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
        1. Model implementation is responsible for computation logic,
        2. factory functions are responsible for customizability and model construction,
        3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).
      
      (note: for simple models, combining 1 and 2 is also okay.)
      
      This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.
      
      This PR rectifies it by making the mother factory function public.
      This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.
      31a69c36
  17. 06 Oct, 2021 3 commits
  18. 05 Oct, 2021 3 commits
  19. 01 Oct, 2021 1 commit
    • moto's avatar
      Fix HuBERT xlarge configuration and test (#1811) · 13b2349a
      moto authored
      1. Fix the HuBERT xlarge model config
      2. In the 48 transformer layers of HuBERT xlarge model, very few elements deviate from the equivalent model of fairseq, and exceeds the default atol 1e-5. This commit relax it to 3e-5 for the specific test.
      13b2349a
  20. 30 Sep, 2021 1 commit
  21. 29 Sep, 2021 2 commits
    • moto's avatar
      Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH` (#1804) · 5c01c25f
      moto authored
      * Rename factory functions `wav2vec2_asr_ARCH` to `wav2vec2_ft_ARCH`
      
      In #1783, we split the factory functions of wav2vec2 into ones for pretraining models
      and ones for fine-tuning models (pretraining model + extra Linear module).
      
      I picked the name scheme `wav2vec2_asr_ARCH` for factory functions of fine-tuning models,
      but did not feel right, because the architecture code is more generic.
      Even though the resulting model architecture was used for  ASR fine-tuning in the paper, 
      it does not have to be ASR.
      This became more evident as we add pre-trained parameters support, such as #1799.
      It matters more for the weight files that for which task and on which dataset it was
      trained on. For factory function, ASR task is not relevant.
      
      Therefore renaming the functions by replacing `_asr_` to `_ft_` fine-tuning.
      
      Note: Since the new functions are not release yet, this PR itself is not BC-breaking.
      5c01c25f
    • moto's avatar
      Skip hubert_asr_xlarge TS test on Windows (#1800) · a7bdedae
      moto authored
      a7bdedae
  22. 28 Sep, 2021 1 commit
    • moto's avatar
      Add HuBERT model architectures (#1769) · a7854f33
      moto authored
      This commit adds the following HuBERT model architectures
      
       - `base` (pre-training)
       - `large` (pre-training / fine-tuning)
       - `xlarge` (pre-training / fine-tuning)
      
      Since the internal components are same as `Wav2Vec2Model`, it reuses the existing modules..
      With these models, it is possible to 
      - import the pre-trained model published by `fairseq` and TorchScript it.
      - fine-tune the existing model for downstream task.
      a7854f33
  23. 24 Sep, 2021 1 commit
    • moto's avatar
      [BC-Breaking] Split pretraining and finetuning factory functions (#1783) · b2e9f1e4
      moto authored
      * [BC-Breaking] Split pretraining and finetuning factory functions
      
      Previously, factory functions of wav2vec2 only generated the architecture
      for the fine-tuning architecture used in wav2ve2 paper for ASR task.
      That is, pre-training architecture + Linear module, and it did not
      provide a straightforward way to generate architectures for pre-training.
      
      The goal of the original implementation was to allow the inference of
      wav2vec2 in non-Python environment via TorchScript. Now we would like to
      expand it to pre-training/fine-tuning and HuBERT model as well.
      
      Therefore, we need to have factory functions for both pre-training and
      fine-tuning. This commit introduces new factory functions and separate
      functions for pre-training and fine-tuning.
      
      1. New functions for ASR fine-tuning.
      
      We introdcue `wav2vec2_asr_XXX` functions which generates the architecture
      used for the fine-tuning task in wav2vec2 paper. *1
      
      2. Re-purpse the old functions
      
      The existing functions, `wav2vec2_XXX`, now generates the architecture with
      pre-trainig module only. (no Linear module)
      
      Note
      *1 This architecture is just one way to define architecture for fine-tuning
      and it is not universal definition. The new `wav2vec2_asr_XXX` functions are
      designed to provide these specific fine-tuning configuration and they are not
      meant to support generic architecture for downstream task.
      b2e9f1e4
  24. 22 Sep, 2021 3 commits
    • moto's avatar
      [BC-Breaking] Move fine-tune specific module out of wav2vec2 encoder (#1782) · 40f2a085
      moto authored
      Previously, the Linear module (called `readout`, which is used only for an ASR fine-tuning
      task) was placed in encoder module. Conceptually, the encoder has nothing to
      do with a module specific to fine-tuning / downstream task.
      
      The problems here are that;
      1. encoder can be also used in pre-training phase, in which such a module should
      not present
      2. The choice of Linear module is arbitral, and it is inconvenient for users
      to have hard-coded module structure in encoder.
      
      Therefore, this commit moves the Linear module out the encoder, and places it
      as `aux` attribute of `Wav2Vec2Model`. (as a result `Wav2Vec2Model` has
      `feature_extractor`, `encoder` and `aux` attributes.)
      
      An alternative approach is to define another module and place `Wav2Vec2Model`
      and aux module along each other. But that will introduce a new class we need
      to maintain.
      The expected use of `aux` is only  for 1. loading the pre-trained parameters 
      published by `fairseq` (and it's variations from HF) and 2. creating the same model 
      architectures for comparison experiment.
      The newly introduced class will not be general enough for downstream adaptations, 
      where there will be a bunch of different more complicated models. (i.e. s3prl)
      
      Therefore, based on the minimalistic approach, we put them inside of `Wav2Vec2Model`.
      40f2a085
    • moto's avatar
      Fix HF model integration (#1781) · e9cab8f8
      moto authored
      * Fix HF model integration
      
      Previously, when testing wav2vec models from HF transformers, all the model were
      instantiated as `Wav2Vec2ForCTC` class, while some of them were supposed to be
      `Wav2Vec2Model`.
      
      Fixing this revealed that model importer cannot correctly handle `Wav2Vec2Model` import.
      
      This PR fixes these issues.
      e9cab8f8
    • moto's avatar
      Update reference from master to main elsewhere (#1784) · 1b4b82e0
      moto authored
      
      
      Summary: Update fairseq reference from master to main elsewhere
      
      Reviewed By: alexeib
      
      Differential Revision: D30938472
      
      fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8
      Co-authored-by: default avatarDiana Liskovich <dianaml@fb.com>
      1b4b82e0
  25. 21 Sep, 2021 1 commit
  26. 20 Sep, 2021 2 commits