1. 23 Dec, 2021 2 commits
  2. 21 Dec, 2021 1 commit
    • moto's avatar
      Fix load behavior for 24-bit input (#2084) · 4554d242
      moto authored
      Summary:
      ## bug description
      
      When a 24 bits-par-sample audio is loaded via file-like object,
      the loaded Tensor is wrong. It was fine if the audio is loaded
      from local file.
      
      ## The cause of the bug
      
      The core of the sox's decoding mechanism is `sox_read` function,
      one of which parameter is the maximum number of samples to decode
      from the given buffer.
      
      https://fossies.org/dox/sox-14.4.2/formats_8c.html#a2a4f0194a0f919d4f38c57b81aa2c06f)]
      
      The `sox_read` function is called in what is called `drain` effect,
      callback and this callback receives output buffer and its size in
      byte. The previous implementation passed this size value as
      the argument of `sox_read` for the maximum number of samples to
      read. Since buffer size is larger than the number of samples fit in
      the buffer, `sox_read` function always consumed the entire
      buffer. (This behavior is not wrong except when the input is
      24 bits-per-sample and file-like object.)
      
      When the input is read from file-like object, inside of drain
      callback, new data are fetched via Python's `read` method and
      loaded on fixed-size memory region. The size of this memory region
      can be adjusted via `torchaudio.utils.sox_utils.set_buffer_size`,
      but the default value is 8096.
      
      If the input format is 24 bits-per-sample, the end of memory region
      does not necessarily correspond to the end of a valid sample.
      When `sox_read` consumes all the data in the buffer region, the data
      at the end introduces some unexpected values.
      This causes the aforementioned bug
      
      ## Fix
      
      Pass proper (better estimated) maximum number of samples decodable to
      `sox_read`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2084
      
      Reviewed By: carolineechen
      
      Differential Revision: D33236947
      
      Pulled By: mthrok
      
      fbshipit-source-id: 171d9b7945f81db54f98362a68b20f2f95bb11a4
      4554d242
  3. 30 Nov, 2021 1 commit
    • hwangjeff's avatar
      Revise Griffin-Lim transform test to reduce execution time (#2037) · 96b1fa72
      hwangjeff authored
      Summary:
      Our Griffin-Lim autograd tests take a long time to run. This PR adjusts some parameters to shorten the run time.
      
      For one of the four tests:
      Before:
      ```
      test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]
      
      ======================== 1 passed in 517.35s (0:08:37) =========================
      ```
      
      After:
      ```
      test/torchaudio_unittest/transforms/autograd_cpu_test.py . [100%]
      
      ======================== 1 passed in 104.59s (0:01:44) =========================
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2037
      
      Reviewed By: mthrok
      
      Differential Revision: D32726213
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: c785323ab380aea4b63fb1683b557c8ae842f54e
      96b1fa72
  4. 24 Nov, 2021 1 commit
    • hwangjeff's avatar
      Add RNN-T beam search decoder (#2028) · 60a85b50
      hwangjeff authored
      Summary:
      Adds beam search decoder for RNN-T implementation ``torchaudio.prototype.RNNT`` that is TorchScript-able and supports both streaming and non-streaming inference.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2028
      
      Reviewed By: mthrok
      
      Differential Revision: D32627919
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: aab99e346d6514a3207a9fb69d4b42978b4cdbbd
      60a85b50
  5. 23 Nov, 2021 1 commit
    • moto's avatar
      Temporarily skip threadpool test (#2025) · 05ae795a
      moto authored
      Summary:
      The sox_effects test in `concurrent.future.ThreadPoolExecutor` started failing since couple of days. While investigate this, skipping the test.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2025
      
      Reviewed By: nateanl
      
      Differential Revision: D32615933
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4f7301c0d3c0d11f687011e42e06d9c87ce4197f
      05ae795a
  6. 22 Nov, 2021 2 commits
    • Zhaoheng Ni's avatar
      Relax dtype for MVDR (#2024) · 392a03c8
      Zhaoheng Ni authored
      Summary:
      Allow users to use `torch.cfloat` dtype input for MVDR module. It internally convert the spectrogram into `torch.cdouble` and output the tensor with the original dtype of the spectrogram.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2024
      
      Reviewed By: carolineechen
      
      Differential Revision: D32594051
      
      Pulled By: nateanl
      
      fbshipit-source-id: e32609ccdc881b36300d579c90daba41c9234b46
      392a03c8
    • Zhaoheng Ni's avatar
      Improve MVDR stability (#2004) · fb2f9538
      Zhaoheng Ni authored
      Summary:
      Division first, multiplication second. This helps avoid the value overflow issue. It also helps the ``stv_evd`` solution pass the gradient check.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2004
      
      Reviewed By: mthrok
      
      Differential Revision: D32539827
      
      Pulled By: nateanl
      
      fbshipit-source-id: 70a386608324bb6e1b1c7238c78d403698590f22
      fb2f9538
  7. 18 Nov, 2021 2 commits
  8. 17 Nov, 2021 1 commit
  9. 04 Nov, 2021 2 commits
  10. 03 Nov, 2021 3 commits
  11. 02 Nov, 2021 3 commits
  12. 28 Oct, 2021 1 commit
  13. 27 Oct, 2021 1 commit
  14. 25 Oct, 2021 1 commit
  15. 22 Oct, 2021 1 commit
  16. 21 Oct, 2021 1 commit
  17. 15 Oct, 2021 2 commits
    • moto's avatar
      Add TTS bundle/pipelines (#1872) · e885204e
      moto authored
      Future work items:
      - length computation of GriffinLim
      - better way to make InverseMelScale work in inference_mode
      e885204e
    • moto's avatar
      Move wav2vec2 pretrained models to pipelines module (#1876) · fad855cd
      moto authored
      - Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
      - Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
      - Update base URL
      fad855cd
  18. 13 Oct, 2021 2 commits
  19. 10 Oct, 2021 1 commit
    • moto's avatar
      Store n_bits in WaveRNN (#1847) · 9637c6bf
      moto authored
      Move the computation of `#classes -> #bits` to the constructor of WaveRNN and attach it to the instance, so that it can be reused elsewhere.
      9637c6bf
  20. 08 Oct, 2021 2 commits
  21. 07 Oct, 2021 2 commits
    • moto's avatar
      Merge factory functions of pre-training model and fine-tuned model (#1830) · 274ada80
      moto authored
      This commit merges wav2vec2/hubert factory functions for pre-training and fine-tuning. In #1829, we added parameters to customize the models that are not part of architecture, and `aux_num_out` falls into this category, so it is no longer necessary to have separate functions. This concludes the wav2vec2/HuBERT API update in release 0.10.
      
      The summary of BC-breaking changes on wav2vec2 APIs between 0.9 and 0.10 (when this commit is incorporated)
      1. `Wav2Vec2Model.extract_features`
      In 0.9, it was returning the output from `FeatureExtractor` module. In 0.10, it returns the list of outputs from the intermediate layers of `TransformerEncoder` block.
      2. `wav2vec2_base(num_out: int)` -> `wav2vec2_base(<dropout_params:float>, aux_num_out: Optional[int]=None)`
          - `num_out` was renamed to `aux_num_out` and optional. If it is omitted, the resulting model does not have the linear layer for fine-tuning.
          - Added dropout parameters.
      274ada80
    • moto's avatar
      Make the core wav2vec2 factory function public (#1829) · 31a69c36
      moto authored
      This commit makes the following changes
      1. Make the factory function with full customizability public.
          i.e. `_get_model(...) -> wav2vec2_model(...)`.
      2. Change the other architecture-specific factory functions so that they accept parameters not related to the model architecture (such as dropout).
          i.e. `wav2vec2_base() -> wav2vec2_base(encoder_projection_dropout, encoder_attention_dropout, encoder_ff_interm_dropout, ...)`
      
      ### Why?
      
      While adding the pre-trained weight support, I realized that separating API for model construction and pre-trained support achieves simple code organization because of the good separation of concern. As mentioned in #1821, in this framework,
        1. Model implementation is responsible for computation logic,
        2. factory functions are responsible for customizability and model construction,
        3. and pre-trained weight API is responsible for constructing a model and loading pre-trained weights along with the complementary information (such as pre-processing and class labels).
      
      (note: for simple models, combining 1 and 2 is also okay.)
      
      This means that factory functions has to support all the customizability required by pre-trained weight API. The current implementation uses the internal function like `from .model import Wav2Vec2Model, _get_model`, which is a bit strange.
      
      This PR rectifies it by making the mother factory function public.
      This also clarifies the purpose of having the other factory functions as public API, which is just a syntax sugar for constructing un-trained model with specific architecture. So this commit also adds supplemental parameters to them.
      31a69c36
  22. 06 Oct, 2021 5 commits
  23. 05 Oct, 2021 2 commits