1. 24 Feb, 2023 2 commits
  2. 23 Feb, 2023 1 commit
    • G. Sun's avatar
      Add TCPGen context-biasing Conformer RNN-T (#2890) · 1ed330b5
      G. Sun authored
      Summary:
      This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing.
      
      An example for Librispeech can be found in audio/examples/asr/librispeech_biasing.
      
      Maintainer's note (mthrok):
      It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple
      could cause some issue without running the code, so the code is not changed, though the annotation uses tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2890
      
      Reviewed By: nateanl
      
      Differential Revision: D43171447
      
      Pulled By: mthrok
      
      fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e
      1ed330b5
  3. 16 Feb, 2023 2 commits
  4. 15 Feb, 2023 1 commit
  5. 14 Feb, 2023 1 commit
    • Zhaoheng Ni's avatar
      Update ssl example (#3060) · ff01be0f
      Zhaoheng Ni authored
      Summary:
      - Rename the current `ssl` example to `self_supervised_learning`
      - Add README to demonstrate how to run the recipe with hubert task
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3060
      
      Reviewed By: mthrok
      
      Differential Revision: D43287868
      
      Pulled By: nateanl
      
      fbshipit-source-id: 10352682485ef147ca32f4c4c9f9cde995444aa0
      ff01be0f
  6. 30 Jan, 2023 1 commit
    • Yan Li's avatar
      Fix hybrid demucs tutorial for CUDA (#3017) · da9d1627
      Yan Li authored
      Summary:
      Currently there will be a few errors when this tutorial is run with a CUDA device.
      
      The reasons being:
      - The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU).
      - When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3017
      
      Reviewed By: mthrok
      
      Differential Revision: D42828526
      
      Pulled By: nateanl
      
      fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762
      da9d1627
  7. 19 Jan, 2023 2 commits
    • Zhaoheng Ni's avatar
      Add modularized SSL training recipe (#2876) · 2eaefe27
      Zhaoheng Ni authored
      Summary:
      TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2876
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42617414
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
      2eaefe27
    • hwangjeff's avatar
      Simplify train step in Conformer RNN-T LibriSpeech recipe (#2981) · c6a52355
      hwangjeff authored
      Summary:
      In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2981
      
      Reviewed By: mthrok
      
      Differential Revision: D42507228
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
      c6a52355
  8. 17 Jan, 2023 1 commit
  9. 16 Jan, 2023 1 commit
    • Robin Scheibler's avatar
      Fixes examples/source_separation for WSJ0_2mix dataset (#2987) · f9d38796
      Robin Scheibler authored
      Summary:
      The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following.
      
      1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset
      2. Corrects `args.data_dir` to `args.root_dir` in eval.py
      3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2987
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42536992
      
      Pulled By: nateanl
      
      fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76
      f9d38796
  10. 13 Jan, 2023 1 commit
  11. 30 Dec, 2022 1 commit
  12. 17 Dec, 2022 1 commit
  13. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  14. 29 Nov, 2022 1 commit
  15. 28 Nov, 2022 1 commit
  16. 17 Nov, 2022 1 commit
  17. 16 Nov, 2022 2 commits
  18. 17 Oct, 2022 1 commit
  19. 14 Oct, 2022 2 commits
  20. 13 Oct, 2022 2 commits
  21. 12 Oct, 2022 2 commits
    • Nikita Shulga's avatar
      Fix typos in tacotron2 tutorial (#2761) · 7aabcbd4
      Nikita Shulga authored
      Summary:
      `publishe`->`published`
      
      Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2761
      
      Reviewed By: carolineechen
      
      Differential Revision: D40313042
      
      Pulled By: malfet
      
      fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b
      7aabcbd4
    • Zhaoheng Ni's avatar
      Improve hubert recipe for pre-training and fine-tuning (#2744) · 27433050
      Zhaoheng Ni authored
      Summary:
      following pr https://github.com/pytorch/audio/issues/2716
      - For preprocessing
        - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.
      
      - For pre-training
        - Normalize the loss based on the total number of masked frames across all GPUs.
        - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
        - Log accuracies of masked/unmasked frames during training.
        - Clip the gradients with norm `10.0`.
      
      - For ASR fine-tuning
        - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
        - Use mixed precision training.
        - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.
      
      - Update the WER results on LibriSpeech dev and test sets.
      
      |                   | WER% (Viterbi)|  WER% (KenLM) |
      |:-----------------:|--------------:|--------------:|
      | dev-clean         |       10.9    |       4.2     |
      | dev-other         |       17.5    |       9.4     |
      | test-clean        |       10.9    |       4.4     |
      | test-other        |       17.8    |       9.5     |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2744
      
      Reviewed By: carolineechen
      
      Differential Revision: D40282322
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
      27433050
  22. 07 Oct, 2022 1 commit
  23. 06 Oct, 2022 1 commit
  24. 05 Oct, 2022 1 commit
  25. 03 Oct, 2022 1 commit
  26. 23 Sep, 2022 2 commits
  27. 22 Sep, 2022 2 commits
  28. 21 Sep, 2022 2 commits
  29. 14 Sep, 2022 1 commit
  30. 13 Sep, 2022 1 commit