1. 19 Jan, 2023 2 commits
    • Zhaoheng Ni's avatar
      Add modularized SSL training recipe (#2876) · 2eaefe27
      Zhaoheng Ni authored
      Summary:
      TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2876
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42617414
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
      2eaefe27
    • hwangjeff's avatar
      Simplify train step in Conformer RNN-T LibriSpeech recipe (#2981) · c6a52355
      hwangjeff authored
      Summary:
      In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2981
      
      Reviewed By: mthrok
      
      Differential Revision: D42507228
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
      c6a52355
  2. 17 Jan, 2023 1 commit
  3. 16 Jan, 2023 1 commit
    • Robin Scheibler's avatar
      Fixes examples/source_separation for WSJ0_2mix dataset (#2987) · f9d38796
      Robin Scheibler authored
      Summary:
      The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following.
      
      1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset
      2. Corrects `args.data_dir` to `args.root_dir` in eval.py
      3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2987
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42536992
      
      Pulled By: nateanl
      
      fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76
      f9d38796
  4. 13 Jan, 2023 1 commit
  5. 30 Dec, 2022 1 commit
  6. 17 Dec, 2022 1 commit
  7. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  8. 29 Nov, 2022 1 commit
  9. 28 Nov, 2022 1 commit
  10. 17 Nov, 2022 1 commit
  11. 16 Nov, 2022 2 commits
  12. 17 Oct, 2022 1 commit
  13. 14 Oct, 2022 2 commits
  14. 13 Oct, 2022 2 commits
  15. 12 Oct, 2022 2 commits
    • Nikita Shulga's avatar
      Fix typos in tacotron2 tutorial (#2761) · 7aabcbd4
      Nikita Shulga authored
      Summary:
      `publishe`->`published`
      
      Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2761
      
      Reviewed By: carolineechen
      
      Differential Revision: D40313042
      
      Pulled By: malfet
      
      fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b
      7aabcbd4
    • Zhaoheng Ni's avatar
      Improve hubert recipe for pre-training and fine-tuning (#2744) · 27433050
      Zhaoheng Ni authored
      Summary:
      following pr https://github.com/pytorch/audio/issues/2716
      - For preprocessing
        - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.
      
      - For pre-training
        - Normalize the loss based on the total number of masked frames across all GPUs.
        - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
        - Log accuracies of masked/unmasked frames during training.
        - Clip the gradients with norm `10.0`.
      
      - For ASR fine-tuning
        - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
        - Use mixed precision training.
        - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.
      
      - Update the WER results on LibriSpeech dev and test sets.
      
      |                   | WER% (Viterbi)|  WER% (KenLM) |
      |:-----------------:|--------------:|--------------:|
      | dev-clean         |       10.9    |       4.2     |
      | dev-other         |       17.5    |       9.4     |
      | test-clean        |       10.9    |       4.4     |
      | test-other        |       17.8    |       9.5     |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2744
      
      Reviewed By: carolineechen
      
      Differential Revision: D40282322
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
      27433050
  16. 07 Oct, 2022 1 commit
  17. 06 Oct, 2022 1 commit
  18. 05 Oct, 2022 1 commit
  19. 03 Oct, 2022 1 commit
  20. 23 Sep, 2022 2 commits
  21. 22 Sep, 2022 2 commits
  22. 21 Sep, 2022 2 commits
  23. 14 Sep, 2022 1 commit
  24. 13 Sep, 2022 1 commit
  25. 09 Sep, 2022 1 commit
  26. 06 Sep, 2022 1 commit
  27. 26 Aug, 2022 1 commit
  28. 18 Aug, 2022 3 commits
    • moto's avatar
      Update ASR inference tutorial (#2631) · 189edb1b
      moto authored
      Summary:
      * Use download_asset
      * Remove notes around nightly
      * Print versions first
      * Remove duplicated import
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2631
      
      Reviewed By: carolineechen
      
      Differential Revision: D38830395
      
      Pulled By: mthrok
      
      fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6
      189edb1b
    • moto's avatar
      Update notes around nightly build and third parties (#2632) · 55ce80b1
      moto authored
      Summary:
      Google Colab now has torchaudio 0.12 pre-installed.
      This commit removes the note about nightly build.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2632
      
      Reviewed By: carolineechen
      
      Differential Revision: D38827632
      
      Pulled By: mthrok
      
      fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb
      55ce80b1
    • moto's avatar
      Tweak tutorials (#2630) · cab2bb44
      moto authored
      Summary:
      Resolves the following warnings
      
      ```
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
      /torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2630
      
      Reviewed By: nateanl
      
      Differential Revision: D38816632
      
      Pulled By: mthrok
      
      fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635
      cab2bb44
  29. 10 Aug, 2022 1 commit
  30. 05 Aug, 2022 1 commit
    • Caroline Chen's avatar
      Add note for lexicon free decoder output (#2603) · 33485b8c
      Caroline Chen authored
      Summary:
      ``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.
      
      Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2603
      
      Reviewed By: mthrok
      
      Differential Revision: D38459709
      
      Pulled By: carolineechen
      
      fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934
      33485b8c