1. 13 Oct, 2022 2 commits
  2. 12 Oct, 2022 4 commits
    • Nikita Shulga's avatar
      Fix typos in tacotron2 tutorial (#2761) · 7aabcbd4
      Nikita Shulga authored
      Summary:
      `publishe`->`published`
      
      Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2761
      
      Reviewed By: carolineechen
      
      Differential Revision: D40313042
      
      Pulled By: malfet
      
      fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b
      7aabcbd4
    • Zhaoheng Ni's avatar
      Improve hubert recipe for pre-training and fine-tuning (#2744) · 27433050
      Zhaoheng Ni authored
      Summary:
      following pr https://github.com/pytorch/audio/issues/2716
      - For preprocessing
        - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.
      
      - For pre-training
        - Normalize the loss based on the total number of masked frames across all GPUs.
        - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
        - Log accuracies of masked/unmasked frames during training.
        - Clip the gradients with norm `10.0`.
      
      - For ASR fine-tuning
        - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
        - Use mixed precision training.
        - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.
      
      - Update the WER results on LibriSpeech dev and test sets.
      
      |                   | WER% (Viterbi)|  WER% (KenLM) |
      |:-----------------:|--------------:|--------------:|
      | dev-clean         |       10.9    |       4.2     |
      | dev-other         |       17.5    |       9.4     |
      | test-clean        |       10.9    |       4.4     |
      | test-other        |       17.8    |       9.5     |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2744
      
      Reviewed By: carolineechen
      
      Differential Revision: D40282322
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
      27433050
    • Zhaoheng Ni's avatar
      Improve wav2vec2/hubert model for pre-training (#2716) · c5bd93b6
      Zhaoheng Ni authored
      Summary:
      This PR improves the Wav2Vec2/HuBERT model regarding model pre-training.
      
      - The model initialization of positional embedding and transformer module is essential to model pre-training. The accuracy of unmasked frames should be higher than masked frames, as it is an easier task. but without the initialization, the accuracy of masked frames is higher than unmasked frames.
        Compared the performance after two epochs with 16 GPUs.
        - With model initialization, the accuracies of masked/unmasked frames are 0.08/0.11.
        - Without model initialization, the accuracies of masked/unmasked frames are 0.06/0.04.
      - After adding the model initialization, the gradient is easy to overflow (aka `nan` gradient). In paper [Self-Supervised Learning for speech recognition with Intermediate layer supervision](https://arxiv.org/abs/2112.08778) the authors propose a simple but effective method to mitigate the overflow issue, by scaling down the multiplication of query and key and subtracting the maximum value from it (subtracting a constant value won't change the output of softmax). Then it guarantees the value won't be overflowed.
      - In the original fairseq, the mask indices are generated by `numpy.random.choice`. Here replace `torch.multinomial` with `torch.randperm`. (cc carolineechen).
      
      Other improvements within training scripts will be included in a separate PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2716
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D39832189
      
      Pulled By: nateanl
      
      fbshipit-source-id: f4d2a473a79ad63add2dd16624bd155d5ce4de27
      c5bd93b6
    • Caroline Chen's avatar
      Skip hubert xlarge torchscript test (#2758) · c2ea6898
      Caroline Chen authored
      Summary:
      a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci
      
      cc atalman
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2758
      
      Reviewed By: mthrok
      
      Differential Revision: D40290535
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57
      c2ea6898
  3. 11 Oct, 2022 4 commits
  4. 10 Oct, 2022 2 commits
    • Zhaoheng Ni's avatar
      Add unit test for LibriMix dataset (#2659) · c5b8e585
      Zhaoheng Ni authored
      Summary:
      Besides the unit test, the PR also addresses these issues:
      - The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
      - If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2659
      
      Reviewed By: carolineechen
      
      Differential Revision: D40229227
      
      Pulled By: nateanl
      
      fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235
      c5b8e585
    • Zhaoheng Ni's avatar
      Fix HuBERT docstring (#2746) · be938e7e
      Zhaoheng Ni authored
      Summary:
      The docstring of `wav2vec2` argument is wrong. Fix it in this PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2746
      
      Reviewed By: carolineechen
      
      Differential Revision: D40225995
      
      Pulled By: nateanl
      
      fbshipit-source-id: 770e9c928ebebd7b6307e181601eb64625d668da
      be938e7e
  5. 09 Oct, 2022 1 commit
  6. 08 Oct, 2022 1 commit
  7. 07 Oct, 2022 3 commits
  8. 06 Oct, 2022 3 commits
  9. 05 Oct, 2022 1 commit
  10. 03 Oct, 2022 3 commits
  11. 01 Oct, 2022 1 commit
  12. 29 Sep, 2022 1 commit
  13. 28 Sep, 2022 3 commits
  14. 27 Sep, 2022 2 commits
  15. 26 Sep, 2022 2 commits
    • Andrey Talman's avatar
      Fix windows tests related to old conda on circleci (#2704) · 1c8accfc
      Andrey Talman authored
      Summary:
      Conda version on circleCI prints following message:
      ```
      ==> WARNING: A newer version of conda exists. <==
        current version: 4.6.14
        latest version: 4.14.0
      ```
      and as a result this error:
      
      ```
      + /c/tools/miniconda3/Scripts/conda.exe install -v -y -c pytorch-nightly -c nvidia pytorch numpy ffmpeg pytorch-cuda=11.6
      Collecting package metadata: ...working... done
      Solving environment: ...working...
      
      Too long with no output (exceeded 30m0s): context deadline exceeded
      ```
      
      This should update the conda version running on the system and allow us to install pytorch and run some tests.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2704
      
      Reviewed By: weiwangmeta
      
      Differential Revision: D39820037
      
      Pulled By: atalman
      
      fbshipit-source-id: 4a82a7a6cbe3dc1a5807ac669e2fa79f454037fa
      1c8accfc
    • Andrey Talman's avatar
      Remove linux wheel from circleci (#2714) · 14714f29
      Andrey Talman authored
      Summary:
      Remove linux wheel from circleci
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2714
      
      Reviewed By: weiwangmeta
      
      Differential Revision: D39816121
      
      Pulled By: atalman
      
      fbshipit-source-id: a3c99b530896888d7b4271d8b3f27f3c986b3480
      14714f29
  16. 24 Sep, 2022 2 commits
    • hwangjeff's avatar
      Fix CUDA check (#2710) · ce08f8d2
      hwangjeff authored
      Summary:
      `torch.version.cuda` can return a string of form X.X or X.X.X. This PR modifies the CUDA version check to account for this.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2710
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D39796810
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: b483bd8200195844d65d0caddebaf1b10f939b64
      ce08f8d2
    • hwangjeff's avatar
      Add CUDA version check (#2707) · 0a5825ae
      hwangjeff authored
      Summary:
      Adds check to ensure that TorchAudio and PyTorch versions use the same CUDA version.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2707
      
      Reviewed By: mthrok
      
      Differential Revision: D39791154
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: de00889c7bac897c6b8762502f9d37797016b71d
      0a5825ae
  17. 23 Sep, 2022 3 commits
  18. 22 Sep, 2022 2 commits