1. 28 Nov, 2022 1 commit
  2. 17 Nov, 2022 1 commit
  3. 16 Nov, 2022 2 commits
  4. 17 Oct, 2022 1 commit
  5. 14 Oct, 2022 2 commits
  6. 13 Oct, 2022 2 commits
  7. 12 Oct, 2022 2 commits
    • Nikita Shulga's avatar
      Fix typos in tacotron2 tutorial (#2761) · 7aabcbd4
      Nikita Shulga authored
      Summary:
      `publishe`->`published`
      
      Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2761
      
      Reviewed By: carolineechen
      
      Differential Revision: D40313042
      
      Pulled By: malfet
      
      fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b
      7aabcbd4
    • Zhaoheng Ni's avatar
      Improve hubert recipe for pre-training and fine-tuning (#2744) · 27433050
      Zhaoheng Ni authored
      Summary:
      following pr https://github.com/pytorch/audio/issues/2716
      - For preprocessing
        - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.
      
      - For pre-training
        - Normalize the loss based on the total number of masked frames across all GPUs.
        - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
        - Log accuracies of masked/unmasked frames during training.
        - Clip the gradients with norm `10.0`.
      
      - For ASR fine-tuning
        - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
        - Use mixed precision training.
        - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.
      
      - Update the WER results on LibriSpeech dev and test sets.
      
      |                   | WER% (Viterbi)|  WER% (KenLM) |
      |:-----------------:|--------------:|--------------:|
      | dev-clean         |       10.9    |       4.2     |
      | dev-other         |       17.5    |       9.4     |
      | test-clean        |       10.9    |       4.4     |
      | test-other        |       17.8    |       9.5     |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2744
      
      Reviewed By: carolineechen
      
      Differential Revision: D40282322
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
      27433050
  8. 07 Oct, 2022 1 commit
  9. 06 Oct, 2022 1 commit
  10. 05 Oct, 2022 1 commit
  11. 03 Oct, 2022 1 commit
  12. 23 Sep, 2022 2 commits
  13. 22 Sep, 2022 2 commits
  14. 21 Sep, 2022 2 commits
  15. 14 Sep, 2022 1 commit
  16. 13 Sep, 2022 1 commit
  17. 09 Sep, 2022 1 commit
  18. 06 Sep, 2022 1 commit
  19. 26 Aug, 2022 1 commit
  20. 18 Aug, 2022 3 commits
    • moto's avatar
      Update ASR inference tutorial (#2631) · 189edb1b
      moto authored
      Summary:
      * Use download_asset
      * Remove notes around nightly
      * Print versions first
      * Remove duplicated import
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2631
      
      Reviewed By: carolineechen
      
      Differential Revision: D38830395
      
      Pulled By: mthrok
      
      fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6
      189edb1b
    • moto's avatar
      Update notes around nightly build and third parties (#2632) · 55ce80b1
      moto authored
      Summary:
      Google Colab now has torchaudio 0.12 pre-installed.
      This commit removes the note about nightly build.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2632
      
      Reviewed By: carolineechen
      
      Differential Revision: D38827632
      
      Pulled By: mthrok
      
      fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb
      55ce80b1
    • moto's avatar
      Tweak tutorials (#2630) · cab2bb44
      moto authored
      Summary:
      Resolves the following warnings
      
      ```
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
      /torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2630
      
      Reviewed By: nateanl
      
      Differential Revision: D38816632
      
      Pulled By: mthrok
      
      fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635
      cab2bb44
  21. 10 Aug, 2022 1 commit
  22. 05 Aug, 2022 1 commit
    • Caroline Chen's avatar
      Add note for lexicon free decoder output (#2603) · 33485b8c
      Caroline Chen authored
      Summary:
      ``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.
      
      Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2603
      
      Reviewed By: mthrok
      
      Differential Revision: D38459709
      
      Pulled By: carolineechen
      
      fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934
      33485b8c
  23. 01 Aug, 2022 1 commit
  24. 29 Jul, 2022 2 commits
    • moto's avatar
      Update forced alignment tutorial (#2544) · c26b38b2
      moto authored
      Summary:
      1. Fix initialization.
      Previously, the SOS token score was initialized to 0 across the time axis.
      This was biasing the alignment to delay the start.
      The proper way to delay the SOS is via blank token.
      The new initilization takes the cumulated sum of blank scores.
      2. Fill the end of trellis with Inf
      Similar to the start, at the end where there remaining time frame is less
      than the number of tokens, it is no longer possible to align the text, thus
      we fill with Inf for better visualization.
      3. Clean up asset management code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2544
      
      Reviewed By: nateanl
      
      Differential Revision: D38276478
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299
      c26b38b2
    • Zhaoheng Ni's avatar
      Improve speech enhancement tutorial (#2527) · d6267031
      Zhaoheng Ni authored
      Summary:
      - The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
      - Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
      - FIx the figure in `rtf_power` subsection.
          - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
      - Print PESQ, STOI, and SDR metric scores.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2527
      
      Reviewed By: mthrok
      
      Differential Revision: D38190218
      
      Pulled By: nateanl
      
      fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de
      d6267031
  25. 28 Jul, 2022 2 commits
  26. 11 Jul, 2022 1 commit
  27. 23 Jun, 2022 1 commit
  28. 17 Jun, 2022 1 commit
  29. 08 Jun, 2022 1 commit