1. 28 Jul, 2023 2 commits
  2. 26 Jul, 2023 1 commit
  3. 25 Jul, 2023 2 commits
    • Pingchuan Ma's avatar
      Update avsr recipe (#3493) · d4644793
      Pingchuan Ma authored
      Summary:
      This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3493
      
      Reviewed By: mthrok
      
      Differential Revision: D47758072
      
      Pulled By: mpc001
      
      fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc
      d4644793
    • moto's avatar
      Update nvdec/nvenc tutorials (#3483) · 56e22664
      moto authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3483
      
      Differential Revision: D47725664
      
      Pulled By: mthrok
      
      fbshipit-source-id: e4249e1488fa7af8670be4a5077957912ff3420b
      56e22664
  4. 24 Jul, 2023 1 commit
  5. 18 Jul, 2023 1 commit
  6. 15 Jul, 2023 1 commit
  7. 05 Jul, 2023 1 commit
  8. 28 Jun, 2023 1 commit
  9. 26 Jun, 2023 1 commit
  10. 21 Jun, 2023 1 commit
  11. 16 Jun, 2023 1 commit
    • Pingchuan Ma's avatar
      Add LRS3 data preparation (#3421) · 77cdd160
      Pingchuan Ma authored
      Summary:
      This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.
      
      This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3421
      
      Reviewed By: mpc001
      
      Differential Revision: D46799748
      
      Pulled By: mthrok
      
      fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
      77cdd160
  12. 15 Jun, 2023 1 commit
    • moto's avatar
      Update forced alignment tutorial (#3440) · 18601691
      moto authored
      Summary:
      * Fix backtrack visualization (the cooridnate was off-by-one.)
      * Add note about the simplification and the new align API
      * Explicitly handle SOS and EOS
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3440
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D46761282
      
      Pulled By: mthrok
      
      fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8
      18601691
  13. 07 Jun, 2023 1 commit
  14. 06 Jun, 2023 1 commit
  15. 04 Jun, 2023 1 commit
  16. 02 Jun, 2023 2 commits
    • moto's avatar
      [BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5
      moto authored
      Summary:
      This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.
      
      Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.
      
      The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.
      
      Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.
      
      See some of the discussion https://github.com/pytorch/audio/issues/1269
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3368
      
      Differential Revision: D46406176
      
      Pulled By: mthrok
      
      fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
      5bbbb1d5
    • moto's avatar
      Update data augmentation tutorial (#3375) · 2ba36b47
      moto authored
      Summary:
      Replace sox_effects with `torchaudio.io.AudioEffector`
      
      1. To show case the new and better feature
      2. To prepare for the upcoming removal of file-like support object
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3375
      
      Reviewed By: nateanl
      
      Differential Revision: D46379016
      
      Pulled By: mthrok
      
      fbshipit-source-id: 70f24b62494204949f327f6ac6c49f315c9ee315
      2ba36b47
  17. 31 May, 2023 1 commit
  18. 26 May, 2023 2 commits
    • atalman's avatar
      Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9
      atalman authored
      Summary:
      This reverts commit d38a7854.
      
      This is temporary revert to unblock unit test migration from circleci to github
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3377
      
      Reviewed By: mthrok
      
      Differential Revision: D46230498
      
      Pulled By: atalman
      
      fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d
      37779ef9
    • Lakshmi Krishnan's avatar
      Improve RNN-T streaming decoding (#3295) · 9fc0dcaa
      Lakshmi Krishnan authored
      Summary:
      This commit fixes the following issues affecting streaming decoding quality
      1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
      2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step.  This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
      3. Some minor errors regarding shape checking for length.
      
      This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3295
      
      Reviewed By: nateanl
      
      Differential Revision: D46216113
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
      9fc0dcaa
  19. 25 May, 2023 1 commit
    • Pingchuan Ma's avatar
      Add LRS3 AV-ASR recipe (#3278) · c6624fa6
      Pingchuan Ma authored
      Summary:
      This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes.
      
      CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3278
      
      Reviewed By: nateanl
      
      Differential Revision: D46121550
      
      Pulled By: mpc001
      
      fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
      c6624fa6
  20. 23 May, 2023 1 commit
  21. 21 May, 2023 2 commits
  22. 16 May, 2023 1 commit
  23. 10 May, 2023 2 commits
  24. 05 May, 2023 1 commit
    • Zhaoheng Ni's avatar
      Update squim tutorial (#3313) · 05ef7dc6
      Zhaoheng Ni authored
      Summary:
      Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3313
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45620311
      
      Pulled By: nateanl
      
      fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
      05ef7dc6
  25. 29 Apr, 2023 1 commit
  26. 28 Apr, 2023 1 commit
    • Yuekai Zhang's avatar
      Add cuctc decoder (#3096) · 0a1801ed
      Yuekai Zhang authored
      Summary:
      This PR implements a CUDA based ctc prefix beam search decoder.
      
      Attach serveral benchmark results using V100 below:
      |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
      |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
      | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
      | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
      
      Note:
      1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
      2. WER is the same as CPU implementations. However, it can't decode with LM now.
      
      Resolves: https://github.com/pytorch/audio/issues/2957.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3096
      
      Reviewed By: nateanl
      
      Differential Revision: D44709397
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
      0a1801ed
  27. 18 Apr, 2023 1 commit
  28. 31 Mar, 2023 1 commit
  29. 29 Mar, 2023 1 commit
    • moto's avatar
      Remove the note about AAC (#3214) · c07a96ab
      moto authored
      Summary:
      There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3214
      
      Reviewed By: nateanl
      
      Differential Revision: D44504030
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef
      c07a96ab
  30. 28 Mar, 2023 1 commit
  31. 16 Mar, 2023 1 commit
  32. 07 Mar, 2023 1 commit
    • Maciej Torhan's avatar
      Fix Adam and AdamW initializers in wav2letter example (#3145) · cea12eaf
      Maciej Torhan authored
      Summary:
      In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3145
      
      Reviewed By: mthrok
      
      Differential Revision: D43847713
      
      Pulled By: nateanl
      
      fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6
      cea12eaf
  33. 02 Mar, 2023 1 commit
  34. 24 Feb, 2023 1 commit