1. 31 May, 2023 1 commit
  2. 26 May, 2023 2 commits
    • atalman's avatar
      Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9
      atalman authored
      Summary:
      This reverts commit d38a7854.
      
      This is temporary revert to unblock unit test migration from circleci to github
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3377
      
      Reviewed By: mthrok
      
      Differential Revision: D46230498
      
      Pulled By: atalman
      
      fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d
      37779ef9
    • Lakshmi Krishnan's avatar
      Improve RNN-T streaming decoding (#3295) · 9fc0dcaa
      Lakshmi Krishnan authored
      Summary:
      This commit fixes the following issues affecting streaming decoding quality
      1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
      2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step.  This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
      3. Some minor errors regarding shape checking for length.
      
      This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3295
      
      Reviewed By: nateanl
      
      Differential Revision: D46216113
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
      9fc0dcaa
  3. 25 May, 2023 1 commit
    • Pingchuan Ma's avatar
      Add LRS3 AV-ASR recipe (#3278) · c6624fa6
      Pingchuan Ma authored
      Summary:
      This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes.
      
      CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3278
      
      Reviewed By: nateanl
      
      Differential Revision: D46121550
      
      Pulled By: mpc001
      
      fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
      c6624fa6
  4. 23 May, 2023 1 commit
  5. 21 May, 2023 2 commits
  6. 16 May, 2023 1 commit
  7. 10 May, 2023 2 commits
  8. 05 May, 2023 1 commit
    • Zhaoheng Ni's avatar
      Update squim tutorial (#3313) · 05ef7dc6
      Zhaoheng Ni authored
      Summary:
      Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3313
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45620311
      
      Pulled By: nateanl
      
      fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
      05ef7dc6
  9. 29 Apr, 2023 1 commit
  10. 28 Apr, 2023 1 commit
    • Yuekai Zhang's avatar
      Add cuctc decoder (#3096) · 0a1801ed
      Yuekai Zhang authored
      Summary:
      This PR implements a CUDA based ctc prefix beam search decoder.
      
      Attach serveral benchmark results using V100 below:
      |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
      |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
      | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
      | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
      
      Note:
      1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
      2. WER is the same as CPU implementations. However, it can't decode with LM now.
      
      Resolves: https://github.com/pytorch/audio/issues/2957.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3096
      
      Reviewed By: nateanl
      
      Differential Revision: D44709397
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
      0a1801ed
  11. 18 Apr, 2023 1 commit
  12. 31 Mar, 2023 1 commit
  13. 29 Mar, 2023 1 commit
    • moto's avatar
      Remove the note about AAC (#3214) · c07a96ab
      moto authored
      Summary:
      There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3214
      
      Reviewed By: nateanl
      
      Differential Revision: D44504030
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef
      c07a96ab
  14. 28 Mar, 2023 1 commit
  15. 16 Mar, 2023 1 commit
  16. 07 Mar, 2023 1 commit
    • Maciej Torhan's avatar
      Fix Adam and AdamW initializers in wav2letter example (#3145) · cea12eaf
      Maciej Torhan authored
      Summary:
      In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3145
      
      Reviewed By: mthrok
      
      Differential Revision: D43847713
      
      Pulled By: nateanl
      
      fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6
      cea12eaf
  17. 02 Mar, 2023 1 commit
  18. 24 Feb, 2023 2 commits
  19. 23 Feb, 2023 1 commit
    • G. Sun's avatar
      Add TCPGen context-biasing Conformer RNN-T (#2890) · 1ed330b5
      G. Sun authored
      Summary:
      This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing.
      
      An example for Librispeech can be found in audio/examples/asr/librispeech_biasing.
      
      Maintainer's note (mthrok):
      It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple
      could cause some issue without running the code, so the code is not changed, though the annotation uses tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2890
      
      Reviewed By: nateanl
      
      Differential Revision: D43171447
      
      Pulled By: mthrok
      
      fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e
      1ed330b5
  20. 16 Feb, 2023 2 commits
  21. 15 Feb, 2023 1 commit
  22. 14 Feb, 2023 1 commit
    • Zhaoheng Ni's avatar
      Update ssl example (#3060) · ff01be0f
      Zhaoheng Ni authored
      Summary:
      - Rename the current `ssl` example to `self_supervised_learning`
      - Add README to demonstrate how to run the recipe with hubert task
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3060
      
      Reviewed By: mthrok
      
      Differential Revision: D43287868
      
      Pulled By: nateanl
      
      fbshipit-source-id: 10352682485ef147ca32f4c4c9f9cde995444aa0
      ff01be0f
  23. 30 Jan, 2023 1 commit
    • Yan Li's avatar
      Fix hybrid demucs tutorial for CUDA (#3017) · da9d1627
      Yan Li authored
      Summary:
      Currently there will be a few errors when this tutorial is run with a CUDA device.
      
      The reasons being:
      - The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU).
      - When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3017
      
      Reviewed By: mthrok
      
      Differential Revision: D42828526
      
      Pulled By: nateanl
      
      fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762
      da9d1627
  24. 19 Jan, 2023 2 commits
    • Zhaoheng Ni's avatar
      Add modularized SSL training recipe (#2876) · 2eaefe27
      Zhaoheng Ni authored
      Summary:
      TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2876
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42617414
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
      2eaefe27
    • hwangjeff's avatar
      Simplify train step in Conformer RNN-T LibriSpeech recipe (#2981) · c6a52355
      hwangjeff authored
      Summary:
      In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2981
      
      Reviewed By: mthrok
      
      Differential Revision: D42507228
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
      c6a52355
  25. 17 Jan, 2023 1 commit
  26. 16 Jan, 2023 1 commit
    • Robin Scheibler's avatar
      Fixes examples/source_separation for WSJ0_2mix dataset (#2987) · f9d38796
      Robin Scheibler authored
      Summary:
      The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following.
      
      1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset
      2. Corrects `args.data_dir` to `args.root_dir` in eval.py
      3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2987
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42536992
      
      Pulled By: nateanl
      
      fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76
      f9d38796
  27. 13 Jan, 2023 1 commit
  28. 30 Dec, 2022 1 commit
  29. 17 Dec, 2022 1 commit
  30. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  31. 29 Nov, 2022 1 commit
  32. 28 Nov, 2022 1 commit
  33. 17 Nov, 2022 1 commit
  34. 16 Nov, 2022 1 commit