1. 05 May, 2023 1 commit
    • Zhaoheng Ni's avatar
      Update squim tutorial (#3313) · 05ef7dc6
      Zhaoheng Ni authored
      Summary:
      Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3313
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45620311
      
      Pulled By: nateanl
      
      fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
      05ef7dc6
  2. 29 Apr, 2023 1 commit
  3. 31 Mar, 2023 1 commit
  4. 29 Mar, 2023 1 commit
    • moto's avatar
      Remove the note about AAC (#3214) · c07a96ab
      moto authored
      Summary:
      There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3214
      
      Reviewed By: nateanl
      
      Differential Revision: D44504030
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef
      c07a96ab
  5. 28 Mar, 2023 1 commit
  6. 16 Mar, 2023 1 commit
  7. 02 Mar, 2023 1 commit
  8. 15 Feb, 2023 1 commit
  9. 30 Jan, 2023 1 commit
    • Yan Li's avatar
      Fix hybrid demucs tutorial for CUDA (#3017) · da9d1627
      Yan Li authored
      Summary:
      Currently there will be a few errors when this tutorial is run with a CUDA device.
      
      The reasons being:
      - The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU).
      - When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3017
      
      Reviewed By: mthrok
      
      Differential Revision: D42828526
      
      Pulled By: nateanl
      
      fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762
      da9d1627
  10. 17 Jan, 2023 1 commit
  11. 13 Jan, 2023 1 commit
  12. 30 Dec, 2022 1 commit
  13. 17 Dec, 2022 1 commit
  14. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  15. 29 Nov, 2022 1 commit
  16. 28 Nov, 2022 1 commit
  17. 17 Oct, 2022 1 commit
  18. 14 Oct, 2022 2 commits
  19. 13 Oct, 2022 2 commits
  20. 12 Oct, 2022 1 commit
  21. 07 Oct, 2022 1 commit
  22. 06 Oct, 2022 1 commit
  23. 05 Oct, 2022 1 commit
  24. 03 Oct, 2022 1 commit
  25. 23 Sep, 2022 1 commit
  26. 22 Sep, 2022 2 commits
  27. 21 Sep, 2022 2 commits
  28. 14 Sep, 2022 1 commit
  29. 13 Sep, 2022 1 commit
  30. 18 Aug, 2022 3 commits
    • moto's avatar
      Update ASR inference tutorial (#2631) · 189edb1b
      moto authored
      Summary:
      * Use download_asset
      * Remove notes around nightly
      * Print versions first
      * Remove duplicated import
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2631
      
      Reviewed By: carolineechen
      
      Differential Revision: D38830395
      
      Pulled By: mthrok
      
      fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6
      189edb1b
    • moto's avatar
      Update notes around nightly build and third parties (#2632) · 55ce80b1
      moto authored
      Summary:
      Google Colab now has torchaudio 0.12 pre-installed.
      This commit removes the note about nightly build.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2632
      
      Reviewed By: carolineechen
      
      Differential Revision: D38827632
      
      Pulled By: mthrok
      
      fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb
      55ce80b1
    • moto's avatar
      Tweak tutorials (#2630) · cab2bb44
      moto authored
      Summary:
      Resolves the following warnings
      
      ```
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
      /torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2630
      
      Reviewed By: nateanl
      
      Differential Revision: D38816632
      
      Pulled By: mthrok
      
      fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635
      cab2bb44
  31. 05 Aug, 2022 1 commit
    • Caroline Chen's avatar
      Add note for lexicon free decoder output (#2603) · 33485b8c
      Caroline Chen authored
      Summary:
      ``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.
      
      Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2603
      
      Reviewed By: mthrok
      
      Differential Revision: D38459709
      
      Pulled By: carolineechen
      
      fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934
      33485b8c
  32. 01 Aug, 2022 1 commit
  33. 29 Jul, 2022 2 commits
    • moto's avatar
      Update forced alignment tutorial (#2544) · c26b38b2
      moto authored
      Summary:
      1. Fix initialization.
      Previously, the SOS token score was initialized to 0 across the time axis.
      This was biasing the alignment to delay the start.
      The proper way to delay the SOS is via blank token.
      The new initilization takes the cumulated sum of blank scores.
      2. Fill the end of trellis with Inf
      Similar to the start, at the end where there remaining time frame is less
      than the number of tokens, it is no longer possible to align the text, thus
      we fill with Inf for better visualization.
      3. Clean up asset management code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2544
      
      Reviewed By: nateanl
      
      Differential Revision: D38276478
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299
      c26b38b2
    • Zhaoheng Ni's avatar
      Improve speech enhancement tutorial (#2527) · d6267031
      Zhaoheng Ni authored
      Summary:
      - The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
      - Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
      - FIx the figure in `rtf_power` subsection.
          - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
      - Print PESQ, STOI, and SDR metric scores.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2527
      
      Reviewed By: mthrok
      
      Differential Revision: D38190218
      
      Pulled By: nateanl
      
      fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de
      d6267031