1. 28 Nov, 2022 1 commit
  2. 17 Oct, 2022 1 commit
  3. 14 Oct, 2022 2 commits
  4. 13 Oct, 2022 2 commits
  5. 12 Oct, 2022 1 commit
  6. 07 Oct, 2022 1 commit
  7. 06 Oct, 2022 1 commit
  8. 05 Oct, 2022 1 commit
  9. 03 Oct, 2022 1 commit
  10. 23 Sep, 2022 1 commit
  11. 22 Sep, 2022 2 commits
  12. 21 Sep, 2022 2 commits
  13. 14 Sep, 2022 1 commit
  14. 13 Sep, 2022 1 commit
  15. 18 Aug, 2022 3 commits
    • moto's avatar
      Update ASR inference tutorial (#2631) · 189edb1b
      moto authored
      Summary:
      * Use download_asset
      * Remove notes around nightly
      * Print versions first
      * Remove duplicated import
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2631
      
      Reviewed By: carolineechen
      
      Differential Revision: D38830395
      
      Pulled By: mthrok
      
      fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6
      189edb1b
    • moto's avatar
      Update notes around nightly build and third parties (#2632) · 55ce80b1
      moto authored
      Summary:
      Google Colab now has torchaudio 0.12 pre-installed.
      This commit removes the note about nightly build.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2632
      
      Reviewed By: carolineechen
      
      Differential Revision: D38827632
      
      Pulled By: mthrok
      
      fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb
      55ce80b1
    • moto's avatar
      Tweak tutorials (#2630) · cab2bb44
      moto authored
      Summary:
      Resolves the following warnings
      
      ```
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
      /torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
      /torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2630
      
      Reviewed By: nateanl
      
      Differential Revision: D38816632
      
      Pulled By: mthrok
      
      fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635
      cab2bb44
  16. 05 Aug, 2022 1 commit
    • Caroline Chen's avatar
      Add note for lexicon free decoder output (#2603) · 33485b8c
      Caroline Chen authored
      Summary:
      ``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.
      
      Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2603
      
      Reviewed By: mthrok
      
      Differential Revision: D38459709
      
      Pulled By: carolineechen
      
      fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934
      33485b8c
  17. 01 Aug, 2022 1 commit
  18. 29 Jul, 2022 2 commits
    • moto's avatar
      Update forced alignment tutorial (#2544) · c26b38b2
      moto authored
      Summary:
      1. Fix initialization.
      Previously, the SOS token score was initialized to 0 across the time axis.
      This was biasing the alignment to delay the start.
      The proper way to delay the SOS is via blank token.
      The new initilization takes the cumulated sum of blank scores.
      2. Fill the end of trellis with Inf
      Similar to the start, at the end where there remaining time frame is less
      than the number of tokens, it is no longer possible to align the text, thus
      we fill with Inf for better visualization.
      3. Clean up asset management code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2544
      
      Reviewed By: nateanl
      
      Differential Revision: D38276478
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299
      c26b38b2
    • Zhaoheng Ni's avatar
      Improve speech enhancement tutorial (#2527) · d6267031
      Zhaoheng Ni authored
      Summary:
      - The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
      - Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
      - FIx the figure in `rtf_power` subsection.
          - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
      - Print PESQ, STOI, and SDR metric scores.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2527
      
      Reviewed By: mthrok
      
      Differential Revision: D38190218
      
      Pulled By: nateanl
      
      fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de
      d6267031
  19. 28 Jul, 2022 1 commit
    • Sean Kim's avatar
      Create tutorial for HDemucs (#2572) · 919fd0c4
      Sean Kim authored
      Summary:
      Add tutorial python file, draft PR, will continue to modify accordingly to feedback.
      
      Future plan: modify spectrogram and bottom audio design and work on finding best audio track and segments
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2572
      
      Reviewed By: carolineechen, nateanl, mthrok
      
      Differential Revision: D38234001
      
      Pulled By: skim0514
      
      fbshipit-source-id: fe9207864f354dec5cf5ff52bf7d9ddcf4a001d5
      919fd0c4
  20. 08 Jun, 2022 2 commits
  21. 07 Jun, 2022 2 commits
  22. 03 Jun, 2022 3 commits
  23. 02 Jun, 2022 1 commit
    • Zhaoheng Ni's avatar
      Update MVDR beamforming tutorial (#2398) · d01f5891
      Zhaoheng Ni authored
      Summary:
      - Use `download_asset` to download audios.
      - Replace `MVDR` module with new-added `SoudenMVDR` and `RTFMVDR` modules.
      - Benchmark performances of `F.rtf_evd` and `F.rtf_power` for RTF computation.
      - Visualize the spectrograms and masks.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2398
      
      Reviewed By: carolineechen
      
      Differential Revision: D36549402
      
      Pulled By: nateanl
      
      fbshipit-source-id: dfd6754e6c33246e6991ccc51c4603b12502a1b5
      d01f5891
  24. 01 Jun, 2022 1 commit
    • Caroline Chen's avatar
      Move CTC beam search decoder to beta (#2410) · 93024ace
      Caroline Chen authored
      Summary:
      Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.
      
      hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2410
      
      Reviewed By: mthrok
      
      Differential Revision: D36784521
      
      Pulled By: carolineechen
      
      fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed
      93024ace
  25. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d
  26. 13 May, 2022 1 commit
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  27. 12 May, 2022 1 commit
  28. 28 Apr, 2022 1 commit
  29. 26 Apr, 2022 1 commit