1. 29 Jul, 2022 4 commits
    • moto's avatar
      Update forced alignment tutorial (#2544) · c26b38b2
      moto authored
      Summary:
      1. Fix initialization.
      Previously, the SOS token score was initialized to 0 across the time axis.
      This was biasing the alignment to delay the start.
      The proper way to delay the SOS is via blank token.
      The new initilization takes the cumulated sum of blank scores.
      2. Fill the end of trellis with Inf
      Similar to the start, at the end where there remaining time frame is less
      than the number of tokens, it is no longer possible to align the text, thus
      we fill with Inf for better visualization.
      3. Clean up asset management code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2544
      
      Reviewed By: nateanl
      
      Differential Revision: D38276478
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299
      c26b38b2
    • moto's avatar
      Enable CTC decoder in Windows (#2587) · 67cb420d
      moto authored
      Summary:
      This commit enables CTC decoder on Windows.
      
      The functionality seems to work fine.
      The tests are passing, the decoding tutorial runs fine.
      
      The only difference to the Linux/macOS version is that
      loading model in XZ compression format is not supported.
      
      ![289961785_399620772041679_7768117002438616376_n](https://user-images.githubusercontent.com/855818/181420923-cfbd8402-20de-4e63-b9e4-e39f9aa9fc50.png)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2587
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D38276490
      
      Pulled By: mthrok
      
      fbshipit-source-id: f2203b2235c5bbb0220fe560aaaf0e1d5530347a
      67cb420d
    • Javier Cardenete Morales's avatar
      Replace 'runtime_error' exception with 'TORCH_CHECK' in TorchAudio sox (#2592) · f234e51f
      Javier Cardenete Morales authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2592
      
      std::runtime_error does not preserve the C++ stack trace, so it is unclear to users what went wrong internally.
      
      PyTorch's TORCH_CHECK macro allows to print C++ stack trace when TORCH_SHOW_CPP_STACKTRACES environment variable is set to 1.
      
      Reviewed By: mthrok
      
      Differential Revision: D38219331
      
      fbshipit-source-id: f51c27111077e927f97127f73f83a31b8e74f61f
      f234e51f
    • Zhaoheng Ni's avatar
      Improve speech enhancement tutorial (#2527) · d6267031
      Zhaoheng Ni authored
      Summary:
      - The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
      - Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
      - FIx the figure in `rtf_power` subsection.
          - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
      - Print PESQ, STOI, and SDR metric scores.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2527
      
      Reviewed By: mthrok
      
      Differential Revision: D38190218
      
      Pulled By: nateanl
      
      fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de
      d6267031
  2. 28 Jul, 2022 7 commits
  3. 27 Jul, 2022 3 commits
  4. 26 Jul, 2022 5 commits
  5. 25 Jul, 2022 3 commits
  6. 22 Jul, 2022 2 commits
    • Sean Kim's avatar
      Add dimension and shape check (#2563) · b1f510fa
      Sean Kim authored
      Summary:
      Don't allow users to input incorrect dimensions
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2563
      
      Reviewed By: carolineechen
      
      Differential Revision: D38074360
      
      Pulled By: skim0514
      
      fbshipit-source-id: 7bcae515706eb358ca6f68c50c7c0ccace1c3f95
      b1f510fa
    • Zhaoheng Ni's avatar
      Add documents for SourceSeparationBundle (#2559) · 6cee56ab
      Zhaoheng Ni authored
      Summary:
      - Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
      - Add citation of Libri2Mix dataset in the bundle documentation.
      - url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2559
      
      Reviewed By: carolineechen
      
      Differential Revision: D38036116
      
      Pulled By: nateanl
      
      fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836
      6cee56ab
  7. 21 Jul, 2022 4 commits
  8. 20 Jul, 2022 1 commit
  9. 19 Jul, 2022 3 commits
  10. 18 Jul, 2022 1 commit
  11. 15 Jul, 2022 1 commit
  12. 12 Jul, 2022 6 commits
    • moto's avatar
      Simplify the requirements to minimum runtime dependencies (#2313) · 632ea670
      moto authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2313
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D37799552
      
      Pulled By: mthrok
      
      fbshipit-source-id: 12e27fccb7098f3142e9ca0b748c71325cd324ee
      632ea670
    • Sean Kim's avatar
      Docstring change for Hybrid Demucs (#2542) · 99303143
      Sean Kim authored
      Summary:
      Small edit to docstring for kernel
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2542
      
      Reviewed By: carolineechen
      
      Differential Revision: D37797937
      
      Pulled By: skim0514
      
      fbshipit-source-id: 4bdd1e3ddb49cbdf2bd5367edb03cf9603d4ec6e
      99303143
    • moto's avatar
      Simplify HW acceleration code (#2534) · 4ba56323
      moto authored
      Summary:
      FFmpeg's API provide multiple ways to initialize decoder. This PR simplifies the initialization by delegating the HW device context management to FFmpeg's native code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2534
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37734573
      
      Pulled By: mthrok
      
      fbshipit-source-id: e61736b4d4d2ca6e94d8965abd93b4e9a68e7351
      4ba56323
    • Sean Kim's avatar
      Hybrid Demucs model implementation (#2506) · 608b8ea6
      Sean Kim authored
      Summary:
      Draft PR with initial model implementation with minor changes from previous implementation
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2506
      
      Reviewed By: nateanl
      
      Differential Revision: D37762671
      
      Pulled By: skim0514
      
      fbshipit-source-id: b7dc0a6ef725d6ae6d76c23c882623f7d339977c
      608b8ea6
    • moto's avatar
      Clean up the interface around dictionary (#2533) · e2641452
      moto authored
      Summary:
      Python dictionary is bound to different types in TorchBind and PyBind.
      StreamReader has methods that receive and return dictionary.
      
      This commit cleans up the treatment of dictionary and consolidate
      helper functions.
      
      * The core implementation and TorchBind all uses `c10::Dict`.
      * PyBind version uses `std::map` and converts it to `c10::Dict`.
      * The helper functions to convert `std::map` <-> `c10::Dict` are consolidated in pybind directory.
      * The wrapper methods are implemented in `pybind` dir.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2533
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37731866
      
      Pulled By: mthrok
      
      fbshipit-source-id: 5a5cf1372668f7d3aacc0bb461bc69fa07212f3f
      e2641452
    • Zhaoheng Ni's avatar
      Fix docstring (#2540) · 05d2580a
      Zhaoheng Ni authored
      Summary:
      The docstring of `apply_beamforming` has warning when building the documentation page. Fix it in this PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2540
      
      Reviewed By: mthrok
      
      Differential Revision: D37763745
      
      Pulled By: nateanl
      
      fbshipit-source-id: 0e9f1e098865af032b00ac56d918cb9d2ffc5024
      05d2580a