1. 05 Aug, 2022 3 commits
  2. 04 Aug, 2022 1 commit
  3. 03 Aug, 2022 2 commits
    • Sean Kim's avatar
      Add HDEMUCS_HIGH_MUSDB (#2601) · 6ecc11c2
      Sean Kim authored
      Summary:
      Add new model pretrained weights and tests
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2601
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D38396673
      
      Pulled By: skim0514
      
      fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1
      6ecc11c2
    • bshall's avatar
      An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a
      bshall authored
      Summary:
      I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
      - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
      - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
      - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
      - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?
      
      I hope this is helpful! looking forward to hearing from you.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2472
      
      Reviewed By: hwangjeff
      
      Differential Revision: D38389155
      
      Pulled By: carolineechen
      
      fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
      946b180a
  4. 02 Aug, 2022 1 commit
  5. 01 Aug, 2022 3 commits
  6. 30 Jul, 2022 1 commit
  7. 29 Jul, 2022 4 commits
    • moto's avatar
      Update forced alignment tutorial (#2544) · c26b38b2
      moto authored
      Summary:
      1. Fix initialization.
      Previously, the SOS token score was initialized to 0 across the time axis.
      This was biasing the alignment to delay the start.
      The proper way to delay the SOS is via blank token.
      The new initilization takes the cumulated sum of blank scores.
      2. Fill the end of trellis with Inf
      Similar to the start, at the end where there remaining time frame is less
      than the number of tokens, it is no longer possible to align the text, thus
      we fill with Inf for better visualization.
      3. Clean up asset management code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2544
      
      Reviewed By: nateanl
      
      Differential Revision: D38276478
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299
      c26b38b2
    • moto's avatar
      Enable CTC decoder in Windows (#2587) · 67cb420d
      moto authored
      Summary:
      This commit enables CTC decoder on Windows.
      
      The functionality seems to work fine.
      The tests are passing, the decoding tutorial runs fine.
      
      The only difference to the Linux/macOS version is that
      loading model in XZ compression format is not supported.
      
      ![289961785_399620772041679_7768117002438616376_n](https://user-images.githubusercontent.com/855818/181420923-cfbd8402-20de-4e63-b9e4-e39f9aa9fc50.png)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2587
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D38276490
      
      Pulled By: mthrok
      
      fbshipit-source-id: f2203b2235c5bbb0220fe560aaaf0e1d5530347a
      67cb420d
    • Javier Cardenete Morales's avatar
      Replace 'runtime_error' exception with 'TORCH_CHECK' in TorchAudio sox (#2592) · f234e51f
      Javier Cardenete Morales authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2592
      
      std::runtime_error does not preserve the C++ stack trace, so it is unclear to users what went wrong internally.
      
      PyTorch's TORCH_CHECK macro allows to print C++ stack trace when TORCH_SHOW_CPP_STACKTRACES environment variable is set to 1.
      
      Reviewed By: mthrok
      
      Differential Revision: D38219331
      
      fbshipit-source-id: f51c27111077e927f97127f73f83a31b8e74f61f
      f234e51f
    • Zhaoheng Ni's avatar
      Improve speech enhancement tutorial (#2527) · d6267031
      Zhaoheng Ni authored
      Summary:
      - The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
      - Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
      - FIx the figure in `rtf_power` subsection.
          - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
      - Print PESQ, STOI, and SDR metric scores.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2527
      
      Reviewed By: mthrok
      
      Differential Revision: D38190218
      
      Pulled By: nateanl
      
      fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de
      d6267031
  8. 28 Jul, 2022 7 commits
  9. 27 Jul, 2022 3 commits
  10. 26 Jul, 2022 5 commits
  11. 25 Jul, 2022 3 commits
  12. 22 Jul, 2022 2 commits
    • Sean Kim's avatar
      Add dimension and shape check (#2563) · b1f510fa
      Sean Kim authored
      Summary:
      Don't allow users to input incorrect dimensions
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2563
      
      Reviewed By: carolineechen
      
      Differential Revision: D38074360
      
      Pulled By: skim0514
      
      fbshipit-source-id: 7bcae515706eb358ca6f68c50c7c0ccace1c3f95
      b1f510fa
    • Zhaoheng Ni's avatar
      Add documents for SourceSeparationBundle (#2559) · 6cee56ab
      Zhaoheng Ni authored
      Summary:
      - Add documentation page for `SourceSeparationBundle` and `CONVTASNET_BASE_LIBRI2MIX`.
      - Add citation of Libri2Mix dataset in the bundle documentation.
      - url in integration test should use slash instead of `os.path.join` as it will fail on Windows. Change it to f-string.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2559
      
      Reviewed By: carolineechen
      
      Differential Revision: D38036116
      
      Pulled By: nateanl
      
      fbshipit-source-id: 736732805191113955badfec3955e2e24e8f4836
      6cee56ab
  13. 21 Jul, 2022 4 commits
  14. 20 Jul, 2022 1 commit