1. 07 Aug, 2023 1 commit
  2. 31 Jul, 2023 1 commit
  3. 28 Jul, 2023 1 commit
  4. 25 Jul, 2023 1 commit
  5. 12 Jul, 2023 1 commit
  6. 11 Jul, 2023 1 commit
  7. 05 Jul, 2023 1 commit
  8. 13 Jun, 2023 1 commit
  9. 07 Jun, 2023 1 commit
  10. 06 Jun, 2023 2 commits
  11. 02 Jun, 2023 1 commit
    • moto's avatar
      [BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5
      moto authored
      Summary:
      This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.
      
      Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.
      
      The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.
      
      Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.
      
      See some of the discussion https://github.com/pytorch/audio/issues/1269
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3368
      
      Differential Revision: D46406176
      
      Pulled By: mthrok
      
      fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
      5bbbb1d5
  12. 01 Jun, 2023 2 commits
  13. 24 May, 2023 1 commit
  14. 22 May, 2023 1 commit
  15. 20 May, 2023 1 commit
  16. 04 May, 2023 1 commit
    • Xiaohui Zhang's avatar
      Extend mask_along_axis{,_iid} (#3289) · 74bd971a
      Xiaohui Zhang authored
      Summary:
      (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design.
      
      To solve these issues, here we
      - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      
      The introduction of SpecAugment transform will be done in another PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3289
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45460357
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
      74bd971a
  17. 08 Mar, 2023 1 commit
  18. 17 Feb, 2023 1 commit
  19. 15 Feb, 2023 1 commit
  20. 24 Jan, 2023 1 commit
  21. 12 Jan, 2023 1 commit
    • mthrok's avatar
      Refactor extension modules initialization (#2968) · 5dfe0b22
      mthrok authored
      Summary:
      * Refactor _extension module so that
        * the implementation of initialization logic and its execution are separated.
          * logic goes to `_extension.utils`
          * the execution is at `_extension.__init__`
          * global variables are defined and modified in `__init__`.
      * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED`
      * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE`
      * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`.
      * Merge the sox-related initialization logic in `_extension.utils` module.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2968
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42387251
      
      Pulled By: mthrok
      
      fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f
      5dfe0b22
  22. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  23. 14 Nov, 2022 1 commit
  24. 10 Nov, 2022 1 commit
  25. 08 Nov, 2022 1 commit
    • Caroline Chen's avatar
      Enable log probs input for rnnt loss (#2798) · ca478823
      Caroline Chen authored
      Summary:
      Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss.
      
      If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function.
      
      The following should produce the same output:
      ```
      rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True)
      ```
      
      ```
      log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
      rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False)
      ```
      
      testing -- unit tests + get same results on the conformer rnnt recipe
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2798
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D41083523
      
      Pulled By: carolineechen
      
      fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61
      ca478823
  26. 15 Sep, 2022 1 commit
  27. 16 Aug, 2022 1 commit
  28. 03 Aug, 2022 1 commit
    • bshall's avatar
      An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a
      bshall authored
      Summary:
      I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
      - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
      - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
      - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
      - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?
      
      I hope this is helpful! looking forward to hearing from you.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2472
      
      Reviewed By: hwangjeff
      
      Differential Revision: D38389155
      
      Pulled By: carolineechen
      
      fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
      946b180a
  29. 28 Jul, 2022 1 commit
  30. 27 Jul, 2022 1 commit
    • Piyush Soni's avatar
      Replace assert with raise (#2579) · 0f4e1e8c
      Piyush Soni authored
      Summary:
      `assert` is not executed when running in optimized mode.
      
      This commit replaces all instances of "assert" in /fbcode/pytorch/audio/torchaudio/functional/functional.py
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2579
      
      Reviewed By: mthrok
      
      Differential Revision: D38158280
      
      fbshipit-source-id: f8d7fca1c8f9b3955c6ca312b16947eb12894d81
      0f4e1e8c
  31. 25 Jul, 2022 1 commit
  32. 21 Jul, 2022 1 commit
  33. 20 Jul, 2022 1 commit
  34. 12 Jul, 2022 1 commit
    • Zhaoheng Ni's avatar
      Fix docstring (#2540) · 05d2580a
      Zhaoheng Ni authored
      Summary:
      The docstring of `apply_beamforming` has warning when building the documentation page. Fix it in this PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2540
      
      Reviewed By: mthrok
      
      Differential Revision: D37763745
      
      Pulled By: nateanl
      
      fbshipit-source-id: 0e9f1e098865af032b00ac56d918cb9d2ffc5024
      05d2580a
  35. 13 Jun, 2022 1 commit
  36. 10 Jun, 2022 1 commit
  37. 02 Jun, 2022 1 commit
    • moto's avatar
      Remove mad (#2428) · d2ecba98
      moto authored
      Summary:
      Remove the code related to libmad, which had been disabled in https://github.com/pytorch/audio/issues/2354
      
      In https://github.com/pytorch/audio/issues/2419, we mp3 decoding to ffmpeg. But CI tests were still using libmad.
      This commit completely removes libmad from torchaudio.
      
      This is BC-breaking change as `apply_sox_effects_file` function cannot handle MP3, and it cannot fallback to ffmpeg.
      The workaround for this is to use `torchaudio.load` then `apply_sox_effects_tensor`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2428
      
      Reviewed By: carolineechen
      
      Differential Revision: D36851805
      
      Pulled By: mthrok
      
      fbshipit-source-id: f98795c59a1ac61cef511f2bbeac37f7c3c69d55
      d2ecba98
  38. 23 May, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add assertion checks to multi-channel functions (#2401) · 38e530d7
      Zhaoheng Ni authored
      Summary:
      - The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
      - The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
      - The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
      - The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2401
      
      Reviewed By: carolineechen
      
      Differential Revision: D36597689
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6
      38e530d7