1. 07 Aug, 2023 1 commit
  2. 31 Jul, 2023 1 commit
  3. 28 Jul, 2023 1 commit
  4. 25 Jul, 2023 1 commit
  5. 12 Jul, 2023 1 commit
  6. 11 Jul, 2023 1 commit
  7. 05 Jul, 2023 1 commit
  8. 13 Jun, 2023 1 commit
  9. 08 Jun, 2023 1 commit
    • Kuba Rad's avatar
      Optimize Torchaudio Vad (#3382) · 1e117f57
      Kuba Rad authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3382
      
      The voice activity detector function was unoptimized, confusingly written, and buggy.
      
      The optimizations created here allow for the function to run roughly 17x faster.
      The main optimizations were to loop over windows of audio rather than individual audio samples. Reducing the number of copies also helped.
      
      There was an off by one error where the array slice referenced was [1: 16001] (for the default settings) instead of [0: 16000]
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44749359
      
      fbshipit-source-id: c76c9412e70cdc6fcd527d113603c88f78480558
      1e117f57
  10. 07 Jun, 2023 1 commit
  11. 06 Jun, 2023 2 commits
  12. 02 Jun, 2023 1 commit
    • moto's avatar
      [BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5
      moto authored
      Summary:
      This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.
      
      Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.
      
      The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.
      
      Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.
      
      See some of the discussion https://github.com/pytorch/audio/issues/1269
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3368
      
      Differential Revision: D46406176
      
      Pulled By: mthrok
      
      fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
      5bbbb1d5
  13. 01 Jun, 2023 2 commits
  14. 24 May, 2023 1 commit
  15. 22 May, 2023 1 commit
  16. 20 May, 2023 1 commit
  17. 04 May, 2023 1 commit
    • Xiaohui Zhang's avatar
      Extend mask_along_axis{,_iid} (#3289) · 74bd971a
      Xiaohui Zhang authored
      Summary:
      (1/2 of the previous [PR](https://github.com/pytorch/audio/pull/2360) which I accidentally closed)
      
      The previous way of doing SpecAugment via Frequency/TimeMasking transforms has the following problems:
      - Only zero masking can be done; masking by mean value is not supported.
      - mask_along_axis is hard-coded to mask the 1st dimension and mask_along_axis_iid is hard-code to mask the 2nd or 3rd dimension of the input tensor.
      - For 3D spectrogram tensors where the first dimension is batch or channel, features from the same batch or different channels have to use the same mask, because mask_along_axis_iid only support 4D tensors, because of the above hard-coding
      - For 2D spectrogram tensors w/o a batch or channel dimension, Time/Frequency masking can't be applied at all, since mask_along_axis only support 3D tensors, because of the above hard-coding.
      - It's not straightforward to apply multiple time/frequency masks by the current design.
      
      To solve these issues, here we
      - Extend mask_along_axis_iid to support 3D tensors and mask_along_axis to support 2D tensors. Now both of them are able to mask one of the last two dimensions (where the time or frequency dimension lives) of the input tensor.
      
      The introduction of SpecAugment transform will be done in another PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3289
      
      Reviewed By: hwangjeff
      
      Differential Revision: D45460357
      
      Pulled By: xiaohui-zhang
      
      fbshipit-source-id: 91bf448294799f13789d96a13d4bae2451461ef3
      74bd971a
  18. 08 Mar, 2023 1 commit
  19. 17 Feb, 2023 1 commit
  20. 15 Feb, 2023 1 commit
  21. 24 Jan, 2023 1 commit
  22. 12 Jan, 2023 1 commit
    • mthrok's avatar
      Refactor extension modules initialization (#2968) · 5dfe0b22
      mthrok authored
      Summary:
      * Refactor _extension module so that
        * the implementation of initialization logic and its execution are separated.
          * logic goes to `_extension.utils`
          * the execution is at `_extension.__init__`
          * global variables are defined and modified in `__init__`.
      * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED`
      * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE`
      * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`.
      * Merge the sox-related initialization logic in `_extension.utils` module.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2968
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42387251
      
      Pulled By: mthrok
      
      fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f
      5dfe0b22
  23. 05 Jan, 2023 1 commit
    • moto's avatar
      Fix filtering function fallback mechanism (#2953) · 5428e283
      moto authored
      Summary:
      lfilter, overdrive have faster implementation written in C++. If they are not available, torchaudio is supposed to fall back on Python-based implementation.
      
      The original fallback mechanism relied on error type and messages from PyTorch core, which has been changed.
      
      This commit updates it for more proper fallback mechanism.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2953
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42344893
      
      Pulled By: mthrok
      
      fbshipit-source-id: 18ce5c1aa1c69d0d2ab469b0b0c36c0221f5ccfd
      5428e283
  24. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  25. 14 Nov, 2022 1 commit
  26. 10 Nov, 2022 1 commit
  27. 08 Nov, 2022 1 commit
    • Caroline Chen's avatar
      Enable log probs input for rnnt loss (#2798) · ca478823
      Caroline Chen authored
      Summary:
      Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss.
      
      If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function.
      
      The following should produce the same output:
      ```
      rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True)
      ```
      
      ```
      log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
      rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False)
      ```
      
      testing -- unit tests + get same results on the conformer rnnt recipe
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2798
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D41083523
      
      Pulled By: carolineechen
      
      fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61
      ca478823
  28. 15 Sep, 2022 1 commit
  29. 16 Aug, 2022 1 commit
  30. 10 Aug, 2022 1 commit
  31. 03 Aug, 2022 1 commit
    • bshall's avatar
      An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a
      bshall authored
      Summary:
      I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
      - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
      - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
      - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
      - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?
      
      I hope this is helpful! looking forward to hearing from you.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2472
      
      Reviewed By: hwangjeff
      
      Differential Revision: D38389155
      
      Pulled By: carolineechen
      
      fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
      946b180a
  32. 28 Jul, 2022 1 commit
  33. 27 Jul, 2022 1 commit
    • Piyush Soni's avatar
      Replace assert with raise (#2579) · 0f4e1e8c
      Piyush Soni authored
      Summary:
      `assert` is not executed when running in optimized mode.
      
      This commit replaces all instances of "assert" in /fbcode/pytorch/audio/torchaudio/functional/functional.py
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2579
      
      Reviewed By: mthrok
      
      Differential Revision: D38158280
      
      fbshipit-source-id: f8d7fca1c8f9b3955c6ca312b16947eb12894d81
      0f4e1e8c
  34. 25 Jul, 2022 1 commit
  35. 21 Jul, 2022 1 commit
  36. 20 Jul, 2022 1 commit
  37. 12 Jul, 2022 1 commit
    • Zhaoheng Ni's avatar
      Fix docstring (#2540) · 05d2580a
      Zhaoheng Ni authored
      Summary:
      The docstring of `apply_beamforming` has warning when building the documentation page. Fix it in this PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2540
      
      Reviewed By: mthrok
      
      Differential Revision: D37763745
      
      Pulled By: nateanl
      
      fbshipit-source-id: 0e9f1e098865af032b00ac56d918cb9d2ffc5024
      05d2580a
  38. 13 Jun, 2022 1 commit