1. 15 Jan, 2023 1 commit
    • Zhaoheng Ni's avatar
      Add pre-trained pipelines for XLS-R models (#2978) · 9b7b64e4
      Zhaoheng Ni authored
      Summary:
      The PR adds three `Wav2Vec2Bundle ` pipeline objects for XLS-R models:
      - WAV2VEC2_XLSR_300M
      - WAV2VEC2_XLSR_1B
      - WAV2VEC2_XLSR_2B
      
      All three models use layer normalization in the feature extraction layers, hence `_normalize_waveform` is set to `True`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2978
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42501491
      
      Pulled By: nateanl
      
      fbshipit-source-id: 2429ec880cc14798034843381e458e1b4664dac3
      9b7b64e4
  2. 14 Jan, 2023 1 commit
  3. 13 Jan, 2023 1 commit
  4. 12 Jan, 2023 2 commits
    • mthrok's avatar
      Refactor extension modules initialization (#2968) · 5dfe0b22
      mthrok authored
      Summary:
      * Refactor _extension module so that
        * the implementation of initialization logic and its execution are separated.
          * logic goes to `_extension.utils`
          * the execution is at `_extension.__init__`
          * global variables are defined and modified in `__init__`.
      * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED`
      * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE`
      * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`.
      * Merge the sox-related initialization logic in `_extension.utils` module.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2968
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42387251
      
      Pulled By: mthrok
      
      fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f
      5dfe0b22
    • moto's avatar
      Add `buffer_chunk_size=-1` option (#2969) · 22788a8f
      moto authored
      Summary:
      This commit adds `buffer_chunk_size=-1`, which does not drop buffered frames.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2969
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42403467
      
      Pulled By: mthrok
      
      fbshipit-source-id: a0847e6878874ce7e4b0ec3f56e5fbb8ebdb5992
      22788a8f
  5. 10 Jan, 2023 1 commit
    • moto's avatar
      Update the handling of videos without PTS values (#2970) · 1717edaa
      moto authored
      Summary:
      filter graph does not fallback to `best_effort_timestamp`, thus applying filters (like changing fps) on videos without PTS values failed.
      
      This commit changes the behavior by overwriting the PTS values with best_effort_timestamp.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2970
      
      Reviewed By: YosuaMichael
      
      Differential Revision: D42425771
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7b7a033ea2ad89bb49d6e1663d35d377dab2aae9
      1717edaa
  6. 06 Jan, 2023 2 commits
  7. 05 Jan, 2023 2 commits
  8. 04 Jan, 2023 1 commit
  9. 30 Dec, 2022 1 commit
    • moto's avatar
      Refactor and optimize yuv420p and nv12 processing (#2945) · cc0d1e0b
      moto authored
      Summary:
      This commit refactors and optimizes functions that converts AVFrames of `yuv420p` and `nv12` into PyTorch's Tensor.
      The performance is improved about 30%.
      
      1. Reduce the number of intermediate Tensors allocated.
      2. Replace 2 calls to `repeat_interleave` with `F::interpolate`.
      
       * (`F::interpolate` is about 5x faster than `repeat_interleave`. )
          <details><summary>code</summary>
      
          ```bash
          #!/usr/bin/env bash
      
          set -e
      
          python -c """
          import torch
          import torch.nn.functional as F
      
          a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
          val1 = a.repeat_interleave(2, -1).repeat_interleave(2, -2)
          val2 = F.interpolate(a.view((1, 1, 7, 7, 1)), size=[14, 14, 1], mode=\"nearest\")
          print(torch.sum(torch.abs(val1 - val2[0, 0, :, :, 0])))
          """
      
          python3 -m timeit \
                  --setup """
          import torch
      
          a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
          """ \
                  """
          a.repeat_interleave(2, -1).repeat_interleave(2, -2)
          """
      
          python3 -m timeit \
                  --setup """
          import torch
          import torch.nn.functional as F
      
          a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
          """ \
                  """
          F.interpolate(a.view((1, 1, 7, 7, 1)), size=[14, 14, 1], mode=\"nearest\")
          """
          ```
      
          </details>
      
          ```
          tensor(0)
          10000 loops, best of 5: 38.3 usec per loop
          50000 loops, best of 5: 7.1 usec per loop
          ```
      
      ## Benchmark Result
      
      <details><summary>code</summary>
      
      ```bash
      #!/usr/bin/env bash
      
      set -e
      
      mkdir -p tmp
      
      for ext in avi mp4; do
          for duration in 1 5 10 30 60; do
              printf "Testing ${ext} ${duration} [sec]\n"
      
              test_data="tmp/test_${duration}.${ext}"
              if [ ! -f "${test_data}" ]; then
                  printf "Generating test data\n"
                  ffmpeg -hide_banner -f lavfi -t ${duration} -i testsrc "${test_data}" > /dev/null 2>&1
              fi
      
              python -m timeit \
                     --setup="from torchaudio.io import StreamReader" \
                     """
      r = StreamReader(\"${test_data}\")
      r.add_basic_video_stream(frames_per_chunk=-1, format=\"yuv420p\")
      r.process_all_packets()
      r.pop_chunks()
      """
          done
      done
      ```
      
      </details>
      
      ![Time to decode AVI file](https://user-images.githubusercontent.com/855818/210008881-8cc83f18-0e51-46e3-afe9-a5ff5dff041e.png)
      
      <details><summary>raw data</summary>
      
      Video Type - AVI
      Duration | Before | After
      -- | -- | --
      1 | 10.3 | 6.29
      5 | 44.3 | 28.3
      10 | 89.3 | 56.9
      30 | 265 | 185
      60 | 555 | 353
      </details>
      
      ![Time to decode MP4 file](https://user-images.githubusercontent.com/855818/210008891-c4546c52-43d7-49d0-8eff-d866ad627129.png)
      
      <details><summary>raw data</summary>
      
      Video Type - MP4
      Duration | Before | After
      -- | -- | --
      1 | 15.3 | 10.5
      5 | 62.1 | 43.2
      10 | 124 | 83.8
      30 | 380 | 252
      60 | 721 | 511
      </details>
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2945
      
      Reviewed By: carolineechen
      
      Differential Revision: D42283269
      
      Pulled By: mthrok
      
      fbshipit-source-id: 59840f943ff516b69ab8ad35fed7104c48a0bf0c
      cc0d1e0b
  10. 22 Dec, 2022 1 commit
  11. 21 Dec, 2022 1 commit
    • moto's avatar
      Extract libsox integration from libtorchaudio (#2929) · 1706a72f
      moto authored
      Summary:
      This commit makes the following changes to the C++ library organization
      - Move sox-related feature implementations from `libtorchaudio` to `libtorchaudio_sox`.
      - Remove C++ implementation of `is_sox_available` and `is_ffmpeg_available` as it is now sufficient to check the existence of `libtorchaudio_sox` and `libtorchaudio_ffmpeg` to check the availability. This makes `libtorchaudio_sox` and `libtorchaudio_ffmpeg` independent from `libtorchaudio`.
      - Move PyBind11-based bindings (`_torchaudio_sox`, `_torchaudio_ffmpeg`) into `torchaudio.lib` so that the built library structure is less cluttered.
      
      Background:
      Originally, when the `libsox` was the only C++ extension and `libtorchaudio` was supposed to contain all the C++ code.
      The things are different now. We have a bunch of C++ extensions and we need to make the code/build structure more modular.
      
      The new `libtorchaudio_sox` contains the implementations and `_torchaudio_sox` contains the PyBin11-based bindings.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2929
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42159594
      
      Pulled By: mthrok
      
      fbshipit-source-id: 1a0fbca9e4143137f6363fc001b2378ce6029aa7
      1706a72f
  12. 20 Dec, 2022 1 commit
  13. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  14. 09 Dec, 2022 2 commits
  15. 08 Dec, 2022 1 commit
  16. 07 Dec, 2022 2 commits
  17. 06 Dec, 2022 1 commit
  18. 04 Dec, 2022 1 commit
  19. 02 Dec, 2022 1 commit
  20. 30 Nov, 2022 1 commit
  21. 29 Nov, 2022 3 commits
  22. 28 Nov, 2022 2 commits
  23. 19 Nov, 2022 1 commit
  24. 18 Nov, 2022 1 commit
  25. 17 Nov, 2022 2 commits
  26. 15 Nov, 2022 1 commit
  27. 14 Nov, 2022 1 commit
  28. 10 Nov, 2022 2 commits
  29. 09 Nov, 2022 1 commit
  30. 08 Nov, 2022 1 commit
    • Caroline Chen's avatar
      Enable log probs input for rnnt loss (#2798) · ca478823
      Caroline Chen authored
      Summary:
      Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss.
      
      If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function.
      
      The following should produce the same output:
      ```
      rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True)
      ```
      
      ```
      log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
      rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False)
      ```
      
      testing -- unit tests + get same results on the conformer rnnt recipe
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2798
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D41083523
      
      Pulled By: carolineechen
      
      fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61
      ca478823