1. 15 Feb, 2023 1 commit
  2. 14 Feb, 2023 1 commit
  3. 07 Feb, 2023 1 commit
    • juan.azcarreta.ortiz's avatar
      Add playback function (#3026) · 2ead941e
      juan.azcarreta.ortiz authored
      Summary:
      Allows user to play audio through the
      device speaker.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3026
      
      Test Plan:
      Created a new test that mocks a call to the write audio chunk method from StreamWriter. To run the test:
      
      `pytest test/torchaudio_unittest/io/_playback_test.py`
      
      Reviewed By: mthrok
      
      Differential Revision: D43082062
      
      Pulled By: jazcarretao
      
      fbshipit-source-id: 01a85b32ce925687a633d1208d15d54556e89dd8
      2ead941e
  4. 04 Feb, 2023 1 commit
  5. 03 Feb, 2023 1 commit
    • moto's avatar
      Add Linux GPU unit tests on GHA (#3029) · 6bdd3830
      moto authored
      Summary:
      Add GitHub Action-based GPU test jobs.
      - It seems that there is 2 hour upper cap so only running CUDA/GPU tests.
      - Since Kaldi related features are not available, they are disabled.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3029
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42983800
      
      Pulled By: mthrok
      
      fbshipit-source-id: 47fefe39c635d1c73ad6799ddacefd2666fe5403
      6bdd3830
  6. 01 Feb, 2023 2 commits
  7. 27 Jan, 2023 1 commit
  8. 26 Jan, 2023 1 commit
  9. 24 Jan, 2023 1 commit
  10. 22 Jan, 2023 1 commit
    • moto's avatar
      Make StreamReader return PTS (#2975) · 0dd59e0d
      moto authored
      Summary:
      This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.
      
      Example
      
      ```python
      from torchaudio.io import StreamReader
      
      s = StreamReader(...)
      s.add_video_stream(...)
      for (video_chunk, ) in s.stream():
          # video_chunk is Torch tensor type but has extra attribute of PTS
          print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
      ```
      
      For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
      of Tensor and metadata, but works like a normal tensor in PyTorch operations.
      
      The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).
      
      It was also suggested to attach metadata directly to Tensor object,
      but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
      PyTorch cannot be ignored, so we use Tensor subclass implementation.
      
      If any unexpected issue arise from metadata attribute name collision, client code can
      fetch the bare Tensor and continue.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2975
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42526945
      
      Pulled By: mthrok
      
      fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35
      0dd59e0d
  11. 19 Jan, 2023 1 commit
  12. 16 Jan, 2023 1 commit
  13. 15 Jan, 2023 1 commit
    • Zhaoheng Ni's avatar
      Add pre-trained pipelines for XLS-R models (#2978) · 9b7b64e4
      Zhaoheng Ni authored
      Summary:
      The PR adds three `Wav2Vec2Bundle ` pipeline objects for XLS-R models:
      - WAV2VEC2_XLSR_300M
      - WAV2VEC2_XLSR_1B
      - WAV2VEC2_XLSR_2B
      
      All three models use layer normalization in the feature extraction layers, hence `_normalize_waveform` is set to `True`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2978
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42501491
      
      Pulled By: nateanl
      
      fbshipit-source-id: 2429ec880cc14798034843381e458e1b4664dac3
      9b7b64e4
  14. 14 Jan, 2023 1 commit
  15. 13 Jan, 2023 1 commit
  16. 12 Jan, 2023 2 commits
    • mthrok's avatar
      Refactor extension modules initialization (#2968) · 5dfe0b22
      mthrok authored
      Summary:
      * Refactor _extension module so that
        * the implementation of initialization logic and its execution are separated.
          * logic goes to `_extension.utils`
          * the execution is at `_extension.__init__`
          * global variables are defined and modified in `__init__`.
      * Replace `is_sox_available()` with `_extension._SOX_INITIALIZED`
      * Replace `is_kaldi_available()` with `_extension._IS_KALDI_AVAILABLE`
      * Move `requies_sox()` and `requires_kaldi()` to break the circular dependency among `_extension` and `_internal.module_utils`.
      * Merge the sox-related initialization logic in `_extension.utils` module.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2968
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42387251
      
      Pulled By: mthrok
      
      fbshipit-source-id: 0c3245dfab53f9bc1b8a83ec2622eb88ec96673f
      5dfe0b22
    • moto's avatar
      Add `buffer_chunk_size=-1` option (#2969) · 22788a8f
      moto authored
      Summary:
      This commit adds `buffer_chunk_size=-1`, which does not drop buffered frames.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2969
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42403467
      
      Pulled By: mthrok
      
      fbshipit-source-id: a0847e6878874ce7e4b0ec3f56e5fbb8ebdb5992
      22788a8f
  17. 10 Jan, 2023 1 commit
    • moto's avatar
      Update the handling of videos without PTS values (#2970) · 1717edaa
      moto authored
      Summary:
      filter graph does not fallback to `best_effort_timestamp`, thus applying filters (like changing fps) on videos without PTS values failed.
      
      This commit changes the behavior by overwriting the PTS values with best_effort_timestamp.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2970
      
      Reviewed By: YosuaMichael
      
      Differential Revision: D42425771
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7b7a033ea2ad89bb49d6e1663d35d377dab2aae9
      1717edaa
  18. 06 Jan, 2023 2 commits
  19. 05 Jan, 2023 2 commits
  20. 04 Jan, 2023 1 commit
  21. 30 Dec, 2022 1 commit
    • moto's avatar
      Refactor and optimize yuv420p and nv12 processing (#2945) · cc0d1e0b
      moto authored
      Summary:
      This commit refactors and optimizes functions that converts AVFrames of `yuv420p` and `nv12` into PyTorch's Tensor.
      The performance is improved about 30%.
      
      1. Reduce the number of intermediate Tensors allocated.
      2. Replace 2 calls to `repeat_interleave` with `F::interpolate`.
      
       * (`F::interpolate` is about 5x faster than `repeat_interleave`. )
          <details><summary>code</summary>
      
          ```bash
          #!/usr/bin/env bash
      
          set -e
      
          python -c """
          import torch
          import torch.nn.functional as F
      
          a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
          val1 = a.repeat_interleave(2, -1).repeat_interleave(2, -2)
          val2 = F.interpolate(a.view((1, 1, 7, 7, 1)), size=[14, 14, 1], mode=\"nearest\")
          print(torch.sum(torch.abs(val1 - val2[0, 0, :, :, 0])))
          """
      
          python3 -m timeit \
                  --setup """
          import torch
      
          a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
          """ \
                  """
          a.repeat_interleave(2, -1).repeat_interleave(2, -2)
          """
      
          python3 -m timeit \
                  --setup """
          import torch
          import torch.nn.functional as F
      
          a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
          """ \
                  """
          F.interpolate(a.view((1, 1, 7, 7, 1)), size=[14, 14, 1], mode=\"nearest\")
          """
          ```
      
          </details>
      
          ```
          tensor(0)
          10000 loops, best of 5: 38.3 usec per loop
          50000 loops, best of 5: 7.1 usec per loop
          ```
      
      ## Benchmark Result
      
      <details><summary>code</summary>
      
      ```bash
      #!/usr/bin/env bash
      
      set -e
      
      mkdir -p tmp
      
      for ext in avi mp4; do
          for duration in 1 5 10 30 60; do
              printf "Testing ${ext} ${duration} [sec]\n"
      
              test_data="tmp/test_${duration}.${ext}"
              if [ ! -f "${test_data}" ]; then
                  printf "Generating test data\n"
                  ffmpeg -hide_banner -f lavfi -t ${duration} -i testsrc "${test_data}" > /dev/null 2>&1
              fi
      
              python -m timeit \
                     --setup="from torchaudio.io import StreamReader" \
                     """
      r = StreamReader(\"${test_data}\")
      r.add_basic_video_stream(frames_per_chunk=-1, format=\"yuv420p\")
      r.process_all_packets()
      r.pop_chunks()
      """
          done
      done
      ```
      
      </details>
      
      ![Time to decode AVI file](https://user-images.githubusercontent.com/855818/210008881-8cc83f18-0e51-46e3-afe9-a5ff5dff041e.png)
      
      <details><summary>raw data</summary>
      
      Video Type - AVI
      Duration | Before | After
      -- | -- | --
      1 | 10.3 | 6.29
      5 | 44.3 | 28.3
      10 | 89.3 | 56.9
      30 | 265 | 185
      60 | 555 | 353
      </details>
      
      ![Time to decode MP4 file](https://user-images.githubusercontent.com/855818/210008891-c4546c52-43d7-49d0-8eff-d866ad627129.png)
      
      <details><summary>raw data</summary>
      
      Video Type - MP4
      Duration | Before | After
      -- | -- | --
      1 | 15.3 | 10.5
      5 | 62.1 | 43.2
      10 | 124 | 83.8
      30 | 380 | 252
      60 | 721 | 511
      </details>
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2945
      
      Reviewed By: carolineechen
      
      Differential Revision: D42283269
      
      Pulled By: mthrok
      
      fbshipit-source-id: 59840f943ff516b69ab8ad35fed7104c48a0bf0c
      cc0d1e0b
  22. 22 Dec, 2022 1 commit
  23. 21 Dec, 2022 1 commit
    • moto's avatar
      Extract libsox integration from libtorchaudio (#2929) · 1706a72f
      moto authored
      Summary:
      This commit makes the following changes to the C++ library organization
      - Move sox-related feature implementations from `libtorchaudio` to `libtorchaudio_sox`.
      - Remove C++ implementation of `is_sox_available` and `is_ffmpeg_available` as it is now sufficient to check the existence of `libtorchaudio_sox` and `libtorchaudio_ffmpeg` to check the availability. This makes `libtorchaudio_sox` and `libtorchaudio_ffmpeg` independent from `libtorchaudio`.
      - Move PyBind11-based bindings (`_torchaudio_sox`, `_torchaudio_ffmpeg`) into `torchaudio.lib` so that the built library structure is less cluttered.
      
      Background:
      Originally, when the `libsox` was the only C++ extension and `libtorchaudio` was supposed to contain all the C++ code.
      The things are different now. We have a bunch of C++ extensions and we need to make the code/build structure more modular.
      
      The new `libtorchaudio_sox` contains the implementations and `_torchaudio_sox` contains the PyBin11-based bindings.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2929
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42159594
      
      Pulled By: mthrok
      
      fbshipit-source-id: 1a0fbca9e4143137f6363fc001b2378ce6029aa7
      1706a72f
  24. 20 Dec, 2022 1 commit
  25. 16 Dec, 2022 1 commit
    • Caroline Chen's avatar
      Rename resampling_method options (#2922) · e6bebe6a
      Caroline Chen authored
      Summary:
      resolves https://github.com/pytorch/audio/issues/2891
      
      Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2922
      
      Reviewed By: mthrok
      
      Differential Revision: D42083619
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
      e6bebe6a
  26. 09 Dec, 2022 2 commits
  27. 08 Dec, 2022 1 commit
  28. 07 Dec, 2022 2 commits
  29. 06 Dec, 2022 1 commit
  30. 04 Dec, 2022 1 commit
  31. 02 Dec, 2022 1 commit
  32. 30 Nov, 2022 1 commit
  33. 29 Nov, 2022 2 commits