1. 26 Oct, 2023 1 commit
  2. 25 Oct, 2023 1 commit
  3. 24 Oct, 2023 1 commit
  4. 11 Oct, 2023 1 commit
  5. 09 Oct, 2023 2 commits
  6. 20 Aug, 2023 1 commit
    • moto's avatar
      Fix I/O test (#3568) · 0688863c
      moto authored
      Summary:
      Turned out FFmpeg 5 installed via conda reports video frame rate -1. FFmpeg 4 and 6 are fine. This is either a regression in FFmpeg or in the underlying decoding library.
      
      Make the reference value adoptive.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3568
      
      Reviewed By: huangruizhe
      
      Differential Revision: D48499621
      
      Pulled By: mthrok
      
      fbshipit-source-id: fb64187bcf0dc57b753cb6c05f04d436238f5c51
      0688863c
  7. 12 Jul, 2023 1 commit
    • moto's avatar
      Support multiple FFmpeg versions (#3464) · 786066b4
      moto authored
      Summary:
      This commit introduces support for multiple FFmpeg versions for OSS binary distributions.
      
      Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
      This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.
      
      The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
      At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
      The order of preference is 6, 5, then 4.
      
      To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
      They are LGPL and downloaded from S3 at build time, instead of building every time.
      
      The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
      single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
      so that it will only support one specific version of FFmpeg.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3464
      
      Differential Revision: D47300223
      
      Pulled By: mthrok
      
      fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04
      786066b4
  8. 05 Jul, 2023 1 commit
  9. 03 Jun, 2023 1 commit
  10. 02 Jun, 2023 1 commit
  11. 01 Jun, 2023 1 commit
    • moto's avatar
      Use dlopen for FFmpeg (#3353) · b14ced1a
      moto authored
      Summary:
      This commit changes the way FFmpeg extension is built and used.
      Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time,
      It uses dlopen to search and link them at run time.
      
      For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides
      portable wrapper.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3353
      
      Differential Revision: D46059199
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d
      b14ced1a
  12. 09 May, 2023 1 commit
  13. 03 Apr, 2023 1 commit
  14. 31 Mar, 2023 1 commit
  15. 27 Mar, 2023 1 commit
    • hwangjeff's avatar
      Revise encoder config arg and docstrings (#3203) · b1de9f1a
      hwangjeff authored
      Summary:
      For `StreamWriter`,
      * Renames arg `config` to codec_config`.
      * Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`.
      * Adds docstrings for arg codec_config`.
      * Updates `chunk` to `frames` in `write_*_chunk` methods.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3203
      
      Reviewed By: mthrok
      
      Differential Revision: D44350153
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343
      b1de9f1a
  16. 17 Mar, 2023 2 commits
  17. 08 Mar, 2023 1 commit
    • moto's avatar
      Include format information after filter (#3155) · 146195d8
      moto authored
      Summary:
      This commit adds fields to OutputStream, which shows the result
      of fitlers, such as width and height after filtering.
      
      Before
      
      ```
      OutputStream(
          source_index=0,
          filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
      ```
      
      After
      
      ```
      OutputVideoStream(
          source_index=0,
          filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
          media_type='video',
          format='gray',
          width=320,
          height=320,
          frame_rate=3.0)
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3155
      
      Reviewed By: nateanl
      
      Differential Revision: D43882399
      
      Pulled By: mthrok
      
      fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d
      146195d8
  18. 24 Feb, 2023 2 commits
    • moto's avatar
      Cleanup ffmpeg bidings (#3095) · b46628ba
      moto authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3095
      
      Reviewed By: nateanl
      
      Differential Revision: D43544998
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4359cdbbdbee53084016a84129cb3d65900b0457
      b46628ba
    • moto's avatar
      Bind StreamReader/Writer with PyBind11 (#3091) · b012b452
      moto authored
      Summary:
      This commit is kind of clean up and preparation for future
      development.
      
      We plan to pass around more complicated objects among
      StreamReader and StreamWriter, and TorchBind is not expressive enough
      for defining intermediate object, so we use PyBind11 for binding
      StreamWriter.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3091
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D43515714
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9097bb104bbf8c1536a5fab6f87447c08b10a7f2
      b012b452
  19. 23 Feb, 2023 1 commit
    • moto's avatar
      Replace c10::Dict with std::map in StreamReader/Writer (#3092) · c3310018
      moto authored
      Summary:
      This commit is kind of clean up and preparation for future development.
      
      We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we want to use PyBind11 for binding StreamReader/Writer.
      
      PyBind11 converts Python dict into std::map, while TorchBind converts it into c10::Dict. Because of this descrepancy, conversion from c10::Dict to std::map have to happen in multiple places, and this makes the binding code thicker as it requires to wrapper methods.
      
      Using std::map reduces the number of wrapper methods / conversions, because the same method can be bound for file-like object and the others.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3092
      
      Reviewed By: nateanl
      
      Differential Revision: D43524808
      
      Pulled By: mthrok
      
      fbshipit-source-id: f7467c66ccd37dbf4abc337bbb18ffaac21a0058
      c3310018
  20. 27 Jan, 2023 1 commit
  21. 26 Jan, 2023 1 commit
    • Moto Hira's avatar
      Abstract away AVFormatContext from StreamReader/Writer constructor (#3007) · 7ea69e61
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3007
      
      Simplify the construction of StreamReader/Writer in C++.
      
      Currently these classes require client code to build AVFormatContext
      manually. This is tedious and not user freindly.
      
      Some client code actually uses the same helper function that
      TorchAudio codebase uses.
      
      This commit moves the helper logic inside of the constructor of
      StreamReader/Writer, so that the signatures of these constructors
      are easy to use and similar to Python interface.
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D42662520
      
      fbshipit-source-id: d95e5236810c48d7d9bd2d89c05d4f60a44b3ba1
      7ea69e61
  22. 04 Jan, 2023 1 commit
    • moto's avatar
      Make fill_buffer a public API and move the impl to C++ (#2954) · bf085b1f
      moto authored
      Summary:
      Currently, when iterating media data with StreamReader, using the for-loop is the only way with public API.
      
      This does not support usecases like "Fetch one chunk after seek" well.
      
      ```python
      s = StreamReader
      s.add_audio_stream(...)
      s.seek(10)
      chunk = None
      for chunk, in s.stream():
          break
      ```
      
      This commit make the `fill_buffer` used in iterative method public API so that one acn do
      
      ```python
      s.seek(10)
      s.fill_buffer()
      chunk, = s.pop_chunks()
      ```
      
       ---
      
      Also this commit moves the implementation to C++ so that it reduces the number of FFI boundary crossing.
      This improves the performance when the iteration is longer.
      
      AVI (generated with `ffmpeg -hide_banner -f lavfi -t ${duration} -i testsrc "${file}.avi"`)
      
      | Video Duration [sec] | Original [msec] | Fill Buffer C++ | One Go  (reference) |
      |----------------------|----------|-----------------|--------|
      |                    1 |       18 |            18.4 |   16.6 |
      |                    5 |       44 |            42.6 |   35.1 |
      |                   10 |     75.3 |            74.4 |   60.9 |
      |                   30 |      200 |             195 |    158 |
      |                   60 |      423 |             382 |    343 |
      
      MP4 (generated with `ffmpeg -hide_banner -f lavfi -t ${duration} -i testsrc "${file}.mp4"`)
      
      | Video Duration [sec] | Original [msec] | Fill Buffer C++ | One Go |
      |----------------------|-----------------|-----------------|--------|
      |                    1 |            18.7 |            18.1 |   10.3 |
      |                    5 |            42.2 |            40.6 |   25.2 |
      |                   10 |            73.9 |            71.8 |   43.6 |
      |                   30 |             202 |             194 |    116 |
      |                   60 |             396 |             386 |    227 |
      * Original (Python implementation)
      
      ```python
      r = StreamReader(src)
      r.add_video_stream(1, decoder_option={"threads": "1"})
      for chunk, in r.stream():
          pass
      ```
      
      * This (C++)
      
      ```python
      r = StreamReader(src)
      r.add_video_stream(1, decoder_option={"threads": "1"})
      for chunk, in r.stream():
          pass
      ```
      
      * Using `process_all_packets` (process all in one go)
      
      ```python
      r = StreamReader(src)
      r.add_video_stream(1, decoder_option={"threads": "1"})
      r.process_all_packets()
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2954
      
      Reviewed By: carolineechen
      
      Differential Revision: D42349446
      
      Pulled By: mthrok
      
      fbshipit-source-id: 9e4e37923e46299c3f43f4ad17a2a2b938b2b197
      bf085b1f
  23. 01 Sep, 2022 1 commit
  24. 12 Jul, 2022 1 commit
    • moto's avatar
      Clean up the interface around dictionary (#2533) · e2641452
      moto authored
      Summary:
      Python dictionary is bound to different types in TorchBind and PyBind.
      StreamReader has methods that receive and return dictionary.
      
      This commit cleans up the treatment of dictionary and consolidate
      helper functions.
      
      * The core implementation and TorchBind all uses `c10::Dict`.
      * PyBind version uses `std::map` and converts it to `c10::Dict`.
      * The helper functions to convert `std::map` <-> `c10::Dict` are consolidated in pybind directory.
      * The wrapper methods are implemented in `pybind` dir.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2533
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37731866
      
      Pulled By: mthrok
      
      fbshipit-source-id: 5a5cf1372668f7d3aacc0bb461bc69fa07212f3f
      e2641452
  25. 08 Jun, 2022 2 commits
  26. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d