1. 01 Sep, 2022 1 commit
  2. 24 Aug, 2022 1 commit
    • moto's avatar
      Add StreamWriter (#2628) · 72404de9
      moto authored
      Summary:
      This commit adds FFmpeg-based encoder StreamWriter class.
      StreamWriter is pretty much the opposite of StreamReader class, and
      it supports;
      
      * Encoding audio / still image / video
      * Exporting to local file / streaming protocol / devices etc...
      * File-like object support (in later commit)
      * HW video encoding (in later commit)
      
      See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2628
      
      Reviewed By: nateanl
      
      Differential Revision: D38816650
      
      Pulled By: mthrok
      
      fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8
      72404de9
  3. 19 Jul, 2022 1 commit
  4. 18 Jul, 2022 1 commit
  5. 12 Jul, 2022 2 commits
    • moto's avatar
      Simplify HW acceleration code (#2534) · 4ba56323
      moto authored
      Summary:
      FFmpeg's API provide multiple ways to initialize decoder. This PR simplifies the initialization by delegating the HW device context management to FFmpeg's native code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2534
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37734573
      
      Pulled By: mthrok
      
      fbshipit-source-id: e61736b4d4d2ca6e94d8965abd93b4e9a68e7351
      4ba56323
    • moto's avatar
      Clean up the interface around dictionary (#2533) · e2641452
      moto authored
      Summary:
      Python dictionary is bound to different types in TorchBind and PyBind.
      StreamReader has methods that receive and return dictionary.
      
      This commit cleans up the treatment of dictionary and consolidate
      helper functions.
      
      * The core implementation and TorchBind all uses `c10::Dict`.
      * PyBind version uses `std::map` and converts it to `c10::Dict`.
      * The helper functions to convert `std::map` <-> `c10::Dict` are consolidated in pybind directory.
      * The wrapper methods are implemented in `pybind` dir.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2533
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37731866
      
      Pulled By: mthrok
      
      fbshipit-source-id: 5a5cf1372668f7d3aacc0bb461bc69fa07212f3f
      e2641452
  6. 08 Jul, 2022 1 commit
  7. 07 Jul, 2022 3 commits
  8. 28 Jun, 2022 2 commits
    • moto's avatar
      Refactor FilterGraph interface (#2508) · 0dd57236
      moto authored
      Summary:
      FilterGraph is necessary for StreamWriter when saving video as
      Tensor array format cannot express commonot video formats like yub420.
      
      The current implementation of FilterGraph is specific to StreamReader,
      as it takes AVCodecParameters object. Not individual parameters.
      
      This PR refactor FilterGraph interface so that it can be constructed
      from more primitive information.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2508
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37466033
      
      Pulled By: mthrok
      
      fbshipit-source-id: 8414e985da7579c2dfe260b4dccd2afe113bb573
      0dd57236
    • moto's avatar
      Refactor AVDictionary clean up (#2507) · 0ad03adf
      moto authored
      Summary:
      Small clean up in ffmpeg binding code.
      
      1. Make `get_option_dict` and `clean_up_dict` public utility
      2. Merge the exception into `clean_up_dict`
      3. Get rid of custom string join function and use `c10::Join`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2507
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37466022
      
      Pulled By: mthrok
      
      fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969
      0ad03adf
  9. 27 Jun, 2022 1 commit
  10. 08 Jun, 2022 2 commits
  11. 04 Jun, 2022 1 commit
    • moto's avatar
      Make FFmpeg log level configurable (#2439) · 877a88c5
      moto authored
      Summary:
      Undesired logs are one of the loudest UX complains we get.
      Yet, loading media files involves uncertainty which is
      difficult to debug without debug log.
      
      This commit introduces utility functions to configure logging level
      so that we can ask users to enable it when they encounter an issue,
      while defaulting to non-verbose option.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2439
      
      Reviewed By: hwangjeff, xiaohui-zhang
      
      Differential Revision: D36903763
      
      Pulled By: mthrok
      
      fbshipit-source-id: f4ddd9915b13197c2a2eb97e965005b8b5b8d987
      877a88c5
  12. 02 Jun, 2022 1 commit
  13. 01 Jun, 2022 2 commits
  14. 29 May, 2022 1 commit
    • moto's avatar
      Update source info (#2418) · bb77cbeb
      moto authored
      Summary:
      Add num_frames and bits_per_sample to match with the current
      `torchaudio.info` capability.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2418
      
      Reviewed By: carolineechen
      
      Differential Revision: D36749077
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713
      bb77cbeb
  15. 27 May, 2022 1 commit
    • moto's avatar
      Refactor Streamer to StreamReader in C++ codebase (#2403) · 9ef6c23d
      moto authored
      Summary:
      * `Streamer` has been renamed to `StreamReader` when it was moved from prototype to beta.
      This commit applies the same name change to the C++ source code.
      
      * Fix miscellaneous lint issues
      
      * Make the code compilable on FFmpeg 5
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2403
      
      Reviewed By: carolineechen
      
      Differential Revision: D36613053
      
      Pulled By: mthrok
      
      fbshipit-source-id: 69fedd6720d488dadf4dfe7d375ee76d216b215d
      9ef6c23d
  16. 21 May, 2022 1 commit
    • moto's avatar
      Add file-like object support to Streaming API (#2400) · a984872d
      moto authored
      Summary:
      This commit adds file-like object support to Streaming API.
      
      ## Features
      - File-like objects are expected to implement `read(self, n)`.
      - Additionally `seek(self, offset, whence)` is used if available.
      - Without `seek` method, some formats cannot be decoded properly.
        - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
        - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
        - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
        - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.
      
      ## Code structure
      
      The approach is very similar to how file-like object is supported in sox-based I/O.
      In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
      if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.
      
      ![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)
      
      ## Refactoring involved
      - Extracted to https://github.com/pytorch/audio/issues/2402
        - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
        - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
        - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
      
      ## TODO:
      - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2400
      
      Reviewed By: carolineechen
      
      Differential Revision: D36520073
      
      Pulled By: mthrok
      
      fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
      a984872d
  17. 19 May, 2022 1 commit
    • moto's avatar
      Refactor Streamer implementation (#2402) · eed57534
      moto authored
      Summary:
      * Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
      * Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
      * Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†
      
      † Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.
      
      Ref https://github.com/pytorch/audio/issues/2400
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2402
      
      Reviewed By: nateanl
      
      Differential Revision: D36488808
      
      Pulled By: mthrok
      
      fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047
      eed57534
  18. 11 May, 2022 1 commit
    • moto's avatar
      Refactor the constructors of pointer wrappers (#2373) · 93c26d63
      moto authored
      Summary:
      This commit refactor the constructor of wrapper classes so that
      wrapper classes are only responsible for deallocation of underlying
      FFmpeg custom structures.
      
      The responsibility of custom initialization is moved to helper functions.
      
      Context:
      
      FFmpeg API uses bunch of raw pointers, which require dedicated allocater
      and deallcoator. In torchaudio we wrap these pointers with
      `std::unique_ptr<>` to adopt RAII semantics.
      
      Currently all of the customization logics required for `Streamer` are
      handled by the constructor of wrapper class. Like the following;
      
      ```
      AVFormatContextPtr(
            const std::string& src,
            const std::string& device,
            const std::map<std::string, std::string>& option);
      ```
      
      This constructor allocates the raw `AVFormatContext*` pointer,
      while initializing it with the given option, then it parses the
      input media.
      
      As we consider the write/encode features, which require different way
      of initializing the `AVFormatContext*`, making it the responsibility
      of constructors of `AVFormatContextPtr` reduce the flexibility.
      
      Thus this commit moves the customization to helper factory function.
      
      - `AVFormatContextPtr(...)` -> `get_input_format_context(...)`
      - `AVCodecContextPtr(...)` -> `get_decode_context(...)`
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2373
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36230148
      
      Pulled By: mthrok
      
      fbshipit-source-id: 202d57d549223904ee958193f3b386ef5a9cda3a
      93c26d63
  19. 10 May, 2022 1 commit
    • moto's avatar
      Add HW acceleration support on Streamer (#2331) · 54d2d04f
      moto authored
      Summary:
      This commits add `hw_accel` option to `Streamer::add_video_stream` method.
      Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA,
      when the following conditions are met.
      1. the video format is H264,
      2. underlying ffmpeg is compiled with NVENC, and
      3. the client code specifies `decoder="h264_cuvid"`.
      
      A simple benchmark yields x7 improvement in the decoding speed.
      
      <details>
      
      ```python
      import time
      
      from torchaudio.prototype.io import Streamer
      
      srcs = [
          "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
          "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",  # offline version
      ]
      
      patterns = [
          ("h264_cuvid", None, "cuda:0"),  # NVDEC on CUDA:0 -> CUDA:0
          ("h264_cuvid", None, "cuda:1"),  # NVDEC on CUDA:1 -> CUDA:1
          ("h264_cuvid", None, None),  # NVDEC -> CPU
          (None, None, None),  # CPU
      ]
      
      for src in srcs:
          print(src, flush=True)
          for (decoder, decoder_options, hw_accel) in patterns:
              s = Streamer(src)
              s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel)
      
              t0 = time.monotonic()
              num_frames = 0
      	for i, (chunk, ) in enumerate(s.stream()):
      	    num_frames += chunk.shape[0]
              t1 = time.monotonic()
              print(chunk.dtype, chunk.shape, chunk.device)
              print(time.monotonic() - t0, num_frames, flush=True)
      ```
      </details>
      
      ```
      https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
      10.781158386962488 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
      10.771313901990652 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      27.88662809302332 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      83.22728440898936 6175
      ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
      12.945253834011964 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
      12.870224556012545 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      28.03406483103754 6175
      torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
      82.6120332319988 6175
      ```
      
      With HW resizing
      
      <details>
      
      ```python
      import time
      
      from torchaudio.prototype.io import Streamer
      
      srcs = [
          "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
          "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
      ]
      
      patterns = [
          # Decode with NVDEC, CUDA HW scaling -> CUDA:0
          ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"),
          # Decoded with NVDEC, CUDA HW scaling -> CPU
          ("h264_cuvid", {"resize": "960x540"}, "", None),
          # CPU decoding, CPU scaling
          (None, None, "scale=width=960:height=540", None),
      ]
      
      for src in srcs:
          print(src, flush=True)
          for (decoder, decoder_options, filter_desc, hw_accel) in patterns:
              s = Streamer(src)
              s.add_video_stream(
                  5,
                  decoder=decoder,
                  decoder_options=decoder_options,
                  filter_desc=filter_desc,
                  hw_accel=hw_accel,
              )
      
              t0 = time.monotonic()
              num_frames = 0
              for i, (chunk, ) in enumerate(s.stream()):
                  num_frames += chunk.shape[0]
              t1 = time.monotonic()
              print(chunk.dtype, chunk.shape, chunk.device)
              print(time.monotonic() - t0, num_frames, flush=True)
      ```
      
      </details>
      
      ```
      ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
      12.890056837990414 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      10.697489063022658 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      85.19899423001334 6175
      
      https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
      torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
      10.712715593050234 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      11.030170071986504 6175
      torch.uint8 torch.Size([5, 3, 540, 960]) cpu
      84.8515750519582 6175
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2331
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36217169
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b
      54d2d04f
  20. 14 Apr, 2022 3 commits
    • moto's avatar
      Support specifying decoder and its options (#2327) · be243c59
      moto authored
      Summary:
      This commit adds support to specify decoder to Streamer's add stream method.
      This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options.
      
      This allows to override the decoder codec and/or specify the option of
      the decoder.
      
      This change allows to specify Nvidia NVDEC codec for supported formats,
      which uses dedicated hardware for decoding the video.
      
       ---
      
      Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to  `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2327
      
      Reviewed By: carolineechen
      
      Differential Revision: D35626904
      
      Pulled By: mthrok
      
      fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede
      be243c59
    • moto's avatar
      Support NV12 format in video decoding (#2330) · 7972be99
      moto authored
      Summary:
      Support NV12 format in Streamer API.
      
      NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values.
      https://chromium.googlesource.com/libyuv/libyuv/+/HEAD/docs/formats.md#nv12-and-nv21
      
      The original UV plane is smaller than Y plane, so in this implmentation,
      UV plane is upsampled to match the size of Y plane.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2330
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35632351
      
      Pulled By: mthrok
      
      fbshipit-source-id: aab4fbc0ce2bb7a1fb67264c27208b610fb56e27
      7972be99
    • moto's avatar
      Add YUV420P format support to Streamer API (#2334) · 2f70e2f9
      moto authored
      Summary:
      This commit adds YUV420P format support to Streamer API.
      When the native format of a video is YUV420P, the Streamer will
      output Tensor of YUV color channel.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2334
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35632916
      
      Pulled By: mthrok
      
      fbshipit-source-id: a7a0078788433060266b8bd3e7cad023f41389f5
      2f70e2f9
  21. 11 Apr, 2022 1 commit
    • moto's avatar
      Fix ffmpeg integration for ffmpeg 5.0 (#2326) · bd319959
      moto authored
      Summary:
      This commit makes the FFmpeg integration support FFmpeg 5.0
      
      In FFmpeg 5, functions like `av_find_input_format` and `avformat_open_input` are changed,
      so that they deal with constant version of `AVInputFormat`.
      
      > 2021-04-27 - 56450a0ee4 - lavf 59.0.100 - avformat.h
      >  Constified the pointers to AVInputFormats and AVOutputFormats
      >  in AVFormatContext, avformat_alloc_output_context2(),
      >  av_find_input_format(), av_probe_input_format(),
      >  av_probe_input_format2(), av_probe_input_format3(),
      >  av_probe_input_buffer2(), av_probe_input_buffer(),
      >  avformat_open_input(), av_guess_format() and av_guess_codec().
      >  Furthermore, constified the AVProbeData in av_probe_input_format(),
      >  av_probe_input_format2() and av_probe_input_format3().
      
      https://github.com/FFmpeg/FFmpeg/blob/4e6debe1df7d53f3f59b37449b82265d5c08a172/doc/APIchanges#L252-L260
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2326
      
      Reviewed By: carolineechen
      
      Differential Revision: D35551380
      
      Pulled By: mthrok
      
      fbshipit-source-id: ccb4f713076ae8693d8d77ac2cb4ad865556a666
      bd319959
  22. 10 Mar, 2022 1 commit
  23. 04 Mar, 2022 2 commits
    • moto's avatar
      Flush and reset internal state after seek (#2264) · 7e1afc40
      moto authored
      Summary:
      This commit adds the following behavior to `seek` so that `seek`
      works after a frame is decoded.
      
      1. Flush the decoder buffer.
      2. Recreate filter graphs (so that internal state is re-initialized)
      3. Discard the buffered tensor. (decoded chunks)
      
      Also it disallows negative values for seek timestamp.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2264
      
      Reviewed By: carolineechen
      
      Differential Revision: D34497826
      
      Pulled By: mthrok
      
      fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
      7e1afc40
    • moto's avatar
      Make Streamer fail if an invalid option is provided (#2263) · 04875eef
      moto authored
      Summary:
      `torchaudio.prototype.io.Streamer` class takes context dependant options
      as `option` argument in the form of mappings of strings.
      
      Currently there is no check if the provided options were valid for
      the given input.
      
      This commit adds the check and raise an error if an invalid erro is given.
      
      This is analogous to `ffmpeg` command error handling.
      
      ```
      $ ffmpeg -foo
      ...
      Unrecognized option 'foo'.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2263
      
      Reviewed By: hwangjeff
      
      Differential Revision: D34495111
      
      Pulled By: mthrok
      
      fbshipit-source-id: cd068de0dc1d1273bdd5d40312c3faccb47b253f
      04875eef
  24. 26 Feb, 2022 1 commit
    • moto's avatar
      Improve device streaming (#2202) · 365313ed
      moto authored
      Summary:
      This commit adds tutorial for device ASR, and update API for device streaming.
      
      The changes for the interface are
      1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
      2. Move `fill_buffer` method to private.
      
      When dealing with device stream, there are situations where the device buffer is not
      ready and the system returns `EAGAIN`. In such case, the previous implementation of
      `process_packet` method raised an exception in Python layer , but for device ASR,
      this is inefficient. A better approach is to retry within C++ layer in blocking manner.
      The new `timeout` parameter serves this purpose.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2202
      
      Reviewed By: nateanl
      
      Differential Revision: D34475829
      
      Pulled By: mthrok
      
      fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
      365313ed
  25. 02 Feb, 2022 1 commit
  26. 21 Jan, 2022 1 commit
  27. 05 Jan, 2022 1 commit
  28. 30 Dec, 2021 4 commits