- 14 May, 2024 1 commit
-
-
Richard Barnes authored
Differential Revision: D57294298 Pull Request resolved: https://github.com/pytorch/audio/pull/3791
-
- 24 Oct, 2023 1 commit
-
-
moto-meta authored
Differential Revision: D50506299 Pull Request resolved: https://github.com/pytorch/audio/pull/3669
-
- 11 Oct, 2023 1 commit
-
-
moto-meta authored
Differential Revision: D50082877 Pull Request resolved: https://github.com/pytorch/audio/pull/3646
-
- 09 Oct, 2023 1 commit
-
-
moto-meta authored
Differential Revision: D49965263 Pull Request resolved: https://github.com/pytorch/audio/pull/3639
-
- 05 Jul, 2023 1 commit
-
-
moto authored
Summary: This reverts commit b7d3e89a. We will use pre-built binaries instead of dlopen. Pull Request resolved: https://github.com/pytorch/audio/pull/3456 Differential Revision: D47239681 Pulled By: mthrok fbshipit-source-id: 0446a62410d914081184fc20c386afa00b1e41b6
-
- 03 Jun, 2023 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3402 This is a second attempt of https://github.com/pytorch/audio/pull/3353. The basic logic to enable dlopen for FFmpeg libraries are same. It uses `at::DynamicLibrary`, which allows to compile torchaudio without linking FFmpeg libraries. This time, the option to enable this feature DLOPEN_FFMPEG has been added, so that users have a way to disable this feature and keep using build-time linking. Please refer to stub.h for more technical detail. Differential Revision: D46403783 fbshipit-source-id: ca3db57ff6bdc50c8c225d22f12f3e76c6dc3f16
-
- 02 Jun, 2023 1 commit
-
-
Moto Hira authored
Differential Revision: D46059199 Original commit changeset: 4493a5fd8a4c Original Phabricator Diff: D46059199 fbshipit-source-id: 71cde3f8cd870d1ad9114e3e87cdd1ba564441c0
-
- 01 Jun, 2023 1 commit
-
-
moto authored
Summary: This commit changes the way FFmpeg extension is built and used. Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time, It uses dlopen to search and link them at run time. For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides portable wrapper. Pull Request resolved: https://github.com/pytorch/audio/pull/3353 Differential Revision: D46059199 Pulled By: mthrok fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d
-
- 07 Apr, 2023 2 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3249 - Put ptr member private so that it's more secure and subclasses won't mess with it - Remove unused `reset` method - Do not default construct the managed object - Introduce helper function for default allocation. (for AVFrame and AVPacket as they are allocated in both reader and writer) - for others, allocation logics are moved to where it is used. - Remove unused `pHWBufferRef` attribute from `StreamWriter`. Reviewed By: hwangjeff Differential Revision: D44775297 fbshipit-source-id: ff6db528152cd54c1ae398191110c30b9c1e238c
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3220 Introduces methods to `StreamReader` and `StreamWriter` that allow for reading and writing `AVPacket` instances rather than tensors. Useful for efficiently remuxing data pulled as is from source. Reviewed By: mthrok Differential Revision: D44271536 fbshipit-source-id: 9b9d743c0119a5eb564fa628fd6a67806d120985
-
- 17 Mar, 2023 1 commit
-
-
moto authored
Summary: TODO: add cache release Pull Request resolved: https://github.com/pytorch/audio/pull/3178 Reviewed By: hwangjeff Differential Revision: D44136275 Pulled By: mthrok fbshipit-source-id: 4eaf646fe17a469e8bbbdf43441d5532f9f8461d
-
- 06 Mar, 2023 1 commit
-
-
Moto Hira authored
Summary: After the series of simplification, audio/video encoding processes can be merged, and it allows the gets rid of the boilerplate code. Pull Request resolved: https://github.com/pytorch/audio/pull/3146 (Note: this ignores all push blocking failures!) Reviewed By: xiaohui-zhang Differential Revision: D43815640 fbshipit-source-id: 2a14e372b2cc75db7eeabc27d855a24c3f7d5063
-
- 23 Feb, 2023 1 commit
-
-
moto authored
Summary: This commit is kind of clean up and preparation for future development. We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we want to use PyBind11 for binding StreamReader/Writer. PyBind11 converts Python dict into std::map, while TorchBind converts it into c10::Dict. Because of this descrepancy, conversion from c10::Dict to std::map have to happen in multiple places, and this makes the binding code thicker as it requires to wrapper methods. Using std::map reduces the number of wrapper methods / conversions, because the same method can be bound for file-like object and the others. Pull Request resolved: https://github.com/pytorch/audio/pull/3092 Reviewed By: nateanl Differential Revision: D43524808 Pulled By: mthrok fbshipit-source-id: f7467c66ccd37dbf4abc337bbb18ffaac21a0058
-
- 01 Feb, 2023 1 commit
-
-
moto authored
Summary: Adding C++ documentation. (C++ APIs are categorized as prototype, though it's used by Python beta APIs.) https://output.circle-artifacts.com/output/job/69654229-a99e-4b15-9ce0-7bc6bcf01101/artifacts/0/docs/libtorchaudio.html <img width="1202" alt="Screenshot 2023-01-31 at 11 48 47 AM" src="https://user-images.githubusercontent.com/855818/215828167-d23032f8-9e40-4413-b5b1-5cbd12d705e9.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2994 Reviewed By: hwangjeff Differential Revision: D42876621 Pulled By: mthrok fbshipit-source-id: d8b8d610b87ec766501baa88b7506368a9905a6a
-
- 27 Jan, 2023 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3013 Namespace clean up before publishing the torchaudio C++ API as prototype. Reviewed By: hwangjeff Differential Revision: D42699903 fbshipit-source-id: 8a9eed0390dfa4a152124b42f2b927dbdd3e23d2
-
- 24 Aug, 2022 1 commit
-
-
moto authored
Summary: This commit adds FFmpeg-based encoder StreamWriter class. StreamWriter is pretty much the opposite of StreamReader class, and it supports; * Encoding audio / still image / video * Exporting to local file / streaming protocol / devices etc... * File-like object support (in later commit) * HW video encoding (in later commit) See also: https://fburl.com/gslide/z85kn5a9 (Meta internal) Pull Request resolved: https://github.com/pytorch/audio/pull/2628 Reviewed By: nateanl Differential Revision: D38816650 Pulled By: mthrok fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8
-
- 12 Jul, 2022 1 commit
-
-
moto authored
Summary: Python dictionary is bound to different types in TorchBind and PyBind. StreamReader has methods that receive and return dictionary. This commit cleans up the treatment of dictionary and consolidate helper functions. * The core implementation and TorchBind all uses `c10::Dict`. * PyBind version uses `std::map` and converts it to `c10::Dict`. * The helper functions to convert `std::map` <-> `c10::Dict` are consolidated in pybind directory. * The wrapper methods are implemented in `pybind` dir. Pull Request resolved: https://github.com/pytorch/audio/pull/2533 Reviewed By: hwangjeff Differential Revision: D37731866 Pulled By: mthrok fbshipit-source-id: 5a5cf1372668f7d3aacc0bb461bc69fa07212f3f
-
- 07 Jul, 2022 2 commits
-
-
moto authored
Summary: Preparation to add save features with ffmpeg. Pull Request resolved: https://github.com/pytorch/audio/pull/2530 Reviewed By: carolineechen Differential Revision: D37698147 Pulled By: mthrok fbshipit-source-id: feb5cbb6349a2b6b7faf44b629c574fdae47ecab
-
moto authored
Summary: This commits move helper functions/definitions around so that better locality of logics are achieved. ## Detail `ffmpeg.[h|cpp]` implements classes that convert FFmpeg structures into RAII semantics. Initially it these classes included the construction logic in their constructors, but such logics were extracted to factory functions in https://github.com/pytorch/audio/issues/2373. Now the reason why the factory functions stayed in `ffmpeg.[h|cpp]` was because the logic for the initialization and clean-up of AVDictionary class was only available in `ffmpeg.cpp`. Now AVDictionary class handling is properly defined in https://github.com/pytorch/audio/issues/2507, the factory functions, which are not that reusable better stay with the implementation that use them. This makes `ffmpeg.h` lean and clean, makes it easier to see what can be reused. Pull Request resolved: https://github.com/pytorch/audio/pull/2512 Reviewed By: hwangjeff Differential Revision: D37477592 Pulled By: mthrok fbshipit-source-id: 8c1b5059ea5f44649cc0eb1f82d1a92877ef186e
-
- 28 Jun, 2022 1 commit
-
-
moto authored
Summary: Small clean up in ffmpeg binding code. 1. Make `get_option_dict` and `clean_up_dict` public utility 2. Merge the exception into `clean_up_dict` 3. Get rid of custom string join function and use `c10::Join`. Pull Request resolved: https://github.com/pytorch/audio/pull/2507 Reviewed By: hwangjeff Differential Revision: D37466022 Pulled By: mthrok fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969
-
- 27 May, 2022 1 commit
-
-
moto authored
Summary: * `Streamer` has been renamed to `StreamReader` when it was moved from prototype to beta. This commit applies the same name change to the C++ source code. * Fix miscellaneous lint issues * Make the code compilable on FFmpeg 5 Pull Request resolved: https://github.com/pytorch/audio/pull/2403 Reviewed By: carolineechen Differential Revision: D36613053 Pulled By: mthrok fbshipit-source-id: 69fedd6720d488dadf4dfe7d375ee76d216b215d
-
- 21 May, 2022 1 commit
-
-
moto authored
Summary: This commit adds file-like object support to Streaming API. ## Features - File-like objects are expected to implement `read(self, n)`. - Additionally `seek(self, offset, whence)` is used if available. - Without `seek` method, some formats cannot be decoded properly. - To work around this, one can use the existing `decoder` option to tell what decoder it should use. - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`. - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed. - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods. ## Code structure The approach is very similar to how file-like object is supported in sox-based I/O. In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind, if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.  ## Refactoring involved - Extracted to https://github.com/pytorch/audio/issues/2402 - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding. - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python. - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly. ## TODO: - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding). Pull Request resolved: https://github.com/pytorch/audio/pull/2400 Reviewed By: carolineechen Differential Revision: D36520073 Pulled By: mthrok fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
-
- 19 May, 2022 1 commit
-
-
moto authored
Summary: * Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11. * Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype. * Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.† † Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used. Ref https://github.com/pytorch/audio/issues/2400 Pull Request resolved: https://github.com/pytorch/audio/pull/2402 Reviewed By: nateanl Differential Revision: D36488808 Pulled By: mthrok fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047
-
- 11 May, 2022 1 commit
-
-
moto authored
Summary: This commit refactor the constructor of wrapper classes so that wrapper classes are only responsible for deallocation of underlying FFmpeg custom structures. The responsibility of custom initialization is moved to helper functions. Context: FFmpeg API uses bunch of raw pointers, which require dedicated allocater and deallcoator. In torchaudio we wrap these pointers with `std::unique_ptr<>` to adopt RAII semantics. Currently all of the customization logics required for `Streamer` are handled by the constructor of wrapper class. Like the following; ``` AVFormatContextPtr( const std::string& src, const std::string& device, const std::map<std::string, std::string>& option); ``` This constructor allocates the raw `AVFormatContext*` pointer, while initializing it with the given option, then it parses the input media. As we consider the write/encode features, which require different way of initializing the `AVFormatContext*`, making it the responsibility of constructors of `AVFormatContextPtr` reduce the flexibility. Thus this commit moves the customization to helper factory function. - `AVFormatContextPtr(...)` -> `get_input_format_context(...)` - `AVCodecContextPtr(...)` -> `get_decode_context(...)` Pull Request resolved: https://github.com/pytorch/audio/pull/2373 Reviewed By: hwangjeff Differential Revision: D36230148 Pulled By: mthrok fbshipit-source-id: 202d57d549223904ee958193f3b386ef5a9cda3a
-
- 10 May, 2022 1 commit
-
-
moto authored
Summary: This commits add `hw_accel` option to `Streamer::add_video_stream` method. Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA, when the following conditions are met. 1. the video format is H264, 2. underlying ffmpeg is compiled with NVENC, and 3. the client code specifies `decoder="h264_cuvid"`. A simple benchmark yields x7 improvement in the decoding speed. <details> ```python import time from torchaudio.prototype.io import Streamer srcs = [ "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", # offline version ] patterns = [ ("h264_cuvid", None, "cuda:0"), # NVDEC on CUDA:0 -> CUDA:0 ("h264_cuvid", None, "cuda:1"), # NVDEC on CUDA:1 -> CUDA:1 ("h264_cuvid", None, None), # NVDEC -> CPU (None, None, None), # CPU ] for src in srcs: print(src, flush=True) for (decoder, decoder_options, hw_accel) in patterns: s = Streamer(src) s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel) t0 = time.monotonic() num_frames = 0 for i, (chunk, ) in enumerate(s.stream()): num_frames += chunk.shape[0] t1 = time.monotonic() print(chunk.dtype, chunk.shape, chunk.device) print(time.monotonic() - t0, num_frames, flush=True) ``` </details> ``` https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0 10.781158386962488 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1 10.771313901990652 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 27.88662809302332 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 83.22728440898936 6175 ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0 12.945253834011964 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1 12.870224556012545 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 28.03406483103754 6175 torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu 82.6120332319988 6175 ``` With HW resizing <details> ```python import time from torchaudio.prototype.io import Streamer srcs = [ "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4", ] patterns = [ # Decode with NVDEC, CUDA HW scaling -> CUDA:0 ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"), # Decoded with NVDEC, CUDA HW scaling -> CPU ("h264_cuvid", {"resize": "960x540"}, "", None), # CPU decoding, CPU scaling (None, None, "scale=width=960:height=540", None), ] for src in srcs: print(src, flush=True) for (decoder, decoder_options, filter_desc, hw_accel) in patterns: s = Streamer(src) s.add_video_stream( 5, decoder=decoder, decoder_options=decoder_options, filter_desc=filter_desc, hw_accel=hw_accel, ) t0 = time.monotonic() num_frames = 0 for i, (chunk, ) in enumerate(s.stream()): num_frames += chunk.shape[0] t1 = time.monotonic() print(chunk.dtype, chunk.shape, chunk.device) print(time.monotonic() - t0, num_frames, flush=True) ``` </details> ``` ./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0 12.890056837990414 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 10.697489063022658 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 85.19899423001334 6175 https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4 torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0 10.712715593050234 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 11.030170071986504 6175 torch.uint8 torch.Size([5, 3, 540, 960]) cpu 84.8515750519582 6175 ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2331 Reviewed By: hwangjeff Differential Revision: D36217169 Pulled By: mthrok fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b
-
- 14 Apr, 2022 1 commit
-
-
moto authored
Summary: This commit adds support to specify decoder to Streamer's add stream method. This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options. This allows to override the decoder codec and/or specify the option of the decoder. This change allows to specify Nvidia NVDEC codec for supported formats, which uses dedicated hardware for decoding the video. --- Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.) Pull Request resolved: https://github.com/pytorch/audio/pull/2327 Reviewed By: carolineechen Differential Revision: D35626904 Pulled By: mthrok fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede
-
- 04 Mar, 2022 1 commit
-
-
moto authored
Summary: This commit adds the following behavior to `seek` so that `seek` works after a frame is decoded. 1. Flush the decoder buffer. 2. Recreate filter graphs (so that internal state is re-initialized) 3. Discard the buffered tensor. (decoded chunks) Also it disallows negative values for seek timestamp. Pull Request resolved: https://github.com/pytorch/audio/pull/2264 Reviewed By: carolineechen Differential Revision: D34497826 Pulled By: mthrok fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d
-
- 02 Feb, 2022 1 commit
-
-
moto authored
Summary: This PR adds the prototype streaming API. The implementation is based on ffmpeg libraries. For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2164 Reviewed By: hwangjeff Differential Revision: D33934457 Pulled By: mthrok fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe
-
- 30 Dec, 2021 1 commit
-
-
moto authored
Summary: - Introduce AudioBuffer and VideoBuffer for different way of handling frames - Update the way option dictionary is passed - Remove unused AutoFrameUnref - Add SrcStreamInfo/OutputStreamInfo classes Pull Request resolved: https://github.com/pytorch/audio/pull/2113 Reviewed By: nateanl Differential Revision: D33356144 Pulled By: mthrok fbshipit-source-id: e837e84fae48baa7befd5c70599bcd2cbb61514d
-
- 07 Dec, 2021 1 commit
-
-
moto authored
Summary: Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review. Add wrapper classes that auto release memories allocated by ffmpeg libraries. For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md. Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later. - [x] Needs to be imported after updating TARGETS file. Pull Request resolved: https://github.com/pytorch/audio/pull/2041 Reviewed By: carolineechen Differential Revision: D32688964 Pulled By: mthrok fbshipit-source-id: 165bef5b292dbedae4e9599d53fb2a3f06978db8
-