Commits · 10d1bd89e8adcf5210adcd4d25593f8588138816 · hehl2 / Torchaudio

08 Jun, 2022 1 commit

Add metadata to source stream info (#2461) · 10d1bd89

moto authored Jun 07, 2022

Summary:
Add metadata, such as ID3 (https://github.com/pytorch/audio/commit/7d98db0567cb60fabcc173949b8c08e3a3487ac2)tag to `StreamReaderSourceAudioStream`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2461

Reviewed By: hwangjeff

Differential Revision: D36985656

Pulled By: mthrok

fbshipit-source-id: e66f9e6e980eb57c378cc643a8979b6b7813dae7

10d1bd89

01 Jun, 2022 1 commit

Tweak StreamReader error messages and tests (#2429) · 5d86054a

moto authored Jun 01, 2022

Summary:
* Update error messages
* Update audio stream tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2429

Reviewed By: carolineechen, nateanl

Differential Revision: D36812769

Pulled By: mthrok

fbshipit-source-id: 7a51d0c4dbae558010d2e59412333e4a7f00d318

5d86054a

29 May, 2022 1 commit

Update source info (#2418) · bb77cbeb

moto authored May 28, 2022

Summary:
Add num_frames and bits_per_sample to match with the current
`torchaudio.info` capability.

Pull Request resolved: https://github.com/pytorch/audio/pull/2418

Reviewed By: carolineechen

Differential Revision: D36749077

Pulled By: mthrok

fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713

bb77cbeb

27 May, 2022 1 commit

Refactor Streamer to StreamReader in C++ codebase (#2403) · 9ef6c23d

moto authored May 27, 2022

Summary:
* `Streamer` has been renamed to `StreamReader` when it was moved from prototype to beta.
This commit applies the same name change to the C++ source code.

* Fix miscellaneous lint issues

* Make the code compilable on FFmpeg 5

Pull Request resolved: https://github.com/pytorch/audio/pull/2403

Reviewed By: carolineechen

Differential Revision: D36613053

Pulled By: mthrok

fbshipit-source-id: 69fedd6720d488dadf4dfe7d375ee76d216b215d

9ef6c23d

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

19 May, 2022 1 commit

Refactor Streamer implementation (#2402) · eed57534

moto authored May 19, 2022

Summary:
* Move the helper wrapping code in TorchBind layer to proper wrapper class for so that it will be re-used in PyBind11.
* Move `add_basic_[audio|video]_stream` methods from C++ to Python, as they are just string manipulation. This will make PyBind11-based binding simpler as it needs not to deal with dtype.
* Move `add_[audio|video]_stream` wrapper signature to Streamer core, so that Streamer directly deals with `c10::optional`.†

† Related to this, there is a slight change in how the empty filter expression is stored. Originally, if an empty filter expression was given to `add_[audio|video]_stream` method, the `StreamReaderOutputStream` was showing it as empty string `""`, even though internally it was using `"anull"` or `"null"`. Now `StreamReaderOutputStream` shows the corresponding filter expression that is actually being used.

Ref https://github.com/pytorch/audio/issues/2400

Pull Request resolved: https://github.com/pytorch/audio/pull/2402

Reviewed By: nateanl

Differential Revision: D36488808

Pulled By: mthrok

fbshipit-source-id: 877ca731364d10fc0cb9d97e75d55df9180f2047

eed57534

11 May, 2022 1 commit

Refactor the constructors of pointer wrappers (#2373) · 93c26d63

moto authored May 10, 2022

Summary:
This commit refactor the constructor of wrapper classes so that
wrapper classes are only responsible for deallocation of underlying
FFmpeg custom structures.

The responsibility of custom initialization is moved to helper functions.

Context:

FFmpeg API uses bunch of raw pointers, which require dedicated allocater
and deallcoator. In torchaudio we wrap these pointers with
`std::unique_ptr<>` to adopt RAII semantics.

Currently all of the customization logics required for `Streamer` are
handled by the constructor of wrapper class. Like the following;

```
AVFormatContextPtr(
      const std::string& src,
      const std::string& device,
      const std::map<std::string, std::string>& option);
```

This constructor allocates the raw `AVFormatContext*` pointer,
while initializing it with the given option, then it parses the
input media.

As we consider the write/encode features, which require different way
of initializing the `AVFormatContext*`, making it the responsibility
of constructors of `AVFormatContextPtr` reduce the flexibility.

Thus this commit moves the customization to helper factory function.

- `AVFormatContextPtr(...)` -> `get_input_format_context(...)`
- `AVCodecContextPtr(...)` -> `get_decode_context(...)`

Pull Request resolved: https://github.com/pytorch/audio/pull/2373

Reviewed By: hwangjeff

Differential Revision: D36230148

Pulled By: mthrok

fbshipit-source-id: 202d57d549223904ee958193f3b386ef5a9cda3a

93c26d63

10 May, 2022 1 commit

Add HW acceleration support on Streamer (#2331) · 54d2d04f

moto authored May 09, 2022

Summary:
This commits add `hw_accel` option to `Streamer::add_video_stream` method.
Specifying `hw_accel="cuda"` allows to create the chunk Tensor directly from CUDA,
when the following conditions are met.
1. the video format is H264,
2. underlying ffmpeg is compiled with NVENC, and
3. the client code specifies `decoder="h264_cuvid"`.

A simple benchmark yields x7 improvement in the decoding speed.

<details>

```python
import time

from torchaudio.prototype.io import Streamer

srcs = [
    "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
    "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",  # offline version
]

patterns = [
    ("h264_cuvid", None, "cuda:0"),  # NVDEC on CUDA:0 -> CUDA:0
    ("h264_cuvid", None, "cuda:1"),  # NVDEC on CUDA:1 -> CUDA:1
    ("h264_cuvid", None, None),  # NVDEC -> CPU
    (None, None, None),  # CPU
]

for src in srcs:
    print(src, flush=True)
    for (decoder, decoder_options, hw_accel) in patterns:
        s = Streamer(src)
        s.add_video_stream(5, decoder=decoder, decoder_options=decoder_options, hw_accel=hw_accel)

        t0 = time.monotonic()
        num_frames = 0
	for i, (chunk, ) in enumerate(s.stream()):
	    num_frames += chunk.shape[0]
        t1 = time.monotonic()
        print(chunk.dtype, chunk.shape, chunk.device)
        print(time.monotonic() - t0, num_frames, flush=True)
```
</details>

```
https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
10.781158386962488 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
10.771313901990652 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
27.88662809302332 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
83.22728440898936 6175
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:0
12.945253834011964 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cuda:1
12.870224556012545 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
28.03406483103754 6175
torch.uint8 torch.Size([5, 3, 1080, 1920]) cpu
82.6120332319988 6175
```

With HW resizing

<details>

```python
import time

from torchaudio.prototype.io import Streamer

srcs = [
    "./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
    "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4",
]

patterns = [
    # Decode with NVDEC, CUDA HW scaling -> CUDA:0
    ("h264_cuvid", {"resize": "960x540"}, "", "cuda:0"),
    # Decoded with NVDEC, CUDA HW scaling -> CPU
    ("h264_cuvid", {"resize": "960x540"}, "", None),
    # CPU decoding, CPU scaling
    (None, None, "scale=width=960:height=540", None),
]

for src in srcs:
    print(src, flush=True)
    for (decoder, decoder_options, filter_desc, hw_accel) in patterns:
        s = Streamer(src)
        s.add_video_stream(
            5,
            decoder=decoder,
            decoder_options=decoder_options,
            filter_desc=filter_desc,
            hw_accel=hw_accel,
        )

        t0 = time.monotonic()
        num_frames = 0
        for i, (chunk, ) in enumerate(s.stream()):
            num_frames += chunk.shape[0]
        t1 = time.monotonic()
        print(chunk.dtype, chunk.shape, chunk.device)
        print(time.monotonic() - t0, num_frames, flush=True)
```

</details>

```
./NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
12.890056837990414 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
10.697489063022658 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
85.19899423001334 6175

https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4
torch.uint8 torch.Size([5, 3, 540, 960]) cuda:0
10.712715593050234 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
11.030170071986504 6175
torch.uint8 torch.Size([5, 3, 540, 960]) cpu
84.8515750519582 6175
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2331

Reviewed By: hwangjeff

Differential Revision: D36217169

Pulled By: mthrok

fbshipit-source-id: 7979570b083cfc238ad4735b44305d8649f0607b

54d2d04f

14 Apr, 2022 1 commit

Support specifying decoder and its options (#2327) · be243c59

moto authored Apr 14, 2022

Summary:
This commit adds support to specify decoder to Streamer's add stream method.
This is roughly equivalent to `ffmpeg`'s `-c:v foo` and `-c:a foo` options.

This allows to override the decoder codec and/or specify the option of
the decoder.

This change allows to specify Nvidia NVDEC codec for supported formats,
which uses dedicated hardware for decoding the video.

 ---

Note: The CL might look overwhelming, but it's essentially, add new parameters in Python, and pass them down all the way to  `AVCodecContextPtr`, which initializes the actual decoder implementation (`AVCodecContext`.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2327

Reviewed By: carolineechen

Differential Revision: D35626904

Pulled By: mthrok

fbshipit-source-id: a115ed548624e53c16bacfecff5aa6c9d4e8bede

be243c59

10 Mar, 2022 1 commit

Fix typos and remove comments (#2270) · 4b47412e

moto authored Mar 10, 2022

Summary:
Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202

Pull Request resolved: https://github.com/pytorch/audio/pull/2270

Reviewed By: hwangjeff

Differential Revision: D34793460

Pulled By: mthrok

fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723

4b47412e

04 Mar, 2022 1 commit

Flush and reset internal state after seek (#2264) · 7e1afc40

moto authored Mar 04, 2022

Summary:
This commit adds the following behavior to `seek` so that `seek`
works after a frame is decoded.

1. Flush the decoder buffer.
2. Recreate filter graphs (so that internal state is re-initialized)
3. Discard the buffered tensor. (decoded chunks)

Also it disallows negative values for seek timestamp.

Pull Request resolved: https://github.com/pytorch/audio/pull/2264

Reviewed By: carolineechen

Differential Revision: D34497826

Pulled By: mthrok

fbshipit-source-id: 8b9a5bf160dfeb15f5cced3eed2288c33e2eb35d

7e1afc40

26 Feb, 2022 1 commit

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

02 Feb, 2022 1 commit

Add Streaming API (#2164) · 7a3e262d

moto authored Feb 01, 2022

Summary:
This PR adds the prototype streaming API.
The implementation is based on ffmpeg libraries.

For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2164

Reviewed By: hwangjeff

Differential Revision: D33934457

Pulled By: mthrok

fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe

7a3e262d

21 Jan, 2022 1 commit

Remove debug code and associated arguments from ffmpeg code (#2168) · 984b169e

moto authored Jan 21, 2022

Summary:
Part of https://github.com/pytorch/audio/issues/2164.
Removes debug code and associated arguments/fields left in previous PRs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2168

Reviewed By: hwangjeff

Differential Revision: D33712999

Pulled By: mthrok

fbshipit-source-id: 0729e9fbc146c48887379b6231e4d6e8cb520c44

984b169e

30 Dec, 2021 2 commits

Build ffmpeg-features in Linux/macOS unittests (#2114) · 9f14fa63

moto authored Dec 30, 2021

Summary:
Preparation to land Python front-end of ffmpeg-related features.

- Set BUILD_FFMPEG=1 in Linux/macOS unit test jobs
- Install ffmpeg and pkg-config from conda-forge
- Add note about Windows build process
- Temporarily avoid `av_err2str`

Pull Request resolved: https://github.com/pytorch/audio/pull/2114

Reviewed By: hwangjeff

Differential Revision: D33371346

Pulled By: mthrok

fbshipit-source-id: b0e16a35959a49a2166109068f3e0cbbb836e888

9f14fa63

Update and fill the rest of ffmpeg-integration C++ code (#2113) · 9cb75e74

moto authored Dec 30, 2021

Summary:
- Introduce AudioBuffer and VideoBuffer for different way of handling frames
- Update the way option dictionary is passed
- Remove unused AutoFrameUnref
- Add SrcStreamInfo/OutputStreamInfo classes

Pull Request resolved: https://github.com/pytorch/audio/pull/2113

Reviewed By: nateanl

Differential Revision: D33356144

Pulled By: mthrok

fbshipit-source-id: e837e84fae48baa7befd5c70599bcd2cbb61514d

9cb75e74

29 Dec, 2021 1 commit

Add Streamer class (#2046) · bb528d7e

moto authored Dec 29, 2021

Summary:
Part of https://github.com/pytorch/audio/issues/1986. Splitting the PR for easier review.

Add `Streamer` class that bundles `StreamProcessor` and handle input.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after https://github.com/pytorch/audio/issues/2045.

Pull Request resolved: https://github.com/pytorch/audio/pull/2046

Reviewed By: carolineechen

Differential Revision: D33299863

Pulled By: mthrok

fbshipit-source-id: 6470cbe061057c8cb970ce7bb5692be04efb5fe9

bb528d7e