Commits · a78ba38932bf990b3d7e44d027239bbcb76fad0e · OpenDAS / Torchaudio

24 Oct, 2023 1 commit

moto-meta authored Oct 24, 2023

Differential Revision: D50506299

Pull Request resolved: https://github.com/pytorch/audio/pull/3669

a78ba389

12 Oct, 2023 1 commit

Resolve lint issues · d947dee0

moto-meta authored Oct 12, 2023

Differential Revision: D50205775

Pull Request resolved: https://github.com/pytorch/audio/pull/3651

d947dee0

11 Oct, 2023 1 commit

Move libtorchaudio_ffmpeg to dedicated directory · 2836a23d

moto-meta authored Oct 11, 2023

Differential Revision: D50082877

Pull Request resolved: https://github.com/pytorch/audio/pull/3646

2836a23d

09 Oct, 2023 1 commit

Migrate to src-layout · ec13a815

moto-meta authored Oct 09, 2023

Differential Revision: D49965263

Pull Request resolved: https://github.com/pytorch/audio/pull/3639

ec13a815

13 Jul, 2023 1 commit

Revert D47402174: [audio][PR] Resolve some compilation warnings · 155d1bae

Moto Hira authored Jul 13, 2023

Differential Revision:
D47402174

Original commit changeset: 00c0719ab184

Original Phabricator Diff: D47402174

fbshipit-source-id: b1f6ea4cc3ecef3f72a87bf2f67bf9644c847546

155d1bae

12 Jul, 2023 1 commit

Resolve some compilation warnings (#3471) · a6d1fec0

moto authored Jul 12, 2023

Summary:
- FFmpeg 6 deprecated attributes
- Guard CUDA specific functions not used in CPU builds

Pull Request resolved: https://github.com/pytorch/audio/pull/3471

Differential Revision: D47402174

Pulled By: mthrok

fbshipit-source-id: 00c0719ab1849b50c0b56b03d8fb38bc7aa74538

a6d1fec0

05 Jul, 2023 1 commit

Revert "[audio][PR] Add option to dlopen FFmpeg libraries (#3402)" (#3456) · ca66a1d3

moto authored Jul 05, 2023

Summary:
This reverts commit b7d3e89a.

We will use pre-built binaries instead of dlopen.

Pull Request resolved: https://github.com/pytorch/audio/pull/3456

Differential Revision: D47239681

Pulled By: mthrok

fbshipit-source-id: 0446a62410d914081184fc20c386afa00b1e41b6

ca66a1d3

08 Jun, 2023 1 commit

Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca

moto authored Jun 08, 2023

Summary:
StreamReader decoding process is composed of the three steps;

1. Decode the incoming AVPacket into AVFrame
2. Pass AVFrame through AVFilter to perform post process
3. Convert the resulgint AVFrame

The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.

For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405

AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.

Fix https://github.com/pytorch/audio/issues/3405

Pull Request resolved: https://github.com/pytorch/audio/pull/3419

Differential Revision: D46557505

Pulled By: mthrok

fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6

7dff24ca

03 Jun, 2023 1 commit

[audio][PR] Add option to dlopen FFmpeg libraries (#3402) · b7d3e89a

Moto Hira authored Jun 02, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3402

This is a second attempt of https://github.com/pytorch/audio/pull/3353.

The basic logic to enable dlopen for FFmpeg libraries are same.
It uses `at::DynamicLibrary`, which allows to compile torchaudio without
linking FFmpeg libraries.

This time, the option to enable this feature DLOPEN_FFMPEG has been added,
so that users have a way to disable this feature and keep using build-time
linking.

Please refer to stub.h for more technical detail.

Differential Revision: D46403783

fbshipit-source-id: ca3db57ff6bdc50c8c225d22f12f3e76c6dc3f16

b7d3e89a

02 Jun, 2023 1 commit

Revert D46059199: [audio][PR] Use dlopen for FFmpeg · ab7a39f7

Moto Hira authored Jun 02, 2023

Differential Revision:
D46059199

Original commit changeset: 4493a5fd8a4c

Original Phabricator Diff: D46059199

fbshipit-source-id: 71cde3f8cd870d1ad9114e3e87cdd1ba564441c0

ab7a39f7

01 Jun, 2023 1 commit

Use dlopen for FFmpeg (#3353) · b14ced1a

moto authored Jun 01, 2023

Summary:
This commit changes the way FFmpeg extension is built and used.
Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time,
It uses dlopen to search and link them at run time.

For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides
portable wrapper.

Pull Request resolved: https://github.com/pytorch/audio/pull/3353

Differential Revision: D46059199

Pulled By: mthrok

fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d

b14ced1a

17 May, 2023 3 commits

Improve the performance of YUV420P frame conversion (#3342) · 72d3fe09

moto authored May 17, 2023

Summary:
This commit improve the performance of conversions of YUV420P format from AVFrame to torch Tensor.

It changes two things;
1. Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
2.  Get rid of intermediate UV plane copy

The following compares the time it takes to process 30 seconds of YUV420P frame at 25 FPS of resolution 320x240. The measurement times are sorted by values.

Some observations
* `torch::nn::functional::interpolate` with `torch::kNearest` option is not as fast as copying data manually.
* switching from `interpolate` to manual data copy reduces the variance.

run | main | 1 | 1+2 | improvement (from main to 1+2)
-- | -- | -- | -- | --
1 | 0.452250583 | 0.417490125 | 0.40155375 | 11.21%
2 | 0.462039958 | 0.42006675 | 0.401764125 | 13.05%
3 | 0.463067666 | 0.42416 | 0.402651334 | 13.05%
4 | 0.464228166 | 0.424545458 | 0.402985667 | 13.19%
5 | 0.465777375 | 0.425629208 | 0.405604625 | 12.92%
6 | 0.469628666 | 0.427044333 | 0.40628525 | 13.49%
7 | 0.475935125 | 0.42805875 | 0.406412167 | 14.61%
8 | 0.482277667 | 0.429921209 | 0.407279 | 15.55%
9 | 0.496695208 | 0.431182792 | 0.442013791 | 11.01%
10 | 0.546653625 | 0.541639584 | 0.4711585 | 13.81%

[second]

Increasing the resolution, the improvement is smaller but is consistent.

run | main | 1+2 | improvement
-- | -- | -- | --
1 | 4.032393 | 3.991784667 | 1.01%
2 | 4.052248084 | 3.992672208 | 1.47%
3 | 4.07705575 | 4.000541666 | 1.88%
4 | 4.143954792 | 4.020671584 | 2.98%
5 | 4.170711959 | 4.025753125 | 3.48%
6 | 4.240229292 | 4.045504875 | 4.59%
7 | 4.267384042 | 4.045588125 | 5.20%
8 | 4.277025958 | 4.061980083 | 5.03%
9 | 4.312192042 | 4.163251959 | 3.45%
10 | 4.406109875 | 4.312560334 | 2.12%

<details><summary>code</summary>

```python
import time

from torchaudio.io import StreamReader

def test():
    r = StreamReader(src="testsrc=duration=30", format="lavfi")
    # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
    r.add_video_stream(-1, filter_desc="format=yuv420p")
    t0 = time.monotonic()
    r.process_all_packets()
    elapsed = time.monotonic() - t0
    print(elapsed)

for _ in range(10):
    test()
```
</details>

<details><summary>env</summary>

```
PyTorch version: 2.1.0.dev20230325
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.6
CMake version: version 3.22.1
Libc version: N/A

Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] torch==2.1.0.dev20230325
[pip3] torchaudio==2.1.0a0+541b525
[conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
[conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
```

</details>

Pull Request resolved: https://github.com/pytorch/audio/pull/3342

Reviewed By: xiaohui-zhang

Differential Revision: D45947716

Pulled By: mthrok

fbshipit-source-id: 17e5930f57544b4f2e48a9b2185464694a88ab68

72d3fe09

Improve the performance of NV12 frame conversion (#3344) · c11661e0

moto authored May 17, 2023

Summary:
Similar to https://github.com/pytorch/audio/pull/3342, this commit improves the performance of NV12 frame conversion.

It changes two things;

- Change the implementation of nearest-neighbor upsampling from `torch::nn::functional::interpolate` to manual data copy.
- Get rid of intermediate UV plane copy

with 320x240

run | main | pr | improvement
-- | -- | -- | --
1 | 0.600671417 | 0.464993125 | 22.59%
2 | 0.638846084 | 0.456763542 | 28.50%
3 | 0.64158175 | 0.458295333 | 28.57%
4 | 0.649868584 | 0.455450583 | 29.92%
5 | 0.612171333 | 0.462435625 | 24.46%
6 | 0.6128095 | 0.456716166 | 25.47%
7 | 0.632084583 | 0.463357083 | 26.69%
8 | 0.610733083 | 0.46148625 | 24.44%
9 | 0.613825834 | 0.4559555 | 25.72%
10 | 0.653857458 | 0.455375375 | 30.36%

[second]

with 1080x720 video

run | main | pr | improvement
-- | -- | -- | --
1 | 4.984154333 | 4.21090375 | 15.51%
2 | 4.988090625 | 4.239649375 | 15.00%
3 | 4.988896375 | 4.227277458 | 15.27%
4 | 4.998186584 | 4.161077042 | 16.75%
5 | 5.06180425 | 4.191672584 | 17.19%
6 | 5.108769667 | 4.198468458 | 17.82%
7 | 5.151363625 | 4.181942167 | 18.82%
8 | 5.199527875 | 4.239319084 | 18.47%
9 | 5.224903708 | 4.194901959 | 19.71%
10 | 5.333422583 | 4.320925792 | 18.98%

[second]

<details><summary>code</summary>

```python
import time

from torchaudio.io import StreamReader

def test():
    r = StreamReader(src="testsrc=duration=30", format="lavfi")
    # r = StreamReader(src="testsrc=duration=30:size=1080x720", format="lavfi")
    r.add_video_stream(-1, filter_desc="format=nv12")
    t0 = time.monotonic()
    r.process_all_packets()
    elapsed = time.monotonic() - t0
    print(elapsed)

for _ in range(10):
    test()
```
</details>

<details><summary>env</summary>

```
PyTorch version: 2.1.0.dev20230325
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.6
CMake version: version 3.22.1
Libc version: N/A

Python version: 3.9.16 (main, Mar  8 2023, 04:29:24)  [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] torch==2.1.0.dev20230325
[pip3] torchaudio==2.1.0a0+541b525
[conda] pytorch                   2.1.0.dev20230325         py3.9_0    pytorch-nightly
[conda] torchaudio                2.1.0a0+541b525           dev_0    <develop>
```

</details>

Pull Request resolved: https://github.com/pytorch/audio/pull/3344

Reviewed By: xiaohui-zhang

Differential Revision: D45948511

Pulled By: mthrok

fbshipit-source-id: ae9b300cbcb4295f3f7470736f258280005a21e5

c11661e0

Add 420p10le CPU support to StreamReader (#3332) · c12f4734

moto authored May 16, 2023

Summary:
This commit add support to decode YUV420P010LE format.

The image tensor returned by this format
- NCHW format (C == 3)
- int16 type
- value range [0, 2^10).

Note that the value range is different from what "hevc_cuvid" decoder
returns. "hevc_cuvid" decoder uses full range of int16 (internally,
it's uint16) to express the color (with some intervals), but the values
returned by CPU "hevc" decoder are with in [0, 2^10).

Address https://github.com/pytorch/audio/issues/3331

Pull Request resolved: https://github.com/pytorch/audio/pull/3332

Reviewed By: hwangjeff

Differential Revision: D45925097

Pulled By: mthrok

fbshipit-source-id: 4e669b65c030f388bba2fdbb8f00faf7e2981508

c12f4734

23 Mar, 2023 2 commits

Support YUV444P in GPU decoder (#3199) · 3240de92

moto authored Mar 23, 2023

Summary:
With the support of CUDA filter in https://github.com/pytorch/audio/issues/3183, it is now possible to change the pixel format of CUDA frame.

This commit adds conversion for YUV444P format.

Pull Request resolved: https://github.com/pytorch/audio/pull/3199

Reviewed By: hwangjeff

Differential Revision: D44323928

Pulled By: mthrok

fbshipit-source-id: 6d9b205e7235df5f21e7d3e06166b3a169f1ae9f

3240de92

Warn if decoding YUV images with different plane size (#3201) · 92eff154

moto authored Mar 23, 2023

Summary:
StreamReader behaves differently when dealing with YUV formats.
It implicitly converts the image format to YUV444P because
otherwise image planes do not have the same shape and it is not
possible to express it as a regular PyTorch Tensor with dedicated
dimension for each color channel.

This is commit adds warnings to such conversions.

Pull Request resolved: https://github.com/pytorch/audio/pull/3201

Reviewed By: nateanl

Differential Revision: D44311017

Pulled By: mthrok

fbshipit-source-id: 73a02a19c013c0263f349e1f3a3603e3d3eddb6a

92eff154

16 Mar, 2023 1 commit

Refactor Tensor conversion in StreamReader (#3170) · 014d7140

moto authored Mar 15, 2023

Summary:
Currently, when the Buffer converts AVFrame* to torch::Tensor,
it checks the format at each time a frame is passed, and
perform the conversion.

This commit changes it so that the conversion operation is
pre-instantiated at the time outside stream is configured.

It introduces Converter implementations for various formats,
and use template to embed them in Buffer class.
This way, branching like if/switch are eliminated from
decoding path.

Pull Request resolved: https://github.com/pytorch/audio/pull/3170

Reviewed By: xiaohui-zhang

Differential Revision: D44048293

Pulled By: mthrok

fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f

014d7140