Commits · a4036248fd4de3662f8fb61c8783bf6bc49b3de7 · OpenDAS / Torchaudio

01 Apr, 2023 1 commit

moto authored Mar 31, 2023

Summary:
This commit adds a new feature AudioEffector, which can be used to
apply various effects and codecs to waveforms in Tensor.

Under the hood it uses StreamWriter and StreamReader to apply
filters and encode/decode.

This is going to replace the deprecated `apply_codec` and
`apply_sox_effect_tensor` functions.

It can also perform online, chunk-by-chunk filtering.

Tutorial to follow.

closes https://github.com/pytorch/audio/issues/3161

Pull Request resolved: https://github.com/pytorch/audio/pull/3163

Reviewed By: hwangjeff

Differential Revision: D44576660

Pulled By: mthrok

fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb

a4036248

30 Mar, 2023 2 commits

Support encode spec change in StreamWriter (#3207) · 1b648626

moto authored Mar 30, 2023

Summary:
This commit adds support for changing the spec of media
(such as sample rate, #channels, image size and frame rate)
on-the-fly at encoding time.

The motivation behind this addition is that certain media
formats support only limited number of spec, and it is
cumbersome to require client code to change the spec
every time.

For example, OPUS supports only 48kHz sampling rate, and
vorbis only supports stereo.

To make it easy to work with media of different formats,
this commit makes it so that anything that's not compatible
with the format is automatically converted, and allows
users to specify the override.

Notable implementation detail is that, for sample format and
pixel format, the default value of encoder has higher precedent
to source value, while for other attributes like sample rate and
#channels, the source value has higher precedent as long as
they are supported.

Pull Request resolved: https://github.com/pytorch/audio/pull/3207

Reviewed By: nateanl

Differential Revision: D44439622

Pulled By: mthrok

fbshipit-source-id: 09524f201d485d201150481884a3e9e4d2aab081

1b648626

Support changing the number of channels in StreamReader (#3216) · 4bc4ca75

moto authored Mar 29, 2023

Summary:
This commit adds `num_channels` argument,
which allows one to change the number of channels on-the-fly.

Pull Request resolved: https://github.com/pytorch/audio/pull/3216

Reviewed By: hwangjeff

Differential Revision: D44516925

Pulled By: mthrok

fbshipit-source-id: 3e5a11b3fdbb19071f712a8148e27aff60341df3

4bc4ca75

29 Mar, 2023 1 commit

Reduce io tests (#3217) · 09ccf7cc

Moto Hira authored Mar 29, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3217

This commit removes some tests for file-like object from StreamWriter test.

The rational is that testing things after the output file is opened are
same for file-like object and regular files. Things like filter-graph and
encoder format change does not affect how the encoded bynary are written.

Reviewed By: hwangjeff

Differential Revision: D44518626

fbshipit-source-id: 821ec20deca92e5e5c85bf4d47997eed51735374

09ccf7cc

28 Mar, 2023 1 commit

Add additional filter graph option to StreamWriter (#3194) · 715eb34a

moto authored Mar 28, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3194

Reviewed By: hwangjeff

Differential Revision: D44283910

Pulled By: mthrok

fbshipit-source-id: 49125724896bf7190ec27f056b6bfef260019f8e

715eb34a

27 Mar, 2023 1 commit

Revise encoder config arg and docstrings (#3203) · b1de9f1a

hwangjeff authored Mar 27, 2023

Summary:
For `StreamWriter`,
* Renames arg `config` to codec_config`.
* Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`.
* Adds docstrings for arg codec_config`.
* Updates `chunk` to `frames` in `write_*_chunk` methods.

Pull Request resolved: https://github.com/pytorch/audio/pull/3203

Reviewed By: mthrok

Differential Revision: D44350153

Pulled By: hwangjeff

fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343

b1de9f1a

25 Mar, 2023 1 commit

Properly set #samples passed to encoder (#3204) · d8a37a21

moto authored Mar 25, 2023

Summary:
Some audio encoders expect specific, exact number of samples described as in `AVCodecContext.frame_size`.

The `AVFrame.nb_samples` is set for the frames passed to `AVFilterGraph`,
but frames coming out of the graph do not necessarily have the same numbr of frames.

This causes issues with encoding OPUS (among others).

This commit fixes it by inserting `asetnsamples` to filter graph if a fixed number of samples is requested.

Note:
It turned out that FFmpeg 4.1 has issue with OPUS encoding. It does not properly discard some sample.
We should probably move the minimum required FFmpeg to 4.2, but I am not sure if we can enforce it via ABI.
Work around will be to issue an warning if encoding OPUS with 4.1. (follow-up)

Pull Request resolved: https://github.com/pytorch/audio/pull/3204

Reviewed By: nateanl

Differential Revision: D44374668

Pulled By: mthrok

fbshipit-source-id: 10ef5333dc0677dfb83c8e40b78edd8ded1b21dc

d8a37a21

23 Mar, 2023 2 commits

Support YUV444P in GPU decoder (#3199) · 3240de92

moto authored Mar 23, 2023

Summary:
With the support of CUDA filter in https://github.com/pytorch/audio/issues/3183, it is now possible to change the pixel format of CUDA frame.

This commit adds conversion for YUV444P format.

Pull Request resolved: https://github.com/pytorch/audio/pull/3199

Reviewed By: hwangjeff

Differential Revision: D44323928

Pulled By: mthrok

fbshipit-source-id: 6d9b205e7235df5f21e7d3e06166b3a169f1ae9f

3240de92

Set "experimental" automatically when using native opus/vorbis encoder (#3192) · bf1214a9

moto authored Mar 23, 2023

Summary:
OPUS encoder and VORBIS encoders require "strict=experimental" flags. This commit enables it automatically.

The rational behind of it is typically we care if we can encode these formats at all and not how they are encoded. (This might be concern when these encoder becomes more mature on FFmpeg side and providing flags would result in weird behavior)

Also when writing high-level functions that uses StreamWriter, if we do not set these flags, then these high-level functions have to add new options that should be passed down to StreamWriter, which turned out to be very painful in https://github.com/pytorch/audio/issues/3163

Pull Request resolved: https://github.com/pytorch/audio/pull/3192

Reviewed By: nateanl

Differential Revision: D44275089

Pulled By: mthrok

fbshipit-source-id: 74a757b4b7fc8467c8c88ffcb54fbaf89d6e4384

bf1214a9

20 Mar, 2023 1 commit

Support CUDA frame in FilterGraph (#3183) · c5b96558

moto authored Mar 20, 2023

Summary:
This commit adds CUDA frame support to FilterGraph

It initializes and attaches CUDA frames context to FilterGraph,
so that CUDA frames can be processed in FilterGraph.

As a result, it enables
1. CUDA filter support such as `scale_cuda`
2. Properly retrieve the pixel format coming out of FilterGraph when
   CUDA HW acceleration is enabled. (currently it is reported as "cuda")

Resolves https://github.com/pytorch/audio/issues/3159

Pull Request resolved: https://github.com/pytorch/audio/pull/3183

Reviewed By: hwangjeff

Differential Revision: D44183722

Pulled By: mthrok

fbshipit-source-id: 522d21039c361ddfaa87fa89cf49c19d210ac62f

c5b96558

17 Mar, 2023 1 commit

Add EncodingConfig (#3179) · 9bb35070

moto authored Mar 16, 2023

Summary:
Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level.

Pull Request resolved: https://github.com/pytorch/audio/pull/3179

Pull Request resolved: https://github.com/pytorch/audio/pull/3164

Reviewed By: mthrok

Differential Revision: D43861413

Pulled By: hwangjeff

fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161

9bb35070

16 Mar, 2023 1 commit

Refactor Tensor conversion in StreamReader (#3170) · 014d7140

moto authored Mar 15, 2023

Summary:
Currently, when the Buffer converts AVFrame* to torch::Tensor,
it checks the format at each time a frame is passed, and
perform the conversion.

This commit changes it so that the conversion operation is
pre-instantiated at the time outside stream is configured.

It introduces Converter implementations for various formats,
and use template to embed them in Buffer class.
This way, branching like if/switch are eliminated from
decoding path.

Pull Request resolved: https://github.com/pytorch/audio/pull/3170

Reviewed By: xiaohui-zhang

Differential Revision: D44048293

Pulled By: mthrok

fbshipit-source-id: 30d8b240a5695d7513f499ce17853f2f0ffcab9f

014d7140

08 Mar, 2023 2 commits

Include format information after filter (#3155) · 146195d8

moto authored Mar 08, 2023

Summary:
This commit adds fields to OutputStream, which shows the result
of fitlers, such as width and height after filtering.

Before

```
OutputStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
```

After

```
OutputVideoStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
    media_type='video',
    format='gray',
    width=320,
    height=320,
    frame_rate=3.0)
```

Pull Request resolved: https://github.com/pytorch/audio/pull/3155

Reviewed By: nateanl

Differential Revision: D43882399

Pulled By: mthrok

fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d

146195d8

Support overwriting PTS in StreamWriter (#3135) · 8d2f6f8d

moto authored Mar 08, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3135

Reviewed By: xiaohui-zhang

Differential Revision: D43724273

Pulled By: mthrok

fbshipit-source-id: 9b52823618948945a26e57d5b3deccbf5f9268c1

8d2f6f8d

07 Mar, 2023 1 commit

Raise an error is StreamWriter is not opened (#3152) · 502d5811

Moto Hira authored Mar 07, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3152

In StreamWriter, if the destination is not opened when attempting to write data, it causes segmentation fault.
This commit adds guard so that instead of segfault, it will error-out.

Reviewed By: nateanl

Differential Revision: D43852649

fbshipit-source-id: aef5db7c1508f8a7db5834c2ab6de3cad09f9d60

502d5811

02 Mar, 2023 1 commit

Fix PTS regression (#3131) · fbf05f28

Moto Hira authored Mar 01, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3131

In https://github.com/pytorch/audio/pull/3122, the intermediate `num_frames` variable
is removed.

PTS can be incremented the same way, but the timing was wrong in #3122.
This commit fixes it.

Reviewed By: xiaohui-zhang

Differential Revision: D43712046

fbshipit-source-id: 2fe0082969296f4f3964e62e55b5325fcd45f4f9

fbf05f28

23 Feb, 2023 1 commit

Remove Tensor binding from StreamReader (#3093) · d3c9295c

mthrok authored Feb 23, 2023

Summary:
Remove the Tensor input support from StreamReader

Follow up of https://github.com/pytorch/audio/pull/3086

Pull Request resolved: https://github.com/pytorch/audio/pull/3093

Reviewed By: xiaohui-zhang

Differential Revision: D43526066

Pulled By: mthrok

fbshipit-source-id: 57ba4866c413649173e1c2c3b23ba7de3231b7bc

d3c9295c

07 Feb, 2023 1 commit

Add playback function (#3026) · 2ead941e

juan.azcarreta.ortiz authored Feb 07, 2023

Summary:
Allows user to play audio through the
device speaker.

Pull Request resolved: https://github.com/pytorch/audio/pull/3026

Test Plan:
Created a new test that mocks a call to the write audio chunk method from StreamWriter. To run the test:

`pytest test/torchaudio_unittest/io/_playback_test.py`

Reviewed By: mthrok

Differential Revision: D43082062

Pulled By: jazcarretao

fbshipit-source-id: 01a85b32ce925687a633d1208d15d54556e89dd8

2ead941e

04 Feb, 2023 1 commit

Add rgb48le and CUDA p010 support (HDR/10bit) to StreamReader (#3023) · b7e173fa

Tristan Rice authored Feb 04, 2023

Summary:
This adds 2 10 bit pix formats one for CPU and one for CUDA. This allows for training on HDR/10bit video datasets.

Pull Request resolved: https://github.com/pytorch/audio/pull/3023

Test Plan:
```py
r = StreamReader(
    reader, format='hevc',
)
stream = r.add_video_stream(
    frames_per_chunk=-1,
    decoder="hevc_cuvid",
    hw_accel="cuda",
)
frame = next(r.stream())
```

```py
r = StreamReader(
    reader, format='hevc',
)
stream = r.add_video_stream(
    frames_per_chunk=-1,
    filter_desc="format=rgb48le",
)
frame = next(r.stream())
```

![audio-example](https://user-images.githubusercontent.com/909104/215696543-ed3dc5a3-3013-4a57-8b98-05aa4a5a9a7c.png)

Reviewed By: xiaohui-zhang

Differential Revision: D43019191

Pulled By: mthrok

fbshipit-source-id: fe4359e525b24c8b856dfdf3d2f8596871566350

b7e173fa

22 Jan, 2023 1 commit

Make StreamReader return PTS (#2975) · 0dd59e0d

moto authored Jan 22, 2023

Summary:
This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.

Example

```python
from torchaudio.io import StreamReader

s = StreamReader(...)
s.add_video_stream(...)
for (video_chunk, ) in s.stream():
    # video_chunk is Torch tensor type but has extra attribute of PTS
    print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
```

For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
of Tensor and metadata, but works like a normal tensor in PyTorch operations.

The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).

It was also suggested to attach metadata directly to Tensor object,
but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
PyTorch cannot be ignored, so we use Tensor subclass implementation.

If any unexpected issue arise from metadata attribute name collision, client code can
fetch the bare Tensor and continue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2975

Reviewed By: hwangjeff

Differential Revision: D42526945

Pulled By: mthrok

fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35

0dd59e0d

16 Jan, 2023 1 commit

Refactor chunked buffer implementation (#2984) · 52b6bc3b

moto authored Jan 16, 2023

Summary:
So that the number of Tensor frames stored in buffers is always a multiple of frames_per_chunk.

This makes it easy to store PTS values in aligned manner.

Pull Request resolved: https://github.com/pytorch/audio/pull/2984

Reviewed By: nateanl

Differential Revision: D42526670

Pulled By: mthrok

fbshipit-source-id: d83ee914b7e50de3b51758069b0e0b6b3ebe2e54

52b6bc3b

12 Jan, 2023 1 commit

Add `buffer_chunk_size=-1` option (#2969) · 22788a8f

moto authored Jan 11, 2023

Summary:
This commit adds `buffer_chunk_size=-1`, which does not drop buffered frames.

Pull Request resolved: https://github.com/pytorch/audio/pull/2969

Reviewed By: xiaohui-zhang

Differential Revision: D42403467

Pulled By: mthrok

fbshipit-source-id: a0847e6878874ce7e4b0ec3f56e5fbb8ebdb5992

22788a8f

10 Jan, 2023 1 commit

Update the handling of videos without PTS values (#2970) · 1717edaa

moto authored Jan 10, 2023

Summary:
filter graph does not fallback to `best_effort_timestamp`, thus applying filters (like changing fps) on videos without PTS values failed.

This commit changes the behavior by overwriting the PTS values with best_effort_timestamp.

Pull Request resolved: https://github.com/pytorch/audio/pull/2970

Reviewed By: YosuaMichael

Differential Revision: D42425771

Pulled By: mthrok

fbshipit-source-id: 7b7a033ea2ad89bb49d6e1663d35d377dab2aae9

1717edaa

30 Dec, 2022 1 commit

Refactor and optimize yuv420p and nv12 processing (#2945) · cc0d1e0b

moto authored Dec 29, 2022

Summary:
This commit refactors and optimizes functions that converts AVFrames of `yuv420p` and `nv12` into PyTorch's Tensor.
The performance is improved about 30%.

1. Reduce the number of intermediate Tensors allocated.
2. Replace 2 calls to `repeat_interleave` with `F::interpolate`.

 * (`F::interpolate` is about 5x faster than `repeat_interleave`. )
    <details><summary>code</summary>

    ```bash
    #!/usr/bin/env bash

    set -e

    python -c """
    import torch
    import torch.nn.functional as F

    a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
    val1 = a.repeat_interleave(2, -1).repeat_interleave(2, -2)
    val2 = F.interpolate(a.view((1, 1, 7, 7, 1)), size=[14, 14, 1], mode=\"nearest\")
    print(torch.sum(torch.abs(val1 - val2[0, 0, :, :, 0])))
    """

    python3 -m timeit \
            --setup """
    import torch

    a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
    """ \
            """
    a.repeat_interleave(2, -1).repeat_interleave(2, -2)
    """

    python3 -m timeit \
            --setup """
    import torch
    import torch.nn.functional as F

    a = torch.arange(49, dtype=torch.uint8).reshape(7, 7).clone()
    """ \
            """
    F.interpolate(a.view((1, 1, 7, 7, 1)), size=[14, 14, 1], mode=\"nearest\")
    """
    ```

    </details>

    ```
    tensor(0)
    10000 loops, best of 5: 38.3 usec per loop
    50000 loops, best of 5: 7.1 usec per loop
    ```

## Benchmark Result

<details><summary>code</summary>

```bash
#!/usr/bin/env bash

set -e

mkdir -p tmp

for ext in avi mp4; do
    for duration in 1 5 10 30 60; do
        printf "Testing ${ext} ${duration} [sec]\n"

        test_data="tmp/test_${duration}.${ext}"
        if [ ! -f "${test_data}" ]; then
            printf "Generating test data\n"
            ffmpeg -hide_banner -f lavfi -t ${duration} -i testsrc "${test_data}" > /dev/null 2>&1
        fi

        python -m timeit \
               --setup="from torchaudio.io import StreamReader" \
               """
r = StreamReader(\"${test_data}\")
r.add_basic_video_stream(frames_per_chunk=-1, format=\"yuv420p\")
r.process_all_packets()
r.pop_chunks()
"""
    done
done
```

</details>

![Time to decode AVI file](https://user-images.githubusercontent.com/855818/210008881-8cc83f18-0e51-46e3-afe9-a5ff5dff041e.png)

<details><summary>raw data</summary>

Video Type - AVI
Duration | Before | After
-- | -- | --
1 | 10.3 | 6.29
5 | 44.3 | 28.3
10 | 89.3 | 56.9
30 | 265 | 185
60 | 555 | 353
</details>

![Time to decode MP4 file](https://user-images.githubusercontent.com/855818/210008891-c4546c52-43d7-49d0-8eff-d866ad627129.png)

<details><summary>raw data</summary>

Video Type - MP4
Duration | Before | After
-- | -- | --
1 | 15.3 | 10.5
5 | 62.1 | 43.2
10 | 124 | 83.8
30 | 380 | 252
60 | 721 | 511
</details>

Pull Request resolved: https://github.com/pytorch/audio/pull/2945

Reviewed By: carolineechen

Differential Revision: D42283269

Pulled By: mthrok

fbshipit-source-id: 59840f943ff516b69ab8ad35fed7104c48a0bf0c

cc0d1e0b

20 Dec, 2022 1 commit

Fallback to best_effort_timestamp in case of invalid PTS (#2916) · c6bc65fd

moto authored Dec 20, 2022

Summary:
If the input video has invalid PTS, the current precise seek fails except when seeking into t=0.

This commit updates the discard mechanism to fallback to `best_effort_timestamp` in such cases.

`best_effort_timestamp` is just the number of frames went through decoder starting from the beginning of the file.

This means if the input file is very long, but seeking towards the end of the file, the StreamReader still decodes all the frames.

For videos with valid PTS, `best_effort_timestamp` should be same as `pts`. [[src](https://ffmpeg.org/doxygen/4.1/decode_8c.html#a8d86329cf58a4adbd24ac840d47730cf)]

Pull Request resolved: https://github.com/pytorch/audio/pull/2916

Reviewed By: YosuaMichael

Differential Revision: D42170204

Pulled By: mthrok

fbshipit-source-id: 80c04dc376e0f427d41eb9feb44c251a1648a998

c6bc65fd

04 Nov, 2022 1 commit

Fix decimal FPS handling StreamWriter (#2831) · 6bd38512

moto authored Nov 04, 2022

Summary:
StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption.

This commit fixes it by properly computing time_base from frame rate.

Address https://github.com/pytorch/audio/issues/2830

Pull Request resolved: https://github.com/pytorch/audio/pull/2831

Reviewed By: carolineechen

Differential Revision: D41036084

Pulled By: mthrok

fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609

6bd38512

31 Oct, 2022 1 commit

Add precise seek (#2737) · 60f29ca0

Joao Gomes authored Oct 31, 2022

Summary:
cc mthrok

Implements precise seek and seek to any frame in torchaudio

Pull Request resolved: https://github.com/pytorch/audio/pull/2737

Reviewed By: mthrok

Differential Revision: D40546716

Pulled By: jdsgomes

fbshipit-source-id: d37da7f55977337eb16a3c4df44ce8c3c102698e

60f29ca0

25 Oct, 2022 1 commit

Fix issue with the missing video frame in StreamWriter (#2789) · 17a2b93b

moto authored Oct 24, 2022

Summary:
Addresses https://github.com/pytorch/audio/issues/2790.

Previously AVPacket objects had duration==0.

`av_interleaved_write_frame` function was inferring the duration of packets by
comparing them against the next ones but It could not infer the duration of
the last packet, as there is no subsequent frame, thus was omitting it from the final data.

This commit fixes it by explicitly setting packet duration = 1 (one frame)
only for video. (audio AVPacket contains multiple samples, so it's different.
To ensure the correctness for audio, the tests were added.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2789

Reviewed By: xiaohui-zhang

Differential Revision: D40627439

Pulled By: mthrok

fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320

17a2b93b

21 Sep, 2022 1 commit

Support in-memory decoding via Tensor wrapper in StreamReader (#2694) · c5a43372

Moto Hira authored Sep 20, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

c5a43372

01 Sep, 2022 1 commit

Add file-like object support to StreamWriter (#2648) · 28da8b84

moto authored Aug 31, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

28da8b84

24 Aug, 2022 1 commit

Add StreamWriter (#2628) · 72404de9

moto authored Aug 24, 2022

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

72404de9

07 Jul, 2022 1 commit

Add YUV444P support to StreamReader (#2516) · b2a90f91

moto authored Jul 06, 2022

Summary:
This commit add support for `"yuv444p"` type as output format of StreamReader.

Pull Request resolved: https://github.com/pytorch/audio/pull/2516

Reviewed By: hwangjeff

Differential Revision: D37659715

Pulled By: mthrok

fbshipit-source-id: eae9b5590d8f138a6ebf3808c08adfe068f11a2b

b2a90f91

28 Jun, 2022 1 commit

Refactor AVDictionary clean up (#2507) · 0ad03adf

moto authored Jun 27, 2022

Summary:
Small clean up in ffmpeg binding code.

1. Make `get_option_dict` and `clean_up_dict` public utility
2. Merge the exception into `clean_up_dict`
3. Get rid of custom string join function and use `c10::Join`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2507

Reviewed By: hwangjeff

Differential Revision: D37466022

Pulled By: mthrok

fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969

0ad03adf

27 Jun, 2022 2 commits

Add missing __init__ in io test directory (#2511) · d50ed521

moto authored Jun 27, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2511

Reviewed By: nateanl

Differential Revision: D37461021

Pulled By: mthrok

fbshipit-source-id: 6f894c02bbefc5afda0f9584d26ad785f7c71ee4

d50ed521

Add utility function to fetch FFmpeg library versions (#2467) · 4ba7dc38

moto authored Jun 27, 2022

Summary:
Follow-up of https://github.com/pytorch/audio/issues/2464. Add utility function to fetch the versions of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/2467

Reviewed By: carolineechen

Differential Revision: D37028006

Pulled By: mthrok

fbshipit-source-id: 72adce1e6b43985760ce55b715b0e59af5244fdb

4ba7dc38

08 Jun, 2022 2 commits

Fix metadata fetch (#2464) · 4d2fa190

moto authored Jun 08, 2022

Summary:
In https://github.com/pytorch/audio/issues/2461, `metadata` field was added to StreamInfo.
However, the value attached to this new field was source-level metadata,
while each stream can have different metadata.

* source level metadata
[AVFormatContext->metadata](https://ffmpeg.org/doxygen/4.1/structAVFormatContext.html#a3019a56080ed2e3297ff25bc2ff88adf)
* stream level metadata
[AVFormatContext->streams[]->metadata](https://ffmpeg.org/doxygen/4.1/structAVStream.html#a50d250a128a3da9ce3d135e84213fb82)

This commit moves source level metadata to dedicated method, `get_metadata`, and
fix the stream-level metadata to report stream metadata.

Pull Request resolved: https://github.com/pytorch/audio/pull/2464

Reviewed By: hwangjeff, xiaohui-zhang

Differential Revision: D36995452

Pulled By: mthrok

fbshipit-source-id: 534be1f7feb07790a0ce8624c336cdb7b65a8697

4d2fa190

Add metadata to source stream info (#2461) · 10d1bd89

moto authored Jun 07, 2022

Summary:
Add metadata, such as ID3 (https://github.com/pytorch/audio/commit/7d98db0567cb60fabcc173949b8c08e3a3487ac2)tag to `StreamReaderSourceAudioStream`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2461

Reviewed By: hwangjeff

Differential Revision: D36985656

Pulled By: mthrok

fbshipit-source-id: e66f9e6e980eb57c378cc643a8979b6b7813dae7

10d1bd89

01 Jun, 2022 1 commit

Tweak StreamReader error messages and tests (#2429) · 5d86054a

moto authored Jun 01, 2022

Summary:
* Update error messages
* Update audio stream tests

Pull Request resolved: https://github.com/pytorch/audio/pull/2429

Reviewed By: carolineechen, nateanl

Differential Revision: D36812769

Pulled By: mthrok

fbshipit-source-id: 7a51d0c4dbae558010d2e59412333e4a7f00d318

5d86054a

29 May, 2022 1 commit

Update source info (#2418) · bb77cbeb

moto authored May 28, 2022

Summary:
Add num_frames and bits_per_sample to match with the current
`torchaudio.info` capability.

Pull Request resolved: https://github.com/pytorch/audio/pull/2418

Reviewed By: carolineechen

Differential Revision: D36749077

Pulled By: mthrok

fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713

bb77cbeb

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d