Commits · fcf38946e423a306da5428a498ac98ecb7a0d757 · OpenDAS / Torchaudio

26 Oct, 2023 1 commit

Update StreamReader/Writer name · fcf38946

moto-meta authored Oct 26, 2023

Differential Revision: D50696105

Pull Request resolved: https://github.com/pytorch/audio/pull/3682

fcf38946

25 Oct, 2023 1 commit

Update library/extension name · 478a852f

moto-meta authored Oct 25, 2023

Differential Revision: D50633306

Pull Request resolved: https://github.com/pytorch/audio/pull/3675

478a852f

24 Oct, 2023 1 commit

Change namespace to torio · a78ba389

moto-meta authored Oct 24, 2023

Differential Revision: D50506299

Pull Request resolved: https://github.com/pytorch/audio/pull/3669

a78ba389

11 Oct, 2023 1 commit

Move libtorchaudio_ffmpeg to dedicated directory · 2836a23d

moto-meta authored Oct 11, 2023

Differential Revision: D50082877

Pull Request resolved: https://github.com/pytorch/audio/pull/3646

2836a23d

09 Oct, 2023 2 commits

Add bytes support to StreamReader (#3642) · 2994ce2e
moto authored Oct 09, 2023
```
Addresses https://github.com/pytorch/audio/issues/3640
```
2994ce2e

Migrate to src-layout · ec13a815

moto-meta authored Oct 09, 2023

Differential Revision: D49965263

Pull Request resolved: https://github.com/pytorch/audio/pull/3639

ec13a815

20 Aug, 2023 1 commit

Fix I/O test (#3568) · 0688863c

moto authored Aug 20, 2023

Summary:
Turned out FFmpeg 5 installed via conda reports video frame rate -1. FFmpeg 4 and 6 are fine. This is either a regression in FFmpeg or in the underlying decoding library.

Make the reference value adoptive.

Pull Request resolved: https://github.com/pytorch/audio/pull/3568

Reviewed By: huangruizhe

Differential Revision: D48499621

Pulled By: mthrok

fbshipit-source-id: fb64187bcf0dc57b753cb6c05f04d436238f5c51

0688863c

12 Jul, 2023 1 commit

Support multiple FFmpeg versions (#3464) · 786066b4

moto authored Jul 11, 2023

Summary:
This commit introduces support for multiple FFmpeg versions for OSS binary distributions.

Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.

The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
The order of preference is 6, 5, then 4.

To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
They are LGPL and downloaded from S3 at build time, instead of building every time.

The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
so that it will only support one specific version of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3464

Differential Revision: D47300223

Pulled By: mthrok

fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04

786066b4

05 Jul, 2023 1 commit

Revert "[audio][PR] Add option to dlopen FFmpeg libraries (#3402)" (#3456) · ca66a1d3

moto authored Jul 05, 2023

Summary:
This reverts commit b7d3e89a.

We will use pre-built binaries instead of dlopen.

Pull Request resolved: https://github.com/pytorch/audio/pull/3456

Differential Revision: D47239681

Pulled By: mthrok

fbshipit-source-id: 0446a62410d914081184fc20c386afa00b1e41b6

ca66a1d3

03 Jun, 2023 1 commit

[audio][PR] Add option to dlopen FFmpeg libraries (#3402) · b7d3e89a

Moto Hira authored Jun 02, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3402

This is a second attempt of https://github.com/pytorch/audio/pull/3353.

The basic logic to enable dlopen for FFmpeg libraries are same.
It uses `at::DynamicLibrary`, which allows to compile torchaudio without
linking FFmpeg libraries.

This time, the option to enable this feature DLOPEN_FFMPEG has been added,
so that users have a way to disable this feature and keep using build-time
linking.

Please refer to stub.h for more technical detail.

Differential Revision: D46403783

fbshipit-source-id: ca3db57ff6bdc50c8c225d22f12f3e76c6dc3f16

b7d3e89a

02 Jun, 2023 1 commit

Revert D46059199: [audio][PR] Use dlopen for FFmpeg · ab7a39f7

Moto Hira authored Jun 02, 2023

Differential Revision:
D46059199

Original commit changeset: 4493a5fd8a4c

Original Phabricator Diff: D46059199

fbshipit-source-id: 71cde3f8cd870d1ad9114e3e87cdd1ba564441c0

ab7a39f7

01 Jun, 2023 1 commit

Use dlopen for FFmpeg (#3353) · b14ced1a

moto authored Jun 01, 2023

Summary:
This commit changes the way FFmpeg extension is built and used.
Instead of linking (LGPL) FFmpeg libraries to torchaudio at build time,
It uses dlopen to search and link them at run time.

For dlopen-ing, we use PyTorch's `at::DynamicLibrary` class, which provides
portable wrapper.

Pull Request resolved: https://github.com/pytorch/audio/pull/3353

Differential Revision: D46059199

Pulled By: mthrok

fbshipit-source-id: 4493a5fd8a4c802178d20276522f5334d637307d

b14ced1a

09 May, 2023 1 commit

Refactor StreamReader/Writer PyBinding (#3296) · 8d7268f1

Moto Hira authored May 09, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3296

Reviewed By: hwangjeff

Differential Revision: D45503774

fbshipit-source-id: 806c22bd0f54fd0cea43d61ef3dbedd67ffeb012

8d7268f1

03 Apr, 2023 1 commit

Migrate the binding of FFmpeg utils to PyBind11 (#3228) · 61c31bc0

moto authored Apr 03, 2023

Summary:
Utilities functions are only available to Python, so no need to use TorchBind for them.
This should allow us to remove link-whole flag when linking `libtorchaudio_ffmpeg` part.

Pull Request resolved: https://github.com/pytorch/audio/pull/3228

Reviewed By: nateanl

Differential Revision: D44639560

Pulled By: mthrok

fbshipit-source-id: 5116073ee8c5ab572c63ad123942c4826bfe1100

61c31bc0

31 Mar, 2023 1 commit

Add qscale to CodecConfig option (#3224) · 493b5018

moto authored Mar 30, 2023

Summary:
This commit adds the equivalent of `qscale` option in FFmpeg to StreamWriter.CodecConfig.
`qscale` enables variable bit rate.

The following figure illustrates the difference between currently available configs.
From top to bottom; original, `compression_level=1`, `compression_level=9`, `bit_rate=192k`, `bit_rate=8k`, `qscale=9`, `qscale=1`.
![Figure_1](https://user-images.githubusercontent.com/855818/228990681-368bf84f-00a7-4248-80ac-6ee728da8f1a.png)

Pull Request resolved: https://github.com/pytorch/audio/pull/3224

Reviewed By: hwangjeff

Differential Revision: D44563633

Pulled By: mthrok

fbshipit-source-id: ff74cd803b5abf1222f087e3e46ba7d81a35f672

493b5018

27 Mar, 2023 1 commit

Revise encoder config arg and docstrings (#3203) · b1de9f1a

hwangjeff authored Mar 27, 2023

Summary:
For `StreamWriter`,
* Renames arg `config` to codec_config`.
* Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`.
* Adds docstrings for arg codec_config`.
* Updates `chunk` to `frames` in `write_*_chunk` methods.

Pull Request resolved: https://github.com/pytorch/audio/pull/3203

Reviewed By: mthrok

Differential Revision: D44350153

Pulled By: hwangjeff

fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343

b1de9f1a

17 Mar, 2023 2 commits

Cache HW device context (#3178) · 0c8c138c

moto authored Mar 17, 2023

Summary:
TODO: add cache release

Pull Request resolved: https://github.com/pytorch/audio/pull/3178

Reviewed By: hwangjeff

Differential Revision: D44136275

Pulled By: mthrok

fbshipit-source-id: 4eaf646fe17a469e8bbbdf43441d5532f9f8461d

0c8c138c

Add EncodingConfig (#3179) · 9bb35070

moto authored Mar 16, 2023

Summary:
Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level.

Pull Request resolved: https://github.com/pytorch/audio/pull/3179

Pull Request resolved: https://github.com/pytorch/audio/pull/3164

Reviewed By: mthrok

Differential Revision: D43861413

Pulled By: hwangjeff

fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161

9bb35070

08 Mar, 2023 1 commit

Include format information after filter (#3155) · 146195d8

moto authored Mar 08, 2023

Summary:
This commit adds fields to OutputStream, which shows the result
of fitlers, such as width and height after filtering.

Before

```
OutputStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
```

After

```
OutputVideoStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
    media_type='video',
    format='gray',
    width=320,
    height=320,
    frame_rate=3.0)
```

Pull Request resolved: https://github.com/pytorch/audio/pull/3155

Reviewed By: nateanl

Differential Revision: D43882399

Pulled By: mthrok

fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d

146195d8

24 Feb, 2023 2 commits

Cleanup ffmpeg bidings (#3095) · b46628ba

moto authored Feb 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3095

Reviewed By: nateanl

Differential Revision: D43544998

Pulled By: mthrok

fbshipit-source-id: 4359cdbbdbee53084016a84129cb3d65900b0457

b46628ba

Bind StreamReader/Writer with PyBind11 (#3091) · b012b452

moto authored Feb 24, 2023

Summary:
This commit is kind of clean up and preparation for future
development.

We plan to pass around more complicated objects among
StreamReader and StreamWriter, and TorchBind is not expressive enough
for defining intermediate object, so we use PyBind11 for binding
StreamWriter.

Pull Request resolved: https://github.com/pytorch/audio/pull/3091

Reviewed By: xiaohui-zhang

Differential Revision: D43515714

Pulled By: mthrok

fbshipit-source-id: 9097bb104bbf8c1536a5fab6f87447c08b10a7f2

b012b452

23 Feb, 2023 1 commit

Replace c10::Dict with std::map in StreamReader/Writer (#3092) · c3310018

moto authored Feb 23, 2023

Summary:
This commit is kind of clean up and preparation for future development.

We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we want to use PyBind11 for binding StreamReader/Writer.

PyBind11 converts Python dict into std::map, while TorchBind converts it into c10::Dict. Because of this descrepancy, conversion from c10::Dict to std::map have to happen in multiple places, and this makes the binding code thicker as it requires to wrapper methods.

Using std::map reduces the number of wrapper methods / conversions, because the same method can be bound for file-like object and the others.

Pull Request resolved: https://github.com/pytorch/audio/pull/3092

Reviewed By: nateanl

Differential Revision: D43524808

Pulled By: mthrok

fbshipit-source-id: f7467c66ccd37dbf4abc337bbb18ffaac21a0058

c3310018

27 Jan, 2023 1 commit

Replace torchaudio::ffmpeg with torchaudio::io (#3013) · 51aae466

Moto Hira authored Jan 27, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3013

Namespace clean up before publishing the torchaudio C++ API as prototype.

Reviewed By: hwangjeff

Differential Revision: D42699903

fbshipit-source-id: 8a9eed0390dfa4a152124b42f2b927dbdd3e23d2

51aae466

26 Jan, 2023 1 commit

Abstract away AVFormatContext from StreamReader/Writer constructor (#3007) · 7ea69e61

Moto Hira authored Jan 26, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3007

Simplify the construction of StreamReader/Writer in C++.

Currently these classes require client code to build AVFormatContext
manually. This is tedious and not user freindly.

Some client code actually uses the same helper function that
TorchAudio codebase uses.

This commit moves the helper logic inside of the constructor of
StreamReader/Writer, so that the signatures of these constructors
are easy to use and similar to Python interface.

Reviewed By: xiaohui-zhang

Differential Revision: D42662520

fbshipit-source-id: d95e5236810c48d7d9bd2d89c05d4f60a44b3ba1

7ea69e61

04 Jan, 2023 1 commit

Make fill_buffer a public API and move the impl to C++ (#2954) · bf085b1f

moto authored Jan 04, 2023

Summary:
Currently, when iterating media data with StreamReader, using the for-loop is the only way with public API.

This does not support usecases like "Fetch one chunk after seek" well.

```python
s = StreamReader
s.add_audio_stream(...)
s.seek(10)
chunk = None
for chunk, in s.stream():
    break
```

This commit make the `fill_buffer` used in iterative method public API so that one acn do

```python
s.seek(10)
s.fill_buffer()
chunk, = s.pop_chunks()
```

 ---

Also this commit moves the implementation to C++ so that it reduces the number of FFI boundary crossing.
This improves the performance when the iteration is longer.

AVI (generated with `ffmpeg -hide_banner -f lavfi -t ${duration} -i testsrc "${file}.avi"`)

| Video Duration [sec] | Original [msec] | Fill Buffer C++ | One Go  (reference) |
|----------------------|----------|-----------------|--------|
|                    1 |       18 |            18.4 |   16.6 |
|                    5 |       44 |            42.6 |   35.1 |
|                   10 |     75.3 |            74.4 |   60.9 |
|                   30 |      200 |             195 |    158 |
|                   60 |      423 |             382 |    343 |

MP4 (generated with `ffmpeg -hide_banner -f lavfi -t ${duration} -i testsrc "${file}.mp4"`)

| Video Duration [sec] | Original [msec] | Fill Buffer C++ | One Go |
|----------------------|-----------------|-----------------|--------|
|                    1 |            18.7 |            18.1 |   10.3 |
|                    5 |            42.2 |            40.6 |   25.2 |
|                   10 |            73.9 |            71.8 |   43.6 |
|                   30 |             202 |             194 |    116 |
|                   60 |             396 |             386 |    227 |
* Original (Python implementation)

```python
r = StreamReader(src)
r.add_video_stream(1, decoder_option={"threads": "1"})
for chunk, in r.stream():
    pass
```

* This (C++)

```python
r = StreamReader(src)
r.add_video_stream(1, decoder_option={"threads": "1"})
for chunk, in r.stream():
    pass
```

* Using `process_all_packets` (process all in one go)

```python
r = StreamReader(src)
r.add_video_stream(1, decoder_option={"threads": "1"})
r.process_all_packets()
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2954

Reviewed By: carolineechen

Differential Revision: D42349446

Pulled By: mthrok

fbshipit-source-id: 9e4e37923e46299c3f43f4ad17a2a2b938b2b197

bf085b1f

01 Sep, 2022 1 commit

Add file-like object support to StreamWriter (#2648) · 28da8b84

moto authored Aug 31, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

28da8b84

12 Jul, 2022 1 commit

Clean up the interface around dictionary (#2533) · e2641452

moto authored Jul 11, 2022

Summary:
Python dictionary is bound to different types in TorchBind and PyBind.
StreamReader has methods that receive and return dictionary.

This commit cleans up the treatment of dictionary and consolidate
helper functions.

* The core implementation and TorchBind all uses `c10::Dict`.
* PyBind version uses `std::map` and converts it to `c10::Dict`.
* The helper functions to convert `std::map` <-> `c10::Dict` are consolidated in pybind directory.
* The wrapper methods are implemented in `pybind` dir.

Pull Request resolved: https://github.com/pytorch/audio/pull/2533

Reviewed By: hwangjeff

Differential Revision: D37731866

Pulled By: mthrok

fbshipit-source-id: 5a5cf1372668f7d3aacc0bb461bc69fa07212f3f

e2641452

08 Jun, 2022 2 commits

Fix metadata fetch (#2464) · 4d2fa190

moto authored Jun 08, 2022

Summary:
In https://github.com/pytorch/audio/issues/2461, `metadata` field was added to StreamInfo.
However, the value attached to this new field was source-level metadata,
while each stream can have different metadata.

* source level metadata
[AVFormatContext->metadata](https://ffmpeg.org/doxygen/4.1/structAVFormatContext.html#a3019a56080ed2e3297ff25bc2ff88adf)
* stream level metadata
[AVFormatContext->streams[]->metadata](https://ffmpeg.org/doxygen/4.1/structAVStream.html#a50d250a128a3da9ce3d135e84213fb82)

This commit moves source level metadata to dedicated method, `get_metadata`, and
fix the stream-level metadata to report stream metadata.

Pull Request resolved: https://github.com/pytorch/audio/pull/2464

Reviewed By: hwangjeff, xiaohui-zhang

Differential Revision: D36995452

Pulled By: mthrok

fbshipit-source-id: 534be1f7feb07790a0ce8624c336cdb7b65a8697

4d2fa190

Add metadata to source stream info (#2461) · 10d1bd89

moto authored Jun 07, 2022

Summary:
Add metadata, such as ID3 (https://github.com/pytorch/audio/commit/7d98db0567cb60fabcc173949b8c08e3a3487ac2)tag to `StreamReaderSourceAudioStream`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2461

Reviewed By: hwangjeff

Differential Revision: D36985656

Pulled By: mthrok

fbshipit-source-id: e66f9e6e980eb57c378cc643a8979b6b7813dae7

10d1bd89

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d