Commits · 2e0dfafa4242d05aea0a3fd38dbca896c1cab119 · OpenDAS / Torchaudio

14 Aug, 2023 1 commit

Move essential backend implementations to _backend (#3549) · 2e0dfafa

moto authored Aug 14, 2023

Summary:
Move the actual I/O implementation to `_backend` submodule so that the existing `backend` submodule contains only what's related to legacy backend utilities.

Pull Request resolved: https://github.com/pytorch/audio/pull/3549

Reviewed By: huangruizhe

Differential Revision: D48253550

Pulled By: mthrok

fbshipit-source-id: c23f1664458c723f63e134c7974b3f7cf17a1e98

2e0dfafa

10 Aug, 2023 1 commit

Refactor _backend module (#3547) · 1e6a8f93

moto authored Aug 10, 2023

Summary:
* Move Backend implementations to separate files

Pull Request resolved: https://github.com/pytorch/audio/pull/3547

Reviewed By: hwangjeff

Differential Revision: D48233538

Pulled By: mthrok

fbshipit-source-id: bcc63fc07a5dfcd48929f0a2fb64bfcb3282eb92

1e6a8f93

29 Jul, 2023 1 commit

Refactor compat (#3518) · 8497ee91

moto authored Jul 29, 2023

Summary:
The I/O functions in _compat module was introduced there so that
everything related to FFmpeg is in torchaudio.io and FFmpeg library
initialization can be carried out in `torchaudio.io.__init__`.

Now that this constraint is removed, (all the initialization happens
at `torchaudio._extension.__init__`) and `_compat` is only used by
FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
for better locality.

Pull Request resolved: https://github.com/pytorch/audio/pull/3518

Reviewed By: huangruizhe

Differential Revision: D47877412

Pulled By: mthrok

fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f

8497ee91

12 Jul, 2023 1 commit

Support multiple FFmpeg versions (#3464) · 786066b4

moto authored Jul 11, 2023

Summary:
This commit introduces support for multiple FFmpeg versions for OSS binary distributions.

Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.

The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
The order of preference is 6, 5, then 4.

To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
They are LGPL and downloaded from S3 at build time, instead of building every time.

The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
so that it will only support one specific version of FFmpeg.

Pull Request resolved: https://github.com/pytorch/audio/pull/3464

Differential Revision: D47300223

Pulled By: mthrok

fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04

786066b4

01 Jun, 2023 1 commit

Refactor arg mapping in ffmpeg save function (#3387) · b99e5f46

moto authored May 31, 2023

Summary:
The arguments of TorchAudio's save function ("format", "bits_per_sample" and "encoding")
are not one-to-one mapping to the arguments of FFmpeg encoding.

For example, to use vorbis codec, FFmpeg expects "ogg" container/extension with "vorbis"
encoder. It does not recognize "vorbis" extension like TorchAudio (libsox) does.

This commit refactors the logic to parse/map the arguments.

As a result it now properly works with vorbis and mp3 extension.

Pull Request resolved: https://github.com/pytorch/audio/pull/3387

Reviewed By: hwangjeff

Differential Revision: D46328787

Pulled By: mthrok

fbshipit-source-id: 36f993952a062bfec58a8b51be6aa86297571f90

b99e5f46

07 Apr, 2023 1 commit

Fix path normalization for StreamWriter-based save operation (#3248) · 9da92cdb

moto authored Apr 07, 2023

Summary:
Follow up of https://github.com/pytorch/audio/issues/3243. Save compat module had different semantics than info and load, which requires different way of performing path normalization.

Pull Request resolved: https://github.com/pytorch/audio/pull/3248

Reviewed By: hwangjeff

Differential Revision: D44774997

Pulled By: mthrok

fbshipit-source-id: 4b967ae3ca6b45850d455b8e95aaa31762c5457e

9da92cdb

04 Apr, 2023 1 commit

[BC-breaking] Make I/O optional arguments kw-only (#3227) · ab40a3a3

moto authored Apr 04, 2023

Summary:
Recently, we added bunch of options to make StreamReader/Writer flexible. As a result, their methods have many number of arguments, and some of them have semantic grouping.

For example, the arguments of ``StreamWriter.add_video_stream`` are roughly grouped as follow;

- Information about input media format
   `frame_rate`, `width`, `height`, `format`
- Information about encoder
   `encoder`, `encoder_option`
- Information about codec configuration
   `codec_config`
- Information about encode media format
   `encoder_format`, `encoder_frame_rate`, `encoder_width`, `encoder_height`
- Information about additional processing
   `filter_desc`
- Hardware acceleration
   `hw_accel`

We do not know what arguments will be added in the future, but when we do,
we want to keep them roughly grouped, by inserting the new argument
somewhere in a middle without breaking backward compatibility.

This commit puts most of them in keyword-only argument, so that we can
rearrange them without breaking backward compatibility.

Pull Request resolved: https://github.com/pytorch/audio/pull/3227

Reviewed By: hwangjeff

Differential Revision: D44681620

Pulled By: mthrok

fbshipit-source-id: b55f6168f4c2f3d0f59731b9bb0db4ae54e5a90f

ab40a3a3

01 Mar, 2023 1 commit

Fix stylecheck in io (#3126) · b0faecb2

Zhaoheng Ni authored Mar 01, 2023

Summary:
`Dict` is not used. Fix styecheck by removing the import of `Dict`.

Pull Request resolved: https://github.com/pytorch/audio/pull/3126

Reviewed By: mthrok

Differential Revision: D43699410

Pulled By: nateanl

fbshipit-source-id: 8d6b5335124903453387c488f96f297d6fe3c819

b0faecb2

24 Feb, 2023 2 commits

Cleanup ffmpeg bidings (#3095) · b46628ba

moto authored Feb 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3095

Reviewed By: nateanl

Differential Revision: D43544998

Pulled By: mthrok

fbshipit-source-id: 4359cdbbdbee53084016a84129cb3d65900b0457

b46628ba

Bind StreamReader/Writer with PyBind11 (#3091) · b012b452

moto authored Feb 24, 2023

Summary:
This commit is kind of clean up and preparation for future
development.

We plan to pass around more complicated objects among
StreamReader and StreamWriter, and TorchBind is not expressive enough
for defining intermediate object, so we use PyBind11 for binding
StreamWriter.

Pull Request resolved: https://github.com/pytorch/audio/pull/3091

Reviewed By: xiaohui-zhang

Differential Revision: D43515714

Pulled By: mthrok

fbshipit-source-id: 9097bb104bbf8c1536a5fab6f87447c08b10a7f2

b012b452

15 Feb, 2023 1 commit

Add FFmpeg compat save function (#3058) · fb932674

Jeff Hwang authored Feb 15, 2023

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/3058

Adds FFmpeg-based save function.

Reviewed By: mthrok

Differential Revision: D43264858

fbshipit-source-id: ae3f89012bc2520f3de11af65348ba8f77f0acff

fb932674

22 Jan, 2023 1 commit

Make StreamReader return PTS (#2975) · 0dd59e0d

moto authored Jan 22, 2023

Summary:
This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.

Example

```python
from torchaudio.io import StreamReader

s = StreamReader(...)
s.add_video_stream(...)
for (video_chunk, ) in s.stream():
    # video_chunk is Torch tensor type but has extra attribute of PTS
    print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
```

For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
of Tensor and metadata, but works like a normal tensor in PyTorch operations.

The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).

It was also suggested to attach metadata directly to Tensor object,
but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
PyTorch cannot be ignored, so we use Tensor subclass implementation.

If any unexpected issue arise from metadata attribute name collision, client code can
fetch the bare Tensor and continue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2975

Reviewed By: hwangjeff

Differential Revision: D42526945

Pulled By: mthrok

fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35

0dd59e0d

21 Dec, 2022 1 commit

Extract libsox integration from libtorchaudio (#2929) · 1706a72f

moto authored Dec 21, 2022

Summary:
This commit makes the following changes to the C++ library organization
- Move sox-related feature implementations from `libtorchaudio` to `libtorchaudio_sox`.
- Remove C++ implementation of `is_sox_available` and `is_ffmpeg_available` as it is now sufficient to check the existence of `libtorchaudio_sox` and `libtorchaudio_ffmpeg` to check the availability. This makes `libtorchaudio_sox` and `libtorchaudio_ffmpeg` independent from `libtorchaudio`.
- Move PyBind11-based bindings (`_torchaudio_sox`, `_torchaudio_ffmpeg`) into `torchaudio.lib` so that the built library structure is less cluttered.

Background:
Originally, when the `libsox` was the only C++ extension and `libtorchaudio` was supposed to contain all the C++ code.
The things are different now. We have a bunch of C++ extensions and we need to make the code/build structure more modular.

The new `libtorchaudio_sox` contains the implementations and `_torchaudio_sox` contains the PyBin11-based bindings.

Pull Request resolved: https://github.com/pytorch/audio/pull/2929

Reviewed By: hwangjeff

Differential Revision: D42159594

Pulled By: mthrok

fbshipit-source-id: 1a0fbca9e4143137f6363fc001b2378ce6029aa7

1706a72f

10 Dec, 2022 1 commit

Fix type of arguments in torchaudio.io classes (#2913) · 54a664b9

Zhaoheng Ni authored Dec 10, 2022

Summary:
The `src` or `dst` argument can be `str` or `file-like object`. Setting it to `str` in type annotation will confuse users that it only accepts `str` type.

Pull Request resolved: https://github.com/pytorch/audio/pull/2913

Reviewed By: mthrok

Differential Revision: D41896668

Pulled By: nateanl

fbshipit-source-id: 1446a9f84186a0376cdbe4c61817fae4d5eaaab4

54a664b9

02 Nov, 2022 1 commit

Make buffer size configurable in ffmpeg file object operations and set size in backend (#2810) · 87eca36d

hwangjeff authored Nov 01, 2022

Summary:
Partly addresses https://github.com/pytorch/audio/issues/2686 and https://github.com/pytorch/audio/issues/2356.

Currently, when the buffer used for file object decoding is insufficiently large, `torchaudio.load` returns a shorter waveform than expected. To deal with this, the user is expected to increase the buffer size via `torchaudio.utils.sox_utils.get_buffer_size`, but this does not influence the buffer used by the FFMpeg fallback. To fix this, this PR introduces changes that apply the buffer size set for the SoX backend to FFMpeg.

As a follow-up, we should see whether it's possible to programmatically detect that the buffer's too small and flag it to the user.

Pull Request resolved: https://github.com/pytorch/audio/pull/2810

Reviewed By: mthrok

Differential Revision: D40906978

Pulled By: hwangjeff

fbshipit-source-id: 256fe1da8b21610b05bea9a0e043f484f9ea2e76

87eca36d

07 Oct, 2022 1 commit

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740) · 7729723b

hwangjeff authored Oct 07, 2022

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

7729723b

08 Jun, 2022 1 commit

Add metadata to source stream info (#2461) · 10d1bd89

moto authored Jun 07, 2022

Summary:
Add metadata, such as ID3 (https://github.com/pytorch/audio/commit/7d98db0567cb60fabcc173949b8c08e3a3487ac2)tag to `StreamReaderSourceAudioStream`.

Pull Request resolved: https://github.com/pytorch/audio/pull/2461

Reviewed By: hwangjeff

Differential Revision: D36985656

Pulled By: mthrok

fbshipit-source-id: e66f9e6e980eb57c378cc643a8979b6b7813dae7

10d1bd89

02 Jun, 2022 1 commit

Use FFmpeg-based I/O as fallback in sox_io backend (#2419) · 19c60a08

moto authored Jun 01, 2022

Summary:
This commit add fallback mechanism to `info` and `load` functions of sox_io backend.
If torchaudio is compiled to use FFmpeg, and runtime dependencies are properly loaded,
in case `info` and `load` fail, it fallback to FFmpeg-based implementation.

BC-breaking changes:
 - FFmpeg does not report the number of frames for MP3, this is because MP3 does not store the information of the number of frames. It can be estimated from the audio duration and sample rate, but it might be inaccurate, so we keep it 0.

Depends on
- https://github.com/pytorch/audio/issues/2416
- https://github.com/pytorch/audio/issues/2417
- https://github.com/pytorch/audio/issues/2418
- https://github.com/pytorch/audio/issues/2423
- https://github.com/pytorch/audio/issues/2427

Pull Request resolved: https://github.com/pytorch/audio/pull/2419

Reviewed By: carolineechen

Differential Revision: D36740306

Pulled By: mthrok

fbshipit-source-id: 9e2ad095b8b39e41404970de0d8d9b5aaa856c97

19c60a08