Commit 667c6a9e authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Update `torchaudio` doc and tutorial (#3285)

Summary:
This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241

Making FFmpeg backend default causes some issues on tutorials, so this commit disable it.
The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change.

Since it is necessary to mention the changes related to migration in the IO tutorial,
I also update the IO documentation to include migration work so that  it's easy to redirect.

Pull Request resolved: https://github.com/pytorch/audio/pull/3285

Reviewed By: nateanl

Differential Revision: D45671237

Pulled By: mthrok

fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133
parent 5a85a461
torchaudio torchaudio
========== ==========
.. note:: I/O
Release 2.1 will revise ``torchaudio.info``, ``torchaudio.load``, and ``torchaudio.save`` to allow for backend selection via function parameter rather than ``torchaudio.set_audio_backend``, with FFmpeg being the default backend. ---
The new API can be enabled in the current release by setting environment variable ``TORCHAUDIO_USE_BACKEND_DISPATCHER=1``.
See :ref:`future_api` for details on the new API.
``torchaudio`` top-level module provides the following functions that make
it easy to handle audio data.
- :py:func:`torchaudio.info`
- :py:func:`torchaudio.load`
- :py:func:`torchaudio.save`
Under the hood, these functions are implemented using various decoding/encoding
libraries. There are currently three variants.
- ``FFmpeg``
- ``libsox``
- ``SoundFile``
``libsox`` backend is the first backend implemented in TorchAudio, and it
works on Linux and macOS.
``SoundFile`` backend was added to extend audio I/O support to Windows.
It also works on Linux and macOS.
``FFmpeg`` backend is the latest addition and it supports wide range of audio, video
formats and protocols.
It works on Linux, macOS and Windows.
.. _dispatcher_migration:
Introduction of Dispatcher
~~~~~~~~~~~~~~~~~~~~~~~~~~
Conventionally, torchaudio has had its IO backend set globally at runtime based on availability.
However, this approach does not allow applications to use different
backends, and it is not well-suited for large codebases.
For these reasons, we are introducing a dispatcher, a new mechanism to allow users to
choose a backend for each function call, and migrating the I/O functions.
This incurs multiple changes, some of which involve backward-compatibility-breaking changes, and require
users to change their function call.
The (planned) changes are as follows. For up-to-date information,
please refer to https://github.com/pytorch/audio/issues/2950
* In 2.0, audio I/O backend dispatcher was introduced.
Users can opt-in to using dispatcher by setting the environment variable
``TORCHAUDIO_USE_BACKEND_DISPATCHER=1``
* In 2.1, the disptcher becomes the default mechanism for I/O.
Those who need to keep using the previous mechanism (global backend) can do
so by setting ``TORCHAUDIO_USE_BACKEND_DISPATCHER=0``.
Furthermore, we are removing file-like object support from libsox backend, as this
is better supported by FFmpeg backend and makes the build process simpler.
Therefore, beginning with 2.1, FFmpeg and Soundfile are the sole backends that support file-like objects.
The changes in 2.1 will mark the :ref:`backend utilities <backend_utils>` deprecated.
Current API Current API
----------- -----------
...@@ -18,35 +67,62 @@ Audio I/O functions are implemented in :ref:`torchaudio.backend<backend>` module ...@@ -18,35 +67,62 @@ Audio I/O functions are implemented in :ref:`torchaudio.backend<backend>` module
Please refer to :ref:`backend` for the detail, and the :doc:`Audio I/O tutorial <../tutorials/audio_io_tutorial>` for the usage. Please refer to :ref:`backend` for the detail, and the :doc:`Audio I/O tutorial <../tutorials/audio_io_tutorial>` for the usage.
torchaudio.info
~~~~~~~~~~~~~~~
.. function:: torchaudio.info(filepath: str, ...) .. function:: torchaudio.info(filepath: str, ...)
Fetch meta data of an audio file. Refer to :ref:`backend` for the detail. Fetch meta data of an audio file. Refer to :ref:`backend` for the detail.
torchaudio.load
~~~~~~~~~~~~~~~
.. function:: torchaudio.load(filepath: str, ...) .. function:: torchaudio.load(filepath: str, ...)
Load audio file into torch.Tensor object. Refer to :ref:`backend` for the detail. Load audio file into torch.Tensor object. Refer to :ref:`backend` for the detail.
torchaudio.save
~~~~~~~~~~~~~~~
.. function:: torchaudio.save(filepath: str, src: torch.Tensor, sample_rate: int, ...) .. function:: torchaudio.save(filepath: str, src: torch.Tensor, sample_rate: int, ...)
Save torch.Tensor object into an audio format. Refer to :ref:`backend` for the detail. Save torch.Tensor object into an audio format. Refer to :ref:`backend` for the detail.
.. currentmodule:: torchaudio .. currentmodule:: torchaudio
.. _backend_utils:
Backend Utilities Backend Utilities
~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
The following functions are effective only when backend dispatcher is disabled.
They are effectively deprecated.
.. autofunction:: list_audio_backends .. autofunction:: list_audio_backends
.. autofunction:: get_audio_backend .. autofunction:: get_audio_backend
.. autofunction:: set_audio_backend .. autofunction:: set_audio_backend
.. _future_api: .. _future_api:
Future API Future API
---------- ----------
Dispatcher
~~~~~~~~~~
The dispatcher tries to use the I/O backend in the following order of precedence
1. FFmpeg
2. libsox
3. soundfile
One can pass ``backend`` argument to I/O functions to override this.
See :ref:`future_api` for details on the new API.
In the next release, each of ``torchaudio.info``, ``torchaudio.load``, and ``torchaudio.save`` will allow for selecting a backend to use via parameter ``backend``. In the next release, each of ``torchaudio.info``, ``torchaudio.load``, and ``torchaudio.save`` will allow for selecting a backend to use via parameter ``backend``.
The functions will support using any of FFmpeg, SoX, and SoundFile, provided that the corresponding library is installed. The functions will support using any of FFmpeg, SoX, and SoundFile, provided that the corresponding library is installed.
If a backend is not explicitly chosen, the functions will select a backend to use given order of precedence (FFmpeg, SoX, SoundFile) and library availability. If a backend is not explicitly chosen, the functions will select a backend to use given order of precedence (FFmpeg, SoX, SoundFile) and library availability.
...@@ -57,11 +133,20 @@ These functions can be enabled in the current release by setting environment var ...@@ -57,11 +133,20 @@ These functions can be enabled in the current release by setting environment var
.. currentmodule:: torchaudio._backend .. currentmodule:: torchaudio._backend
torchaudio.info
~~~~~~~~~~~~~~~
.. autofunction:: info .. autofunction:: info
:noindex: :noindex:
torchaudio.load
~~~~~~~~~~~~~~~
.. autofunction:: load .. autofunction:: load
:noindex: :noindex:
torchaudio.save
~~~~~~~~~~~~~~~
.. autofunction:: save .. autofunction:: save
:noindex: :noindex:
...@@ -5,8 +5,15 @@ Audio I/O ...@@ -5,8 +5,15 @@ Audio I/O
**Author**: `Moto Hira <moto@meta.com>`__ **Author**: `Moto Hira <moto@meta.com>`__
This tutorial shows how to use TorchAudio's basic I/O API to load audio files This tutorial shows how to use TorchAudio's basic I/O API to inspect audio data,
into PyTorch's Tensor object, and save Tensor objects to audio files. load them into PyTorch Tensors and save PyTorch Tensors.
.. warning::
There are multiple changes planned/made to audio I/O in recent releases.
For the detail of these changes please refer to
:ref:`Introduction of Dispatcher <dispatcher_migration>`.
""" """
import torch import torch
...@@ -47,6 +54,15 @@ SAMPLE_WAV = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch12753 ...@@ -47,6 +54,15 @@ SAMPLE_WAV = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch12753
SAMPLE_WAV_8000 = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042-8000hz.wav") SAMPLE_WAV_8000 = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042-8000hz.wav")
def _hide_seek(obj):
class _wrapper:
def __init__(self, obj):
self.obj = obj
def read(self, n):
return self.obj.read(n)
return _wrapper(obj)
###################################################################### ######################################################################
# Querying audio metadata # Querying audio metadata
...@@ -113,7 +129,7 @@ print(metadata) ...@@ -113,7 +129,7 @@ print(metadata)
url = "https://download.pytorch.org/torchaudio/tutorial-assets/steam-train-whistle-daniel_simon.wav" url = "https://download.pytorch.org/torchaudio/tutorial-assets/steam-train-whistle-daniel_simon.wav"
with requests.get(url, stream=True) as response: with requests.get(url, stream=True) as response:
metadata = torchaudio.info(response.raw) metadata = torchaudio.info(_hide_seek(response.raw))
print(metadata) print(metadata)
###################################################################### ######################################################################
...@@ -215,7 +231,7 @@ Audio(waveform.numpy()[0], rate=sample_rate) ...@@ -215,7 +231,7 @@ Audio(waveform.numpy()[0], rate=sample_rate)
# Load audio data as HTTP request # Load audio data as HTTP request
url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
with requests.get(url, stream=True) as response: with requests.get(url, stream=True) as response:
waveform, sample_rate = torchaudio.load(response.raw) waveform, sample_rate = torchaudio.load(_hide_seek(response.raw))
plot_specgram(waveform, sample_rate, title="HTTP datasource") plot_specgram(waveform, sample_rate, title="HTTP datasource")
###################################################################### ######################################################################
...@@ -237,7 +253,7 @@ bucket = "pytorch-tutorial-assets" ...@@ -237,7 +253,7 @@ bucket = "pytorch-tutorial-assets"
key = "VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" key = "VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
client = boto3.client("s3", config=Config(signature_version=UNSIGNED)) client = boto3.client("s3", config=Config(signature_version=UNSIGNED))
response = client.get_object(Bucket=bucket, Key=key) response = client.get_object(Bucket=bucket, Key=key)
waveform, sample_rate = torchaudio.load(response["Body"]) waveform, sample_rate = torchaudio.load(_hide_seek(response["Body"]))
plot_specgram(waveform, sample_rate, title="From S3") plot_specgram(waveform, sample_rate, title="From S3")
...@@ -271,13 +287,14 @@ frame_offset, num_frames = 16000, 16000 # Fetch and decode the 1 - 2 seconds ...@@ -271,13 +287,14 @@ frame_offset, num_frames = 16000, 16000 # Fetch and decode the 1 - 2 seconds
url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav" url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
print("Fetching all the data...") print("Fetching all the data...")
with requests.get(url, stream=True) as response: with requests.get(url, stream=True) as response:
waveform1, sample_rate1 = torchaudio.load(response.raw) waveform1, sample_rate1 = torchaudio.load(_hide_seek(response.raw))
waveform1 = waveform1[:, frame_offset : frame_offset + num_frames] waveform1 = waveform1[:, frame_offset : frame_offset + num_frames]
print(f" - Fetched {response.raw.tell()} bytes") print(f" - Fetched {response.raw.tell()} bytes")
print("Fetching until the requested frames are available...") print("Fetching until the requested frames are available...")
with requests.get(url, stream=True) as response: with requests.get(url, stream=True) as response:
waveform2, sample_rate2 = torchaudio.load(response.raw, frame_offset=frame_offset, num_frames=num_frames) waveform2, sample_rate2 = torchaudio.load(
_hide_seek(response.raw), frame_offset=frame_offset, num_frames=num_frames)
print(f" - Fetched {response.raw.tell()} bytes") print(f" - Fetched {response.raw.tell()} bytes")
print("Checking the resulting waveform ... ", end="") print("Checking the resulting waveform ... ", end="")
...@@ -351,11 +368,11 @@ with tempfile.TemporaryDirectory() as tempdir: ...@@ -351,11 +368,11 @@ with tempfile.TemporaryDirectory() as tempdir:
formats = [ formats = [
"flac", "flac",
"vorbis", # "vorbis",
"sph", # "sph",
"amb", # "amb",
"amr-nb", # "amr-nb",
"gsm", # "gsm",
] ]
###################################################################### ######################################################################
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment