Update `torchaudio` doc and tutorial (#3285)

Summary: This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241 Making FFmpeg backend default causes some issues on tutorials, so this commit disable it. The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change. Since it is necessary to mention the changes related to migration in the IO tutorial, I also update the IO documentation to include migration work so that it's easy to redirect. Pull Request resolved: https://github.com/pytorch/audio/pull/3285 Reviewed By: nateanl Differential Revision: D45671237 Pulled By: mthrok fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133

Update `torchaudio` doc and tutorial (#3285)
Summary: This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241 Making FFmpeg backend default causes some issues on tutorials, so this commit disable it. The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change. Since it is necessary to mention the changes related to migration in the IO tutorial, I also update the IO documentation to include migration work so that it's easy to redirect. Pull Request resolved: https://github.com/pytorch/audio/pull/3285 Reviewed By: nateanl Differential Revision: D45671237 Pulled By: mthrok fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133
667c6a9e · moto · Facebook GitHub Bot · 5a85a461 · 667c6a9e · 667c6a9e
Commit 667c6a9e authored May 09, 2023 by moto Committed by Facebook GitHub Bot May 09, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 119 additions and 17 deletions

docs/source/torchaudio.rst docs/source/torchaudio.rst +90 -5

examples/tutorials/audio_io_tutorial.py examples/tutorials/audio_io_tutorial.py +29 -12

No files found.
--- a/docs/source/torchaudio.rst
+++ b/docs/source/torchaudio.rst
 torchaudio
 ==========
-.. note::
+I/O
-   Release 2.1 will revise ``torchaudio.info``, ``torchaudio.load``, and ``torchaudio.save`` to allow for backend selection via function parameter rather than ``torchaudio.set_audio_backend``, with FFmpeg being the default backend.
+---
-   The new API can be enabled in the current release by setting environment variable ``TORCHAUDIO_USE_BACKEND_DISPATCHER=1``.
-   See :ref:`future_api` for details on the new API.
+``torchaudio`` top-level module provides the following functions that make
+it easy to handle audio data.
+- :py:func:`torchaudio.info`
+- :py:func:`torchaudio.load`
+- :py:func:`torchaudio.save`
+Under the hood, these functions are implemented using various decoding/encoding
+libraries. There are currently three variants.
+- ``FFmpeg``
+- ``libsox``
+- ``SoundFile``
+``libsox`` backend is the first backend implemented in TorchAudio, and it
+works on Linux and macOS.
+``SoundFile`` backend was added to extend audio I/O support to Windows.
+It also works on Linux and macOS.
+``FFmpeg`` backend is the latest addition and it supports wide range of audio, video
+formats and protocols.
+It works on Linux, macOS and Windows.
+.. _dispatcher_migration:
+Introduction of Dispatcher
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Conventionally, torchaudio has had its IO backend set globally at runtime based on availability.
+However, this approach does not allow applications to use different
+backends, and it is not well-suited for large codebases.
+For these reasons, we are introducing a dispatcher, a new mechanism to allow users to
+choose a backend for each function call, and migrating the I/O functions.
+This incurs multiple changes, some of which involve backward-compatibility-breaking changes, and require
+users to change their function call.
+The (planned) changes are as follows. For up-to-date information,
+please refer to https://github.com/pytorch/audio/issues/2950
+* In 2.0, audio I/O backend dispatcher was introduced.
+  Users can opt-in to using dispatcher by setting the environment variable
+  ``TORCHAUDIO_USE_BACKEND_DISPATCHER=1``
+* In 2.1, the disptcher becomes the default mechanism for I/O.
+  Those who need to keep using the previous mechanism (global backend) can do
+  so by setting ``TORCHAUDIO_USE_BACKEND_DISPATCHER=0``.
+Furthermore, we are removing file-like object support from libsox backend, as this
+is better supported by FFmpeg backend and makes the build process simpler.
+Therefore, beginning with 2.1, FFmpeg and Soundfile are the sole backends that support file-like objects.
+The changes in 2.1 will mark the :ref:`backend utilities <backend_utils>` deprecated.
 Current API
 -----------
@@ -18,35 +67,62 @@ Audio I/O functions are implemented in :ref:`torchaudio.backend<backend>` module
 Please refer to :ref:`backend` for the detail, and the :doc:`Audio I/O tutorial <../tutorials/audio_io_tutorial>` for the usage.
+torchaudio.info
+~~~~~~~~~~~~~~~
 .. function:: torchaudio.info(filepath: str, ...)
   Fetch meta data of an audio file. Refer to :ref:`backend` for the detail.
+torchaudio.load
+~~~~~~~~~~~~~~~
 .. function:: torchaudio.load(filepath: str, ...)
   Load audio file into torch.Tensor object. Refer to :ref:`backend` for the detail.
+torchaudio.save
+~~~~~~~~~~~~~~~
 .. function:: torchaudio.save(filepath: str, src: torch.Tensor, sample_rate: int, ...)
   Save torch.Tensor object into an audio format. Refer to :ref:`backend` for the detail.
 .. currentmodule:: torchaudio
+.. _backend_utils:
 Backend Utilities
 ~~~~~~~~~~~~~~~~~
+The following functions are effective only when backend dispatcher is disabled.
+They are effectively deprecated.
 .. autofunction:: list_audio_backends
 .. autofunction:: get_audio_backend
 .. autofunction:: set_audio_backend
 .. _future_api:
 Future API
 ----------
+Dispatcher
+~~~~~~~~~~
+The dispatcher tries to use the I/O backend in the following order of precedence
+1. FFmpeg
+2. libsox
+3. soundfile
+One can pass ``backend`` argument to I/O functions to override this.
+See :ref:`future_api` for details on the new API.
 In the next release, each of ``torchaudio.info``, ``torchaudio.load``, and ``torchaudio.save`` will allow for selecting a backend to use via parameter ``backend``.
 The functions will support using any of FFmpeg, SoX, and SoundFile, provided that the corresponding library is installed.
 If a backend is not explicitly chosen, the functions will select a backend to use given order of precedence (FFmpeg, SoX, SoundFile) and library availability.
@@ -57,11 +133,20 @@ These functions can be enabled in the current release by setting environment var
 .. currentmodule:: torchaudio._backend
+torchaudio.info
+~~~~~~~~~~~~~~~
 .. autofunction:: info
   :noindex:
+torchaudio.load
+~~~~~~~~~~~~~~~
 .. autofunction:: load
   :noindex:
+torchaudio.save
+~~~~~~~~~~~~~~~
 .. autofunction:: save
   :noindex:
--- a/examples/tutorials/audio_io_tutorial.py
+++ b/examples/tutorials/audio_io_tutorial.py
@@ -5,8 +5,15 @@ Audio I/O
 **Author**: `Moto Hira <moto@meta.com>`__
-This tutorial shows how to use TorchAudio's basic I/O API to load audio files
+This tutorial shows how to use TorchAudio's basic I/O API to inspect audio data,
-into PyTorch's Tensor object, and save Tensor objects to audio files.
+load them into PyTorch Tensors and save PyTorch Tensors.
+.. warning::
+   There are multiple changes planned/made to audio I/O in recent releases.
+   For the detail of these changes please refer to
+   :ref:`Introduction of Dispatcher <dispatcher_migration>`.
 """
 import torch
@@ -47,6 +54,15 @@ SAMPLE_WAV = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch12753
 SAMPLE_WAV_8000 = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042-8000hz.wav")
+def _hide_seek(obj):
+    class _wrapper:
+        def __init__(self, obj):
+            self.obj = obj
+        def read(self, n):
+            return self.obj.read(n)
+    return _wrapper(obj)
 ######################################################################
 # Querying audio metadata
@@ -113,7 +129,7 @@ print(metadata)
 url = "https://download.pytorch.org/torchaudio/tutorial-assets/steam-train-whistle-daniel_simon.wav"
 with requests.get(url, stream=True) as response:
-    metadata = torchaudio.info(response.raw)
+    metadata = torchaudio.info(_hide_seek(response.raw))
 print(metadata)
 ######################################################################
@@ -215,7 +231,7 @@ Audio(waveform.numpy()[0], rate=sample_rate)
 # Load audio data as HTTP request
 url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
 with requests.get(url, stream=True) as response:
-    waveform, sample_rate = torchaudio.load(response.raw)
+    waveform, sample_rate = torchaudio.load(_hide_seek(response.raw))
 plot_specgram(waveform, sample_rate, title="HTTP datasource")
 ######################################################################
@@ -237,7 +253,7 @@ bucket = "pytorch-tutorial-assets"
 key = "VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
 client = boto3.client("s3", config=Config(signature_version=UNSIGNED))
 response = client.get_object(Bucket=bucket, Key=key)
-waveform, sample_rate = torchaudio.load(response["Body"])
+waveform, sample_rate = torchaudio.load(_hide_seek(response["Body"]))
 plot_specgram(waveform, sample_rate, title="From S3")
@@ -271,13 +287,14 @@ frame_offset, num_frames = 16000, 16000  # Fetch and decode the 1 - 2 seconds
 url = "https://download.pytorch.org/torchaudio/tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
 print("Fetching all the data...")
 with requests.get(url, stream=True) as response:
-    waveform1, sample_rate1 = torchaudio.load(response.raw)
+    waveform1, sample_rate1 = torchaudio.load(_hide_seek(response.raw))
    waveform1 = waveform1[:, frame_offset : frame_offset + num_frames]
    print(f" - Fetched {response.raw.tell()} bytes")
 print("Fetching until the requested frames are available...")
 with requests.get(url, stream=True) as response:
-    waveform2, sample_rate2 = torchaudio.load(response.raw, frame_offset=frame_offset, num_frames=num_frames)
+    waveform2, sample_rate2 = torchaudio.load(
+        _hide_seek(response.raw), frame_offset=frame_offset, num_frames=num_frames)
    print(f" - Fetched {response.raw.tell()} bytes")
 print("Checking the resulting waveform ... ", end="")
@@ -351,11 +368,11 @@ with tempfile.TemporaryDirectory() as tempdir:
 formats = [
    "flac",
-    "vorbis",
+    # "vorbis",
-    "sph",
+    # "sph",
-    "amb",
+    # "amb",
-    "amr-nb",
+    # "amr-nb",
-    "gsm",
+    # "gsm",
 ]
 ######################################################################