Unverified Commit 2381dd89 authored by moto's avatar moto Committed by GitHub
Browse files

Update documentation and fix docstrings (#788)

- Addresses #549 #638 #786 
- Add `torchaudio` top level module doc
- Separate `torchaudio` top level module doc from `index.html`
- Add `backend` module doc.
- Remove `-> None` from function signature as it adds noise to documentation
- Changed function argument name of `torchaudio.backend.sox_io_backend.save` from `tensor` to `src`, so that it matches with the reset of backends.
- Tweak bunch of docstrings
parent 937d52f8
.. _backend:
torchaudio.backend
==================
:mod:`torchaudio.backend` module provides implementations for audio file I/O, using different backend libraries.
To switch backend, use :py:func:`torchaudio.set_audio_backend`. To check the current backend use :py:func:`torchaudio.get_audio_backend`.
.. warning::
Although ``sox`` backend is default for backward compatibility reason, it has a number of issues, therefore it is highly recommended to use ``sox_io`` backend instead. Note, however, that due to the interface refinement, functions defined in ``sox`` backend and those defined in ``sox_io`` backend do not have the same signatures.
.. note::
Instead of calling functions in :mod:`torchaudio.backend` directly, please use ``torchaudio.info``, ``torhcaudio.load``, ``torchaudio.load_wav`` and ``torchaudio.save`` with proper backend set with :func:`torchaudio.get_audio_backend`.
There are currently three implementations available.
* :ref:`sox<sox_backend>`
* :ref:`sox_io<sox_io_backend>`
* :ref:`soundfile<soundfile_backend>`
``sox`` backend is the original backend which is built on ``libsox``. This module is currently default but is known to have number of issues, such as wrong handling of WAV files other than 16-bit signed integer. Users are encouraged to use ``sox_io`` backend. This backend requires C++ extension module and is not available on Windows system.
``sox_io`` backend is the new backend which is built on ``libsox`` and bound to Python with ``Torchscript``. This module addresses all the known issues ``sox`` backend has. Function calls to this backend can be Torchscriptable. This backend requires C++ extension module and is not available on Windows system.
``soundfile`` backend is built on ``PySoundFile``. You need to install ``PySoundFile`` separately.
Common Data Structure
~~~~~~~~~~~~~~~~~~~~~
Structures used to exchange data between Python interface and ``libsox``. They are used by :ref:`sox<sox_backend>` and :ref:`soundfile<soundfile_backend>` but not by :ref:`sox_io<sox_io_backend>`.
.. autoclass:: torchaudio.backend.common.SignalInfo
.. autoclass:: torchaudio.backend.common.EncodingInfo
.. _sox_backend:
Sox Backend
~~~~~~~~~~~
``sox`` backend is available on ``torchaudio`` installation with C++ extension. It is currently not available on Windows system.
It is currently default backend when it's available. You can switch from another backend to ``sox`` backend with the following;
.. code::
torchaudio.set_audio_backend("sox")
info
----
.. autofunction:: torchaudio.backend.sox_backend.info
load
----
.. autofunction:: torchaudio.backend.sox_backend.load
.. autofunction:: torchaudio.backend.sox_backend.load_wav
save
----
.. autofunction:: torchaudio.backend.sox_backend.save
others
------
.. automodule:: torchaudio.backend.sox_backend
:members:
:exclude-members: info, load, load_wav, save
.. _sox_io_backend:
Sox IO Backend
~~~~~~~~~~~~~~
``sox_io`` backend is available on ``torchaudio`` installation with C++ extension. It is currently not available on Windows system.
This new backend is recommended over ``sox`` backend. You can switch from another backend to ``sox_io`` backend with the following;
.. code::
torchaudio.set_audio_backend("sox_io")
The function call to this backend can be Torchsript-able. You can apply :func:`torch.jit.script` and dump the object to file, then call it from C++ application.
info
----
.. autoclass:: torchaudio.backend.sox_io_backend.AudioMetaData
.. autofunction:: torchaudio.backend.sox_io_backend.info
load
----
.. autofunction:: torchaudio.backend.sox_io_backend.load
.. autofunction:: torchaudio.backend.sox_io_backend.load_wav
save
----
.. autofunction:: torchaudio.backend.sox_io_backend.save
.. _soundfile_backend:
Soundfile Backend
~~~~~~~~~~~~~~~~~
``soundfile`` backend is available when ``PySoundFile`` is installed. This backend works on ``torchaudio`` installation without C++ extension. (i.e. Windows)
You can switch from another backend to ``soundfile`` backend with the following;
.. code::
torchaudio.set_audio_backend("soundfile")
info
----
.. autofunction:: torchaudio.backend.soundfile_backend.info
load
----
.. autofunction:: torchaudio.backend.soundfile_backend.load
.. autofunction:: torchaudio.backend.soundfile_backend.load_wav
save
----
.. autofunction:: torchaudio.backend.soundfile_backend.save
torchaudio
===========
==========
The :mod:`torchaudio` package consists of I/O, popular datasets and common audio transformations.
......@@ -7,13 +7,13 @@ The :mod:`torchaudio` package consists of I/O, popular datasets and common audio
:maxdepth: 2
:caption: Package Reference
sox_effects
torchaudio
backend
functional
transforms
datasets
models
sox_effects
compliance.kaldi
kaldi_io
transforms
functional
utils
.. automodule:: torchaudio
:members:
.. role:: hidden
:class: hidden-section
.. _sox_effects:
torchaudio.sox_effects
======================
.. currentmodule:: torchaudio.sox_effects
.. warning::
The :py:class:`SoxEffect` and :py:class:`SoxEffectsChain` classes are deprecated. Please migrate to :func:`apply_effects_tensor` and :func:`apply_effects_file`.
Resource initialization / shutdown
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: init_sox_effects
.. autofunction:: shutdown_sox_effects
Listing supported effects
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: effect_names
Applying effects
~~~~~~~~~~~~~~~~
Apply SoX effects chain on torch.Tensor or on file and load as torch.Tensor.
Applying effects on Tensor
--------------------------
.. autofunction:: apply_effects_tensor
Applying effects on file
------------------------
.. autofunction:: apply_effects_file
Create SoX effects chain for preprocessing audio.
Legacy
~~~~~~
:hidden:`SoxEffect`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SoxEffect
---------
.. autoclass:: SoxEffect
:members:
:hidden:`SoxEffectsChain`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SoxEffectsChain
---------------
.. autoclass:: SoxEffectsChain
:members: append_effect_to_chain, sox_build_flow_effects, clear_chain, set_input_file
torchaudio
==========
I/O functionalities
~~~~~~~~~~~~~~~~~~~
Audio I/O functions are implemented in :ref:`torchaudio.backend<backend>` module, but for the ease of use, the following functions are made available on :mod:`torchaudio` module. There are different backends available and you can switch backends with :func:`set_audio_backend`.
Refer to :ref:`backend` for the detail.
.. function:: torchaudio.info(filepath: str, ...)
Fetch meta data of an audio file. Refer to :ref:`backend` for the detail.
.. function:: torchaudio.load(filepath: str, ...)
Load audio file into torch.Tensor object. Refer to :ref:`backend` for the detail.
.. function:: torchaudio.load_wav(filepath: str, ...)
Load audio file into torch.Tensor, Refer to :ref:`backend` for the detail.
.. function:: torchaudio.save(filepath: str, src: torch.Tensor, sample_rate: int, ...)
Save torch.Tensor object into an audio format. Refer to :ref:`backend` for the detail.
.. currentmodule:: torchaudio
Backend Utilities
~~~~~~~~~~~~~~~~~
.. autofunction:: list_audio_backends
.. autofunction:: get_audio_backend
.. autofunction:: set_audio_backend
Sox Effects Utilities
~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: initialize_sox
.. autofunction:: shutdown_sox
.. role:: hidden
:class: hidden-section
torchaudio.utils
================
torchaudio.utils.sox_utils
==========================
~~~~~~~~~~~~~~~~~~~~~~~~~~
Utility module to configure libsox. This affects functionalities in ``sox_io`` backend and ``torchaudio.sox_effects``.
Utility module to configure libsox.
This affects functionalities in :ref:`Sox IO backend<sox_io_backend>` and :ref:`Sox Effects<sox_effects>`.
.. currentmodule:: torchaudio.utils.sox_utils
.. autofunction:: set_seed
.. autofunction:: set_verbosity
.. autofunction:: set_buffer_size
.. autofunction:: set_use_threads
.. autofunction:: list_effects
.. autofunction:: list_formats
.. automodule:: torchaudio.utils.sox_utils
:members:
......@@ -35,10 +35,10 @@ except ImportError:
@_mod_utils.deprecated(
"Please remove the function call to initialize_sox. "
"Resource initialization is now automatically handled.")
def initialize_sox() -> int:
def initialize_sox():
"""Initialize sox effects.
This function is deprecated. See ``torchaudio.sox_effects.init_sox_effects``
This function is deprecated. See :py:func:`torchaudio.sox_effects.init_sox_effects`
"""
_init_sox_effects()
......@@ -51,6 +51,6 @@ def initialize_sox() -> int:
def shutdown_sox():
"""Shutdown sox effects.
This function is deprecated. See ``torchaudio.sox_effects.shutdown_sox_effects``
This function is deprecated. See :py:func:`torchaudio.sox_effects.shutdown_sox_effects`
"""
_shutdown_sox_effects()
......@@ -2,6 +2,18 @@ from typing import Any, Optional
class SignalInfo:
"""Data class returned ``info`` functions.
Used by :ref:`sox backend<sox_backend>` and :ref:`soundfile backend<soundfile_backend>`
See https://fossies.org/dox/sox-14.4.2/structsox__signalinfo__t.html
:ivar Optional[int] channels: The number of channels
:ivar Optional[float] rate: Sampleing rate
:ivar Optional[int] precision: Bit depth
:ivar Optional[int] length: For :ref:`sox backend<sox_backend>`, the number of samples.
(frames * channels). For :ref:`soundfile backend<soundfile_backend>`, the number of frames.
"""
def __init__(self,
channels: Optional[int] = None,
rate: Optional[float] = None,
......@@ -14,6 +26,20 @@ class SignalInfo:
class EncodingInfo:
"""Data class returned ``info`` functions.
Used by :ref:`sox backend<sox_backend>` and :ref:`soundfile backend<soundfile_backend>`
See https://fossies.org/dox/sox-14.4.2/structsox__encodinginfo__t.html
:ivar Optional[int] encoding: sox_encoding_t
:ivar Optional[int] bits_per_sample: bit depth
:ivar Optional[float] compression: Compression option
:ivar Any reverse_bytes:
:ivar Any reverse_nibbles:
:ivar Any reverse_bits:
:ivar Optional[bool] opposite_endian:
"""
def __init__(self,
encoding: Any = None,
bits_per_sample: Optional[int] = None,
......
......@@ -7,6 +7,12 @@ from torchaudio._internal import (
class AudioMetaData:
"""Data class to be returned by :py:func:`~torchaudio.backend.sox_io_backend.info`.
:ivar int sample_rate: Sample rate
:ivar int num_frames: The number of frames
:ivar int num_channels: The number of channels
"""
def __init__(self, sample_rate: int, num_frames: int, num_channels: int):
self.sample_rate = sample_rate
self.num_frames = num_frames
......@@ -15,7 +21,14 @@ class AudioMetaData:
@_mod_utils.requires_module('torchaudio._torchaudio')
def info(filepath: str) -> AudioMetaData:
"""Get signal information of an audio file."""
"""Get signal information of an audio file.
Args:
filepath (str): Path to audio file
Returns:
AudioMetaData: meta data of the given audio.
"""
sinfo = torch.ops.torchaudio.sox_io_get_info(filepath)
return AudioMetaData(sinfo.get_sample_rate(), sinfo.get_num_frames(), sinfo.get_num_channels())
......@@ -30,21 +43,28 @@ def load(
) -> Tuple[torch.Tensor, int]:
"""Load audio data from file.
This function can handle all the codecs that underlying libsox can handle, however note the
followings.
Note:
This function is tested on the following formats;
- WAV
- 32-bit floating-point
- 32-bit signed integer
- 16-bit signed integer
- 8-bit unsigned integer
- MP3
- FLAC
- OGG/VORBIS
By default, this function returns Tensor with ``float32`` dtype and the shape of ``[channel, time]``.
This function can handle all the codecs that underlying libsox can handle,
however it is tested on the following formats;
* WAV
* 32-bit floating-point
* 32-bit signed integer
* 16-bit signed integer
* 8-bit unsigned integer
* MP3
* FLAC
* OGG/VORBIS
* OPUS
To load ``MP3``, ``FLAC``, ``OGG/VORBIS``, ``OPUS`` and other codecs ``libsox`` does not
handle natively, your installation of ``torchaudio`` has to be linked to ``libsox``
and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
``float32`` dtype and the shape of ``[channel, time]``.
The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
......@@ -54,24 +74,33 @@ def load(
for 32-bit signed PCM, ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM.
``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as
flac and mp3. For these formats, this function always returns ``float32`` Tensor with values
normalized to ``[-1.0, 1.0]``.
``flac`` and ``mp3``.
For these formats, this function always returns ``float32`` Tensor with values normalized to
``[-1.0, 1.0]``.
Args:
filepath: Path to audio file
frame_offset: Number of frames to skip before start reading data.
num_frames: Maximum number of frames to read. -1 reads all the remaining samples, starting
from ``frame_offset``. This function may return the less number of frames if there is
not enough frames in the given file.
normalize: When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``. If input file is integer WAV, giving ``False`` will change
the resulting Tensor type to integer type. This argument has no effect for formats other
than integer WAV type.
channels_first: When True, the returned Tensor has dimension ``[channel, time]``.
filepath (str):
Path to audio file
frame_offset (int):
Number of frames to skip before start reading data.
num_frames (int):
Maximum number of frames to read. ``-1`` reads all the remaining samples,
starting from ``frame_offset``.
This function may return the less number of frames if there is not enough
frames in the given file.
normalize (bool):
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool):
When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
Returns:
torch.Tensor: If the input file has integer wav format and normalization is off, then it has
torch.Tensor:
If the input file has integer wav format and normalization is off, then it has
integer type, else ``float32`` type. If ``channels_first=True``, it has
``[channel, time]`` else ``[time, channel]``.
"""
......@@ -83,37 +112,49 @@ def load(
@_mod_utils.requires_module('torchaudio._torchaudio')
def save(
filepath: str,
tensor: torch.Tensor,
src: torch.Tensor,
sample_rate: int,
channels_first: bool = True,
compression: Optional[float] = None,
):
"""Save audio data to file.
Note:
Supported formats are;
- WAV
- 32-bit floating-point
- 32-bit signed integer
- 16-bit signed integer
- 8-bit unsigned integer
- MP3
- FLAC
- OGG/VORBIS
* WAV
* 32-bit floating-point
* 32-bit signed integer
* 16-bit signed integer
* 8-bit unsigned integer
* MP3
* FLAC
* OGG/VORBIS
To save ``MP3``, ``FLAC``, ``OGG/VORBIS``, and other codecs ``libsox`` does not
handle natively, your installation of ``torchaudio`` has to be linked to ``libsox``
and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
Args:
filepath: Path to save file.
tensor: Audio data to save. must be 2D tensor.
sample_rate: sampling rate
channels_first: If True, the given tensor is interpreted as ``[channel, time]``.
compression: Used for formats other than WAV. This corresponds to ``-C`` option
of ``sox`` command.
filepath (str): Path to save file.
tensor (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate (int): sampling rate
channels_first (bool):
If ``True``, the given tensor is interpreted as ``[channel, time]``,
otherwise ``[time, channel]``.
compression (Optional[float]):
Used for formats other than WAV. This corresponds to ``-C`` option of ``sox`` command.
* | ``MP3``: Either bitrate (in ``kbps``) with quality factor, such as ``128.2``, or
| VBR encoding with quality factor such as ``-4.2``. Default: ``-4.5``.
* | ``FLAC``: compression level. Whole number from ``0`` to ``8``.
| ``8`` is default and highest compression.
* | ``OGG/VORBIS``: number from ``-1`` to ``10``; ``-1`` is the highest compression
| and lowest quality. Default: ``3``.
See the detail at http://sox.sourceforge.net/soxformat.html.
- MP3: Either bitrate [kbps] with quality factor, such as ``128.2`` or
VBR encoding with quality factor such as ``-4.2``. Default: ``-4.5``
- FLAC: compression level. Whole number from ``0`` to ``8``.
``8`` is default and highest compression.
- OGG/VORBIS: number from -1 to 10; -1 is the highest compression and lowest
quality. Default: ``3``.
"""
if compression is None:
ext = str(filepath)[-3:].lower()
......@@ -127,8 +168,22 @@ def save(
compression = 3.
else:
raise RuntimeError(f'Unsupported file type: "{ext}"')
signal = torch.classes.torchaudio.TensorSignal(tensor, sample_rate, channels_first)
signal = torch.classes.torchaudio.TensorSignal(src, sample_rate, channels_first)
torch.ops.torchaudio.sox_io_save_audio_file(filepath, signal, compression)
load_wav = load
@_mod_utils.requires_module('torchaudio._torchaudio')
def load_wav(
filepath: str,
frame_offset: int = 0,
num_frames: int = -1,
channels_first: bool = True,
) -> Tuple[torch.Tensor, int]:
"""Load wave file.
This function is defined only for the purpose of compatibility against other backend
for simple usecases, such as ``torchaudio.load_wav(filepath)``.
The implementation is same as :py:func:`load`.
"""
return load(filepath, frame_offset, num_frames, normalize=False, channels_first=channels_first)
......@@ -19,7 +19,11 @@ __all__ = [
def list_audio_backends() -> List[str]:
"""List available backends"""
"""List available backends
Returns:
List[str]: The list of available backends.
"""
backends = []
if is_module_available('soundfile'):
backends.append('soundfile')
......@@ -29,12 +33,13 @@ def list_audio_backends() -> List[str]:
return backends
def set_audio_backend(backend: Optional[str]) -> None:
def set_audio_backend(backend: Optional[str]):
"""Set the backend for I/O operation
Args:
backend (str): Name of the backend. One of "sox" or "soundfile",
based on availability of the system.
backend (Optional[str]): Name of the backend.
One of ``"sox"``, ``"sox_io"`` or ``"soundfile"`` based on availability
of the system. If ``None`` is provided the current backend is unassigned.
"""
if backend is not None and backend not in list_audio_backends():
raise RuntimeError(
......@@ -68,7 +73,11 @@ def _init_audio_backend():
def get_audio_backend() -> Optional[str]:
"""Get the name of the current backend"""
"""Get the name of the current backend
Returns:
Optional[str]: The name of the current backend or ``None`` if no backend is assigned.
"""
if torchaudio.load == no_backend.load:
return None
if torchaudio.load == sox_backend.load:
......
......@@ -1646,11 +1646,11 @@ def compute_deltas(
r"""Compute delta coefficients of a tensor, usually a spectrogram:
.. math::
d_t = \frac{\sum_{n=1}^{\text{N}} n (c_{t+n} - c_{t-n})}{2 \sum_{n=1}^{\text{N} n^2}
d_t = \frac{\sum_{n=1}^{\text{N}} n (c_{t+n} - c_{t-n})}{2 \sum_{n=1}^{\text{N}} n^2}
where :math:`d_t` is the deltas at time :math:`t`,
:math:`c_t` is the spectrogram coeffcients at time :math:`t`,
:math:`N` is (`win_length`-1)//2.
:math:`N` is ``(win_length-1)//2``.
Args:
specgram (Tensor): Tensor of audio of dimension (..., freq, time)
......
from typing import Optional
from torch import Tensor
from torch import nn
......@@ -7,8 +5,9 @@ __all__ = ["Wav2Letter"]
class Wav2Letter(nn.Module):
r"""Wav2Letter model architecture from the `"Wav2Letter: an End-to-End ConvNet-based Speech Recognition System"
<https://arxiv.org/abs/1609.03193>`_ paper.
r"""Wav2Letter model architecture from the `Wav2Letter an End-to-End ConvNet-based Speech Recognition System`_.
.. _Wav2Letter an End-to-End ConvNet-based Speech Recognition System: https://arxiv.org/abs/1609.03193
:math:`\text{padding} = \frac{\text{ceil}(\text{kernel} - \text{stride})}{2}`
......@@ -63,7 +62,7 @@ class Wav2Letter(nn.Module):
def forward(self, x: Tensor) -> Tensor:
r"""
Args:
x (Tensor): Tensor of dimension (batch_size, num_features, input_length).
x (torch.Tensor): Tensor of dimension (batch_size, num_features, input_length).
Returns:
Tensor: Predictor tensor of dimension (batch_size, number_of_classes, input_length).
......
......@@ -15,31 +15,29 @@ if _mod_utils.is_module_available('torchaudio._torchaudio'):
@_mod_utils.requires_module('torchaudio._torchaudio')
def init_sox_effects() -> None:
"""Initialize resources required to use ``SoxEffectsChain``
def init_sox_effects():
"""Initialize resources required to use sox effects.
Note:
You do not need to call this function manually. It is called automatically.
Once initialized, you do not need to call this function again across the multiple call of
``SoxEffectsChain.sox_build_flow_effects``, though it is safe to do so as long as
``shutdown_sox_effects`` is not called yet.
Once ``shutdown_sox_effects`` is called, you can no longer use SoX effects and
initializing again will result in error.
Note:
This function is not required for simple loading.
Once initialized, you do not need to call this function again across the multiple uses of
sox effects though it is safe to do so as long as :func:`shutdown_sox_effects` is not called yet.
Once :func:`shutdown_sox_effects` is called, you can no longer use SoX effects and initializing
again will result in error.
"""
torch.ops.torchaudio.sox_effects_initialize_sox_effects()
@_mod_utils.requires_module("torchaudio._torchaudio")
def shutdown_sox_effects() -> None:
"""Clean up resources required to use ``SoxEffectsChain``
def shutdown_sox_effects():
"""Clean up resources required to use sox effects.
Note:
You do not need to call this function manually. It is called automatically.
It is safe to call this function multiple times.
Once ``shutdown_sox_effects`` is called, you can no longer use SoX effects and
Once :py:func:`shutdown_sox_effects` is called, you can no longer use SoX effects and
initializing again will result in error.
"""
torch.ops.torchaudio.sox_effects_shutdown_sox_effects()
......@@ -49,10 +47,12 @@ def shutdown_sox_effects() -> None:
def effect_names() -> List[str]:
"""Gets list of valid sox effect names
Returns: list[str]
Returns:
List[str]: list of available effect names.
Example
>>> EFFECT_NAMES = torchaudio.sox_effects.effect_names()
>>> torchaudio.sox_effects.effect_names()
['allpass', 'band', 'bandpass', ... ]
"""
return list(list_effects().keys())
......@@ -66,6 +66,13 @@ def apply_effects_tensor(
) -> Tuple[torch.Tensor, int]:
"""Apply sox effects to given Tensor
Note:
This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed`` and ``pitch`` and other effects), but this function does
only applies the given effects. (Therefore, to actually apply ``speed`` effect, you also
need to give ``rate`` effect with desired sampling rate.)
Args:
tensor (torch.Tensor): Input 2D Tensor.
sample_rate (int): Sample rate
......@@ -79,20 +86,15 @@ def apply_effects_tensor(
the same channels order. The shape of the Tensor can be different based on the
effects applied. Sample rate can also be different based on the effects applied.
Notes:
This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed`` and ``pitch`` and other effects), but this function does
only applies the given effects. (Therefore, to actually apply ``speed`` effect, you also
need to give ``rate`` effect with desired sampling rate.)
Examples:
Example - Basic usage
>>>
>>> # Defines the effects to apply
>>> effects = [
... ['gain', '-n'], # normalises to 0dB
... ['pitch', '5'], # 5 cent pitch shift
... ['rate', '8000'], # resample to 8000 Hz
... ]
>>>
>>> # Generate pseudo wave:
>>> # normalized, channels first, 2ch, sampling rate 16000, 1 second
>>> sample_rate = 16000
......@@ -102,9 +104,12 @@ def apply_effects_tensor(
>>> waveform
tensor([[ 0.3138, 0.7620, -0.9019, ..., -0.7495, -0.4935, 0.5442],
[-0.0832, 0.0061, 0.8233, ..., -0.5176, -0.9140, -0.2434]])
>>>
>>> # Apply effects
>>> waveform, sample_rate = apply_effects_tensor(
... wave_form, sample_rate, effects, channels_first=True)
>>>
>>> # Check the result
>>> # The new waveform is sampling rate 8000, 1 second.
>>> # normalization and channel order are preserved
>>> waveform.shape
......@@ -114,6 +119,40 @@ def apply_effects_tensor(
[ 0.1331, 0.0436, -0.3783, ..., -0.0035, 0.0012, 0.0008]])
>>> sample_rate
8000
Example - Torchscript-able transform
>>>
>>> # Use `apply_effects_tensor` in `torch.nn.Module` and dump it to file,
>>> # then run sox effect via Torchscript runtime.
>>>
>>> class SoxEffectTransform(torch.nn.Module):
... effects: List[List[str]]
...
... def __init__(self, effects: List[List[str]]):
... super().__init__()
... self.effects = effects
...
... def forward(self, tensor: torch.Tensor, sample_rate: int):
... return sox_effects.apply_effects_tensor(
... tensor, sample_rate, self.effects)
...
...
>>> # Create transform object
>>> effects = [
... ["lowpass", "-1", "300"], # apply single-pole lowpass filter
... ["rate", "8000"], # change sample rate to 8000
... ]
>>> transform = SoxEffectTensorTransform(effects, input_sample_rate)
>>>
>>> # Dump it to file and load
>>> path = 'sox_effect.zip'
>>> torch.jit.script(trans).save(path)
>>> transform = torch.jit.load(path)
>>>
>>>> # Run transform
>>> waveform, input_sample_rate = torchaudio.load("input.wav")
>>> waveform, sample_rate = transform(waveform, input_sample_rate)
>>> assert sample_rate == 8000
"""
in_signal = torch.classes.torchaudio.TensorSignal(tensor, sample_rate, channels_first)
out_signal = torch.ops.torchaudio.sox_effects_apply_effects_tensor(in_signal, effects)
......@@ -129,12 +168,22 @@ def apply_effects_file(
) -> Tuple[torch.Tensor, int]:
"""Apply sox effects to the audio file and load the resulting data as Tensor
Note:
This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given
effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate``
effect with desired sampling rate, because internally, ``speed`` effects only alter sampling
rate and leave samples untouched.
Args:
path (str): Path to the audio file.
effects (List[List[str]]): List of effects.
normalize (bool): When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``. If input file is integer WAV, giving ``False`` will change
the resulting Tensor type to integer type. This argument has no effect for formats other
normalize (bool):
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. This argument has no effect for formats other
than integer WAV type.
channels_first (bool): When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
......@@ -147,23 +196,19 @@ def apply_effects_file(
If ``channels_first=True``, the resulting Tensor has dimension ``[channel, time]``,
otherwise ``[time, channel]``.
Notes:
This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given
effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate``
effect with desired sampling rate, because internally, ``speed`` effects only alter sampling
rate and leave samples untouched.
Examples:
Example - Basic usage
>>>
>>> # Defines the effects to apply
>>> effects = [
... ['gain', '-n'], # normalises to 0dB
... ['pitch', '5'], # 5 cent pitch shift
... ['rate', '8000'], # resample to 8000 Hz
... ]
>>>
>>> # Apply effects and load data with channels_first=True
>>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True)
>>>
>>> # Check the result
>>> waveform.shape
torch.Size([2, 8000])
>>> waveform
......@@ -173,6 +218,42 @@ def apply_effects_file(
-5.6159e-07, 4.8103e-07]])
>>> sample_rate
8000
Example - Apply random speed perturbation to dataset
>>>
>>> # Load data from file, apply random speed perturbation
>>> class RandomPerturbationFile(torch.utils.data.Dataset):
... \"\"\"Given flist, apply random speed perturbation
...
... Suppose all the input files are at least one second long.
... \"\"\"
... def __init__(self, flist: List[str], sample_rate: int):
... super().__init__()
... self.flist = flist
... self.sample_rate = sample_rate
... self.rng = None
...
... def __getitem__(self, index):
... speed = self.rng.uniform(0.5, 2.0)
... effects = [
... ['gain', '-n', '-10'], # apply 10 db attenuation
... ['remix', '-'], # merge all the channels
... ['speed', f'{speed:.5f}'], # duration is now 0.5 ~ 2.0 seconds.
... ['rate', f'{self.sample_rate}'],
... ['pad', '0', '1.5'], # add 1.5 seconds silence at the end
... ['trim', '0', '2'], # get the first 2 seconds
... ]
... waveform, _ = torchaudio.sox_effects.apply_effects_file(
... self.flist[index], effects)
... return waveform
...
... def __len__(self):
... return len(self.flist)
...
>>> dataset = RandomPerturbationFile(file_list, sample_rate=8000)
>>> loader = torch.utils.data.DataLoader(dataset, batch_size=32)
>>> for batch in loader:
>>> pass
"""
signal = torch.ops.torchaudio.sox_effects_apply_effects_file(path, effects, normalize, channels_first)
return signal.get_tensor(), signal.get_sample_rate()
......@@ -183,7 +264,7 @@ def apply_effects_file(
def SoxEffect():
r"""Create an object for passing sox effect information between python and c++
Note:
Warning:
This function is deprecated.
Please migrate to :func:`apply_effects_file` or :func:`apply_effects_tensor`.
......@@ -198,50 +279,56 @@ def SoxEffect():
class SoxEffectsChain(object):
r"""SoX effects chain class.
Note:
Warning:
This class is deprecated.
Please migrate to :func:`apply_effects_file` or :func:`apply_effects_tensor`.
Args:
normalization (bool, number, or callable, optional): If boolean `True`, then output is divided by `1 << 31`
(assumes signed 32-bit audio), and normalizes to `[-1, 1]`. If `number`, then output is divided by that
number. If `callable`, then the output is passed as a parameter to the given function, then the
output is divided by the result. (Default: ``True``)
channels_first (bool, optional): Set channels first or length first in result. (Default: ``True``)
out_siginfo (sox_signalinfo_t, optional): a sox_signalinfo_t type, which could be helpful if the
audio type cannot be automatically determined. (Default: ``None``)
out_encinfo (sox_encodinginfo_t, optional): a sox_encodinginfo_t type, which could be set if the
audio type cannot be automatically determined. (Default: ``None``)
filetype (str, optional): a filetype or extension to be set if sox cannot determine it
automatically. . (Default: ``'raw'``)
normalization (bool, number, or callable, optional):
If boolean ``True``, then output is divided by ``1 << 31``
(assumes signed 32-bit audio), and normalizes to ``[-1, 1]``.
If ``number``, then output is divided by that number.
If ``callable``, then the output is passed as a parameter to the given function, then
the output is divided by the result. (Default: ``True``)
channels_first (bool, optional):
Set channels first or length first in result. (Default: ``True``)
out_siginfo (sox_signalinfo_t, optional):
a sox_signalinfo_t type, which could be helpful if the audio type cannot be
automatically determined. (Default: ``None``)
out_encinfo (sox_encodinginfo_t, optional):
a sox_encodinginfo_t type, which could be set if the audio type cannot be
automatically determined. (Default: ``None``)
filetype (str, optional):
a filetype or extension to be set if sox cannot determine it automatically.
(Default: ``'raw'``)
Returns:
Tuple[Tensor, int]: An output Tensor of size `[C x L]` or `[L x C]` where L is the number
Tuple[Tensor, int]:
An output Tensor of size ``[C x L]`` or ``[L x C]`` where L is the number
of audio frames and C is the number of channels. An integer which is the sample rate of the
audio (as listed in the metadata of the file)
Example
>>> class MyDataset(Dataset):
>>> def __init__(self, audiodir_path):
>>> self.data = [os.path.join(audiodir_path, fn) for fn in os.listdir(audiodir_path)]
>>> self.E = torchaudio.sox_effects.SoxEffectsChain()
>>> self.E.append_effect_to_chain("rate", [16000]) # resample to 16000hz
>>> self.E.append_effect_to_chain("channels", ["1"]) # mono signal
>>> def __getitem__(self, index):
>>> fn = self.data[index]
>>> self.E.set_input_file(fn)
>>> x, sr = self.E.sox_build_flow_effects()
>>> return x, sr
>>>
>>> def __len__(self):
>>> return len(self.data)
>>>
>>> torchaudio.initialize_sox()
... def __init__(self, audiodir_path):
... self.data = [
... os.path.join(audiodir_path, fn)
... for fn in os.listdir(audiodir_path)]
... self.E = torchaudio.sox_effects.SoxEffectsChain()
... self.E.append_effect_to_chain("rate", [16000]) # resample to 16000hz
... self.E.append_effect_to_chain("channels", ["1"]) # mono signal
... def __getitem__(self, index):
... fn = self.data[index]
... self.E.set_input_file(fn)
... x, sr = self.E.sox_build_flow_effects()
... return x, sr
...
... def __len__(self):
... return len(self.data)
...
>>> ds = MyDataset(path_to_audio_files)
>>> for sig, sr in ds:
>>> [do something here]
>>> torchaudio.shutdown_sox()
... pass
"""
EFFECTS_UNIMPLEMENTED = {"spectrogram", "splice", "noiseprof", "fir"}
......@@ -298,9 +385,9 @@ class SoxEffectsChain(object):
out (Tensor, optional): Where the output will be written to. (Default: ``None``)
Returns:
Tuple[Tensor, int]: An output Tensor of size `[C x L]` or `[L x C]` where L is the number
of audio frames and C is the number of channels. An integer which is the sample rate of the
audio (as listed in the metadata of the file)
Tuple[Tensor, int]: An output Tensor of size `[C x L]` or `[L x C]` where
L is the number of audio frames and C is the number of channels.
An integer which is the sample rate of the audio (as listed in the metadata of the file)
"""
# initialize output tensor
if out is not None:
......
......@@ -86,20 +86,8 @@ class Spectrogram(torch.nn.Module):
class GriffinLim(torch.nn.Module):
r"""Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation.
Implementation ported from `librosa`.
.. [1] McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto.
"librosa: Audio and music signal analysis in python."
In Proceedings of the 14th python in science conference, pp. 18-25. 2015.
.. [2] Perraudin, N., Balazs, P., & Søndergaard, P. L.
"A fast Griffin-Lim algorithm,"
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-4),
Oct. 2013.
.. [3] D. W. Griffin and J. S. Lim,
"Signal estimation from modified short-time Fourier transform,"
IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
Implementation ported from ``librosa`` [1]_, [2]_, [3]_.
Args:
n_fft (int, optional): Size of FFT, creates ``n_fft // 2 + 1`` bins. (Default: ``400``)
......@@ -117,6 +105,24 @@ class GriffinLim(torch.nn.Module):
Values near 1 can lead to faster convergence, but above 1 may not converge. (Default: ``0.99``)
length (int, optional): Array length of the expected output. (Default: ``None``)
rand_init (bool, optional): Initializes phase randomly if True and to zero otherwise. (Default: ``True``)
References:
.. [1]
| McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg,
and Oriol Nieto.
| "librosa: Audio and music signal analysis in python."
| In Proceedings of the 14th python in science conference, pp. 18-25. 2015.
.. [2]
| Perraudin, N., Balazs, P., & Søndergaard, P. L.
| "A fast Griffin-Lim algorithm,"
| IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-4),
| Oct. 2013.
.. [3]
| D. W. Griffin and J. S. Lim,
| "Signal estimation from modified short-time Fourier transform,"
| IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
"""
__constants__ = ['n_fft', 'n_iter', 'win_length', 'hop_length', 'power', 'normalized',
'length', 'momentum', 'rand_init']
......@@ -153,7 +159,8 @@ class GriffinLim(torch.nn.Module):
def forward(self, specgram: Tensor) -> Tensor:
r"""
Args:
specgram (Tensor): A magnitude-only STFT spectrogram of dimension (..., freq, frames)
specgram (Tensor):
A magnitude-only STFT spectrogram of dimension (..., freq, frames)
where freq is ``n_fft // 2 + 1``.
Returns:
......
......@@ -55,8 +55,8 @@ def set_use_threads(use_threads: bool):
"""Set multithread option for sox effect chain
Args:
use_threads (bool): When True, enables libsox's parallel effects channels processing.
To use mutlithread, the underlying libsox has to be compiled with OpenMP support.
use_threads (bool): When ``True``, enables ``libsox``'s parallel effects channels processing.
To use mutlithread, the underlying ``libsox`` has to be compiled with OpenMP support.
See Also:
http://sox.sourceforge.net/sox.html
......@@ -69,7 +69,7 @@ def list_effects() -> Dict[str, str]:
"""List the available sox effect names
Returns:
Dict[str, str]: Mapping from "effect name" to "usage"
Dict[str, str]: Mapping from ``effect name`` to ``usage``
"""
return dict(torch.ops.torchaudio.sox_utils_list_effects())
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment