Unverified Commit 2381dd89 authored by moto's avatar moto Committed by GitHub
Browse files

Update documentation and fix docstrings (#788)

- Addresses #549 #638 #786 
- Add `torchaudio` top level module doc
- Separate `torchaudio` top level module doc from `index.html`
- Add `backend` module doc.
- Remove `-> None` from function signature as it adds noise to documentation
- Changed function argument name of `torchaudio.backend.sox_io_backend.save` from `tensor` to `src`, so that it matches with the reset of backends.
- Tweak bunch of docstrings
parent 937d52f8
.. _backend:
torchaudio.backend
==================
:mod:`torchaudio.backend` module provides implementations for audio file I/O, using different backend libraries.
To switch backend, use :py:func:`torchaudio.set_audio_backend`. To check the current backend use :py:func:`torchaudio.get_audio_backend`.
.. warning::
Although ``sox`` backend is default for backward compatibility reason, it has a number of issues, therefore it is highly recommended to use ``sox_io`` backend instead. Note, however, that due to the interface refinement, functions defined in ``sox`` backend and those defined in ``sox_io`` backend do not have the same signatures.
.. note::
Instead of calling functions in :mod:`torchaudio.backend` directly, please use ``torchaudio.info``, ``torhcaudio.load``, ``torchaudio.load_wav`` and ``torchaudio.save`` with proper backend set with :func:`torchaudio.get_audio_backend`.
There are currently three implementations available.
* :ref:`sox<sox_backend>`
* :ref:`sox_io<sox_io_backend>`
* :ref:`soundfile<soundfile_backend>`
``sox`` backend is the original backend which is built on ``libsox``. This module is currently default but is known to have number of issues, such as wrong handling of WAV files other than 16-bit signed integer. Users are encouraged to use ``sox_io`` backend. This backend requires C++ extension module and is not available on Windows system.
``sox_io`` backend is the new backend which is built on ``libsox`` and bound to Python with ``Torchscript``. This module addresses all the known issues ``sox`` backend has. Function calls to this backend can be Torchscriptable. This backend requires C++ extension module and is not available on Windows system.
``soundfile`` backend is built on ``PySoundFile``. You need to install ``PySoundFile`` separately.
Common Data Structure
~~~~~~~~~~~~~~~~~~~~~
Structures used to exchange data between Python interface and ``libsox``. They are used by :ref:`sox<sox_backend>` and :ref:`soundfile<soundfile_backend>` but not by :ref:`sox_io<sox_io_backend>`.
.. autoclass:: torchaudio.backend.common.SignalInfo
.. autoclass:: torchaudio.backend.common.EncodingInfo
.. _sox_backend:
Sox Backend
~~~~~~~~~~~
``sox`` backend is available on ``torchaudio`` installation with C++ extension. It is currently not available on Windows system.
It is currently default backend when it's available. You can switch from another backend to ``sox`` backend with the following;
.. code::
torchaudio.set_audio_backend("sox")
info
----
.. autofunction:: torchaudio.backend.sox_backend.info
load
----
.. autofunction:: torchaudio.backend.sox_backend.load
.. autofunction:: torchaudio.backend.sox_backend.load_wav
save
----
.. autofunction:: torchaudio.backend.sox_backend.save
others
------
.. automodule:: torchaudio.backend.sox_backend
:members:
:exclude-members: info, load, load_wav, save
.. _sox_io_backend:
Sox IO Backend
~~~~~~~~~~~~~~
``sox_io`` backend is available on ``torchaudio`` installation with C++ extension. It is currently not available on Windows system.
This new backend is recommended over ``sox`` backend. You can switch from another backend to ``sox_io`` backend with the following;
.. code::
torchaudio.set_audio_backend("sox_io")
The function call to this backend can be Torchsript-able. You can apply :func:`torch.jit.script` and dump the object to file, then call it from C++ application.
info
----
.. autoclass:: torchaudio.backend.sox_io_backend.AudioMetaData
.. autofunction:: torchaudio.backend.sox_io_backend.info
load
----
.. autofunction:: torchaudio.backend.sox_io_backend.load
.. autofunction:: torchaudio.backend.sox_io_backend.load_wav
save
----
.. autofunction:: torchaudio.backend.sox_io_backend.save
.. _soundfile_backend:
Soundfile Backend
~~~~~~~~~~~~~~~~~
``soundfile`` backend is available when ``PySoundFile`` is installed. This backend works on ``torchaudio`` installation without C++ extension. (i.e. Windows)
You can switch from another backend to ``soundfile`` backend with the following;
.. code::
torchaudio.set_audio_backend("soundfile")
info
----
.. autofunction:: torchaudio.backend.soundfile_backend.info
load
----
.. autofunction:: torchaudio.backend.soundfile_backend.load
.. autofunction:: torchaudio.backend.soundfile_backend.load_wav
save
----
.. autofunction:: torchaudio.backend.soundfile_backend.save
torchaudio torchaudio
=========== ==========
The :mod:`torchaudio` package consists of I/O, popular datasets and common audio transformations. The :mod:`torchaudio` package consists of I/O, popular datasets and common audio transformations.
...@@ -7,13 +7,13 @@ The :mod:`torchaudio` package consists of I/O, popular datasets and common audio ...@@ -7,13 +7,13 @@ The :mod:`torchaudio` package consists of I/O, popular datasets and common audio
:maxdepth: 2 :maxdepth: 2
:caption: Package Reference :caption: Package Reference
sox_effects torchaudio
backend
functional
transforms
datasets datasets
models
sox_effects
compliance.kaldi compliance.kaldi
kaldi_io kaldi_io
transforms
functional
utils utils
.. automodule:: torchaudio
:members:
.. role:: hidden .. _sox_effects:
:class: hidden-section
torchaudio.sox_effects torchaudio.sox_effects
====================== ======================
.. currentmodule:: torchaudio.sox_effects .. currentmodule:: torchaudio.sox_effects
.. warning::
The :py:class:`SoxEffect` and :py:class:`SoxEffectsChain` classes are deprecated. Please migrate to :func:`apply_effects_tensor` and :func:`apply_effects_file`.
Resource initialization / shutdown
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: init_sox_effects
.. autofunction:: shutdown_sox_effects
Listing supported effects
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: effect_names
Applying effects
~~~~~~~~~~~~~~~~
Apply SoX effects chain on torch.Tensor or on file and load as torch.Tensor. Apply SoX effects chain on torch.Tensor or on file and load as torch.Tensor.
Applying effects on Tensor
--------------------------
.. autofunction:: apply_effects_tensor .. autofunction:: apply_effects_tensor
Applying effects on file
------------------------
.. autofunction:: apply_effects_file .. autofunction:: apply_effects_file
Create SoX effects chain for preprocessing audio. Legacy
~~~~~~
:hidden:`SoxEffect` SoxEffect
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---------
.. autoclass:: SoxEffect .. autoclass:: SoxEffect
:members: :members:
:hidden:`SoxEffectsChain` SoxEffectsChain
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---------------
.. autoclass:: SoxEffectsChain .. autoclass:: SoxEffectsChain
:members: append_effect_to_chain, sox_build_flow_effects, clear_chain, set_input_file :members: append_effect_to_chain, sox_build_flow_effects, clear_chain, set_input_file
torchaudio
==========
I/O functionalities
~~~~~~~~~~~~~~~~~~~
Audio I/O functions are implemented in :ref:`torchaudio.backend<backend>` module, but for the ease of use, the following functions are made available on :mod:`torchaudio` module. There are different backends available and you can switch backends with :func:`set_audio_backend`.
Refer to :ref:`backend` for the detail.
.. function:: torchaudio.info(filepath: str, ...)
Fetch meta data of an audio file. Refer to :ref:`backend` for the detail.
.. function:: torchaudio.load(filepath: str, ...)
Load audio file into torch.Tensor object. Refer to :ref:`backend` for the detail.
.. function:: torchaudio.load_wav(filepath: str, ...)
Load audio file into torch.Tensor, Refer to :ref:`backend` for the detail.
.. function:: torchaudio.save(filepath: str, src: torch.Tensor, sample_rate: int, ...)
Save torch.Tensor object into an audio format. Refer to :ref:`backend` for the detail.
.. currentmodule:: torchaudio
Backend Utilities
~~~~~~~~~~~~~~~~~
.. autofunction:: list_audio_backends
.. autofunction:: get_audio_backend
.. autofunction:: set_audio_backend
Sox Effects Utilities
~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: initialize_sox
.. autofunction:: shutdown_sox
.. role:: hidden torchaudio.utils
:class: hidden-section ================
torchaudio.utils.sox_utils torchaudio.utils.sox_utils
========================== ~~~~~~~~~~~~~~~~~~~~~~~~~~
Utility module to configure libsox. This affects functionalities in ``sox_io`` backend and ``torchaudio.sox_effects``. Utility module to configure libsox.
This affects functionalities in :ref:`Sox IO backend<sox_io_backend>` and :ref:`Sox Effects<sox_effects>`.
.. currentmodule:: torchaudio.utils.sox_utils .. automodule:: torchaudio.utils.sox_utils
:members:
.. autofunction:: set_seed
.. autofunction:: set_verbosity
.. autofunction:: set_buffer_size
.. autofunction:: set_use_threads
.. autofunction:: list_effects
.. autofunction:: list_formats
...@@ -35,10 +35,10 @@ except ImportError: ...@@ -35,10 +35,10 @@ except ImportError:
@_mod_utils.deprecated( @_mod_utils.deprecated(
"Please remove the function call to initialize_sox. " "Please remove the function call to initialize_sox. "
"Resource initialization is now automatically handled.") "Resource initialization is now automatically handled.")
def initialize_sox() -> int: def initialize_sox():
"""Initialize sox effects. """Initialize sox effects.
This function is deprecated. See ``torchaudio.sox_effects.init_sox_effects`` This function is deprecated. See :py:func:`torchaudio.sox_effects.init_sox_effects`
""" """
_init_sox_effects() _init_sox_effects()
...@@ -51,6 +51,6 @@ def initialize_sox() -> int: ...@@ -51,6 +51,6 @@ def initialize_sox() -> int:
def shutdown_sox(): def shutdown_sox():
"""Shutdown sox effects. """Shutdown sox effects.
This function is deprecated. See ``torchaudio.sox_effects.shutdown_sox_effects`` This function is deprecated. See :py:func:`torchaudio.sox_effects.shutdown_sox_effects`
""" """
_shutdown_sox_effects() _shutdown_sox_effects()
...@@ -2,6 +2,18 @@ from typing import Any, Optional ...@@ -2,6 +2,18 @@ from typing import Any, Optional
class SignalInfo: class SignalInfo:
"""Data class returned ``info`` functions.
Used by :ref:`sox backend<sox_backend>` and :ref:`soundfile backend<soundfile_backend>`
See https://fossies.org/dox/sox-14.4.2/structsox__signalinfo__t.html
:ivar Optional[int] channels: The number of channels
:ivar Optional[float] rate: Sampleing rate
:ivar Optional[int] precision: Bit depth
:ivar Optional[int] length: For :ref:`sox backend<sox_backend>`, the number of samples.
(frames * channels). For :ref:`soundfile backend<soundfile_backend>`, the number of frames.
"""
def __init__(self, def __init__(self,
channels: Optional[int] = None, channels: Optional[int] = None,
rate: Optional[float] = None, rate: Optional[float] = None,
...@@ -14,6 +26,20 @@ class SignalInfo: ...@@ -14,6 +26,20 @@ class SignalInfo:
class EncodingInfo: class EncodingInfo:
"""Data class returned ``info`` functions.
Used by :ref:`sox backend<sox_backend>` and :ref:`soundfile backend<soundfile_backend>`
See https://fossies.org/dox/sox-14.4.2/structsox__encodinginfo__t.html
:ivar Optional[int] encoding: sox_encoding_t
:ivar Optional[int] bits_per_sample: bit depth
:ivar Optional[float] compression: Compression option
:ivar Any reverse_bytes:
:ivar Any reverse_nibbles:
:ivar Any reverse_bits:
:ivar Optional[bool] opposite_endian:
"""
def __init__(self, def __init__(self,
encoding: Any = None, encoding: Any = None,
bits_per_sample: Optional[int] = None, bits_per_sample: Optional[int] = None,
......
...@@ -7,6 +7,12 @@ from torchaudio._internal import ( ...@@ -7,6 +7,12 @@ from torchaudio._internal import (
class AudioMetaData: class AudioMetaData:
"""Data class to be returned by :py:func:`~torchaudio.backend.sox_io_backend.info`.
:ivar int sample_rate: Sample rate
:ivar int num_frames: The number of frames
:ivar int num_channels: The number of channels
"""
def __init__(self, sample_rate: int, num_frames: int, num_channels: int): def __init__(self, sample_rate: int, num_frames: int, num_channels: int):
self.sample_rate = sample_rate self.sample_rate = sample_rate
self.num_frames = num_frames self.num_frames = num_frames
...@@ -15,7 +21,14 @@ class AudioMetaData: ...@@ -15,7 +21,14 @@ class AudioMetaData:
@_mod_utils.requires_module('torchaudio._torchaudio') @_mod_utils.requires_module('torchaudio._torchaudio')
def info(filepath: str) -> AudioMetaData: def info(filepath: str) -> AudioMetaData:
"""Get signal information of an audio file.""" """Get signal information of an audio file.
Args:
filepath (str): Path to audio file
Returns:
AudioMetaData: meta data of the given audio.
"""
sinfo = torch.ops.torchaudio.sox_io_get_info(filepath) sinfo = torch.ops.torchaudio.sox_io_get_info(filepath)
return AudioMetaData(sinfo.get_sample_rate(), sinfo.get_num_frames(), sinfo.get_num_channels()) return AudioMetaData(sinfo.get_sample_rate(), sinfo.get_num_frames(), sinfo.get_num_channels())
...@@ -30,21 +43,28 @@ def load( ...@@ -30,21 +43,28 @@ def load(
) -> Tuple[torch.Tensor, int]: ) -> Tuple[torch.Tensor, int]:
"""Load audio data from file. """Load audio data from file.
This function can handle all the codecs that underlying libsox can handle, however note the
followings.
Note: Note:
This function is tested on the following formats; This function can handle all the codecs that underlying libsox can handle,
- WAV however it is tested on the following formats;
- 32-bit floating-point
- 32-bit signed integer * WAV
- 16-bit signed integer
- 8-bit unsigned integer * 32-bit floating-point
- MP3 * 32-bit signed integer
- FLAC * 16-bit signed integer
- OGG/VORBIS * 8-bit unsigned integer
By default, this function returns Tensor with ``float32`` dtype and the shape of ``[channel, time]``. * MP3
* FLAC
* OGG/VORBIS
* OPUS
To load ``MP3``, ``FLAC``, ``OGG/VORBIS``, ``OPUS`` and other codecs ``libsox`` does not
handle natively, your installation of ``torchaudio`` has to be linked to ``libsox``
and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
``float32`` dtype and the shape of ``[channel, time]``.
The samples are normalized to fit in the range of ``[-1.0, 1.0]``. The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
...@@ -54,24 +74,33 @@ def load( ...@@ -54,24 +74,33 @@ def load(
for 32-bit signed PCM, ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. for 32-bit signed PCM, ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM.
``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as ``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as
flac and mp3. For these formats, this function always returns ``float32`` Tensor with values ``flac`` and ``mp3``.
normalized to ``[-1.0, 1.0]``. For these formats, this function always returns ``float32`` Tensor with values normalized to
``[-1.0, 1.0]``.
Args: Args:
filepath: Path to audio file filepath (str):
frame_offset: Number of frames to skip before start reading data. Path to audio file
num_frames: Maximum number of frames to read. -1 reads all the remaining samples, starting frame_offset (int):
from ``frame_offset``. This function may return the less number of frames if there is Number of frames to skip before start reading data.
not enough frames in the given file. num_frames (int):
normalize: When ``True``, this function always return ``float32``, and sample values are Maximum number of frames to read. ``-1`` reads all the remaining samples,
normalized to ``[-1.0, 1.0]``. If input file is integer WAV, giving ``False`` will change starting from ``frame_offset``.
the resulting Tensor type to integer type. This argument has no effect for formats other This function may return the less number of frames if there is not enough
than integer WAV type. frames in the given file.
channels_first: When True, the returned Tensor has dimension ``[channel, time]``. normalize (bool):
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool):
When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``. Otherwise, the returned Tensor's dimension is ``[time, channel]``.
Returns: Returns:
torch.Tensor: If the input file has integer wav format and normalization is off, then it has torch.Tensor:
If the input file has integer wav format and normalization is off, then it has
integer type, else ``float32`` type. If ``channels_first=True``, it has integer type, else ``float32`` type. If ``channels_first=True``, it has
``[channel, time]`` else ``[time, channel]``. ``[channel, time]`` else ``[time, channel]``.
""" """
...@@ -83,37 +112,49 @@ def load( ...@@ -83,37 +112,49 @@ def load(
@_mod_utils.requires_module('torchaudio._torchaudio') @_mod_utils.requires_module('torchaudio._torchaudio')
def save( def save(
filepath: str, filepath: str,
tensor: torch.Tensor, src: torch.Tensor,
sample_rate: int, sample_rate: int,
channels_first: bool = True, channels_first: bool = True,
compression: Optional[float] = None, compression: Optional[float] = None,
): ):
"""Save audio data to file. """Save audio data to file.
Supported formats are; Note:
- WAV Supported formats are;
- 32-bit floating-point
- 32-bit signed integer * WAV
- 16-bit signed integer
- 8-bit unsigned integer * 32-bit floating-point
- MP3 * 32-bit signed integer
- FLAC * 16-bit signed integer
- OGG/VORBIS * 8-bit unsigned integer
* MP3
* FLAC
* OGG/VORBIS
To save ``MP3``, ``FLAC``, ``OGG/VORBIS``, and other codecs ``libsox`` does not
handle natively, your installation of ``torchaudio`` has to be linked to ``libsox``
and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
Args: Args:
filepath: Path to save file. filepath (str): Path to save file.
tensor: Audio data to save. must be 2D tensor. tensor (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate: sampling rate sample_rate (int): sampling rate
channels_first: If True, the given tensor is interpreted as ``[channel, time]``. channels_first (bool):
compression: Used for formats other than WAV. This corresponds to ``-C`` option If ``True``, the given tensor is interpreted as ``[channel, time]``,
of ``sox`` command. otherwise ``[time, channel]``.
compression (Optional[float]):
Used for formats other than WAV. This corresponds to ``-C`` option of ``sox`` command.
* | ``MP3``: Either bitrate (in ``kbps``) with quality factor, such as ``128.2``, or
| VBR encoding with quality factor such as ``-4.2``. Default: ``-4.5``.
* | ``FLAC``: compression level. Whole number from ``0`` to ``8``.
| ``8`` is default and highest compression.
* | ``OGG/VORBIS``: number from ``-1`` to ``10``; ``-1`` is the highest compression
| and lowest quality. Default: ``3``.
See the detail at http://sox.sourceforge.net/soxformat.html. See the detail at http://sox.sourceforge.net/soxformat.html.
- MP3: Either bitrate [kbps] with quality factor, such as ``128.2`` or
VBR encoding with quality factor such as ``-4.2``. Default: ``-4.5``
- FLAC: compression level. Whole number from ``0`` to ``8``.
``8`` is default and highest compression.
- OGG/VORBIS: number from -1 to 10; -1 is the highest compression and lowest
quality. Default: ``3``.
""" """
if compression is None: if compression is None:
ext = str(filepath)[-3:].lower() ext = str(filepath)[-3:].lower()
...@@ -127,8 +168,22 @@ def save( ...@@ -127,8 +168,22 @@ def save(
compression = 3. compression = 3.
else: else:
raise RuntimeError(f'Unsupported file type: "{ext}"') raise RuntimeError(f'Unsupported file type: "{ext}"')
signal = torch.classes.torchaudio.TensorSignal(tensor, sample_rate, channels_first) signal = torch.classes.torchaudio.TensorSignal(src, sample_rate, channels_first)
torch.ops.torchaudio.sox_io_save_audio_file(filepath, signal, compression) torch.ops.torchaudio.sox_io_save_audio_file(filepath, signal, compression)
load_wav = load @_mod_utils.requires_module('torchaudio._torchaudio')
def load_wav(
filepath: str,
frame_offset: int = 0,
num_frames: int = -1,
channels_first: bool = True,
) -> Tuple[torch.Tensor, int]:
"""Load wave file.
This function is defined only for the purpose of compatibility against other backend
for simple usecases, such as ``torchaudio.load_wav(filepath)``.
The implementation is same as :py:func:`load`.
"""
return load(filepath, frame_offset, num_frames, normalize=False, channels_first=channels_first)
...@@ -19,7 +19,11 @@ __all__ = [ ...@@ -19,7 +19,11 @@ __all__ = [
def list_audio_backends() -> List[str]: def list_audio_backends() -> List[str]:
"""List available backends""" """List available backends
Returns:
List[str]: The list of available backends.
"""
backends = [] backends = []
if is_module_available('soundfile'): if is_module_available('soundfile'):
backends.append('soundfile') backends.append('soundfile')
...@@ -29,12 +33,13 @@ def list_audio_backends() -> List[str]: ...@@ -29,12 +33,13 @@ def list_audio_backends() -> List[str]:
return backends return backends
def set_audio_backend(backend: Optional[str]) -> None: def set_audio_backend(backend: Optional[str]):
"""Set the backend for I/O operation """Set the backend for I/O operation
Args: Args:
backend (str): Name of the backend. One of "sox" or "soundfile", backend (Optional[str]): Name of the backend.
based on availability of the system. One of ``"sox"``, ``"sox_io"`` or ``"soundfile"`` based on availability
of the system. If ``None`` is provided the current backend is unassigned.
""" """
if backend is not None and backend not in list_audio_backends(): if backend is not None and backend not in list_audio_backends():
raise RuntimeError( raise RuntimeError(
...@@ -68,7 +73,11 @@ def _init_audio_backend(): ...@@ -68,7 +73,11 @@ def _init_audio_backend():
def get_audio_backend() -> Optional[str]: def get_audio_backend() -> Optional[str]:
"""Get the name of the current backend""" """Get the name of the current backend
Returns:
Optional[str]: The name of the current backend or ``None`` if no backend is assigned.
"""
if torchaudio.load == no_backend.load: if torchaudio.load == no_backend.load:
return None return None
if torchaudio.load == sox_backend.load: if torchaudio.load == sox_backend.load:
......
...@@ -1646,11 +1646,11 @@ def compute_deltas( ...@@ -1646,11 +1646,11 @@ def compute_deltas(
r"""Compute delta coefficients of a tensor, usually a spectrogram: r"""Compute delta coefficients of a tensor, usually a spectrogram:
.. math:: .. math::
d_t = \frac{\sum_{n=1}^{\text{N}} n (c_{t+n} - c_{t-n})}{2 \sum_{n=1}^{\text{N} n^2} d_t = \frac{\sum_{n=1}^{\text{N}} n (c_{t+n} - c_{t-n})}{2 \sum_{n=1}^{\text{N}} n^2}
where :math:`d_t` is the deltas at time :math:`t`, where :math:`d_t` is the deltas at time :math:`t`,
:math:`c_t` is the spectrogram coeffcients at time :math:`t`, :math:`c_t` is the spectrogram coeffcients at time :math:`t`,
:math:`N` is (`win_length`-1)//2. :math:`N` is ``(win_length-1)//2``.
Args: Args:
specgram (Tensor): Tensor of audio of dimension (..., freq, time) specgram (Tensor): Tensor of audio of dimension (..., freq, time)
......
from typing import Optional
from torch import Tensor from torch import Tensor
from torch import nn from torch import nn
...@@ -7,8 +5,9 @@ __all__ = ["Wav2Letter"] ...@@ -7,8 +5,9 @@ __all__ = ["Wav2Letter"]
class Wav2Letter(nn.Module): class Wav2Letter(nn.Module):
r"""Wav2Letter model architecture from the `"Wav2Letter: an End-to-End ConvNet-based Speech Recognition System" r"""Wav2Letter model architecture from the `Wav2Letter an End-to-End ConvNet-based Speech Recognition System`_.
<https://arxiv.org/abs/1609.03193>`_ paper.
.. _Wav2Letter an End-to-End ConvNet-based Speech Recognition System: https://arxiv.org/abs/1609.03193
:math:`\text{padding} = \frac{\text{ceil}(\text{kernel} - \text{stride})}{2}` :math:`\text{padding} = \frac{\text{ceil}(\text{kernel} - \text{stride})}{2}`
...@@ -63,7 +62,7 @@ class Wav2Letter(nn.Module): ...@@ -63,7 +62,7 @@ class Wav2Letter(nn.Module):
def forward(self, x: Tensor) -> Tensor: def forward(self, x: Tensor) -> Tensor:
r""" r"""
Args: Args:
x (Tensor): Tensor of dimension (batch_size, num_features, input_length). x (torch.Tensor): Tensor of dimension (batch_size, num_features, input_length).
Returns: Returns:
Tensor: Predictor tensor of dimension (batch_size, number_of_classes, input_length). Tensor: Predictor tensor of dimension (batch_size, number_of_classes, input_length).
......
...@@ -15,31 +15,29 @@ if _mod_utils.is_module_available('torchaudio._torchaudio'): ...@@ -15,31 +15,29 @@ if _mod_utils.is_module_available('torchaudio._torchaudio'):
@_mod_utils.requires_module('torchaudio._torchaudio') @_mod_utils.requires_module('torchaudio._torchaudio')
def init_sox_effects() -> None: def init_sox_effects():
"""Initialize resources required to use ``SoxEffectsChain`` """Initialize resources required to use sox effects.
You do not need to call this function manually. It is called automatically.
Once initialized, you do not need to call this function again across the multiple call of
``SoxEffectsChain.sox_build_flow_effects``, though it is safe to do so as long as
``shutdown_sox_effects`` is not called yet.
Once ``shutdown_sox_effects`` is called, you can no longer use SoX effects and
initializing again will result in error.
Note: Note:
This function is not required for simple loading. You do not need to call this function manually. It is called automatically.
Once initialized, you do not need to call this function again across the multiple uses of
sox effects though it is safe to do so as long as :func:`shutdown_sox_effects` is not called yet.
Once :func:`shutdown_sox_effects` is called, you can no longer use SoX effects and initializing
again will result in error.
""" """
torch.ops.torchaudio.sox_effects_initialize_sox_effects() torch.ops.torchaudio.sox_effects_initialize_sox_effects()
@_mod_utils.requires_module("torchaudio._torchaudio") @_mod_utils.requires_module("torchaudio._torchaudio")
def shutdown_sox_effects() -> None: def shutdown_sox_effects():
"""Clean up resources required to use ``SoxEffectsChain`` """Clean up resources required to use sox effects.
You do not need to call this function manually. It is called automatically. Note:
You do not need to call this function manually. It is called automatically.
It is safe to call this function multiple times. It is safe to call this function multiple times.
Once ``shutdown_sox_effects`` is called, you can no longer use SoX effects and Once :py:func:`shutdown_sox_effects` is called, you can no longer use SoX effects and
initializing again will result in error. initializing again will result in error.
""" """
torch.ops.torchaudio.sox_effects_shutdown_sox_effects() torch.ops.torchaudio.sox_effects_shutdown_sox_effects()
...@@ -49,10 +47,12 @@ def shutdown_sox_effects() -> None: ...@@ -49,10 +47,12 @@ def shutdown_sox_effects() -> None:
def effect_names() -> List[str]: def effect_names() -> List[str]:
"""Gets list of valid sox effect names """Gets list of valid sox effect names
Returns: list[str] Returns:
List[str]: list of available effect names.
Example Example
>>> EFFECT_NAMES = torchaudio.sox_effects.effect_names() >>> torchaudio.sox_effects.effect_names()
['allpass', 'band', 'bandpass', ... ]
""" """
return list(list_effects().keys()) return list(list_effects().keys())
...@@ -66,6 +66,13 @@ def apply_effects_tensor( ...@@ -66,6 +66,13 @@ def apply_effects_tensor(
) -> Tuple[torch.Tensor, int]: ) -> Tuple[torch.Tensor, int]:
"""Apply sox effects to given Tensor """Apply sox effects to given Tensor
Note:
This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed`` and ``pitch`` and other effects), but this function does
only applies the given effects. (Therefore, to actually apply ``speed`` effect, you also
need to give ``rate`` effect with desired sampling rate.)
Args: Args:
tensor (torch.Tensor): Input 2D Tensor. tensor (torch.Tensor): Input 2D Tensor.
sample_rate (int): Sample rate sample_rate (int): Sample rate
...@@ -79,20 +86,15 @@ def apply_effects_tensor( ...@@ -79,20 +86,15 @@ def apply_effects_tensor(
the same channels order. The shape of the Tensor can be different based on the the same channels order. The shape of the Tensor can be different based on the
effects applied. Sample rate can also be different based on the effects applied. effects applied. Sample rate can also be different based on the effects applied.
Notes: Example - Basic usage
This function works in the way very similar to ``sox`` command, however there are slight >>>
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed`` and ``pitch`` and other effects), but this function does
only applies the given effects. (Therefore, to actually apply ``speed`` effect, you also
need to give ``rate`` effect with desired sampling rate.)
Examples:
>>> # Defines the effects to apply >>> # Defines the effects to apply
>>> effects = [ >>> effects = [
... ['gain', '-n'], # normalises to 0dB ... ['gain', '-n'], # normalises to 0dB
... ['pitch', '5'], # 5 cent pitch shift ... ['pitch', '5'], # 5 cent pitch shift
... ['rate', '8000'], # resample to 8000 Hz ... ['rate', '8000'], # resample to 8000 Hz
... ] ... ]
>>>
>>> # Generate pseudo wave: >>> # Generate pseudo wave:
>>> # normalized, channels first, 2ch, sampling rate 16000, 1 second >>> # normalized, channels first, 2ch, sampling rate 16000, 1 second
>>> sample_rate = 16000 >>> sample_rate = 16000
...@@ -102,9 +104,12 @@ def apply_effects_tensor( ...@@ -102,9 +104,12 @@ def apply_effects_tensor(
>>> waveform >>> waveform
tensor([[ 0.3138, 0.7620, -0.9019, ..., -0.7495, -0.4935, 0.5442], tensor([[ 0.3138, 0.7620, -0.9019, ..., -0.7495, -0.4935, 0.5442],
[-0.0832, 0.0061, 0.8233, ..., -0.5176, -0.9140, -0.2434]]) [-0.0832, 0.0061, 0.8233, ..., -0.5176, -0.9140, -0.2434]])
>>>
>>> # Apply effects >>> # Apply effects
>>> waveform, sample_rate = apply_effects_tensor( >>> waveform, sample_rate = apply_effects_tensor(
... wave_form, sample_rate, effects, channels_first=True) ... wave_form, sample_rate, effects, channels_first=True)
>>>
>>> # Check the result
>>> # The new waveform is sampling rate 8000, 1 second. >>> # The new waveform is sampling rate 8000, 1 second.
>>> # normalization and channel order are preserved >>> # normalization and channel order are preserved
>>> waveform.shape >>> waveform.shape
...@@ -114,6 +119,40 @@ def apply_effects_tensor( ...@@ -114,6 +119,40 @@ def apply_effects_tensor(
[ 0.1331, 0.0436, -0.3783, ..., -0.0035, 0.0012, 0.0008]]) [ 0.1331, 0.0436, -0.3783, ..., -0.0035, 0.0012, 0.0008]])
>>> sample_rate >>> sample_rate
8000 8000
Example - Torchscript-able transform
>>>
>>> # Use `apply_effects_tensor` in `torch.nn.Module` and dump it to file,
>>> # then run sox effect via Torchscript runtime.
>>>
>>> class SoxEffectTransform(torch.nn.Module):
... effects: List[List[str]]
...
... def __init__(self, effects: List[List[str]]):
... super().__init__()
... self.effects = effects
...
... def forward(self, tensor: torch.Tensor, sample_rate: int):
... return sox_effects.apply_effects_tensor(
... tensor, sample_rate, self.effects)
...
...
>>> # Create transform object
>>> effects = [
... ["lowpass", "-1", "300"], # apply single-pole lowpass filter
... ["rate", "8000"], # change sample rate to 8000
... ]
>>> transform = SoxEffectTensorTransform(effects, input_sample_rate)
>>>
>>> # Dump it to file and load
>>> path = 'sox_effect.zip'
>>> torch.jit.script(trans).save(path)
>>> transform = torch.jit.load(path)
>>>
>>>> # Run transform
>>> waveform, input_sample_rate = torchaudio.load("input.wav")
>>> waveform, sample_rate = transform(waveform, input_sample_rate)
>>> assert sample_rate == 8000
""" """
in_signal = torch.classes.torchaudio.TensorSignal(tensor, sample_rate, channels_first) in_signal = torch.classes.torchaudio.TensorSignal(tensor, sample_rate, channels_first)
out_signal = torch.ops.torchaudio.sox_effects_apply_effects_tensor(in_signal, effects) out_signal = torch.ops.torchaudio.sox_effects_apply_effects_tensor(in_signal, effects)
...@@ -129,12 +168,22 @@ def apply_effects_file( ...@@ -129,12 +168,22 @@ def apply_effects_file(
) -> Tuple[torch.Tensor, int]: ) -> Tuple[torch.Tensor, int]:
"""Apply sox effects to the audio file and load the resulting data as Tensor """Apply sox effects to the audio file and load the resulting data as Tensor
Note:
This function works in the way very similar to ``sox`` command, however there are slight
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given
effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate``
effect with desired sampling rate, because internally, ``speed`` effects only alter sampling
rate and leave samples untouched.
Args: Args:
path (str): Path to the audio file. path (str): Path to the audio file.
effects (List[List[str]]): List of effects. effects (List[List[str]]): List of effects.
normalize (bool): When ``True``, this function always return ``float32``, and sample values are normalize (bool):
normalized to ``[-1.0, 1.0]``. If input file is integer WAV, giving ``False`` will change When ``True``, this function always return ``float32``, and sample values are
the resulting Tensor type to integer type. This argument has no effect for formats other normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. This argument has no effect for formats other
than integer WAV type. than integer WAV type.
channels_first (bool): When True, the returned Tensor has dimension ``[channel, time]``. channels_first (bool): When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``. Otherwise, the returned Tensor's dimension is ``[time, channel]``.
...@@ -147,23 +196,19 @@ def apply_effects_file( ...@@ -147,23 +196,19 @@ def apply_effects_file(
If ``channels_first=True``, the resulting Tensor has dimension ``[channel, time]``, If ``channels_first=True``, the resulting Tensor has dimension ``[channel, time]``,
otherwise ``[time, channel]``. otherwise ``[time, channel]``.
Notes: Example - Basic usage
This function works in the way very similar to ``sox`` command, however there are slight >>>
differences. For example, ``sox`` commnad adds certain effects automatically (such as
``rate`` effect after ``speed``, ``pitch`` etc), but this function only applies the given
effects. Therefore, to actually apply ``speed`` effect, you also need to give ``rate``
effect with desired sampling rate, because internally, ``speed`` effects only alter sampling
rate and leave samples untouched.
Examples:
>>> # Defines the effects to apply >>> # Defines the effects to apply
>>> effects = [ >>> effects = [
... ['gain', '-n'], # normalises to 0dB ... ['gain', '-n'], # normalises to 0dB
... ['pitch', '5'], # 5 cent pitch shift ... ['pitch', '5'], # 5 cent pitch shift
... ['rate', '8000'], # resample to 8000 Hz ... ['rate', '8000'], # resample to 8000 Hz
... ] ... ]
>>>
>>> # Apply effects and load data with channels_first=True >>> # Apply effects and load data with channels_first=True
>>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True) >>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True)
>>>
>>> # Check the result
>>> waveform.shape >>> waveform.shape
torch.Size([2, 8000]) torch.Size([2, 8000])
>>> waveform >>> waveform
...@@ -173,6 +218,42 @@ def apply_effects_file( ...@@ -173,6 +218,42 @@ def apply_effects_file(
-5.6159e-07, 4.8103e-07]]) -5.6159e-07, 4.8103e-07]])
>>> sample_rate >>> sample_rate
8000 8000
Example - Apply random speed perturbation to dataset
>>>
>>> # Load data from file, apply random speed perturbation
>>> class RandomPerturbationFile(torch.utils.data.Dataset):
... \"\"\"Given flist, apply random speed perturbation
...
... Suppose all the input files are at least one second long.
... \"\"\"
... def __init__(self, flist: List[str], sample_rate: int):
... super().__init__()
... self.flist = flist
... self.sample_rate = sample_rate
... self.rng = None
...
... def __getitem__(self, index):
... speed = self.rng.uniform(0.5, 2.0)
... effects = [
... ['gain', '-n', '-10'], # apply 10 db attenuation
... ['remix', '-'], # merge all the channels
... ['speed', f'{speed:.5f}'], # duration is now 0.5 ~ 2.0 seconds.
... ['rate', f'{self.sample_rate}'],
... ['pad', '0', '1.5'], # add 1.5 seconds silence at the end
... ['trim', '0', '2'], # get the first 2 seconds
... ]
... waveform, _ = torchaudio.sox_effects.apply_effects_file(
... self.flist[index], effects)
... return waveform
...
... def __len__(self):
... return len(self.flist)
...
>>> dataset = RandomPerturbationFile(file_list, sample_rate=8000)
>>> loader = torch.utils.data.DataLoader(dataset, batch_size=32)
>>> for batch in loader:
>>> pass
""" """
signal = torch.ops.torchaudio.sox_effects_apply_effects_file(path, effects, normalize, channels_first) signal = torch.ops.torchaudio.sox_effects_apply_effects_file(path, effects, normalize, channels_first)
return signal.get_tensor(), signal.get_sample_rate() return signal.get_tensor(), signal.get_sample_rate()
...@@ -183,7 +264,7 @@ def apply_effects_file( ...@@ -183,7 +264,7 @@ def apply_effects_file(
def SoxEffect(): def SoxEffect():
r"""Create an object for passing sox effect information between python and c++ r"""Create an object for passing sox effect information between python and c++
Note: Warning:
This function is deprecated. This function is deprecated.
Please migrate to :func:`apply_effects_file` or :func:`apply_effects_tensor`. Please migrate to :func:`apply_effects_file` or :func:`apply_effects_tensor`.
...@@ -198,50 +279,56 @@ def SoxEffect(): ...@@ -198,50 +279,56 @@ def SoxEffect():
class SoxEffectsChain(object): class SoxEffectsChain(object):
r"""SoX effects chain class. r"""SoX effects chain class.
Note: Warning:
This class is deprecated. This class is deprecated.
Please migrate to :func:`apply_effects_file` or :func:`apply_effects_tensor`. Please migrate to :func:`apply_effects_file` or :func:`apply_effects_tensor`.
Args: Args:
normalization (bool, number, or callable, optional): If boolean `True`, then output is divided by `1 << 31` normalization (bool, number, or callable, optional):
(assumes signed 32-bit audio), and normalizes to `[-1, 1]`. If `number`, then output is divided by that If boolean ``True``, then output is divided by ``1 << 31``
number. If `callable`, then the output is passed as a parameter to the given function, then the (assumes signed 32-bit audio), and normalizes to ``[-1, 1]``.
output is divided by the result. (Default: ``True``) If ``number``, then output is divided by that number.
channels_first (bool, optional): Set channels first or length first in result. (Default: ``True``) If ``callable``, then the output is passed as a parameter to the given function, then
out_siginfo (sox_signalinfo_t, optional): a sox_signalinfo_t type, which could be helpful if the the output is divided by the result. (Default: ``True``)
audio type cannot be automatically determined. (Default: ``None``) channels_first (bool, optional):
out_encinfo (sox_encodinginfo_t, optional): a sox_encodinginfo_t type, which could be set if the Set channels first or length first in result. (Default: ``True``)
audio type cannot be automatically determined. (Default: ``None``) out_siginfo (sox_signalinfo_t, optional):
filetype (str, optional): a filetype or extension to be set if sox cannot determine it a sox_signalinfo_t type, which could be helpful if the audio type cannot be
automatically. . (Default: ``'raw'``) automatically determined. (Default: ``None``)
out_encinfo (sox_encodinginfo_t, optional):
a sox_encodinginfo_t type, which could be set if the audio type cannot be
automatically determined. (Default: ``None``)
filetype (str, optional):
a filetype or extension to be set if sox cannot determine it automatically.
(Default: ``'raw'``)
Returns: Returns:
Tuple[Tensor, int]: An output Tensor of size `[C x L]` or `[L x C]` where L is the number Tuple[Tensor, int]:
An output Tensor of size ``[C x L]`` or ``[L x C]`` where L is the number
of audio frames and C is the number of channels. An integer which is the sample rate of the of audio frames and C is the number of channels. An integer which is the sample rate of the
audio (as listed in the metadata of the file) audio (as listed in the metadata of the file)
Example Example
>>> class MyDataset(Dataset): >>> class MyDataset(Dataset):
>>> def __init__(self, audiodir_path): ... def __init__(self, audiodir_path):
>>> self.data = [os.path.join(audiodir_path, fn) for fn in os.listdir(audiodir_path)] ... self.data = [
>>> self.E = torchaudio.sox_effects.SoxEffectsChain() ... os.path.join(audiodir_path, fn)
>>> self.E.append_effect_to_chain("rate", [16000]) # resample to 16000hz ... for fn in os.listdir(audiodir_path)]
>>> self.E.append_effect_to_chain("channels", ["1"]) # mono signal ... self.E = torchaudio.sox_effects.SoxEffectsChain()
>>> def __getitem__(self, index): ... self.E.append_effect_to_chain("rate", [16000]) # resample to 16000hz
>>> fn = self.data[index] ... self.E.append_effect_to_chain("channels", ["1"]) # mono signal
>>> self.E.set_input_file(fn) ... def __getitem__(self, index):
>>> x, sr = self.E.sox_build_flow_effects() ... fn = self.data[index]
>>> return x, sr ... self.E.set_input_file(fn)
>>> ... x, sr = self.E.sox_build_flow_effects()
>>> def __len__(self): ... return x, sr
>>> return len(self.data) ...
>>> ... def __len__(self):
>>> torchaudio.initialize_sox() ... return len(self.data)
...
>>> ds = MyDataset(path_to_audio_files) >>> ds = MyDataset(path_to_audio_files)
>>> for sig, sr in ds: >>> for sig, sr in ds:
>>> [do something here] ... pass
>>> torchaudio.shutdown_sox()
""" """
EFFECTS_UNIMPLEMENTED = {"spectrogram", "splice", "noiseprof", "fir"} EFFECTS_UNIMPLEMENTED = {"spectrogram", "splice", "noiseprof", "fir"}
...@@ -298,9 +385,9 @@ class SoxEffectsChain(object): ...@@ -298,9 +385,9 @@ class SoxEffectsChain(object):
out (Tensor, optional): Where the output will be written to. (Default: ``None``) out (Tensor, optional): Where the output will be written to. (Default: ``None``)
Returns: Returns:
Tuple[Tensor, int]: An output Tensor of size `[C x L]` or `[L x C]` where L is the number Tuple[Tensor, int]: An output Tensor of size `[C x L]` or `[L x C]` where
of audio frames and C is the number of channels. An integer which is the sample rate of the L is the number of audio frames and C is the number of channels.
audio (as listed in the metadata of the file) An integer which is the sample rate of the audio (as listed in the metadata of the file)
""" """
# initialize output tensor # initialize output tensor
if out is not None: if out is not None:
......
...@@ -86,20 +86,8 @@ class Spectrogram(torch.nn.Module): ...@@ -86,20 +86,8 @@ class Spectrogram(torch.nn.Module):
class GriffinLim(torch.nn.Module): class GriffinLim(torch.nn.Module):
r"""Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation. r"""Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation.
Implementation ported from `librosa`.
.. [1] McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. Implementation ported from ``librosa`` [1]_, [2]_, [3]_.
"librosa: Audio and music signal analysis in python."
In Proceedings of the 14th python in science conference, pp. 18-25. 2015.
.. [2] Perraudin, N., Balazs, P., & Søndergaard, P. L.
"A fast Griffin-Lim algorithm,"
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-4),
Oct. 2013.
.. [3] D. W. Griffin and J. S. Lim,
"Signal estimation from modified short-time Fourier transform,"
IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
Args: Args:
n_fft (int, optional): Size of FFT, creates ``n_fft // 2 + 1`` bins. (Default: ``400``) n_fft (int, optional): Size of FFT, creates ``n_fft // 2 + 1`` bins. (Default: ``400``)
...@@ -117,6 +105,24 @@ class GriffinLim(torch.nn.Module): ...@@ -117,6 +105,24 @@ class GriffinLim(torch.nn.Module):
Values near 1 can lead to faster convergence, but above 1 may not converge. (Default: ``0.99``) Values near 1 can lead to faster convergence, but above 1 may not converge. (Default: ``0.99``)
length (int, optional): Array length of the expected output. (Default: ``None``) length (int, optional): Array length of the expected output. (Default: ``None``)
rand_init (bool, optional): Initializes phase randomly if True and to zero otherwise. (Default: ``True``) rand_init (bool, optional): Initializes phase randomly if True and to zero otherwise. (Default: ``True``)
References:
.. [1]
| McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg,
and Oriol Nieto.
| "librosa: Audio and music signal analysis in python."
| In Proceedings of the 14th python in science conference, pp. 18-25. 2015.
.. [2]
| Perraudin, N., Balazs, P., & Søndergaard, P. L.
| "A fast Griffin-Lim algorithm,"
| IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-4),
| Oct. 2013.
.. [3]
| D. W. Griffin and J. S. Lim,
| "Signal estimation from modified short-time Fourier transform,"
| IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
""" """
__constants__ = ['n_fft', 'n_iter', 'win_length', 'hop_length', 'power', 'normalized', __constants__ = ['n_fft', 'n_iter', 'win_length', 'hop_length', 'power', 'normalized',
'length', 'momentum', 'rand_init'] 'length', 'momentum', 'rand_init']
...@@ -153,8 +159,9 @@ class GriffinLim(torch.nn.Module): ...@@ -153,8 +159,9 @@ class GriffinLim(torch.nn.Module):
def forward(self, specgram: Tensor) -> Tensor: def forward(self, specgram: Tensor) -> Tensor:
r""" r"""
Args: Args:
specgram (Tensor): A magnitude-only STFT spectrogram of dimension (..., freq, frames) specgram (Tensor):
where freq is ``n_fft // 2 + 1``. A magnitude-only STFT spectrogram of dimension (..., freq, frames)
where freq is ``n_fft // 2 + 1``.
Returns: Returns:
Tensor: waveform of (..., time), where time equals the ``length`` parameter if given. Tensor: waveform of (..., time), where time equals the ``length`` parameter if given.
......
...@@ -55,8 +55,8 @@ def set_use_threads(use_threads: bool): ...@@ -55,8 +55,8 @@ def set_use_threads(use_threads: bool):
"""Set multithread option for sox effect chain """Set multithread option for sox effect chain
Args: Args:
use_threads (bool): When True, enables libsox's parallel effects channels processing. use_threads (bool): When ``True``, enables ``libsox``'s parallel effects channels processing.
To use mutlithread, the underlying libsox has to be compiled with OpenMP support. To use mutlithread, the underlying ``libsox`` has to be compiled with OpenMP support.
See Also: See Also:
http://sox.sourceforge.net/sox.html http://sox.sourceforge.net/sox.html
...@@ -69,7 +69,7 @@ def list_effects() -> Dict[str, str]: ...@@ -69,7 +69,7 @@ def list_effects() -> Dict[str, str]:
"""List the available sox effect names """List the available sox effect names
Returns: Returns:
Dict[str, str]: Mapping from "effect name" to "usage" Dict[str, str]: Mapping from ``effect name`` to ``usage``
""" """
return dict(torch.ops.torchaudio.sox_utils_list_effects()) return dict(torch.ops.torchaudio.sox_utils_list_effects())
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment