Commit 6fa5732c authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Add note about `normalize` argument (#2449)

Summary:
`load` function has `normalize` argument, which converts the native
sample type to `torch.float32`.

This argument is confusing for audio practitioners as it seems
to perform [volume normalization](https://en.wikipedia.org/wiki/Audio_normalization).

See https://github.com/pytorch/audio/issues/2253

Due to the BC-breaking concern, we cannot easily change the argument name.
This commit adds warnings to documentations.

Fix https://github.com/pytorch/audio/issues/2253

Pull Request resolved: https://github.com/pytorch/audio/pull/2449

Reviewed By: nateanl

Differential Revision: D36995756

Pulled By: carolineechen

fbshipit-source-id: 0b7db2758a355f6aafe06a2273bc72a1027690bd
parent a9c1e3a3
...@@ -147,19 +147,25 @@ def load( ...@@ -147,19 +147,25 @@ def load(
* SPHERE * SPHERE
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
``float32`` dtype and the shape of `[channel, time]`. ``float32`` dtype, and the shape of `[channel, time]`.
The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit .. warning::
signed integer and 8-bit unsigned integer (24-bit signed integer is not supported),
by providing ``normalize=False``, this function can return integer Tensor, where the samples
are expressed within the whole range of the corresponding dtype, that is, ``int32`` tensor
for 32-bit signed PCM, ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM.
``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as ``normalize`` argument does not perform volume normalization.
``flac`` and ``mp3``. It only converts the sample type to `torch.float32` from the native sample
For these formats, this function always returns ``float32`` Tensor with values normalized to type.
``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
this function can return integer Tensor, where the samples are expressed within the whole range
of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
``normalize`` argument has no effect on 32-bit floating-point WAV and other formats, such as
``flac`` and ``mp3``.
For these formats, this function always returns ``float32`` Tensor with values.
Note: Note:
``filepath`` argument is intentionally annotated as ``str`` only, even though it accepts ``filepath`` argument is intentionally annotated as ``str`` only, even though it accepts
...@@ -177,11 +183,13 @@ def load( ...@@ -177,11 +183,13 @@ def load(
This function may return the less number of frames if there is not enough This function may return the less number of frames if there is not enough
frames in the given file. frames in the given file.
normalize (bool, optional): normalize (bool, optional):
When ``True``, this function always return ``float32``, and sample values are When ``True``, this function converts the native sample type to ``float32``.
normalized to ``[-1.0, 1.0]``. Default: ``True``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. integer type.
This argument has no effect for formats other than integer WAV type. This argument has no effect for formats other than integer WAV type.
channels_first (bool, optional): channels_first (bool, optional):
When True, the returned Tensor has dimension `[channel, time]`. When True, the returned Tensor has dimension `[channel, time]`.
Otherwise, the returned Tensor's dimension is `[time, channel]`. Otherwise, the returned Tensor's dimension is `[time, channel]`.
......
...@@ -130,20 +130,25 @@ def load( ...@@ -130,20 +130,25 @@ def load(
and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc. and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
``float32`` dtype and the shape of `[channel, time]`. ``float32`` dtype, and the shape of `[channel, time]`.
The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit .. warning::
signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
this function can return integer Tensor, where the samples are expressed within the whole range
of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as ``normalize`` argument does not perform volume normalization.
``flac`` and ``mp3``. It only converts the sample type to `torch.float32` from the native sample
For these formats, this function always returns ``float32`` Tensor with values normalized to type.
``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
this function can return integer Tensor, where the samples are expressed within the whole range
of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
``normalize`` argument has no effect on 32-bit floating-point WAV and other formats, such as
``flac`` and ``mp3``.
For these formats, this function always returns ``float32`` Tensor with values.
Args: Args:
filepath (path-like object or file-like object): filepath (path-like object or file-like object):
...@@ -166,11 +171,13 @@ def load( ...@@ -166,11 +171,13 @@ def load(
This function may return the less number of frames if there is not enough This function may return the less number of frames if there is not enough
frames in the given file. frames in the given file.
normalize (bool, optional): normalize (bool, optional):
When ``True``, this function always return ``float32``, and sample values are When ``True``, this function converts the native sample type to ``float32``.
normalized to ``[-1.0, 1.0]``. Default: ``True``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. integer type.
This argument has no effect for formats other than integer WAV type. This argument has no effect for formats other than integer WAV type.
channels_first (bool, optional): channels_first (bool, optional):
When True, the returned Tensor has dimension `[channel, time]`. When True, the returned Tensor has dimension `[channel, time]`.
Otherwise, the returned Tensor's dimension is `[time, channel]`. Otherwise, the returned Tensor's dimension is `[time, channel]`.
...@@ -181,7 +188,7 @@ def load( ...@@ -181,7 +188,7 @@ def load(
Returns: Returns:
(torch.Tensor, int): Resulting Tensor and sample rate. (torch.Tensor, int): Resulting Tensor and sample rate.
If the input file has integer wav format and normalization is off, then it has If the input file has integer wav format and ``normalize=False``, then it has
integer type, else ``float32`` type. If ``channels_first=True``, it has integer type, else ``float32`` type. If ``channels_first=True``, it has
`[channel, time]` else `[time, channel]`. `[channel, time]` else `[time, channel]`.
""" """
......
...@@ -192,11 +192,13 @@ def apply_effects_file( ...@@ -192,11 +192,13 @@ def apply_effects_file(
TorchScript compiler compatibility. TorchScript compiler compatibility.
effects (List[List[str]]): List of effects. effects (List[List[str]]): List of effects.
normalize (bool, optional): normalize (bool, optional):
When ``True``, this function always return ``float32``, and sample values are When ``True``, this function converts the native sample type to ``float32``.
normalized to ``[-1.0, 1.0]``. Default: ``True``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. This argument has no effect for formats other integer type.
than integer WAV type. This argument has no effect for formats other than integer WAV type.
channels_first (bool, optional): When True, the returned Tensor has dimension `[channel, time]`. channels_first (bool, optional): When True, the returned Tensor has dimension `[channel, time]`.
Otherwise, the returned Tensor's dimension is `[time, channel]`. Otherwise, the returned Tensor's dimension is `[time, channel]`.
format (str or None, optional): format (str or None, optional):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment