Add note about `normalize` argument (#2449)

Summary: `load` function has `normalize` argument, which converts the native sample type to `torch.float32`. This argument is confusing for audio practitioners as it seems to perform [volume normalization](https://en.wikipedia.org/wiki/Audio_normalization). See https://github.com/pytorch/audio/issues/2253 Due to the BC-breaking concern, we cannot easily change the argument name. This commit adds warnings to documentations. Fix https://github.com/pytorch/audio/issues/2253 Pull Request resolved: https://github.com/pytorch/audio/pull/2449 Reviewed By: nateanl Differential Revision: D36995756 Pulled By: carolineechen fbshipit-source-id: 0b7db2758a355f6aafe06a2273bc72a1027690bd

Add note about `normalize` argument (#2449)
Summary: `load` function has `normalize` argument, which converts the native sample type to `torch.float32`. This argument is confusing for audio practitioners as it seems to perform [volume normalization](https://en.wikipedia.org/wiki/Audio_normalization). See https://github.com/pytorch/audio/issues/2253 Due to the BC-breaking concern, we cannot easily change the argument name. This commit adds warnings to documentations. Fix https://github.com/pytorch/audio/issues/2253 Pull Request resolved: https://github.com/pytorch/audio/pull/2449 Reviewed By: nateanl Differential Revision: D36995756 Pulled By: carolineechen fbshipit-source-id: 0b7db2758a355f6aafe06a2273bc72a1027690bd
6fa5732c · moto · Facebook GitHub Bot · a9c1e3a3 · 6fa5732c · 6fa5732c
Commit 6fa5732c authored Jun 13, 2022 by moto Committed by Facebook GitHub Bot Jun 13, 2022
3 changed files
--- a/torchaudio/backend/soundfile_backend.py
+++ b/torchaudio/backend/soundfile_backend.py
@@ -147,19 +147,25 @@ def load(
        * SPHERE
    By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
-    ``float32`` dtype and the shape of `[channel, time]`.
+    ``float32`` dtype, and the shape of `[channel, time]`.
-    The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
-    When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
+    .. warning::
-    signed integer and 8-bit unsigned integer (24-bit signed integer is not supported),
-    by providing ``normalize=False``, this function can return integer Tensor, where the samples
-    are expressed within the whole range of the corresponding dtype, that is, ``int32`` tensor
-    for 32-bit signed PCM, ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM.
-    ``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as
+       ``normalize`` argument does not perform volume normalization.
-    ``flac`` and ``mp3``.
+       It only converts the sample type to `torch.float32` from the native sample
-    For these formats, this function always returns ``float32`` Tensor with values normalized to
+       type.
-    ``[-1.0, 1.0]``.
+       When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
+       signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
+       this function can return integer Tensor, where the samples are expressed within the whole range
+       of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
+       ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
+       support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
+       ``normalize`` argument has no effect on 32-bit floating-point WAV and other formats, such as
+       ``flac`` and ``mp3``.
+       For these formats, this function always returns ``float32`` Tensor with values.
    Note:
        ``filepath`` argument is intentionally annotated as ``str`` only, even though it accepts
@@ -177,11 +183,13 @@ def load(
            This function may return the less number of frames if there is not enough
            frames in the given file.
        normalize (bool, optional):
-            When ``True``, this function always return ``float32``, and sample values are
+            When ``True``, this function converts the native sample type to ``float32``.
-            normalized to ``[-1.0, 1.0]``.
+            Default: ``True``.
            If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
            integer type.
            This argument has no effect for formats other than integer WAV type.
        channels_first (bool, optional):
            When True, the returned Tensor has dimension `[channel, time]`.
            Otherwise, the returned Tensor's dimension is `[time, channel]`.

--- a/torchaudio/backend/sox_io_backend.py
+++ b/torchaudio/backend/sox_io_backend.py
@@ -130,20 +130,25 @@ def load(
        and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
    By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
-    ``float32`` dtype and the shape of `[channel, time]`.
+    ``float32`` dtype, and the shape of `[channel, time]`.
-    The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
-    When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
+    .. warning::
-    signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
-    this function can return integer Tensor, where the samples are expressed within the whole range
-    of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
-    ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
-    support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
-    ``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as
+       ``normalize`` argument does not perform volume normalization.
-    ``flac`` and ``mp3``.
+       It only converts the sample type to `torch.float32` from the native sample
-    For these formats, this function always returns ``float32`` Tensor with values normalized to
+       type.
-    ``[-1.0, 1.0]``.
+       When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
+       signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
+       this function can return integer Tensor, where the samples are expressed within the whole range
+       of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
+       ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
+       support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
+       ``normalize`` argument has no effect on 32-bit floating-point WAV and other formats, such as
+       ``flac`` and ``mp3``.
+       For these formats, this function always returns ``float32`` Tensor with values.
    Args:
        filepath (path-like object or file-like object):
@@ -166,11 +171,13 @@ def load(
            This function may return the less number of frames if there is not enough
            frames in the given file.
        normalize (bool, optional):
-            When ``True``, this function always return ``float32``, and sample values are
+            When ``True``, this function converts the native sample type to ``float32``.
-            normalized to ``[-1.0, 1.0]``.
+            Default: ``True``.
            If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
            integer type.
            This argument has no effect for formats other than integer WAV type.
        channels_first (bool, optional):
            When True, the returned Tensor has dimension `[channel, time]`.
            Otherwise, the returned Tensor's dimension is `[time, channel]`.
@@ -181,7 +188,7 @@ def load(
    Returns:
        (torch.Tensor, int): Resulting Tensor and sample rate.
-            If the input file has integer wav format and normalization is off, then it has
+            If the input file has integer wav format and ``normalize=False``, then it has
            integer type, else ``float32`` type. If ``channels_first=True``, it has
            `[channel, time]` else `[time, channel]`.
    """

--- a/torchaudio/sox_effects/sox_effects.py
+++ b/torchaudio/sox_effects/sox_effects.py
@@ -192,11 +192,13 @@ def apply_effects_file(
            TorchScript compiler compatibility.
        effects (List[List[str]]): List of effects.
        normalize (bool, optional):
-            When ``True``, this function always return ``float32``, and sample values are
+            When ``True``, this function converts the native sample type to ``float32``.
-            normalized to ``[-1.0, 1.0]``.
+            Default: ``True``.
            If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
-            integer type. This argument has no effect for formats other
+            integer type.
-            than integer WAV type.
+            This argument has no effect for formats other than integer WAV type.
        channels_first (bool, optional): When True, the returned Tensor has dimension `[channel, time]`.
            Otherwise, the returned Tensor's dimension is `[time, channel]`.
        format (str or None, optional):