Commit 476ab9ab authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Consolidate bibliography / reference (#2676)

Summary:
Preparation for the adoptation of `autosummary`.

Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`.

Example:

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2676

Reviewed By: carolineechen

Differential Revision: D39509431

Pulled By: mthrok

fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a
parent 50c66721
......@@ -218,7 +218,7 @@ TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH.__doc__ = """Character-based TTS pipeline wit
The text processor encodes the input texts character-by-character.
Tacotron2 was trained on *LJSpeech* [:footcite:`ljspeech17`] for 1,500 epochs.
Tacotron2 was trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs.
You can find the training script `here <https://github.com/pytorch/audio/tree/main/examples/pipeline_tacotron2>`__.
The default parameters were used.
......@@ -264,7 +264,7 @@ graphemes to phonemes.
The model (*en_us_cmudict_forward*) was trained on
`CMUDict <http://www.speech.cs.cmu.edu/cgi-bin/cmudict>`__.
Tacotron2 was trained on *LJSpeech* [:footcite:`ljspeech17`] for 1,500 epochs.
Tacotron2 was trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs.
You can find the training script `here <https://github.com/pytorch/audio/tree/main/examples/pipeline_tacotron2>`__.
The text processor is set to the *"english_phonemes"*.
......@@ -309,13 +309,13 @@ TACOTRON2_WAVERNN_CHAR_LJSPEECH.__doc__ = """Character-based TTS pipeline with :
The text processor encodes the input texts character-by-character.
Tacotron2 was trained on *LJSpeech* [:footcite:`ljspeech17`] for 1,500 epochs.
Tacotron2 was trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs.
You can find the training script `here <https://github.com/pytorch/audio/tree/main/examples/pipeline_tacotron2>`__.
The following parameters were used; ``win_length=1100``, ``hop_length=275``, ``n_fft=2048``,
``mel_fmin=40``, and ``mel_fmax=11025``.
The vocder is based on :py:class:`torchaudio.models.WaveRNN`.
It was trained on 8 bits depth waveform of *LJSpeech* [:footcite:`ljspeech17`] for 10,000 epochs.
It was trained on 8 bits depth waveform of *LJSpeech* :cite:`ljspeech17` for 10,000 epochs.
You can find the training script `here <https://github.com/pytorch/audio/tree/main/examples/pipeline_wavernn>`__.
Please refer to :func:`torchaudio.pipelines.Tacotron2TTSBundle` for the usage.
......@@ -360,13 +360,13 @@ graphemes to phonemes.
The model (*en_us_cmudict_forward*) was trained on
`CMUDict <http://www.speech.cs.cmu.edu/cgi-bin/cmudict>`__.
Tacotron2 was trained on *LJSpeech* [:footcite:`ljspeech17`] for 1,500 epochs.
Tacotron2 was trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs.
You can find the training script `here <https://github.com/pytorch/audio/tree/main/examples/pipeline_tacotron2>`__.
The following parameters were used; ``win_length=1100``, ``hop_length=275``, ``n_fft=2048``,
``mel_fmin=40``, and ``mel_fmax=11025``.
The vocder is based on :py:class:`torchaudio.models.WaveRNN`.
It was trained on 8 bits depth waveform of *LJSpeech* [:footcite:`ljspeech17`] for 10,000 epochs.
It was trained on 8 bits depth waveform of *LJSpeech* :cite:`ljspeech17` for 10,000 epochs.
You can find the training script `here <https://github.com/pytorch/audio/tree/main/examples/pipeline_wavernn>`__.
Please refer to :func:`torchaudio.pipelines.Tacotron2TTSBundle` for the usage.
......
This diff is collapsed.
......@@ -444,7 +444,7 @@ class _ConvEmformerLayer(torch.nn.Module):
class ConvEmformer(_EmformerImpl):
r"""Implements the convolution-augmented streaming transformer architecture introduced in
*Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution*
[:footcite:`9747706`].
:cite:`9747706`.
Args:
input_dim (int): input dimension.
......
......@@ -104,7 +104,7 @@ class MVDR(torch.nn.Module):
Based on https://github.com/espnet/espnet/blob/master/espnet2/enh/layers/beamformer.py
We provide three solutions of MVDR beamforming. One is based on *reference channel selection*
[:footcite:`souden2009optimal`] (``solution=ref_channel``).
:cite:`souden2009optimal` (``solution=ref_channel``).
.. math::
\\textbf{w}_{\\text{MVDR}}(f) =\
......@@ -126,7 +126,7 @@ class MVDR(torch.nn.Module):
:math:`.^{\\mathsf{H}}` denotes the Hermitian Conjugate operation.
We apply either *eigenvalue decomposition*
[:footcite:`higuchi2016robust`] or the *power method* [:footcite:`mises1929praktische`] to get the
:cite:`higuchi2016robust` or the *power method* :cite:`mises1929praktische` to get the
steering vector from the PSD matrix of speech.
After estimating the beamforming weight, the enhanced Short-time Fourier Transform (STFT) is obtained by
......@@ -137,7 +137,7 @@ class MVDR(torch.nn.Module):
where :math:`\\bf{Y}` and :math:`\\hat{\\bf{S}}` are the STFT of the multi-channel noisy speech and\
the single-channel enhanced speech, respectively.
For online streaming audio, we provide a *recursive method* [:footcite:`higuchi2017online`] to update the
For online streaming audio, we provide a *recursive method* :cite:`higuchi2017online` to update the
PSD matrices of speech and noise, respectively.
Args:
......@@ -341,7 +341,7 @@ class MVDR(torch.nn.Module):
class RTFMVDR(torch.nn.Module):
r"""Minimum Variance Distortionless Response (*MVDR* [:footcite:`capon1969high`]) module
r"""Minimum Variance Distortionless Response (*MVDR* :cite:`capon1969high`) module
based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
.. devices:: CPU CUDA
......@@ -405,8 +405,8 @@ class RTFMVDR(torch.nn.Module):
class SoudenMVDR(torch.nn.Module):
r"""Minimum Variance Distortionless Response (*MVDR* [:footcite:`capon1969high`]) module
based on the method proposed by *Souden et, al.* [:footcite:`souden2009optimal`].
r"""Minimum Variance Distortionless Response (*MVDR* :cite:`capon1969high`) module
based on the method proposed by *Souden et, al.* :cite:`souden2009optimal`.
.. devices:: CPU CUDA
......
......@@ -214,8 +214,8 @@ class GriffinLim(torch.nn.Module):
.. properties:: Autograd TorchScript
Implementation ported from
*librosa* [:footcite:`brian_mcfee-proc-scipy-2015`], *A fast Griffin-Lim algorithm* [:footcite:`6701851`]
and *Signal estimation from modified short-time Fourier transform* [:footcite:`1172092`].
*librosa* :cite:`brian_mcfee-proc-scipy-2015`, *A fast Griffin-Lim algorithm* :cite:`6701851`
and *Signal estimation from modified short-time Fourier transform* :cite:`1172092`.
Args:
n_fft (int, optional): Size of FFT, creates ``n_fft // 2 + 1`` bins. (Default: ``400``)
......@@ -1040,7 +1040,7 @@ class TimeStretch(torch.nn.Module):
.. properties:: Autograd TorchScript
Proposed in *SpecAugment* [:footcite:`specaugment`].
Proposed in *SpecAugment* :cite:`specaugment`.
Args:
hop_length (int or None, optional): Length of hop between STFT windows. (Default: ``win_length // 2``)
......@@ -1226,7 +1226,7 @@ class FrequencyMasking(_AxisMasking):
.. properties:: Autograd TorchScript
Proposed in *SpecAugment* [:footcite:`specaugment`].
Proposed in *SpecAugment* :cite:`specaugment`.
Args:
freq_mask_param (int): maximum possible length of the mask.
......@@ -1260,7 +1260,7 @@ class TimeMasking(_AxisMasking):
.. properties:: Autograd TorchScript
Proposed in *SpecAugment* [:footcite:`specaugment`].
Proposed in *SpecAugment* :cite:`specaugment`.
Args:
time_mask_param (int): maximum possible length of the mask.
......@@ -1724,7 +1724,7 @@ class PitchShift(LazyModuleMixin, torch.nn.Module):
class RNNTLoss(torch.nn.Module):
"""Compute the RNN Transducer loss from *Sequence Transduction with Recurrent Neural Networks*
[:footcite:`graves2012sequence`].
:cite:`graves2012sequence`.
.. devices:: CPU CUDA
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment