Add prototype.tacotron2 page to docs (#1695)

9c641849 · yangarbiter · GitHub · 9535d83e · 9c641849 · 9c641849
Unverified Commit 9c641849 authored Aug 12, 2021 by yangarbiter Committed by GitHub Aug 12, 2021
4 changed files
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -40,6 +40,7 @@ The :mod:`torchaudio` package consists of I/O, popular datasets and common audio
   kaldi_io
   utils
   rnnt_loss
+   tacotron2
 .. toctree::

--- a/docs/source/refs.bib
+++ b/docs/source/refs.bib
@@ -38,14 +38,6 @@
      archivePrefix={arXiv},
      primaryClass={cs.SD}
 }
-@inproceedings{shen2018natural,
-      title={Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions},
-      author={Shen, Jonathan and Pang, Ruoming and Weiss, Ron J and Schuster, Mike and Jaitly, Navdeep and Yang, Zongheng and Chen, Zhifeng and Zhang, Yu and Wang, Yuxuan and Skerrv-Ryan, Rj and others},
-      year={2017},
-      eprint={1712.05884},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
-}
 @article{Luo_2019,
   title={Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation},
   volume={27},
@@ -96,3 +88,11 @@
  number={},
  pages={2494-2498},
  doi={10.1109/ICASSP.2014.6854049}}
+@inproceedings{shen2018natural,
+  title={Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions},
+  author={Shen, Jonathan and Pang, Ruoming and Weiss, Ron J and Schuster, Mike and Jaitly, Navdeep and Yang, Zongheng and Chen, Zhifeng and Zhang, Yu and Wang, Yuxuan and Skerrv-Ryan, Rj and others},
+  booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
+  pages={4779--4783},
+  year={2018},
+  organization={IEEE}
+}
\ No newline at end of file
--- a/docs/source/tacotron2.rst
+++ b/docs/source/tacotron2.rst
+.. role:: hidden
+    :class: hidden-section
+torchaudio.prototype.tacotron2
+==============================
+.. currentmodule:: torchaudio.prototype.tacotron2
+.. note::
+    The Tacotron2 model is a prototype feature, see `here <https://pytorch.org/audio>`_ to learn more about the nomenclature.
+    It is only available within the nightlies, and also needs to be imported
+    explicitly using: :code:`from torchaudio.prototype.Tacotron2 import Tacotron2, tacotron2`.
+Tacotron2
+~~~~~~~~~
+.. autoclass:: Tacotron2
+  .. automethod:: forward
+  .. automethod:: infer
+Factory Functions
+-----------------
+tacotron2
+---------
+.. autofunction:: tacotron2
+References
+~~~~~~~~~~
+.. footbibliography::
--- a/torchaudio/prototype/tacotron2.py
+++ b/torchaudio/prototype/tacotron2.py
@@ -1139,13 +1139,14 @@ class Tacotron2(nn.Module):
        The input `text` should be padded with zeros to length max of ``text_lengths``.
        Args:
-            text (Tensor): the input text to Tacotron2.  (n_batch, max of ``text_lengths``)
+            text (Tensor): The input text to Tacotron2 with shape (n_batch, max of ``text_lengths``).
-            text_lengths (Tensor): the length of each text (n_batch)
+            text_lengths (Tensor): The length of each text with shape (n_batch, ).
        Return:
-            mel_specgram (Tensor): the predicted mel spectrogram
+            mel_specgram (Tensor): The predicted mel spectrogram
-                with shape (n_batch, n_mels, max of ``mel_specgram_lengths.max()``)
+                with shape (n_batch, n_mels, max of ``mel_specgram_lengths.max()``).
-            mel_specgram_lengths (Tensor): the length of the predicted mel spectrogram (n_batch, ))
+            mel_specgram_lengths (Tensor): The length of the predicted mel spectrogram
+                with shape (n_batch, ).
            alignments (Tensor): Sequence of attention weights from the decoder.
                with shape (n_batch, max of ``mel_specgram_lengths``, max of ``text_lengths``).
        """