Unverified Commit 9c641849 authored by yangarbiter's avatar yangarbiter Committed by GitHub
Browse files

Add prototype.tacotron2 page to docs (#1695)

parent 9535d83e
...@@ -40,6 +40,7 @@ The :mod:`torchaudio` package consists of I/O, popular datasets and common audio ...@@ -40,6 +40,7 @@ The :mod:`torchaudio` package consists of I/O, popular datasets and common audio
kaldi_io kaldi_io
utils utils
rnnt_loss rnnt_loss
tacotron2
.. toctree:: .. toctree::
......
...@@ -38,14 +38,6 @@ ...@@ -38,14 +38,6 @@
archivePrefix={arXiv}, archivePrefix={arXiv},
primaryClass={cs.SD} primaryClass={cs.SD}
} }
@inproceedings{shen2018natural,
title={Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions},
author={Shen, Jonathan and Pang, Ruoming and Weiss, Ron J and Schuster, Mike and Jaitly, Navdeep and Yang, Zongheng and Chen, Zhifeng and Zhang, Yu and Wang, Yuxuan and Skerrv-Ryan, Rj and others},
year={2017},
eprint={1712.05884},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@article{Luo_2019, @article{Luo_2019,
title={Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation}, title={Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation},
volume={27}, volume={27},
...@@ -96,3 +88,11 @@ ...@@ -96,3 +88,11 @@
number={}, number={},
pages={2494-2498}, pages={2494-2498},
doi={10.1109/ICASSP.2014.6854049}} doi={10.1109/ICASSP.2014.6854049}}
@inproceedings{shen2018natural,
title={Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions},
author={Shen, Jonathan and Pang, Ruoming and Weiss, Ron J and Schuster, Mike and Jaitly, Navdeep and Yang, Zongheng and Chen, Zhifeng and Zhang, Yu and Wang, Yuxuan and Skerrv-Ryan, Rj and others},
booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={4779--4783},
year={2018},
organization={IEEE}
}
\ No newline at end of file
.. role:: hidden
:class: hidden-section
torchaudio.prototype.tacotron2
==============================
.. currentmodule:: torchaudio.prototype.tacotron2
.. note::
The Tacotron2 model is a prototype feature, see `here <https://pytorch.org/audio>`_ to learn more about the nomenclature.
It is only available within the nightlies, and also needs to be imported
explicitly using: :code:`from torchaudio.prototype.Tacotron2 import Tacotron2, tacotron2`.
Tacotron2
~~~~~~~~~
.. autoclass:: Tacotron2
.. automethod:: forward
.. automethod:: infer
Factory Functions
-----------------
tacotron2
---------
.. autofunction:: tacotron2
References
~~~~~~~~~~
.. footbibliography::
...@@ -1139,13 +1139,14 @@ class Tacotron2(nn.Module): ...@@ -1139,13 +1139,14 @@ class Tacotron2(nn.Module):
The input `text` should be padded with zeros to length max of ``text_lengths``. The input `text` should be padded with zeros to length max of ``text_lengths``.
Args: Args:
text (Tensor): the input text to Tacotron2. (n_batch, max of ``text_lengths``) text (Tensor): The input text to Tacotron2 with shape (n_batch, max of ``text_lengths``).
text_lengths (Tensor): the length of each text (n_batch) text_lengths (Tensor): The length of each text with shape (n_batch, ).
Return: Return:
mel_specgram (Tensor): the predicted mel spectrogram mel_specgram (Tensor): The predicted mel spectrogram
with shape (n_batch, n_mels, max of ``mel_specgram_lengths.max()``) with shape (n_batch, n_mels, max of ``mel_specgram_lengths.max()``).
mel_specgram_lengths (Tensor): the length of the predicted mel spectrogram (n_batch, )) mel_specgram_lengths (Tensor): The length of the predicted mel spectrogram
with shape (n_batch, ).
alignments (Tensor): Sequence of attention weights from the decoder. alignments (Tensor): Sequence of attention weights from the decoder.
with shape (n_batch, max of ``mel_specgram_lengths``, max of ``text_lengths``). with shape (n_batch, max of ``mel_specgram_lengths``, max of ``text_lengths``).
""" """
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment