Update prototype documentations (#2108)

Summary: ### Change list * Split the documentation of prototypes * Add a new API reference section dedicated for prototypes. * Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder * Hide the signature of RNNT constructor. (cc hwangjeff ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT * Tweak CTC tutorial * Replace hyperlinks to API reference with backlinks * Add `progress=False` to download ### Follow-up RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous. I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough. ### Before https://pytorch.org/audio/main/prototype.html <img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png"> ### After https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html <img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html <img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html <img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2108 Reviewed By: hwangjeff, carolineechen, nateanl Differential Revision: D33340816 Pulled By: mthrok fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187

Update prototype documentations (#2108)
Summary: ### Change list * Split the documentation of prototypes * Add a new API reference section dedicated for prototypes. * Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder * Hide the signature of RNNT constructor. (cc hwangjeff ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT * Tweak CTC tutorial * Replace hyperlinks to API reference with backlinks * Add `progress=False` to download ### Follow-up RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous. I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough. ### Before https://pytorch.org/audio/main/prototype.html <img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png"> ### After https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html <img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html <img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html <img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2108 Reviewed By: hwangjeff, carolineechen, nateanl Differential Revision: D33340816 Pulled By: mthrok fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187
10cce198 · moto · Facebook GitHub Bot · 72a98a86 · 10cce198 · 10cce198
Commit 10cce198 authored Dec 28, 2021 by moto Committed by Facebook GitHub Bot Dec 28, 2021
7 changed files
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -48,7 +48,17 @@ API References
   compliance.kaldi
   kaldi_io
   utils
+Prototype API References
+------------------------
+.. toctree::
+   :maxdepth: 1
+   :caption: Prototype API Reference
   prototype
+   prototype.rnnt
+   prototype.ctc_decoder
 Getting Started
 ---------------

--- a/docs/source/prototype.ctc_decoder.rst
+++ b/docs/source/prototype.ctc_decoder.rst
+torchaudio.prototype.ctc_decoder
+================================
+.. currentmodule:: torchaudio.prototype.ctc_decoder
+Decoder Class
+-------------
+KenLMLexiconDecoder
+~~~~~~~~~~~~~~~~~~~
+.. autoclass:: KenLMLexiconDecoder
+  .. automethod:: __call__
+  .. automethod:: idxs_to_tokens
+Factory Function
+----------------
+kenlm_lexicon_decoder
+~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: kenlm_lexicon_decoder
--- a/docs/source/prototype.rnnt.rst
+++ b/docs/source/prototype.rnnt.rst
+torchaudio.prototype.rnnt
+=========================
+.. py:module:: torchaudio.prototype
+.. currentmodule:: torchaudio.prototype
+Model Classes
+-------------
+Conformer
+~~~~~~~~~
+.. autoclass:: Conformer
+  .. automethod:: forward
+Emformer
+~~~~~~~~
+.. autoclass:: Emformer
+  .. automethod:: forward
+  .. automethod:: infer
+RNNT
+~~~~
+.. autoclass:: RNNT
+  .. automethod:: forward
+  .. automethod:: transcribe_streaming
+  .. automethod:: transcribe
+  .. automethod:: predict
+  .. automethod:: join
+Model Factory Functions
+-----------------------
+emformer_rnnt_base
+~~~~~~~~~~~~~~~~~~
+.. autofunction:: emformer_rnnt_base
+emformer_rnnt_model
+~~~~~~~~~~~~~~~~~~~
+.. autofunction:: emformer_rnnt_model
+Decoder Classes
+---------------
+RNNTBeamSearch
+~~~~~~~~~~~~
+.. autoclass:: RNNTBeamSearch
+  .. automethod:: forward
+  .. automethod:: infer
+Hypothesis
+~~~~~~~~~~
+.. autoclass:: Hypothesis
+Pipeline Primitives (Pre-trained Models)
+----------------------------------------
+RNNTBundle
+~~~~~~~~~~
+.. autoclass:: RNNTBundle
+  :members: sample_rate, n_fft, n_mels, hop_length, segment_length, right_context_length
+  .. automethod:: get_decoder
+  .. automethod:: get_feature_extractor
+  .. automethod:: get_streaming_feature_extractor
+  .. automethod:: get_token_processor
+  .. autoclass:: torchaudio.prototype::RNNTBundle.FeatureExtractor
+    :special-members: __call__
+  .. autoclass:: torchaudio.prototype::RNNTBundle.TokenProcessor
+    :special-members: __call__
+EMFORMER_RNNT_BASE_LIBRISPEECH
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. container:: py attribute
+   .. autodata:: EMFORMER_RNNT_BASE_LIBRISPEECH
+      :no-value:
+References
+----------
+.. footbibliography::
--- a/docs/source/prototype.rst
+++ b/docs/source/prototype.rst
-.. role:: hidden
-    :class: hidden-section
 torchaudio.prototype
 ====================
-.. py:module:: torchaudio.prototype
-.. currentmodule:: torchaudio.prototype
 ``torchaudio.prototype`` provides prototype features;
-see `here <https://pytorch.org/audio>`_ for more information on prototype features.
+they are at an early stage for feedback and testing.
-The module is available only within nightly builds and must be imported
+Their interfaces might be changed without prior notice.
-explicitly, e.g. ``import torchaudio.prototype``.
-Conformer
-~~~~~~~~~
-.. autoclass:: Conformer
-  .. automethod:: forward
-Emformer
-~~~~~~~~
-.. autoclass:: Emformer
-  .. automethod:: forward
-  .. automethod:: infer
-RNNT
-~~~~
-.. autoclass:: RNNT
-  .. automethod:: forward
-  .. automethod:: transcribe_streaming
-  .. automethod:: transcribe
-  .. automethod:: predict
-  .. automethod:: join
-emformer_rnnt_base
-~~~~~~~~~~~~~~~~~~
-.. autofunction:: emformer_rnnt_base
-emformer_rnnt_model
-~~~~~~~~~~~~~~~~~~~
-.. autofunction:: emformer_rnnt_model
-RNNTBeamSearch
-~~~~~~~~~~~~~~
-.. autoclass:: RNNTBeamSearch
-  .. automethod:: forward
-  .. automethod:: infer
-Hypothesis
-~~~~~~~~~~
-.. autoclass:: Hypothesis
-RNNTBundle
-~~~~~~~~~~
-.. autoclass:: RNNTBundle
-  :members: sample_rate, n_fft, n_mels, hop_length, segment_length, right_context_length
-  .. automethod:: get_decoder
-  .. automethod:: get_feature_extractor
-  .. automethod:: get_streaming_feature_extractor
-  .. automethod:: get_token_processor
-  .. autoclass:: torchaudio.prototype::RNNTBundle.FeatureExtractor
-    :special-members: __call__
-  .. autoclass:: torchaudio.prototype::RNNTBundle.TokenProcessor
-    :special-members: __call__
-EMFORMER_RNNT_BASE_LIBRISPEECH
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autodata:: EMFORMER_RNNT_BASE_LIBRISPEECH
-  :no-value:
-KenLMLexiconDecoder
-~~~~~~~~~~~~~~~~~~~
-.. currentmodule:: torchaudio.prototype.ctc_decoder
-.. autoclass:: KenLMLexiconDecoder
-  .. automethod:: __call__
-  .. automethod:: idxs_to_tokens
-kenlm_lexicon_decoder
-~~~~~~~~~~~~~~~~~~~~~
-.. currentmodule:: torchaudio.prototype.ctc_decoder
+Most modules of prototypes are excluded from release.
+Please refer to `here <https://pytorch.org/audio>`_ for
+more information on prototype features.
-.. autoclass:: kenlm_lexicon_decoder
+The modules under ``torchaudio.prototype`` must be
+imported explicitly, e.g.
+.. code-block:: python
-References
+   import torchaudio.prototype.rnnt
-~~~~~~~~~~
-.. footbibliography::
+.. toctree::
+    prototype.rnnt
+    prototype.ctc_decoder
--- a/examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py
+++ b/examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py
@@ -50,7 +50,7 @@ import torchaudio
 # We use the pretrained `Wav2Vec 2.0 <https://arxiv.org/abs/2006.11477>`__
 # Base model that is finetuned on 10 min of the `LibriSpeech
 # dataset <http://www.openslr.org/12>`__, which can be loaded in using
-# ``torchaudio.pipelines``. For more detail on running Wav2Vec 2.0 speech
+# py:func:`torchaudio.pipelines`. For more detail on running Wav2Vec 2.0 speech
 # recognition pipelines in torchaudio, please refer to `this
 # tutorial <https://pytorch.org/audio/main/tutorials/speech_recognition_pipeline_tutorial.html>`__.
 #
@@ -161,12 +161,11 @@ torch.hub.download_url_to_file(kenlm_url, kenlm_file)
 # Construct Beam Search Decoder
 # -----------------------------
 #
-# The decoder can be constructed using the ``kenlm_lexicon_decoder``
+# The decoder can be constructed using the
-# factory function from ``torchaudio.prototype.ctc_decoder``. In addition
+# :py:func:`torchaudio.prototype.ctc_decoder.kenlm_lexicon_decoder`
-# to the previously mentioned components, it also takes in various beam
+# factory function.
-# search decoding parameters and token/word parameters. The full list of
+# In addition to the previously mentioned components, it also takes in
-# parameters can be found
+# various beam search decoding parameters and token/word parameters.
-# `here <https://pytorch.org/audio/main/prototype.html#kenlm-lexicon-decoder>`__.
 #
 from torchaudio.prototype.ctc_decoder import kenlm_lexicon_decoder
@@ -190,7 +189,7 @@ beam_search_decoder = kenlm_lexicon_decoder(
 # --------------
 #
 # For comparison against the beam search decoder, we also construct a
-# basic greedy decoder.\ **bold text**
+# basic greedy decoder.
 #

--- a/torchaudio/prototype/ctc_decoder/ctc_decoder.py
+++ b/torchaudio/prototype/ctc_decoder/ctc_decoder.py
@@ -24,6 +24,14 @@ Hypothesis = namedtuple("Hypothesis", ["tokens", "words", "score"])
 class KenLMLexiconDecoder:
+    """torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder()
+    Note:
+        To build the decoder, please use factory function
+        :py:func:`kenlm_lexicon_decoder`.
+    """
    def __init__(
        self,
        nbest: int,

--- a/torchaudio/prototype/rnnt.py
+++ b/torchaudio/prototype/rnnt.py
@@ -426,7 +426,9 @@ class _Joiner(torch.nn.Module):
 class RNNT(torch.nn.Module):
-    r"""Recurrent neural network transducer (RNN-T) model.
+    r"""torchaudio.prototype.rnnt.RNNT()
+    Recurrent neural network transducer (RNN-T) model.
    Note:
        To build the model, please use one of the factory functions.