Inital code drop

Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Inital code drop
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
996ea169 · Przemek Tredak · 996ea169 · 996ea169 · 58d77fa8 · 996ea169
Commit 996ea169 authored Sep 27, 2022 by Przemek Tredak
20 changed files
--- a/.gitignore
+++ b/.gitignore
+*.o
+*.swp
+*.ii
+*.ptx
+*.cubin
+*.fatbin*
+*.module_id
+*.nsys-rep
+*.ncu-rep
+*.sqlite
+.eggs
+build/
+*.so
+*.egg-info
+__pycache__
+.ycm_extra_conf.py
+.vimrc
+tests/cpp/build/
+docs/_build
+.ipynb_checkpoints
+docs/doxygen
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "3rdparty/googletest"]
+	path = 3rdparty/googletest
+	url = https://github.com/google/googletest.git
--- a/googletest @ 58d77fa8
+++ b/googletest @ 58d77fa8
+Subproject commit 58d77fa8070e8cec2dc1ed015d66b454c8d78850
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+Contribution Rules
+==================
+
+.. role:: bash(code)
+   :language: bash
+
+Coding Guidelines
+-----------------
+
+* We follow `Google C++ Style Guide <https://google.github.io/styleguide/cppguide.html>`_. When no
+  rules can be found, follow the already occuring conventions. If there is no precedence in our
+  codebase we are open to discussion.
+* Prior to your contribution, please make sure that the code passes the linter check. We do both C++
+  and Python linting. To invoke the check, please use
+
+  .. code-block:: bash
+
+    TE_PATH=<path to TE source> bash qa/L0_lint/test.sh
+
+* Avoid introducing unnecessary complexity into existing code so that maintainability and
+  readability are preserved.
+* Try to keep pull requests (PRs) as concise as possible:
+
+  - Avoid committing commented-out code.
+  - Wherever possible, each PR should address a single concern. If there are several
+    otherwise-unrelated things that should be fixed to reach a desired endpoint, our recommendation
+    is to open several PRs and indicate the dependencies in the description. The more complex the
+    changes are in a single PR, the more time it will take to review those changes.
+
+* Write PR and commit titles using imperative mood.
+
+  - Format commit messages sticking to rules described in
+    `this <https://chris.beams.io/posts/git-commit/>`_ guide.
+
+* Make sure all `L0_*` tests pass:
+
+  - In the :bash:`qa/` directory, there are basic sanity tests scripted in directories named
+    `L0_...`. A given test can be executed by running the :bash:`./test.sh` command in each test
+    directory. The :bash:`test.sh` script assume that the TE source can be found under
+    :bash:`/opt/transformerengine`. This assumption can be changed by setting :bash:`TE_PATH`
+    environment variable to the directory containing TE source.
+  - One of the tests, `L0_license` tests for valid NVIDIA copyright and license text. If you create
+    a new file and do not want to pass copyright rights to NVIDIA, please add the file to the
+    `exclude_copyright` list in :bash:`qa/L0_license/config.json`.
+
+* Transformer Engine's default build assumes recent versions of TE's dependencies (CUDA toolkit
+  etc.). Contributions that add compatibility with older versions of those dependencies will be
+  considered, but NVIDIA cannot guarantee that all possible build configurations work, are not
+  broken by future contributions, and retain highest performance.
+* Make sure that you can contribute your work to open source (no license and/or patent conflict is
+  introduced by your code). You need to `Sign Your Work`_.
+* Thanks in advance for your patience as we review your contributions; we do appreciate them!
+
+Sign Your Work
+--------------
+
+* We require that all contributors "sign-off" on their commits. This certifies that the contribution
+  is your original work, or you have rights to submit it under the same license, or a compatible
+  license.
+
+  * Any contribution which contains commits that are not Signed-Off will not be accepted.
+
+* To sign off on a commit you simply use the `--signoff` (or `-s`) option when committing your changes:
+
+  .. code-block:: bash
+
+    $ git commit -s -m "Add cool feature."
+
+  This will append the following to your commit message:
+
+  .. code-block:: text
+
+    Signed-off-by: Your Name <your@email.com>
+
+* Full text of the DCO:
+
+  .. code-block:: text
+
+    Developer Certificate of Origin
+    Version 1.1
+
+    Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
+
+    Everyone is permitted to copy and distribute verbatim copies of this
+    license document, but changing it is not allowed.
+
+
+    Developer's Certificate of Origin 1.1
+
+    By making a contribution to this project, I certify that:
+
+    (a) The contribution was created in whole or in part by me and I
+        have the right to submit it under the open source license
+        indicated in the file; or
+
+    (b) The contribution is based upon previous work that, to the best
+        of my knowledge, is covered under an appropriate open source
+        license and I have the right under that license to submit that
+        work with modifications, whether created in whole or in part
+        by me, under the same open source license (unless I am
+        permitted to submit under a different license), as indicated
+        in the file; or
+
+    (c) The contribution was provided directly to me by some other
+        person who certified (a), (b) or (c) and I have not modified
+        it.
+
+    (d) I understand and agree that this project and the contribution
+        are public and that a record of the contribution (including all
+        personal information I submit with it, including my sign-off) is
+        maintained indefinitely and may be redistributed consistent with
+        this project or the open source license(s) involved.
--- a/LICENSE
+++ b/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
--- a/README.rst
+++ b/README.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+|License|
+
+Transformer Engine
+==================
+
+.. overview-begin-marker-do-not-remove
+
+Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including
+using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower
+memory utilization in both training and inference. TE provides a collection of highly optimized
+building blocks for popular Transformer architectures and an automatic mixed precision-like API that
+can be used seamlessly with your PyTorch code. TE also includes a framework agnostic C++ API that
+can be integrated with other deep learning libraries to enable FP8 support for Transformers.
+
+As the number of parameters in Transformer models continues to grow, training and inference for
+architectures such as BERT, GPT and T5 becomes very memory and compute intensive. Most deep learning
+frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for
+many deep learning models. Using mixed-precision training, which combines single-precision (FP32)
+with lower precision (e.g. FP16) format when training a model, results in significant speedups with
+minimal differences in accuracy as compared to FP32 training. With the introduction of Hopper GPU
+architecture FP8 precision was introduced, which offers improved performance over FP16 with no
+degradation in accuracy. Although all major deep learning frameworks support FP16, FP8 support is
+not available today.
+
+TE addresses the problem of FP8 support by providing APIs that integrate with popular Large Language
+Model (LLM) libraries. It provides python layer (initially supporting pyTorch, with support for more
+frameworks in the future) consisting of modules to easily build Transformer layer as well as
+framework agnostic library in C++ including structs and kernels needed for FP8 support. Modules
+provided by TE internally maintain scaling factors and other values needed for FP8 training, greatly
+simplifying for the users.
+
+Transformer Engine in action:
+
+.. code-block:: python
+
+  import torch
+  import transformer_engine.pytorch as te
+  from transformer_engine.common import recipe
+
+  # Set dimensions.
+  in_features = 768
+  out_features = 3072
+  hidden_size = 2048
+
+  # Initialize model and inputs.
+  model = te.Linear(in_features, out_features, use_bias=True)
+  inp = torch.randn(hidden_size, in_features, device="cuda")
+
+  # Create FP8 recipe. Note: All input args are optional.
+  fp8_recipe = recipe.DelayedScaling(margin=0, interval=1, fp8_format=recipe.Format.E4M3)
+
+  # Enables autocasting for the forward pass
+  with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe):
+      out = model(inp)
+
+  loss = out.sum()
+  loss.backward()
+
+Highlights
+----------
+
+* Easy-to-use pyTorch modules enabling building of the Transformer layers with FP8 support on H100
+  GPUs.
+* Optimizations (e.g. fused kernels) for Transformer models across all precisions and NVIDIA GPU
+  architecures.
+
+.. overview-end-marker-do-not-remove
+
+Installation
+------------
+
+In the NGC container
+^^^^^^^^^^^^^^^^^^^^
+
+Transformer Engine comes preinstalled in the pyTorch container on
+`NVIDIA GPU Cloud <https://ngc.nvidia.com>`_ (versions 22.09 and later).
+
+From source
+^^^^^^^^^^^
+
+Clone the repository and inside it type:
+
+.. code-block:: bash
+
+  pip install .
+
+
+Transformer Architectures
+-------------------------
+
+While the more granular modules in Transformer Engine allow building any Transformer architecture,
+the `TransformerLayer` API of Transformer Engine is flexible enough to build multiple major
+variations of Transformers.
+
+GPT
+^^^
+
+`GPT` architecture has `LayerNorm` at the input side (before `QKV Gemm`) and the residual connection
+is taken from the input of that `LayerNorm`. In TE this can be achieved by setting the following
+arguments in the `TransformerLayer` API.
+
+.. code-block:: python
+
+  transformer_engine.pytorch.TransformerLayer(
+          ...,
+          ...,
+          apply_residual_connection_post_layernorm=False,
+          output_layernorm=False,
+          layer_type="encoder",
+  )
+
+BERT
+^^^^
+
+`BERT` architecture has `LayerNorm` at the output side (after the final `BiasDropoutAdd`) and the
+residual connection is taken from the output of that `LayerNorm`. In TE this can be achieved by
+setting the following arguments in the `TransformerLayer` API.
+
+.. code-block:: python
+
+  transformer_engine.pytorch.TransformerLayer(
+          ...,
+          ...,
+          apply_residual_connection_post_layernorm=True,
+          output_layernorm=True,
+          layer_type="encoder",
+  )
+
+T5
+^^
+
+`T5` architecture has an additional `cross-attention` + `BiasDropoutAdd` + `LayerNorm` block before
+the `MLP` layer. In TE this can be added by setting the `layer_type` to `decoder` in the
+`TransformerLayer` API.
+
+.. code-block:: python
+
+  transformer_engine.pytorch.TransformerLayer(
+          ...,
+          ...,
+          layer_type="decoder",
+  )
+
+Contributing to Transformer Engine
+----------------------------------
+
+We welcome contributions to Transformer Engine. To contribute to TE and make pull requests,
+follow the guidelines outlined in the `<CONTRIBUTING.rst>`_ document.
+
+Useful Links
+------------
+
+* `Attention original paper <https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf>`_
+
+* `Megatron-LM tensor parallel <https://arxiv.org/pdf/1909.08053.pdf>`_
+
+* `Megatron-LM sequence parallel <https://arxiv.org/pdf/2205.05198.pdf>`_
+
+.. |License| image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
+   :target: https://opensource.org/licenses/Apache-2.0
--- a/VERSION
+++ b/VERSION
+0.1.0
--- a/docs/Doxyfile
+++ b/docs/Doxyfile
--- a/docs/Makefile
+++ b/docs/Makefile
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/_templates/layout.html
+++ b/docs/_templates/layout.html
+{% extends "!layout.html" %}
+  {% block sidebartitle %} {{ super() }}
+
+  <style>
+    /* Sidebar header (and topbar for mobile) */
+    .wy-side-nav-search, .wy-nav-top {
+      background: #76b900;
+    }
+
+    .wy-menu > p > span.caption-text {
+      color: #76b900;
+    }
+
+    .wy-menu-vertical p {
+      height: 32px;
+      line-height: 32px;
+      padding: 0 1.618em;
+      margin: 12px 0 0;
+      display: block;
+      font-weight: 700;
+      text-transform: uppercase;
+      font-size: 85%;
+      white-space: nowrap;
+    }
+
+    .wy-side-nav-search a:link, .wy-nav-top a:link {
+      color: #fff;
+    }
+    .wy-side-nav-search a:visited, .wy-nav-top a:visited {
+      color: #fff;
+    }
+    .wy-side-nav-search a:hover, .wy-nav-top a:hover {
+      color: #fff;
+    }
+
+    .wy-menu-vertical a:link, .wy-menu-vertical a:visited {
+      color: #d9d9d9
+    }
+
+    .wy-menu-vertical a:active {
+      background-color: #76b900
+    }
+
+    .wy-side-nav-search>div.version {
+      color: rgba(0, 0, 0, 0.3)
+    }
+
+    .wy-nav-content {
+      max-width: 1000px;
+    }
+
+    /* override table width restrictions */
+    .wy-table-responsive table td, .wy-table-responsive table th {
+        /* !important prevents the common CSS stylesheets from
+          overriding this as on RTD they are loaded after this stylesheet */
+        white-space: normal !important;
+    }
+
+    .wy-table-responsive {
+        overflow: visible !important;
+    }
+
+  </style>
+  {% endblock %}
+
+  {% block footer %} {{ super() }}
+
+  <style>
+  a:link, a:visited {
+    color: #76b900;
+  }
+
+  a:hover {
+    color: #8c0;
+  }
+
+  html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple)>dt {
+    background: rgba(118, 185, 0, 0.1);
+    color: rgba(59,93,0,1);
+    border-top: solid 3px rgba(59,93,0,1);
+  }
+
+  html.writer-html4 .rst-content dl:not(.docutils) .property, html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .property {
+    text-transform: capitalize;
+    display: inline-block;
+    padding-right: 8px;
+  }
+  </style>
+
+  {%- if nvidia_analytics_id %}
+  <script type="text/javascript">_satellite.pageBottom();</script>
+  {%- endif %}
+
+  {% endblock %}
--- a/docs/api/c/activation.rst
+++ b/docs/api/c/activation.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+activation.h
+============
+
+.. doxygenfile:: activation.h
--- a/docs/api/c/cast.rst
+++ b/docs/api/c/cast.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+cast.h
+======
+
+.. doxygenfile:: cast.h
--- a/docs/api/c/gemm.rst
+++ b/docs/api/c/gemm.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+gemm.h
+======
+
+.. doxygenfile:: gemm.h
--- a/docs/api/c/index.rst
+++ b/docs/api/c/index.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+C/C++ API
+=========
+
+.. Caution:: This feature is not officially supported yet and may change without notice.
+
+The C/C++ API allows you to access the custom kernels defined in `libtransformer_engine.so` library
+directly from C/C++, without Python.
+
+.. toctree::
+   :caption: Headers
+
+   activation.h <activation>
+   cast.h <cast>
+   gemm.h <gemm>
+   layer_norm.h <layer_norm>
+   transformer_engine.h <transformer_engine>
+   transpose.h <transpose>
--- a/docs/api/c/layer_norm.rst
+++ b/docs/api/c/layer_norm.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+layer_norm.h
+============
+
+.. doxygenfile:: layer_norm.h
--- a/docs/api/c/transformer_engine.rst
+++ b/docs/api/c/transformer_engine.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+transformer_engine.h
+====================
+
+.. doxygenfile:: transformer_engine.h
--- a/docs/api/c/transpose.rst
+++ b/docs/api/c/transpose.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+transpose.h
+===========
+
+.. doxygenfile:: transpose.h
--- a/docs/api/common.rst
+++ b/docs/api/common.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+Common API
+==========
+
+Classes
+-------
+
+.. autoclass:: transformer_engine.common.recipe.Format
+
+.. autoclass:: transformer_engine.common.recipe.DelayedScaling(margin=0, interval=1, fp8_format=Format.E4M3, amax_history_len=1, amax_compute_algo="most_recent", scaling_factor_compute_algo=None, override_linear_precision=(False, False, False))
+
--- a/docs/api/framework.rst
+++ b/docs/api/framework.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+Framework-specific API
+======================
+
+.. toctree::
+
+    pytorch
--- a/docs/api/pytorch.rst
+++ b/docs/api/pytorch.rst
+..
+    Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+    See LICENSE for license information.
+
+pyTorch
+=======
+
+Modules
+-------
+
+.. autoclass:: transformer_engine.pytorch.Linear(in_features, out_features, bias=True, **kwargs)
+  :members: forward
+
+.. autoclass:: transformer_engine.pytorch.LayerNorm(hidden_size, eps=1e-5, **kwargs)
+
+.. autoclass:: transformer_engine.pytorch.LayerNormLinear(in_features, out_features, eps=1e-5, bias=True, **kwargs)
+  :members: forward
+
+.. autoclass:: transformer_engine.pytorch.LayerNormMLP(hidden_size, ffn_hidden_size, eps=1e-5, bias=True, **kwargs)
+  :members: forward
+
+.. autoclass:: transformer_engine.pytorch.TransformerLayer(hidden_size, ffn_hidden_size, num_attention_heads, **kwargs)
+  :members: forward
+
+Functions
+---------
+
+.. autofunction:: transformer_engine.pytorch.fp8_autocast