"docs/git@developer.sourcefind.cn:yaoyuping/nndetection.git" did not exist on "537a3032c6383a6faca00fe23459c32339dc09d7"
Commit 996ea169 authored by Przemek Tredak's avatar Przemek Tredak
Browse files

Inital code drop


Co-authored-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: default avatarPrzemek Tredak <ptredak@nvidia.com>
parents
*.o
*.swp
*.ii
*.ptx
*.cubin
*.fatbin*
*.module_id
*.nsys-rep
*.ncu-rep
*.sqlite
.eggs
build/
*.so
*.egg-info
__pycache__
.ycm_extra_conf.py
.vimrc
tests/cpp/build/
docs/_build
.ipynb_checkpoints
docs/doxygen
[submodule "3rdparty/googletest"]
path = 3rdparty/googletest
url = https://github.com/google/googletest.git
Subproject commit 58d77fa8070e8cec2dc1ed015d66b454c8d78850
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
Contribution Rules
==================
.. role:: bash(code)
:language: bash
Coding Guidelines
-----------------
* We follow `Google C++ Style Guide <https://google.github.io/styleguide/cppguide.html>`_. When no
rules can be found, follow the already occuring conventions. If there is no precedence in our
codebase we are open to discussion.
* Prior to your contribution, please make sure that the code passes the linter check. We do both C++
and Python linting. To invoke the check, please use
.. code-block:: bash
TE_PATH=<path to TE source> bash qa/L0_lint/test.sh
* Avoid introducing unnecessary complexity into existing code so that maintainability and
readability are preserved.
* Try to keep pull requests (PRs) as concise as possible:
- Avoid committing commented-out code.
- Wherever possible, each PR should address a single concern. If there are several
otherwise-unrelated things that should be fixed to reach a desired endpoint, our recommendation
is to open several PRs and indicate the dependencies in the description. The more complex the
changes are in a single PR, the more time it will take to review those changes.
* Write PR and commit titles using imperative mood.
- Format commit messages sticking to rules described in
`this <https://chris.beams.io/posts/git-commit/>`_ guide.
* Make sure all `L0_*` tests pass:
- In the :bash:`qa/` directory, there are basic sanity tests scripted in directories named
`L0_...`. A given test can be executed by running the :bash:`./test.sh` command in each test
directory. The :bash:`test.sh` script assume that the TE source can be found under
:bash:`/opt/transformerengine`. This assumption can be changed by setting :bash:`TE_PATH`
environment variable to the directory containing TE source.
- One of the tests, `L0_license` tests for valid NVIDIA copyright and license text. If you create
a new file and do not want to pass copyright rights to NVIDIA, please add the file to the
`exclude_copyright` list in :bash:`qa/L0_license/config.json`.
* Transformer Engine's default build assumes recent versions of TE's dependencies (CUDA toolkit
etc.). Contributions that add compatibility with older versions of those dependencies will be
considered, but NVIDIA cannot guarantee that all possible build configurations work, are not
broken by future contributions, and retain highest performance.
* Make sure that you can contribute your work to open source (no license and/or patent conflict is
introduced by your code). You need to `Sign Your Work`_.
* Thanks in advance for your patience as we review your contributions; we do appreciate them!
Sign Your Work
--------------
* We require that all contributors "sign-off" on their commits. This certifies that the contribution
is your original work, or you have rights to submit it under the same license, or a compatible
license.
* Any contribution which contains commits that are not Signed-Off will not be accepted.
* To sign off on a commit you simply use the `--signoff` (or `-s`) option when committing your changes:
.. code-block:: bash
$ git commit -s -m "Add cool feature."
This will append the following to your commit message:
.. code-block:: text
Signed-off-by: Your Name <your@email.com>
* Full text of the DCO:
.. code-block:: text
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
|License|
Transformer Engine
==================
.. overview-begin-marker-do-not-remove
Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including
using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower
memory utilization in both training and inference. TE provides a collection of highly optimized
building blocks for popular Transformer architectures and an automatic mixed precision-like API that
can be used seamlessly with your PyTorch code. TE also includes a framework agnostic C++ API that
can be integrated with other deep learning libraries to enable FP8 support for Transformers.
As the number of parameters in Transformer models continues to grow, training and inference for
architectures such as BERT, GPT and T5 becomes very memory and compute intensive. Most deep learning
frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for
many deep learning models. Using mixed-precision training, which combines single-precision (FP32)
with lower precision (e.g. FP16) format when training a model, results in significant speedups with
minimal differences in accuracy as compared to FP32 training. With the introduction of Hopper GPU
architecture FP8 precision was introduced, which offers improved performance over FP16 with no
degradation in accuracy. Although all major deep learning frameworks support FP16, FP8 support is
not available today.
TE addresses the problem of FP8 support by providing APIs that integrate with popular Large Language
Model (LLM) libraries. It provides python layer (initially supporting pyTorch, with support for more
frameworks in the future) consisting of modules to easily build Transformer layer as well as
framework agnostic library in C++ including structs and kernels needed for FP8 support. Modules
provided by TE internally maintain scaling factors and other values needed for FP8 training, greatly
simplifying for the users.
Transformer Engine in action:
.. code-block:: python
import torch
import transformer_engine.pytorch as te
from transformer_engine.common import recipe
# Set dimensions.
in_features = 768
out_features = 3072
hidden_size = 2048
# Initialize model and inputs.
model = te.Linear(in_features, out_features, use_bias=True)
inp = torch.randn(hidden_size, in_features, device="cuda")
# Create FP8 recipe. Note: All input args are optional.
fp8_recipe = recipe.DelayedScaling(margin=0, interval=1, fp8_format=recipe.Format.E4M3)
# Enables autocasting for the forward pass
with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe):
out = model(inp)
loss = out.sum()
loss.backward()
Highlights
----------
* Easy-to-use pyTorch modules enabling building of the Transformer layers with FP8 support on H100
GPUs.
* Optimizations (e.g. fused kernels) for Transformer models across all precisions and NVIDIA GPU
architecures.
.. overview-end-marker-do-not-remove
Installation
------------
In the NGC container
^^^^^^^^^^^^^^^^^^^^
Transformer Engine comes preinstalled in the pyTorch container on
`NVIDIA GPU Cloud <https://ngc.nvidia.com>`_ (versions 22.09 and later).
From source
^^^^^^^^^^^
Clone the repository and inside it type:
.. code-block:: bash
pip install .
Transformer Architectures
-------------------------
While the more granular modules in Transformer Engine allow building any Transformer architecture,
the `TransformerLayer` API of Transformer Engine is flexible enough to build multiple major
variations of Transformers.
GPT
^^^
`GPT` architecture has `LayerNorm` at the input side (before `QKV Gemm`) and the residual connection
is taken from the input of that `LayerNorm`. In TE this can be achieved by setting the following
arguments in the `TransformerLayer` API.
.. code-block:: python
transformer_engine.pytorch.TransformerLayer(
...,
...,
apply_residual_connection_post_layernorm=False,
output_layernorm=False,
layer_type="encoder",
)
BERT
^^^^
`BERT` architecture has `LayerNorm` at the output side (after the final `BiasDropoutAdd`) and the
residual connection is taken from the output of that `LayerNorm`. In TE this can be achieved by
setting the following arguments in the `TransformerLayer` API.
.. code-block:: python
transformer_engine.pytorch.TransformerLayer(
...,
...,
apply_residual_connection_post_layernorm=True,
output_layernorm=True,
layer_type="encoder",
)
T5
^^
`T5` architecture has an additional `cross-attention` + `BiasDropoutAdd` + `LayerNorm` block before
the `MLP` layer. In TE this can be added by setting the `layer_type` to `decoder` in the
`TransformerLayer` API.
.. code-block:: python
transformer_engine.pytorch.TransformerLayer(
...,
...,
layer_type="decoder",
)
Contributing to Transformer Engine
----------------------------------
We welcome contributions to Transformer Engine. To contribute to TE and make pull requests,
follow the guidelines outlined in the `<CONTRIBUTING.rst>`_ document.
Useful Links
------------
* `Attention original paper <https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf>`_
* `Megatron-LM tensor parallel <https://arxiv.org/pdf/1909.08053.pdf>`_
* `Megatron-LM sequence parallel <https://arxiv.org/pdf/2205.05198.pdf>`_
.. |License| image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
:target: https://opensource.org/licenses/Apache-2.0
This diff is collapsed.
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
{% extends "!layout.html" %}
{% block sidebartitle %} {{ super() }}
<style>
/* Sidebar header (and topbar for mobile) */
.wy-side-nav-search, .wy-nav-top {
background: #76b900;
}
.wy-menu > p > span.caption-text {
color: #76b900;
}
.wy-menu-vertical p {
height: 32px;
line-height: 32px;
padding: 0 1.618em;
margin: 12px 0 0;
display: block;
font-weight: 700;
text-transform: uppercase;
font-size: 85%;
white-space: nowrap;
}
.wy-side-nav-search a:link, .wy-nav-top a:link {
color: #fff;
}
.wy-side-nav-search a:visited, .wy-nav-top a:visited {
color: #fff;
}
.wy-side-nav-search a:hover, .wy-nav-top a:hover {
color: #fff;
}
.wy-menu-vertical a:link, .wy-menu-vertical a:visited {
color: #d9d9d9
}
.wy-menu-vertical a:active {
background-color: #76b900
}
.wy-side-nav-search>div.version {
color: rgba(0, 0, 0, 0.3)
}
.wy-nav-content {
max-width: 1000px;
}
/* override table width restrictions */
.wy-table-responsive table td, .wy-table-responsive table th {
/* !important prevents the common CSS stylesheets from
overriding this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}
.wy-table-responsive {
overflow: visible !important;
}
</style>
{% endblock %}
{% block footer %} {{ super() }}
<style>
a:link, a:visited {
color: #76b900;
}
a:hover {
color: #8c0;
}
html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple)>dt {
background: rgba(118, 185, 0, 0.1);
color: rgba(59,93,0,1);
border-top: solid 3px rgba(59,93,0,1);
}
html.writer-html4 .rst-content dl:not(.docutils) .property, html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .property {
text-transform: capitalize;
display: inline-block;
padding-right: 8px;
}
</style>
{%- if nvidia_analytics_id %}
<script type="text/javascript">_satellite.pageBottom();</script>
{%- endif %}
{% endblock %}
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
activation.h
============
.. doxygenfile:: activation.h
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
cast.h
======
.. doxygenfile:: cast.h
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
gemm.h
======
.. doxygenfile:: gemm.h
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
C/C++ API
=========
.. Caution:: This feature is not officially supported yet and may change without notice.
The C/C++ API allows you to access the custom kernels defined in `libtransformer_engine.so` library
directly from C/C++, without Python.
.. toctree::
:caption: Headers
activation.h <activation>
cast.h <cast>
gemm.h <gemm>
layer_norm.h <layer_norm>
transformer_engine.h <transformer_engine>
transpose.h <transpose>
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
layer_norm.h
============
.. doxygenfile:: layer_norm.h
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
transformer_engine.h
====================
.. doxygenfile:: transformer_engine.h
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
transpose.h
===========
.. doxygenfile:: transpose.h
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
Common API
==========
Classes
-------
.. autoclass:: transformer_engine.common.recipe.Format
.. autoclass:: transformer_engine.common.recipe.DelayedScaling(margin=0, interval=1, fp8_format=Format.E4M3, amax_history_len=1, amax_compute_algo="most_recent", scaling_factor_compute_algo=None, override_linear_precision=(False, False, False))
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
Framework-specific API
======================
.. toctree::
pytorch
..
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.
pyTorch
=======
Modules
-------
.. autoclass:: transformer_engine.pytorch.Linear(in_features, out_features, bias=True, **kwargs)
:members: forward
.. autoclass:: transformer_engine.pytorch.LayerNorm(hidden_size, eps=1e-5, **kwargs)
.. autoclass:: transformer_engine.pytorch.LayerNormLinear(in_features, out_features, eps=1e-5, bias=True, **kwargs)
:members: forward
.. autoclass:: transformer_engine.pytorch.LayerNormMLP(hidden_size, ffn_hidden_size, eps=1e-5, bias=True, **kwargs)
:members: forward
.. autoclass:: transformer_engine.pytorch.TransformerLayer(hidden_size, ffn_hidden_size, num_attention_heads, **kwargs)
:members: forward
Functions
---------
.. autofunction:: transformer_engine.pytorch.fp8_autocast
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment