Commits · f7608d89e6701449d7b8ff0952c5c7e02b695e98 · OpenDAS / TransformerEngine

09 May, 2023 1 commit

ONNX export refactoring (#197) · 83911ddb

Neta Zmora authored May 09, 2023



* ONNX export refactoring

* Remove infer_ort (to enable more testing)
* Add BF16 ORT tests for Q/DQ ops and GELU.
  * Use FP32 i/o instead of BF16 (because ORT doesn't support BF16 i/o) and add casts from FP32 to BF16 (this is only for subgraph inputs and outputs).
  * We'll need to add more BF16 testing.
* GEMM:
  * Add cast after DQ to achieve better performance (matmul at sub-fp32 precisions).
  * Fold bias into Gemm operation (=> smaller graphs)
  * Wrap GEMM-GELU with FP32 (TE implements GELU in FP32)
* Enable tests for cross attention (test_export_multihead_attention)
* Reduce test thresholds for test_export_layernorm_mlp, test_export_layernorm_linear, test_export_layernorm
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Loosen MHA export validation thresholds for FP16
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

---------
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

83911ddb

03 May, 2023 1 commit

test_onnx_export - bugfix (#192) · 49a161e4

galagam authored May 03, 2023


Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

49a161e4

29 Apr, 2023 1 commit

Fixes to test_onnx_export when saving input and output tensors (#173) · 1bc86400

galagam authored Apr 30, 2023



* Fixes to test_onnx_export when saving input and output tensors

- Allow saving i/o tensors when onnxruntime inference is skipped
- Support saving multiple outputs
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

* fix
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

1bc86400

28 Apr, 2023 1 commit

Fix LayerNorm ONNX export (#174) · 2a1069f4

Neta Zmora authored Apr 29, 2023



* iFix LN ONNX export

When exporting LayerNorm make sure that the weights and bias
inputs have the same type as the LN input.
Also:
 * Add a regression test.
 * Add environment variable to override directory of generated test artifacts
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* fix envvar
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix linting
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Neta Zmora <nzmora@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

2a1069f4

18 Apr, 2023 1 commit

Tighten tolerances for graph capture test (#153) · b2b3fbe7

Tim Moon authored Apr 17, 2023


Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

b2b3fbe7

17 Apr, 2023 1 commit

[PyTorch] Add tests for cuda graph capture (#144) · f126a04f

Kirthi Shankar Sivamani authored Apr 16, 2023



* Add tests for cuda graph capture
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* add sanity test and address reviews
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

f126a04f

13 Apr, 2023 1 commit

Fix model load exception when state resides on GPU (#140) · b921c0d1

Neta Zmora authored Apr 14, 2023



* Fix model load exception when state resides on GPU

- Whenever converting a torch.tensor to numpy, we need to first
migrate the tensor storage to the host CPU.

- Add a warning not to do contant-folding when exporting to ONNX.
This is due to a torch.onnx export bug.

- Refactor compare_outputs
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Onnx export: Improve remark text
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

---------
Signed-off-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

b921c0d1

07 Apr, 2023 2 commits

make bias configurable (#130) · 82dde778

ngoyal2707 authored Apr 07, 2023



* made bias configurable
Signed-off-by: Naman Goyal <naman@fb.com>

* removed commented lines
Signed-off-by: Naman Goyal <naman@fb.com>

* Update transformer_engine/pytorch/jit.py
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: ngoyal2707 <ngoyal2707@users.noreply.github.com>

* Update transformer_engine/pytorch/jit.py
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: ngoyal2707 <ngoyal2707@users.noreply.github.com>

* fixed incorrect call to fused bias dropout add kernel
Signed-off-by: Naman Goyal <naman@fb.com>

* Update transformer_engine/pytorch/jit.py
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* Separate FC1 and FC2 use_bias args; solves all ci errors
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* jit fusion improvement
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Docs
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Naman Goyal <naman@fb.com>
Signed-off-by: ngoyal2707 <ngoyal2707@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Naman Goyal <naman@fb.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

82dde778

Conditional dgrad computation for Linear API (#134) · a2e19b7a

Kirthi Shankar Sivamani authored Apr 06, 2023



* small cleanup before starting
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* conditional dgrad for Linear
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* add tests and small improvements to LNLinear and LNMLP
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

a2e19b7a

04 Apr, 2023 1 commit

Add FP8 support for Ada (#129) · 96ad903c

Kirthi Shankar Sivamani authored Apr 04, 2023



* Add FP8 support for Ada
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* better message
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* lint fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Address review comments
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* better message for no fp8
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* same thing for onnx test
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix CI and review
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

96ad903c

03 Apr, 2023 1 commit

Bugfix - compute scale_inv when loading checkpoint (#123) · 66c10f7a

galagam authored Apr 04, 2023



* Bugfix - compute scale_inv when loading checkpoint
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

* Save inverse scale in extra state tensor + minor CR fixes
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>

* Fix lint
Co-authored-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Gal Hubara Agam <ghubaraagam@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

66c10f7a

30 Mar, 2023 1 commit

Change FP8 recipe defaults (#112) · 80542a0a

Kirthi Shankar Sivamani authored Mar 29, 2023



* Change FP8 recipe defaults
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Increase default amax history length
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Always check history size
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* no amax history for onnx export
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* revert onnx export test changes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix indices in onnx test
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>

80542a0a

29 Mar, 2023 1 commit

Fix FlashAttention tests (#99) · bcbd4be0

tcherckez-nvidia authored Mar 29, 2023


Signed-off-by: Tal Cherckez <tcherckez@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

bcbd4be0

23 Mar, 2023 1 commit

Fix GELU ONNX export (#111) · 06486a00

Neta Zmora authored Mar 23, 2023



* Fix GELU ONNX export

* Wrap GELU export with cast to/from FP32 to achieve same compute precision as TE.
* Increase GELU export test thresholds.
* Change export to ONNX opset 17 for smaller representation of LN (single node instead of subgraph).
* Remove the need for LN work-around for ORT
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Add docstring to te_onnx_extensions.py::compute_in_fp32
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Tune threshold for GELU ONNX export

Ran 8K test instances to verify the threshold.
Allow 2 coefficients to escape threshold. Two wrong coefficients
are not a failure.
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

---------
Signed-off-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

06486a00

18 Mar, 2023 1 commit

Add an option to serialize test i/o to file (ONNX export tests) (#107) · e4a84a8d

Neta Zmora authored Mar 18, 2023



Add an option to serialize test i/o to file

Small refactoring of the inferencing code.
Change the default directory where generated ONNX files are stored.
Use the temp directory to avoid clogging the file system.
Add an option to serialize test input/output tensors to a
Polygraphy RunResults object.
Signed-off-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

e4a84a8d

17 Mar, 2023 1 commit

Improve PyTorch test harness (#102) · 2c996359

Kirthi Shankar Sivamani authored Mar 17, 2023



* add layernorm1p fp8 test
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* combine tests for easy maintenance
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* using torch.autocast for AMP and check grad types
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Add test for wgrad accumulation fusion
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* rename file
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Setup numerical tests + SAR
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Add test for full activation recompute
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Add tests for checkpoint load/store
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* TE vs framework numerical tests
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix ci
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* relax thresholds
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

2c996359

16 Mar, 2023 1 commit

Add a temporary workaround to layernorm ONNX export (#95) · 44d64abc

Neta Zmora authored Mar 16, 2023



* Add a temporary workaround to layernorm export

Seems like ORT is performing template-matching for LN and incorrectly concludes
that it doesn't have a kernel for FP32 LN. The work-around adds the addition of
fake_zero which is meant to prevent the template matching while keeping the graph
virtually unchanged. This also requires `do_constant_folding=False` in
`torch.onnx.export`.
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Adjust test threshold
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Opened an ORT bug and added the link for tracking
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Fix Python linter errors
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Simplify the LN workaround solution (ONNX export)

After discussing https://github.com/microsoft/onnxruntime/issues/15021


with Microsoft engineers, replaced the LN workaround with a simpler
implementation.

In addition:
* To make test more robust add `allow_cnt_errors` to `validate_result`
* Add more documentation to clarify the purpose and methodology of the
ONNX export tests
Signed-off-by: Neta Zmora <nzmora@nvidia.com>

* Fix unused import
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* Fix unused import
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* Fix unused import
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

---------
Signed-off-by: Neta Zmora <nzmora@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

44d64abc

11 Mar, 2023 1 commit

deprecate qk layer scaling and fp32 softmax args (#90) · 81429b80

Kirthi Shankar Sivamani authored Mar 11, 2023



* deprecate qk layer scaling and fp32 softmax args
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* apply QK layer scaling for fp16 training
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* address review comments
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

81429b80

07 Mar, 2023 1 commit

Fix flash attention (#84) · 37a12c4e

Kirthi Shankar Sivamani authored Mar 07, 2023



* ignore self attention mask for causal type
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* further relax checks to run FA, update docs
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix pytorch softmax path
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* minimum ampere requirement for fa
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

37a12c4e

24 Feb, 2023 1 commit

Move TE/PyTorch UT to tests/pytorch/ (#78) · 97b344cd

Jeng Bai-Cheng authored Feb 24, 2023



* move TE/PyTorch UT to tests/pytorch

1. move tests/* files to tests/pytorch/
2. adjust UT paths in qa/L0_unittest/test.sh
Signed-off-by: Ryan Jeng <rjeng@nvidia.com>

* update build.yml
Signed-off-by: Ryan Jeng <rjeng@nvidia.com>

---------
Signed-off-by: Ryan Jeng <rjeng@nvidia.com>

97b344cd