- 16 Jun, 2023 2 commits
-
-
Neta Zmora authored
* Fix softmax ONNX export * BF16 is validated using "fake i/o": ie. instead of using BF16 as input/output, use FP32 input/output and convert to/from BF16 in the forward method. * Wrap softmax symbolic functions with conversion to/from FP32 to produce the same semantics as TE's softmax (compute is performed at FP32 precision regardless of input/output data type). Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * ONNX export Code refactoring Share function compute_in_fp32 between softmax.py (softmax symbolic functions) and te_onnx_extensions.py (the rest of the symbolic functions). Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Fix exports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* ReLU ONNX export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add GLU variants Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * linter check Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Export reglu, geglu, swiglu Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Added ReLU and GLU variants to common Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * pyTorch changes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * PyTorch C++ lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix storage errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Compute bgrad Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix numerical tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX export tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2023 1 commit
-
-
galagam authored
* add bf16 subgraph tests Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * changes: 1. Add normal mode BF16 tests for all subgraphs 2. Add fake BF16 tests for low-level subgraphs 3. Separate IO serialization from validation Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * ONNX export test - BF16 support part 1 TE infer returns torch.tensor, to support output of bf16 which is currently not supported in numpy Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * ONNX export test - BF16 support part 2 - Separate TE infer from serialize - Fix serialize function to use full path - Set unique filenames for fake bf16 (avoid overriding standard bf16) - Remove overwriting fake_bf16_io value Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Export test: Slight tolerance increase in test_export_gpt_generation Causes sporadic failures ~1% of all runs Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Remove GEMM fake-bf16 export test and patch to enable it Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Asfiya Baig <asfiyab@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 31 May, 2023 1 commit
-
-
Tim Moon authored
* Refactor Setuptools build system Successfully launches CMake install, but installs CMake extensions in temp dir. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug JAX build Fix pybind11 import. Distinguish between build-time and run-time dependencies. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add helper function to determine dependencies Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add missing license Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug case where system CMake is too old Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add missing license Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Simplify sanity import tests Just importing modules provides richer error messages. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Properly install submodules Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Install helper library for TensorFlow Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Do not install Ninja by default Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Include Git commit hash in version string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Override build_ext.build_extensions instead of build_ext.run Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix incorrect include path Restore Ninja dependency. Restore overriding build_ext.run func. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @nouiz Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable parallel Ninja jobs in GitHub actions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Properly install userbuffers lib Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak install docs Review suggestion from @ksivaman Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add examples for specifying framework in docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 13 May, 2023 1 commit
-
-
Neta Zmora authored
* Dynamically-generated causal attention mask (for ONNX export) TE's default causal mask is square (seq_len, seq_len) and is dynamically allocated for different sequence sizes. Dynamic allocation and dictionary lookups are not supported by ONNX. GPT generative phase uses rectangular masks. This commit forces softmax to use `forward_torch_softmax` and to dynamically generate an attention mask when exporting to ONNX. The mask is generated w/o using conditional control-flow by generating a (k_seq_len, k_seq_len) mask and slicing it to (q_seq_len, k_seq_len) An alternate implementation is to pre-allocate a mask of shape (max_seq, max_seq) and to slice that. This solution is more performant at the expense of space, but the problem is the TE doesn't have a concept of max_seq. * Add to test_export_softmax a test for te.softmax.FusedScaleMaskSoftmax. * Add test_softmax_mask_fn to test that TE's default attention mask and the new ONNX-compatible mask produce the same behavior. * Add test_export_gpt_generation to test that the ONNX model can correctly handle inputs with different shapes and that the attention mask it adjusted on-the-fly to different sequence lengths. Misc: * Add a PRNG seeding fixture for more stability in tests. * Add dynamic shapes for ONNX input/output tests. * Allow validate_result to compare ORT output to pre-computed TE outputs. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Add NVTE_ONNX_KVCACHE_MAX_SEQ_LEN for efficient text-generation in inference * Introduce an environment variable (NVTE_ONNX_KVCACHE_MAX_SEQ_LEN) to set the maximum sequence length. In ONNX inference with KV-Cache optimizations for GPT text generation, the attention mask shape can be square (context-phase) or rectangular (generation-phase). When exporting to ONNX and this variable is set, TE preallocates an upper triangular (k=1) matrix with a size as prescribed by the variable, and dynamically slices the mask for the required shape. TE models can be exported to ONNX when NVTE_ONNX_KVCACHE_MAX_SEQ_LEN is not configured, but the attention masking is always square and not fit for efficient text generation. * Work-around torch.onnx.export bug that incorrectly folds layer_norm(data, scale=add(gamma,1)) to layer_norm(data, scale=gamma) when we use LN with zero-centered gamma. * ONNX export tests * Add a fixture (seed_default_rng) to seed the PRNG * Add a fixture (set_max_seq_len) to set the max sequence length when exporting to ONNX for GPT text generation Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Fix linting errors Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Remove immutable default values from a couple of function signatures Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Add @skip_FP8 to test_export_gpt_generation Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Update transformer_engine/pytorch/softmax.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CI error for softmax export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 May, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* LayerNormMLP numeric test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * DotProductAttention numeric test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 09 May, 2023 3 commits
-
-
galagam authored
* ONNX export - input names fix * Fix discrepencies due to input names not defined correctly/not passed to export * Refactor ORT input feed creation for simplicity * Control whether to save test IO files via environment variable Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * ONNX export test: minor refactor Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
galagam <96368689+galagam@users.noreply.github.com> --------- Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
galagam <96368689+galagam@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Initial refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * refactor attention out of transformer.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Neta Zmora authored
* ONNX export refactoring * Remove infer_ort (to enable more testing) * Add BF16 ORT tests for Q/DQ ops and GELU. * Use FP32 i/o instead of BF16 (because ORT doesn't support BF16 i/o) and add casts from FP32 to BF16 (this is only for subgraph inputs and outputs). * We'll need to add more BF16 testing. * GEMM: * Add cast after DQ to achieve better performance (matmul at sub-fp32 precisions). * Fold bias into Gemm operation (=> smaller graphs) * Wrap GEMM-GELU with FP32 (TE implements GELU in FP32) * Enable tests for cross attention (test_export_multihead_attention) * Reduce test thresholds for test_export_layernorm_mlp, test_export_layernorm_linear, test_export_layernorm Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Loosen MHA export validation thresholds for FP16 Signed-off-by:
Neta Zmora <nzmora@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com>
-
- 03 May, 2023 1 commit
-
-
galagam authored
Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 Apr, 2023 1 commit
-
-
galagam authored
* Fixes to test_onnx_export when saving input and output tensors - Allow saving i/o tensors when onnxruntime inference is skipped - Support saving multiple outputs Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Apr, 2023 1 commit
-
-
Neta Zmora authored
* iFix LN ONNX export When exporting LayerNorm make sure that the weights and bias inputs have the same type as the LN input. Also: * Add a regression test. * Add environment variable to override directory of generated test artifacts Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * fix envvar Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Apr, 2023 1 commit
-
-
Tim Moon authored
Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add tests for cuda graph capture Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add sanity test and address reviews Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Apr, 2023 1 commit
-
-
Neta Zmora authored
* Fix model load exception when state resides on GPU - Whenever converting a torch.tensor to numpy, we need to first migrate the tensor storage to the host CPU. - Add a warning not to do contant-folding when exporting to ONNX. This is due to a torch.onnx export bug. - Refactor compare_outputs Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Onnx export: Improve remark text Signed-off-by:
Neta Zmora <nzmora@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Apr, 2023 2 commits
-
-
ngoyal2707 authored
* made bias configurable Signed-off-by:
Naman Goyal <naman@fb.com> * removed commented lines Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * fixed incorrect call to fused bias dropout add kernel Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Separate FC1 and FC2 use_bias args; solves all ci errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * jit fusion improvement Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Naman Goyal <naman@fb.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Naman Goyal <naman@fb.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* small cleanup before starting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * conditional dgrad for Linear Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add tests and small improvements to LNLinear and LNMLP Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add FP8 support for Ada Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * better message Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * better message for no fp8 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * same thing for onnx test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CI and review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Apr, 2023 1 commit
-
-
galagam authored
* Bugfix - compute scale_inv when loading checkpoint Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Save inverse scale in extra state tensor + minor CR fixes Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Fix lint Co-authored-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Change FP8 recipe defaults Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Increase default amax history length Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Always check history size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * no amax history for onnx export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert onnx export test changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix indices in onnx test Co-authored-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Neta Zmora <nzmora@nvidia.com>
-
- 29 Mar, 2023 1 commit
-
-
tcherckez-nvidia authored
Signed-off-by:
Tal Cherckez <tcherckez@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 Mar, 2023 1 commit
-
-
Neta Zmora authored
* Fix GELU ONNX export * Wrap GELU export with cast to/from FP32 to achieve same compute precision as TE. * Increase GELU export test thresholds. * Change export to ONNX opset 17 for smaller representation of LN (single node instead of subgraph). * Remove the need for LN work-around for ORT Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Add docstring to te_onnx_extensions.py::compute_in_fp32 Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Tune threshold for GELU ONNX export Ran 8K test instances to verify the threshold. Allow 2 coefficients to escape threshold. Two wrong coefficients are not a failure. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Mar, 2023 1 commit
-
-
Neta Zmora authored
Add an option to serialize test i/o to file Small refactoring of the inferencing code. Change the default directory where generated ONNX files are stored. Use the temp directory to avoid clogging the file system. Add an option to serialize test input/output tensors to a Polygraphy RunResults object. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* add layernorm1p fp8 test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * combine tests for easy maintenance Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * using torch.autocast for AMP and check grad types Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add test for wgrad accumulation fusion Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rename file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Setup numerical tests + SAR Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add test for full activation recompute Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add tests for checkpoint load/store Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * TE vs framework numerical tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * relax thresholds Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Mar, 2023 1 commit
-
-
Neta Zmora authored
* Add a temporary workaround to layernorm export Seems like ORT is performing template-matching for LN and incorrectly concludes that it doesn't have a kernel for FP32 LN. The work-around adds the addition of fake_zero which is meant to prevent the template matching while keeping the graph virtually unchanged. This also requires `do_constant_folding=False` in `torch.onnx.export`. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Adjust test threshold Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Opened an ORT bug and added the link for tracking Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Fix Python linter errors Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Simplify the LN workaround solution (ONNX export) After discussing https://github.com/microsoft/onnxruntime/issues/15021 with Microsoft engineers, replaced the LN workaround with a simpler implementation. In addition: * To make test more robust add `allow_cnt_errors` to `validate_result` * Add more documentation to clarify the purpose and methodology of the ONNX export tests Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Fix unused import Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix unused import Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix unused import Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 11 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* deprecate qk layer scaling and fp32 softmax args Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * apply QK layer scaling for fp16 training Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* ignore self attention mask for causal type Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * further relax checks to run FA, update docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix pytorch softmax path Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minimum ampere requirement for fa Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 24 Feb, 2023 1 commit
-
-
Jeng Bai-Cheng authored
* move TE/PyTorch UT to tests/pytorch 1. move tests/* files to tests/pytorch/ 2. adjust UT paths in qa/L0_unittest/test.sh Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * update build.yml Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> --------- Signed-off-by:
Ryan Jeng <rjeng@nvidia.com>
-