-
Neta Zmora authored
* Add dynamically shaped input mask in test_export_softmax * Fix test_softmax_mask_fn - use env. var `NVTE_ONNX_KVCACHE_MAX_SEQ_LEN` to control whether the test uses the default mask generation function or dynamic TRILU mask slicing. * Change core_attention ONNX export test: use "no_mask" as attn mask type when testing `te.attention.DotProductAttention` w/o masking. * Use ORT CUDA backend by default. Signed-off-by:Neta Zmora <nzmora@nvidia.com>
a0f44354