-
zlsh80826 authored
* Unfused scale+softmax if bias is present Signed-off-by:
Reese Wang <rewang@nvidia.com> * WAR a causal masking + no_bias bug and add the unittests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix the optional args (bias) sharding Signed-off-by:
Reese Wang <rewang@nvidia.com> * Disable fused attn in JAX by default, enable it with NVTE_USE_FUSED_ATTN Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add thread local for the plan cache Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename dbeta to dbias for the readability Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add scaled softmax with dropout test cases Signed-off-by:
Reese Wang <rewang@nvidia.com> * Updated NVTE_FUSED_ATTN variable name Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
69003969