-
Michael Goldfarb authored
* Use 64-bit offsets for cuDNN 9.5+ * Align workspace tensors to 16B. * Fix bug where std::accumulate overflowed on large tensor shapes. * Only support 64-bit offsets on arbitrary sequence length fp16 backend. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
7b18f235