[PyTorch] Minor fixes for TE 2.2 (#1589)
* skip cuDNN 9.8 for KV caching Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert from max_seqlen_kv to max_sequence_length for InferenceParams Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rename test_paged_attn to test_kv_cache Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove redundant None returns in bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add debug flags when no backend is found Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip kv_cache_accuracy tests for cuDNN 9.8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * truncate length of cu_seqlens for consistency with q/k/v shape Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add back padding_brcm for fused attn tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * re-enable kv_cache_accuracy test for 9.8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cuDNN search dir Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixes based on review Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove extra empty line Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Showing
Please register or sign in to comment