[Bugfix] Fix mismatch of shared memory layout and mma atom on Hopper (#224)
* add test for issue 101
* use ss_smem_selector from cutlass
* fix mismatch between smem layout and mma
* only fix for sm90
* Add CUDA requirements to GEMM thread tests
* lint fix
---------
Co-authored-by:
Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
Showing
Please register or sign in to comment