[Refactor] Merge bulk copy into copy and improve layout inference for bulk copy (#746)
* [Refactor] Merge bulk copy into copy and refactor layout inference for bulk copy * Deleted the `bulk_copy` operator implementation and its header file as it is no longer needed. * Introduced a new function `cuTensorMapType()` to return the data type for CUDA tensor mapping. * Updated related files to reflect these changes, ensuring that the codebase remains clean and maintainable. * lint fix * Fix typos in intrinsic names and remove unused print statement in block_sparse_attn_tilelang.py. Updated references from `ptx_ldmatirx` to `ptx_ldmatrix` across multiple files for consistency. * remove bulk copy * Refactor copy and atomic add operations to support TMA lower configuration - Updated `GetCopyInst` to accept a `disable_tma_lower` parameter, allowing for conditional usage of TMA in bulk load/store operations. - Modified `Lower` method in `Copy` to incorporate the new TMA configuration. - Refactored `AtomicAdd::...
Showing
This diff is collapsed.
Please register or sign in to comment