"git@developer.sourcefind.cn:yangql/composable_kernel.git" did not exist on "0a2657312ec62a65e92a36cebd7d3b2a3c0712e1"
[Dev] Adjust computation logic to avoid precision loss when casting acc_s from...
[Dev] Adjust computation logic to avoid precision loss when casting acc_s from float to float16 (#141) - Remove redundant `acc_s_0` fragment in flash attention kernel - Simplify memory copy and reduction operations - Reorder memory copy and scaling steps for improved performance - Add Hopper-specific synchronization method in CUDA reduce template - Update reduce operation to use architecture-specific synchronization
Showing
Please register or sign in to comment