"...composable_kernel_rocm.git" did not exist on "e90cf8f926069b36a0cd941bcde7965b318f9c41"
Multiple changes to global kernel function.
* StorePartials work on offseted pointer. * Read flags as uint32_t value * Accumulate partials only if there is more than one cooperating workgroup * Add condition for waiting on reduction end, only when there is still work to do. * Fix creation od a/b grid desc in CheckArgument. * LaunchKernel will use preprocess lambda to set flags value to zero. * Add condition in IsSupportedArgument to check if xdl is supported.
Showing
Please register or sign in to comment