Multiple changes to global kernel function.
* StorePartials work on offseted pointer. * Read flags as uint32_t value * Accumulate partials only if there is more than one cooperating workgroup * Add condition for waiting on reduction end, only when there is still work to do. * Fix creation od a/b grid desc in CheckArgument. * LaunchKernel will use preprocess lambda to set flags value to zero. * Add condition in IsSupportedArgument to check if xdl is supported.
Showing
Please register or sign in to comment