fix divergence in barriers (#3621)
Without this fix, we see cases in which not all work-items in a thread group end up hitting the same number of barriers, which leads to a hang in OpenCL GPU execution.
Showing
Please register or sign in to comment