"...git@developer.sourcefind.cn:OpenDAS/mmdetection3d.git" did not exist on "e63e0473248c57a403b6f1e080f4f57b7f0ead15"
Layernorm welford (#346)
* Add threadwise and blockwise welford * Rename gridwise op, prepare to add welford version * implement welford and integrate welford into layernorm * Take care of tail loop * Fix buf when ThreadSliceK > 1 * Fix bug of merging of two empty set * Rename clip to clamp * 1. Fix type of count 2. Remove useless static_assert * Do not inherit Reduction::Argument * [What] replace __syncthreads() with block_sync_lds() [Why] __syncthreads might wait both lgkmcnt(0) and vmcnt(0) * Add y stride * Rename. DeviceLayernorm -> DeviceLayernormImpl DeviceNormalization2 -> DeviceLayernorm * Move literal ""_uz & ""_zu into namespace 'literals' * Move namespace 'literals' as 'ck::literals' Co-authored-by:Po-Yen, Chen <PoYen.Chen@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
Showing
This diff is collapsed.
Please register or sign in to comment