test/verify/test_conv_group_add.cpp · 97a1ed2de7cd5103aed50b10c634adfe041e7681 · gaoqiong / MIGraphX

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

test_conv_group_add.cpp 2.07 KB

Replace test_conv_group_add.cpp