[DCU] fix 48 FA fails,thread overflow and norm_mlp.
Signed-off-by:Wuyufan <Wuyf1@sugon.com> 解决了如下问题: 1. FA 48个单测报错问题(fA输入:[B,S,H,D] reshape to [blocknums,blocksize,H,D]). 2. 在解决FA 报错的时候一个kernel launch error. 3. norm_mlp问题暂时用rest_rng_state解决 See merge request dcutoolkit/deeplearing/TransformerEngine!77 Co-authored-by:
Tangao <2205747538@qq.com> Co-authored-by:
wuyufffan <1095978552@qq.com>
Showing
Please register or sign in to comment