* update out product mean test * polish out product mean test and update linear test * update test for layernorm, use both kernel * support test without triton * polish layernorm * use current code