• Shaojie WANG's avatar
    Optimization for gridwise group norm (#453) · 40942b90
    Shaojie WANG authored
    
    
    * use another instance to check the efficiency
    
    * optimize group layer norm
    
    * 1. coalesce load/store data for gridwise layer norm welford. 2. move a sqrt and divison into a outer static loop
    
    * add more instances to layernorm
    
    * add 2 more test cases
    
    * remove ignore in generating tuple of vector
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    40942b90
test_groupnorm_fp16.cpp 2.27 KB