• Shaojie WANG's avatar
    MNKO padding support on bmm+masking+scale+softmax+bmm+premute (#425) · ebab84b6
    Shaojie WANG authored
    
    
    * add lower triangle bmm
    
    * init code for tile skipping
    
    * functionality right with lower triangle mask
    
    * add decoder lower triangular mask calculation
    
    * use 7*13 group
    
    * fix n2 compute error
    
    * attention with lower triangle mask with tile skipping
    
    * add template to distinguish masking kernel
    
    * rename template and remove default template value
    
    * remove lower triangle gemm reference struct
    
    * add some comments on example
    
    * add 10 instance for masking bmm + scale + softmax + bmm + permute kernels
    
    * add test
    
    * add test file
    
    * add gtest for bmm masking scale softmax bmm permute
    
    * clang-format
    
    * fix compile error
    
    * check lef bottom corner for tile skipping
    
    * fix error: check left bottom corner for tile skipping
    
    * add k padding
    
    * add test and instance for MNK padding
    
    * passing a mask struct
    
    * fix instances
    
    * delete used comments
    
    * format
    Co-authored-by: default avatardanyao12 <yaodan@dc-smc-13.amd.com>
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    ebab84b6
CMakeLists.txt 1.73 KB