• Po Yen Chen's avatar
    [CK_TILE] Change output accum tensor layout of fmha fwd split-kv & combine kernels (#1527) · a1c07e8d
    Po Yen Chen authored
    * Use same layout for o_acc and o tensor
    
    * Use better param names in partitioner
    
    * Remove redundant kargs 'max_seqlen_q'
    
    * Use better param names in splitkv kernel
    
    * Add comment for additional kernel arguments
    
    * Sync empty loop early return logics between pipelines
    
    * Pass more arguments to cmake in scripts
    
    * Align backslashes
    
    * Fix wrong o_acc tensor view strides
    
    * Change o_acc layout if o_perm=0
    
    * Handle whole row masked via attn_bias
    
    * Use use vector width = 1 for o_acc
    
    * Use more even split sizes
    a1c07e8d
cmake-ck-release.sh 1.21 KB