• Phil Wang's avatar
    backwards for softcapping (#1033) · 5f1ae4a3
    Phil Wang authored
    * check in the two ways of approaching backwards for softcapping, both functional
    
    * prepare the softcap switch for backwards
    
    * temporary
    
    * cleanup to the way Tri prefers
    
    * calculate dtanh when copying from scores -> dtanh Tensor
    
    * no ternary operators allowed for constexpr, so just use some hack found online
    
    * fix maybe_dtanh, restore some files
    
    * restore another file
    
    * move calculate_dtanh to utils and colocate with apply_softcap
    
    * cleanup
    
    * maybe last cleanup
    
    * save for another pr
    
    * remove a stray line
    
    * fix spacing
    
    * fix an issue, and make test_flash_attn.py ready to test softcapping backwards
    5f1ae4a3
flash_bwd_kernel.h 47.6 KB