copy over fmha example
remove bwd related commands from cmakelists
remove unused ops in the example;
select only bf16/nodropout/nolse/batched
pass validation in the example driver
fork pipeline
add a hardcoded score_mod
fork the kernel
abstract score_mod from a pipeline
unhardcode score_mod and pass it as a cpp expression from codegen
modify host attention impl accounting for score_mod
use custom score for testing
reorder score mod and scale in host verification
use cmakelists as the single source of truth for score_mod function definition
fix numeric mismatches
run clang-format
remove bwd related scripts
edit test and benchmark scripts for the new example
remove readme
remove unused cases from smoke test
re-add group-mode kernels
Add pre_softmax fnctor (#1852)
* Add pre_softmax fnctor
* remove stray define:wq
* Move op out of pipeline, adds it to refnc
---------
Co-authored-by:
root <root@splinter-126-wr-d1.aus.dcgpu>
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.norep...
Showing
Please register or sign in to comment