copy over fmha example
remove bwd related commands from cmakelists remove unused ops in the example; select only bf16/nodropout/nolse/batched pass validation in the example driver fork pipeline add a hardcoded score_mod fork the kernel abstract score_mod from a pipeline unhardcode score_mod and pass it as a cpp expression from codegen modify host attention impl accounting for score_mod use custom score for testing reorder score mod and scale in host verification use cmakelists as the single source of truth for score_mod function definition fix numeric mismatches run clang-format remove bwd related scripts edit test and benchmark scripts for the new example remove readme remove unused cases from smoke test re-add group-mode kernels Add pre_softmax fnctor (#1852) * Add pre_softmax fnctor * remove stray define:wq * Move op out of pipeline, adds it to refnc --------- Co-authored-by:root <root@splinter-126-wr-d1.aus.dcgpu> Co-authored-by:
Max Podkorytov <4273004+tenpercent@users.noreply.github.com> added flex_attention in Jenkins file fixing clang fixing clang space added fixed copyright errors fixed even more clangformat formatting modified jenkins fixed typo added flex attention test for gfx90a and gfx942 fixed typo fixed example name fixed example script name added perf logs for both gpu arch pipeline fixes for accuracy issues; disable pre-softmax function until its accuracy is fixed added stash and unstash for perf logs fixed typo in perf name print error message print success message hardcoded perf files names flex attention jenkins switch off flex attention jenkins switch off from settings fixed typo add context to score-mod signature
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment