Unverified Commit 4007289a authored by Max Podkorytov's avatar Max Podkorytov
Browse files

copy over fmha example



remove bwd related commands from cmakelists

remove unused ops in the example;

select only bf16/nodropout/nolse/batched

pass validation in the example driver

fork pipeline

add a hardcoded score_mod

fork the kernel

abstract score_mod from a pipeline

unhardcode score_mod and pass it as a cpp expression from codegen

modify host attention impl accounting for score_mod

use custom score for testing

reorder score mod and scale in host verification

use cmakelists as the single source of truth for score_mod function definition

fix numeric mismatches

run clang-format

remove bwd related scripts

edit test and benchmark scripts for the new example

remove readme

remove unused cases from smoke test

re-add group-mode kernels

Add pre_softmax fnctor (#1852)

* Add pre_softmax fnctor

* remove stray define:wq

* Move op out of pipeline, adds it to refnc

---------
Co-authored-by: default avatarroot <root@splinter-126-wr-d1.aus.dcgpu>
Co-authored-by: default avatarMax Podkorytov <4273004+tenpercent@users.noreply.github.com>

added flex_attention in Jenkins file

fixing clang

fixing clang

space added

fixed copyright  errors

fixed even more clangformat

formatting

modified jenkins

fixed typo

added flex attention test for gfx90a and gfx942

fixed typo

fixed example name

fixed example script name

added perf logs for both gpu arch

pipeline fixes for accuracy issues; disable pre-softmax function until its accuracy is fixed

added stash and unstash for perf logs

fixed typo in perf name

print error message

print success  message

hardcoded perf files names

flex attention jenkins switch off

flex attention jenkins switch off from settings

fixed typo

add context to score-mod signature
parent 8086bbe3
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment