1. 11 Feb, 2025 1 commit
    • Max Podkorytov's avatar
      copy over fmha example · 4007289a
      Max Podkorytov authored
      
      
      remove bwd related commands from cmakelists
      
      remove unused ops in the example;
      
      select only bf16/nodropout/nolse/batched
      
      pass validation in the example driver
      
      fork pipeline
      
      add a hardcoded score_mod
      
      fork the kernel
      
      abstract score_mod from a pipeline
      
      unhardcode score_mod and pass it as a cpp expression from codegen
      
      modify host attention impl accounting for score_mod
      
      use custom score for testing
      
      reorder score mod and scale in host verification
      
      use cmakelists as the single source of truth for score_mod function definition
      
      fix numeric mismatches
      
      run clang-format
      
      remove bwd related scripts
      
      edit test and benchmark scripts for the new example
      
      remove readme
      
      remove unused cases from smoke test
      
      re-add group-mode kernels
      
      Add pre_softmax fnctor (#1852)
      
      * Add pre_softmax fnctor
      
      * remove stray define:wq
      
      * Move op out of pipeline, adds it to refnc
      
      ---------
      Co-authored-by: default avatarroot <root@splinter-126-wr-d1.aus.dcgpu>
      Co-authored-by: default avatarMax Podkorytov <4273004+tenpercent@users.noreply.github.com>
      
      added flex_attention in Jenkins file
      
      fixing clang
      
      fixing clang
      
      space added
      
      fixed copyright  errors
      
      fixed even more clangformat
      
      formatting
      
      modified jenkins
      
      fixed typo
      
      added flex attention test for gfx90a and gfx942
      
      fixed typo
      
      fixed example name
      
      fixed example script name
      
      added perf logs for both gpu arch
      
      pipeline fixes for accuracy issues; disable pre-softmax function until its accuracy is fixed
      
      added stash and unstash for perf logs
      
      fixed typo in perf name
      
      print error message
      
      print success  message
      
      hardcoded perf files names
      
      flex attention jenkins switch off
      
      flex attention jenkins switch off from settings
      
      fixed typo
      
      add context to score-mod signature
      4007289a