1. 11 Jul, 2024 2 commits
    • Illia Silin's avatar
      Add CK_TILE tests to daily CI builds. (#1381) · 98a01bbc
      Illia Silin authored
      * add ck_tile tests to CI
      
      * build and run ck_tile tests on gfx90a and gfx942 in parallel
      
      * fix groovy syntax
      
      * turn ck_tile tests OFF by default
      
      * skip creating the build folder
      
      * build ck_tile examples with 64 threads
      
      * build ck_tile examples with cmake-ck-dev.sh script
      
      * add video group to docker on mi300
      
      * do not retry to rebuild the early CI stages
      
      * help prevent jenkins false failure
      
      * restore cron trigger
      98a01bbc
    • Illia Silin's avatar
      [Jenkins] restore cron jobs (#1380) · f914c228
      Illia Silin authored
      * test the cron trigger
      
      * fix the cron jobs
      
      * restore the list of cron jobs
      f914c228
  2. 10 Jul, 2024 2 commits
  3. 09 Jul, 2024 3 commits
  4. 08 Jul, 2024 2 commits
  5. 06 Jul, 2024 1 commit
    • Harisankar Sadasivan's avatar
      Universal streamk with atomics (#1360) · 75e622f0
      Harisankar Sadasivan authored
      * universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). 
      
      * Update README.md
      
      * fixing clang-format issues
      
      * removed conflicts in struct members between streamk and universal streamk
      
      * corrected arg parsing for streamk and universal streamk
      
      * added stream-k policies for 3 tile and 4 tile
      
      * fixed argument type issue with parsing cmd args
      
      * changes suggested in PR review are made- removing comments and correcting copyright
      
      * file permissions updated
      
      * added default value support for grid_size and streamk-policy selection set to -1
      
      * print messages for arguments
      
      * print messages for arguments
      
      * print messages for arguments1
      75e622f0
  6. 04 Jul, 2024 2 commits
  7. 28 Jun, 2024 2 commits
  8. 27 Jun, 2024 5 commits
  9. 26 Jun, 2024 2 commits
    • Po Yen Chen's avatar
    • Po Yen Chen's avatar
      [CK_TILE] fmha forward split-kv + combine kernels (#1338) · 0cb2e06d
      Po Yen Chen authored
      
      
      * FA fwd dropout
      
      * FA bwd
      
      * epilogue reuse
      
      * CMakeLists update
      
      * [CK_TILE] support alibi (#1269)
      
      * add alibi support
      
      * fix code
      
      * update code based on comment
      
      * Support more hdim
      
      * fix fp8 bias
      
      * support seqlen_k=0 case
      
      * remove unused printf
      
      * fix format
      
      ---------
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      
      * now fwd/bwd can build
      
      * bwd alibi
      
      * add bwd validation stream_config
      
      * update generated filenames
      
      * update bwd kernel launch
      
      * CK_TILE_HOST_DEVICE in philox
      
      * Transpose -> transpose
      
      * format
      
      * format
      
      * format
      
      * Generate the instance for FA required
      
      * format
      
      * fix error in WarpGemm
      
      * Add num_splits option and dummy split-kv api method
      
      * Generate fmha_fwd_splitkv()
      
      * Add SplitKV kernel codegen logics
      
      * Add SplitKV combine kernel codegen logics
      
      * Fix mismatched return type
      
      * Clean-up code
      
      * Replace sentinel value before storing
      
      * Fix wrong layout of LSE/LSEacc/Oacc
      
      * Format codes
      
      * Fix o_acc memory error
      
      * Fix wrong kBlockSize used in policy
      
      * Reduce # of combine kernels
      
      * Fix split-kv combine kernel name
      
      * Fix wrong LDS indexing logics
      
      * Fix wrong loop counter step logic
      
      * Undo vector size changes
      
      * Remove no-longer used field
      
      * Remove in-consistent comment
      
      * Remove debug statements in example
      
      * Remove more debug statements
      
      * Add constness to local variables
      
      * Clearn up generate.py
      
      * Fix unstable clang-format comment
      
      * Remove unused include directive
      
      * Use shorter template parameter name
      
      * Enable non-split-kv blobs
      
      * Update license date
      
      * Print num_splits conditionally
      
      * Undo disabling data types
      
      * Remove unnessary tile size for fp8
      
      * Fix wrong pipeline args for fp8
      
      * Fix example output format
      
      * Remove more debug code in combine pipeline
      
      * Add stride kernel arguments for LSE/O acc workspace
      
      * Re-order split-kv pipeline call operator arguments
      
      * Pass LSE/O strides in kernel argument
      
      * Re-order pipeline call operator arguments
      
      * Use tensor_descriptor to locate LSEacc elements
      
      * Support providing invalid element for tensor view
      
      * Set invalid element value for LSEacc tensor view
      
      * Remove hand-written store_tile() code
      
      * Remove necessary value-overwrite logic
      
      * Add transposed lds descriptor
      
      * Support load_tile() for tile_window_with_static_lengths<>
      
      * Undo removing necessary value-overwrite logic
      
      * Use read descriptor to locate lds elements
      
      * Simplify pipeline source code
      
      * Add constraint to kMaxSplits
      
      * Default use kMaxSplits=64 in generate.py
      
      * Revert "Add constraint to kMaxSplits"
      
      This reverts commit 0a2132d758042e6fb0292f4e354909b8a4d1c118.
      
      * Revert "Default use kMaxSplits=64 in generate.py"
      
      This reverts commit c7d9c80b77320aec6559222bed7d47adcaefe4e3.
      
      * Decide alignment by the padding parameter
      
      * Remove no-longer used utility functions
      
      * Remove not-working code
      
      * Add comment & remove no-longer used code
      
      * Fix computation errors
      
      * Add heuristic to override num_splits option
      
      * Add constraint to kMaxSplits
      
      * Fix compilation error
      
      * Clean up pipeline code
      
      * Wrap pointer access as lambda function
      
      * Rename confusing methods
      
      * Use kLogMasSplits as template parameter
      
      * Finish splitkv combine kernel codegen
      
      * Update kMaxSplits limit
      
      * Use smaller kM0 for splitkv combine kernel
      
      * Ignore droupout flag in splitkv pipeline
      
      * Unify flag usage
      
      * Add back flag kStoreLSE
      
      * Merge lambda calls in pipeline
      
      * Fix compilation errors
      
      * Avoid all empty splits
      
      * Always check for empty loop in splitkv pipelines
      
      * Re-order parameters
      
      * Remove redundant p_drop option check
      
      * Add traits/problem for fwd splitkv kernel
      
      * Conditionally enable uneven split boundary checks
      
      * Add comment for the splitkv traits field
      
      * Change even split criteria
      
      * Re-order statements
      
      * Refine occupancy value for hdim=128&256
      
      * Refine occupancy value for hdim=32&64
      
      * Remove redundant kernel argument
      
      * Separate fmha bwd codegen logics
      
      * Separate fmha fwd codegen logics
      
      * Remove redundant direction parameter in fwd&bwd codegen logics
      
      * Support generate multiple APIs for an example
      
      * Let 'api' an alias of 'direction' option
      
      * Remove choices for the 'direction' option
      
      * Use dictionary to config all the functions
      
      * Move fmha splitkv codegen logics to other file
      
      * Add fwd_splitkv api for tile_example_fmha_fwd
      
      ---------
      
      Co-authored-by: danyao12 <danyao12>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      0cb2e06d
  10. 25 Jun, 2024 1 commit
    • arai713's avatar
      CK Instance Gen (#1145) · 3e9711f0
      arai713 authored
      
      
      * Format
      
      * Format
      
      * Format
      
      * Remove const
      
      * Use the right template
      
      * Format
      
      * Format
      
      * add row/col instances
      
      * Add missing file
      
      * fixed
      
      * fixing block to etile error
      
      * Format
      
      * Updates
      
      * Format
      
      * fixed rrr layout
      
      * generating a sample JSON file: currently contains includes, prologue/epilogue and instances
      
      * version where the json is passed into the instances to generate a key
      
      * updated run function to just launch kernel
      
      * updated run function: only contains kernel object, json file is updated but still needs to be cleaned up, added front-end API to parse JSON into character buffer
      
      * adding in testing files
      
      * cleaned up comments, still need to work on including header files
      
      * removed unneeded files
      
      * removed/commented out JSON implementation
      
      * added fusion(prologue/epilogue) into instance generation
      
      * working on instance selection
      
      * added instance selection, need to fix instance validation
      
      * removed block2etile map validity check for testing purposes
      
      * test running: failing due to incorrect files/input
      
      * all grid descs/ptrs completed, but device file not found
      
      * Update test and embed modules
      
      * Restore older version
      
      * added convolution operation, written test, debugging generated code for compilation
      
      * attempting to include CK in host directory: _Float16 error
      
      * CK header file issues
      
      * slight fix
      
      * don't crash when hip can't report total memory
      
      * dump generated code to a file
      
      * changing sizes
      
      * creating tensor descriptors using CK methods: set up grid desc manually, also trying to set up an argument pointer - this needs to be fixed
      
      * some fixes to call the device code
      
      * separating test files for conv and gemm
      
      * completed arg ptr, now have linking errors
      
      * clang format fix
      
      * resolved linker issues in conv test
      
      * remove dependency on libutility from ck
      
      * resolved num dim error
      
      * properly passing arg ptr, errors with passing typenames: redefinition/redeclaration
      
      * undo the commenting of device function
      
      * hand created kernel code to find rtc issues
      
      * dump the full src to file
      
      * resolved redeclaration errors, cleaned up errors for Amber's kernel code
      
      * debugging purposes: redeclaration error
      
      * config files
      
      * resolved errors for NumTensor and redeclaration, formatted version.h
      
      * resolved most errors in manually added kernel and my own. error with calling kernel object: overloaded function type
      
      * WIP: close to getting kernel compiled
      
      * WIP: fixing rtc errors
      
      * fixed sequence errors, formatting, still one error with run fcn
      
      * yay: kernel compiles and runs
      
      * updated templated/generated version to run and compile
      
      * minor fixes
      
      * working generated example, resolved memory access error due to padding
      
      * adding in reference kernel, validation failing against reference
      
      * debugging: printing kernel argsz
      
      * reduced error in results
      
      * debugged reference kernel and output errors, added to generated version, currently debugging prologue function issues
      
      * working validation (using reference convolution) with prologue function for both hard-coded and generated version
      
      * WIP: create an alt version that creates Argument on the device
      
      * wip: added new duplicate files, fixed fusion templating errors from working example, setting up kernel arguments
      
      * wip: making necessary methods device code
      
      * added grid descs, working on grid pointers, errors with stl numerics
      
      * wip: updating kernel args - issue, replacing some std functions
      
      * replaced std::accumulate call with temp hardcoded version
      
      * wip: args causing memory issue
      
      * Construct Argument object inside the kernel and use it to call convolution device function. Code runs and verification passes
      
      * adding object file dump
      
      * temporary hardcoding of grid size, can remove device op inst + arg ptr
      
      * minor fix for grid size
      
      * added modified example where arg ptr is created on the device for generated version as well
      
      * removed device op instance and arg ptr from modified examples
      
      * moving device op file for testing purposes and to properly build CK
      
      * commenting out print-outs
      
      * adjust compiler args to produce a valid ELF file
      
      * temporary removal of validation
      
      * reverting compiler args back for working example
      
      * retrieve necessary arguments from generated template parameters in correct format
      
      * calculating grid size on host-side, still need to clean up process, pass parameters to host functions properly
      
      * scaled up factory functions/wrapper structs to implement host-side launch parameter calculations using CK host side functions - in hard-coded example
      
      * temporary change to generate ELF format binary object file
      
      * removed unecessary code, added comments
      
      * formatting fix
      
      * cleaned up code, added new tests, restructured library: move helper into CK
      
      * refactored launch parameter calculation to be more concise
      
      * renamed files and variables for more clarity/uniformity
      
      * more code cleaning, removed debug statements
      
      * moved majority of my files into codegen directory, running properly
      
      * updated Embed.cmake(string_view) in codegen directory
      
      * updated host directory to match Embed.cmake as well
      
      * added old tests in
      
      * updated instance generation methods to be more concise
      
      * removed layout from launch parameter calculation
      
      * working test
      
      * fixed issue with verification, all instances working
      
      * updated verification in other tests
      
      * removed duplicate matrix padder file, removed code dumps
      
      * removed old hard-coded tests
      
      * removed old host directory, all files in codegen directory now
      
      * fixed copyright in files
      
      * commenting out validation
      
      * renamed files
      
      * made changes for review: fixed copyright, renamed files for clarity, removed comments, refactored code
      
      * updated headers
      
      * removing duplicate file for fwd conv to gemm, merging with original file
      
      * fix building codegen with clang++ directly
      
      * resolving build error from conv_fwd_to_gemm
      
      * fix for previous error
      
      * renaming tests
      
      * created common test file
      
      * cleaned up code, added comments
      
      * renamed device op
      
      * fixed typos in comments
      
      * removed extra space
      
      * code cleanup: resolving Amber's comments
      
      * removed wrapper struct for matrix padder, fixed template
      
      * cleaned up if statements for better readability
      
      ---------
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarM. Amber Hassaan <amber_474@yahoo.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      3e9711f0
  11. 24 Jun, 2024 1 commit
  12. 22 Jun, 2024 1 commit
  13. 21 Jun, 2024 2 commits
  14. 20 Jun, 2024 3 commits
  15. 19 Jun, 2024 2 commits
  16. 18 Jun, 2024 3 commits
  17. 17 Jun, 2024 2 commits
  18. 14 Jun, 2024 1 commit
  19. 13 Jun, 2024 1 commit
  20. 12 Jun, 2024 1 commit
  21. 11 Jun, 2024 1 commit