"experiments/vscode:/vscode.git/clone" did not exist on "49265377a4a210bb9b6c5e6eed7ddd53ed76f69f"
  1. 26 Oct, 2023 1 commit
  2. 19 Oct, 2023 3 commits
  3. 18 Oct, 2023 1 commit
    • Po Yen Chen's avatar
      Pre-compute coordinates to speed up store_tile() for TileWindowWithStaticDistribution<> (#12) · 63bc96e3
      Po Yen Chen authored
      
      
      * Extract store_tile() logics as method
      
      * Extract load_tile() logics as method
      
      * Rename type alias
      
      * Extract common logics as traits
      
      * Remove unnecessary access specifier
      
      * Add ComputeMode for TileWindowWithStaticDistribution
      
      * Put field check into Traits
      
      * More definition of Traits types
      
      * Use more clear static_assert() message
      
      * Enable pre-compute coordinates in store_tile()
      
      * Re-formate static assert
      
      * Undo changes to the wrong method
      
      * Enable pre-compute coords for store_tile()
      
      * Remove static_vector usage
      
      * Add method to move non-member coordinates
      
      * Force using pre-computed coordinates in Store()
      
      * Fix wrong access for SFC_Ys
      
      * Change comment
      
      * Allow users to hint # access per coord
      
      * Add comment for noting remove data members later
      
      * Unify FIXME comments
      
      * Replace FIXME comments by TODO
      
      * Let user specify HintNumCoords
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor load/store for window
      
      * clean
      
      * clean
      
      * bug fix for window; clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      63bc96e3
  4. 12 Oct, 2023 2 commits
    • Chao Liu's avatar
      Refactor 1010 (#14) · 7337ec25
      Chao Liu authored
      * refactor
      
      * refactor
      
      * change load_tile, update block gemm
      
      * debug
      
      * clean
      
      * clean
      
      * experiment lod
      
      * workaround spilling issue
      
      * clean
      7337ec25
    • carlushuang's avatar
      slice kv, and use 3d padding LDS layout (#15) · 7b1a0b7f
      carlushuang authored
      * slice kv, and use 3d padding LDS layout
      
      * add missing sync
      
      * put sync to another poace
      
      * move sync place
      
      * revert to normal
      7b1a0b7f
  5. 06 Oct, 2023 1 commit
    • carlushuang's avatar
      add tensor slicing API (#7) · 6491acda
      carlushuang authored
      
      
      * add tensor slicing API
      
      * remove redundant ck namespace
      
      * better gemm_gemm interface
      
      * modify gemm_gemm
      
      * add slice_tile api
      
      * fix merge bug
      
      * update to 3d padding, since we no longer need that much LDS size
      
      * clean
      
      * cleang
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      ---------
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6491acda
  6. 03 Oct, 2023 1 commit
    • Chao Liu's avatar
      Shuffle in thread (#13) · 1cf54e86
      Chao Liu authored
      * adding in-thread shuffle
      
      * update softmax example
      
      * refactor grid gemm
      
      * refactor gemm: layouts
      
      * bug fix
      
      * clean
      
      * clean
      1cf54e86
  7. 14 Sep, 2023 2 commits
  8. 13 Sep, 2023 1 commit
  9. 07 Sep, 2023 1 commit
  10. 05 Sep, 2023 5 commits
  11. 04 Sep, 2023 1 commit
  12. 31 Aug, 2023 3 commits
    • zjing14's avatar
      Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0
      zjing14 authored
      
      
      * move all arguments into device
      
      * add b2c_tile_map
      
      * add examples
      
      * add SetDeviceKernelArgs
      
      * dedicated fixed_nk solution
      
      * init client api
      
      * add grouped_gemm_bias example
      
      * add a instance
      
      * add instances
      
      * formatting
      
      * fixed cmake
      
      * Update EnableCompilerWarnings.cmake
      
      * Update cmake-ck-dev.sh
      
      * clean; fixed comments
      
      * fixed comment
      
      * add instances for fp32 output
      
      * add instances for fp32 output
      
      * add fp32 out client example
      
      * fixed CI
      
      * init commit for kbatch
      
      * add splitk gridwise
      
      * format
      
      * fixed
      
      * clean deviceop
      
      * clean code
      
      * finish splitk
      
      * fixed instances
      
      * change m_loops to tile_loops
      
      * add setkbatch
      
      * clean code
      
      * add splitK+bias
      
      * add instances
      
      * opt mk_nk instances
      
      * clean examples
      
      * fixed CI
      
      * remove zero
      
      * finished non-zero
      
      * clean
      
      * clean code
      
      * optimized global_barrier
      
      * fixed ci
      
      * fixed CI
      
      * removed AddBias
      
      * format
      
      * fixed CI
      
      * fixed CI
      
      * move 20_grouped_gemm to 21_grouped_gemm
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      f5ec04f0
    • rocking's avatar
      MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de
      rocking authored
      * Add maxpool instances
      
      * Rename index pool to max pool.
      
      * Add maxpool bwd bf16 instances
      
      * Add avg pool bwd instances
      
      * Rename avgpool and maxpool to avg_pool3d and max_pool
      
      * Add bf16 pool fwd instances
      
      * Add max pool bwd to ckProfiler
      
      * Add avg pool3d bwd to ckProfiler
      
      * Add avg pool bwd test
      
      * Fix bug of reference pool fwd (dilation)
      
      * Fix bug of max pool bwd  (dilation and initZero)
      
      * Support bf16 compute data type
      
      * Force compute type be f32. Because atomicAdd only support f32
      
      * Add max pool bwd test
      
      * Rename folder
      
      * Rename pool
      
      * Add max pool bwd client example
      
      * Add avg pool bwd client example
      
      * Add missing workspace
      
      * clang format
      
      * Rename macro
      
      * remove useless header
      
      * remove useless layout
      866377de
    • Illia Silin's avatar
      fix gemm_streamk example on mi300 (#875) · bf1912ed
      Illia Silin authored
      bf1912ed
  13. 30 Aug, 2023 1 commit
  14. 29 Aug, 2023 1 commit
  15. 28 Aug, 2023 1 commit
  16. 23 Aug, 2023 4 commits
  17. 22 Aug, 2023 3 commits
  18. 18 Aug, 2023 1 commit
  19. 17 Aug, 2023 1 commit
  20. 14 Aug, 2023 2 commits
    • Bartlomiej Wroblewski's avatar
      d4c84256
    • rocking's avatar
      Refactor pool fwd (#815) · f60f0a5e
      rocking authored
      * Do not hardcode stride
      
      * devicePool2DFwd Inherit devicePool3DFwd
      
      * Move instance declaration out of common
      
      * Add dilation
      
      * use the pool3d rank, because pool2d inherit pooo3d
      
      * calculate Do Ho Wo for the dilation
      
      * Fix header name
      
      * Modify ckProfiler
      
      * Remove pool2d instance
      
      * Remove pool2d in profiler
      
      * Remove pool2d and add dilation
      
      * In to client example, this commit revise following:
      1. Add dilation.
      2. Use pool3d to implement pool2d
      
      * Refine naming and IsSupportedArgument()
      
      * Add dilation to maxpool bwd example
      
      * clang format
      
      * 1. Remove useless header
      2. Fix copyright
      3. Refine naming
      
      * Add layout parameter to pool fwd
      
      * clang format
      
      * Fix merge error
      
      * Fix compile error
      
      * Remove layout parameter in derived class
      
      * Refine changlog
      
      * Fix compile error
      
      * Fix compiler error
      
      * Add layout to external api and profiler
      f60f0a5e
  21. 11 Aug, 2023 2 commits
  22. 10 Aug, 2023 2 commits
    • Illia Silin's avatar
      Add the rocm5.7 RC1 compiler and use it for QA builds. (#842) · 6237bd12
      Illia Silin authored
      * add docker for rocm5.7 RC1
      
      * fix rocm5.7 rc1 build
      
      * build QA with rocm5.7 rc1 compiler
      6237bd12
    • rocking's avatar
      Average pool backward deviceOP and example (#797) · 578142db
      rocking authored
      * Add avgpool bwd reference code
      
      * Refine naming
      
      * Fix invalid in_element op in ref_conv
      
      * Add example (only reference now)
      
      * Add the full example of avgpool bwd
      
      * Fix copyright
      
      * Imitate MakeDescriptor from  transform_conv_bwd_data_to_gemm_v1.hpp
      
      * rename channel to c from k
      
      * Arrange the code
      
      * Imitate the argument from conv bwd
      
      * Implement invoker
      
      * Fix order of parameter in example
      
      * Refactor reference code for different dimension
      
      * Support different stride
      
      * Check if argument is valid
      
      * Fix kernel parameter for NDHWC, fastest dimension C is not reduced
      
      * Add more data type in example
      
      * Fix bug in example
      
      * calculate Do Ho Wo according to the dilation
      
      * Remove useless header
      
      * Add comment in reference code
      
      * Add layout parameter
      
      * Remove layout in derived class
      
      * Refine reference comment
      578142db