1. 19 Jul, 2024 1 commit
    • ltqin's avatar
      Universal gemm splitk using reduce (with multi-d) (#1341) · c544eb4d
      ltqin authored
      
      
      * init for reduce_threadwise multi_d
      
      * add reduce_threadwise_multi_d
      
      * add reduce_multi_d
      
      * clean
      
      * start add an other splitk device op
      
      * add reduce template parameter to SplitKBatchOffset
      
      * add reduce c matrix
      
      * clean up code
      
      * change example data type to bf16
      
      * add bf16Ai8B example
      
      * remove reduce template parameter
      
      * add splitk atomic status to v4
      
      * example add multi d parameters
      
      * device op add multi-d parameters
      
      * add multi-d to reduce
      
      * fix kbach=1 bug
      
      * change B layout to col in  bf16Ai8B example
      
      * remove float adding struct
      
      * change  multi-d interface
      
      * change file and class name
      
      * remove multi-d of bf16Ai8B example
      
      * change IsReduce function to IsReduceAdd
      
      * change example layout to RRR from RCR
      
      * according layout to set ds stride
      
      * reset parameter layout
      
      * add gemm universal reduce instance
      
      * add reduce factory
      
      * add profile_gemm_universal_reduce
      
      * add reduce to profiler
      
      * fix reduce instance
      
      * fix profiler reduce compiling bug
      
      * format
      
      * format library instance code
      
      * add mem instance for reduce library
      
      * fix call instance names
      
      * add workspace for reduce in ckProfiler
      
      * format
      
      * add mnpading to reduce library instance
      
      * add fp16 instance to reduce of profiler
      
      * change copyright time
      
      * restore profiler cmake file
      
      * add reduce text to instances
      
      * add DsLayout and DsDataType to instances template parameter
      
      * fixed gemm_reduce_multi_d
      
      * add an example without multi_d
      
      * Update common.hpp
      
      * Update gtest.cmake
      
      * Update gemm_xdl_splitk_reduce_bf16.cpp
      
      * clean
      
      * Update gtest.cmake
      
      * format
      
      * fixe api
      
      * format
      
      * default parameter change to RRR
      
      * add vector_len for multi_d
      
      * format
      
      * Update gtest.cmake
      
      * fix bf16A iBB elementwiseop
      
      * add ReduceDataType
      
      * move ReduceDataType to end position
      
      * format
      
      * remove googletest git method  address
      
      * fix copyright time
      
      * update init data
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      c544eb4d
  2. 02 Apr, 2024 1 commit
    • Illia Silin's avatar
      Split the instances by architecture. (#1223) · ae57e593
      Illia Silin authored
      * parse examples inside the add_example_executable function
      
      * fix the example 64 cmake file
      
      * add xdl flag to the gemm_bias_softmax_gemm_permute example
      
      * add filtering of tests based on architecture type
      
      * enable test_grouped_gemm for gfx9 only
      
      * enable test_transpose only for gfx9
      
      * only linnk test_transpose if it gets built
      
      * split the gemm instances by architectures
      
      * split gemm_bilinear,grouped_conv_bwd_weight instances by targets
      
      * split instances by architecture
      
      * split grouped_conv instances by architecture
      
      * fix clang format
      
      * fix the if-else logic in group_conv headers
      
      * small fix for grouped convolution instances
      
      * fix the grouped conv bwd weight dl instances
      
      * fix client examples
      
      * only enable client examples 3 and 4 on gfx9
      
      * set the gfx9 macro
      
      * make sure the architecture macros are set by cmake
      
      * use separate set of xdl/wmma flags for host code
      
      * sinmplify the main cmake file
      
      * add conv_fwd_bf8 instance declaration
      ae57e593
  3. 12 Feb, 2024 1 commit
  4. 07 Feb, 2024 1 commit
  5. 19 Jan, 2024 1 commit
  6. 18 Oct, 2023 1 commit
  7. 21 Sep, 2023 1 commit
    • Illia Silin's avatar
      Refactoring cmake files to build data types separately. (#932) · bba085d2
      Illia Silin authored
      * refactor cmake files for the tests
      
      * refactor cmake files for examples
      
      * fix cmake for gemm example
      
      * fix the cmake file for all examples
      
      * add splitting by data types in gemm_splitk instance header
      
      * rename test to reflect only dl instances are used
      
      * clean up CI workspace, update cmake for instances
      
      * change the jenkinsfile syntax
      
      * build all instances except DL on gfx11
      
      * move workspace cleanup after stages
      
      * clean up workspace after every stage
      
      * isolate data types in grouped_conv_fwd header
      
      * isolate dl instances for grouped_conv2d_fwd
      
      * fix syntax
      
      * fix cmake and batchnorm instances
      
      * fix typo
      
      * fix reduction instances
      
      * fix grouped_conv headers
      
      * fix syntax
      
      * replace parsing logic for instances, replace bfp16 with bf16
      
      * fix the client examples build
      
      * clean up DTYPES from instances cmake files
      
      * update the parsing logic in cmake files
      
      * make an exception for reduction kernels
      
      * update few remaining cmake files to handle DTYPES
      
      * fix syntax
      
      * fix cmake conflicts
      
      * replace f8 with fp8 test name
      
      * resolve conflicts for dpp instances
      bba085d2
  8. 23 Aug, 2023 1 commit
  9. 22 Aug, 2023 1 commit
  10. 07 Aug, 2023 1 commit
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
  11. 15 Jun, 2023 1 commit
    • Illia Silin's avatar
      Enable gfx941 and gfx942 architectures. (#752) · 027e46ee
      Illia Silin authored
      * enable gfx941/942 targets
      
      * fix clang format
      
      * fix the cmake logic for multiple targets
      
      * fix cmake syntax for looping over targets
      
      * add gfx941/942 support for gemm_xdl instances
      027e46ee
  12. 31 May, 2023 1 commit
  13. 23 May, 2023 1 commit
    • Illia Silin's avatar
      Enable gemm_dl and other kernels on Navi3x. (#714) · d821d1e5
      Illia Silin authored
      * enable dl kernels on navi3
      
      * do not build xdl tests and examples on Navi
      
      * run tests before building everything on jenkins
      
      * disable gemm_bilinear on gfx1030
      
      * add gpu targets to installer on Navi
      
      * put tests in the same order as before
      
      * reduce the number of navi targets in CI
      
      * build CI installed for gfx940 as well
      
      * only build for MI300 during QA runs
      d821d1e5
  14. 11 Nov, 2022 1 commit
    • Po Yen Chen's avatar
      Rangify constructor of HostTensorDescriptor & Tensor<> (#445) · 4a2a56c2
      Po Yen Chen authored
      * Rangify STL algorithms
      
      This commit adapts rangified std::copy(), std::fill() & std::transform()
      
      * Rangify check_err()
      
      By rangifying check_err(), we can not only compare values between
      std::vector<>s, but also compare any ranges which have same value
      type.
      
      * Allow constructing Tensor<> like a HostTensorDescriptor
      
      * Simplify Tensor<> object construction logics
      
      * Remove more unnecessary 'HostTensorDescriptor' objects
      
      * Re-format example code
      
      * Re-write more HostTensorDescriptor ctor call
      4a2a56c2
  15. 13 Oct, 2022 1 commit
    • Adam Osewski's avatar
      Refactor device op implementations into `impl` subdirectory. (#420) · 30480288
      Adam Osewski authored
      
      
      * Move kernel implementation files under impl directory.
      
      * Update examples paths.
      
      * Update device kernel impl include paths.
      
      * Update tensor operation instances include paths.
      
      * Update profiler and tests include paths.
      
      * Clang-format
      
      * Update include paths for batched gemm reduce
      
      * Refactor UnitTest ConvNDBwdWeight.
      
      * Refactor fwd and bwd data convND UT.
      
      * Fix used test macro.
      
      * Fix include path.
      
      * Fix include paths.
      
      * Fix include paths in profiler and tests.
      
      * Fix include paths.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      30480288
  16. 25 Aug, 2022 1 commit
  17. 23 Aug, 2022 1 commit