1. 09 Oct, 2024 1 commit
  2. 07 Oct, 2024 1 commit
  3. 04 Oct, 2024 1 commit
  4. 02 Oct, 2024 1 commit
  5. 25 Sep, 2024 1 commit
  6. 20 Sep, 2024 2 commits
  7. 13 Sep, 2024 1 commit
    • Jun Liu's avatar
      Customize filesystem in CK for legacy systems (#1509) · 81bc1496
      Jun Liu authored
      
      
      * Legacy support: customized filesystem
      
      * Update cmakefile for python alternative path
      
      * fix build issues
      
      * CK has no boost dependency
      
      * More fixes to issues found on legay systems
      
      * fix clang format issue
      
      * Check if blob is correctly generated in cmake
      
      * fix the python issues
      
      * add a compiler flag for codegen when using alternative python
      
      * use target_link_options instead of target_compile_options
      
      ---------
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      81bc1496
  8. 12 Sep, 2024 1 commit
  9. 11 Sep, 2024 2 commits
  10. 05 Sep, 2024 2 commits
  11. 03 Sep, 2024 1 commit
  12. 02 Sep, 2024 1 commit
  13. 21 Aug, 2024 2 commits
    • Andriy Roshchenko's avatar
      Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473) · c3515f27
      Andriy Roshchenko authored
      * Enable CMakePresets build
      
      * Verify Convolution, Scaling and ReLU algorithms.
      
      * Add tensor element-wise scale and type cast operation.
      
      * Reduction implemented but does not work.
      
      * Exploration of Reduction functionality.
      
      * Completed example for Convolution scaled with ReLu activation and AMAX reduction.
      
      * WIP: Add required instances for convolution.
      
      * WIP: Create client example. Implement convolution stage.
      
      * Add elementwise instances.
      
      * Add elementwise scale + convert example.
      
      * Add reduction instances.
      
      * WIP: Client example for AMAX reduction.
      
      * WIP: Add instances for multistage reduction.
      
      * WIP: Implementation of multistage reduction.
      
      * Refactoring.
      
      * Clean up.
      
      * Add CMakePresets.json
      
      * Guard off FP8 instances when the data type is not available.
      
      * Add example for Scaled FP8 Convolution with AMAX reduction.
      
      * Refactor CombConvScaleRelu instances.
      
      * Add CombConvScale instances.
      
      * Add client example for Scaled FP8 Convolution with AMAX reduction.
      
      * Cleanup.
      c3515f27
    • Rostyslav Geyyer's avatar
      Set RNE fp8 conversion as a default (#1458) · e20f20ef
      Rostyslav Geyyer authored
      * Set RNE fp8 conversion as a default
      
      * Update f8 tests
      
      * Disable failing test on gfx11
      
      * Update bf8 tests
      
      * Add a flag
      
      * Fix the flag
      
      * Raise flag for gfx10 as well
      
      * Temp commit for tolerance testing
      
      * Update tolerances
      e20f20ef
  14. 14 Aug, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] gemm_universal related optimization (#1453) · 3049b546
      Haocong WANG authored
      
      
      * replace buffer_atomic with global_atomic
      
      * fixed global_atomic_add
      
      * added bf16 atomic_add
      
      * format
      
      * clang-format-12
      
      * clean
      
      * clean
      
      * add guards
      
      * Update gtest.cmake
      
      * enabled splitk_gemm_multi_d
      
      * format
      
      * add ckProfiler
      
      * format
      
      * fixed naming
      
      * format
      
      * clean
      
      * clean
      
      * add guards
      
      * fix clang format
      
      * format
      
      * add kbatch printout
      
      * clean
      
      * Add rocm6.2 related gemm optimization
      
      * Limit bf16 atomic usage
      
      * remove redundant RCR gemm_universal instance
      
      * Add RRR fp8 gemm universal instance
      
      * Bug fix
      
      * Add GPU_TARGET guard to FP8/BF8 target
      
      * bug fix
      
      * update cmake
      
      * remove all fp8/bf8 example if arch not support
      
      * Enable fp8 RRR support in ckProfiler
      
      * limit greedy-reverse flag to gemm_universal in ckProfiler
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      3049b546
  15. 13 Aug, 2024 1 commit
  16. 10 Aug, 2024 1 commit
  17. 09 Aug, 2024 1 commit
  18. 07 Aug, 2024 1 commit
  19. 06 Aug, 2024 3 commits
  20. 31 Jul, 2024 1 commit
  21. 30 Jul, 2024 1 commit
  22. 25 Jul, 2024 1 commit
  23. 24 Jul, 2024 2 commits
    • Andriy Roshchenko's avatar
      Adding more instances of grouped convolution 3d forward for FP8 with... · 4a8a1bef
      Andriy Roshchenko authored
      Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
      
      * Add CMakePresets configurations.
      
      * Add binary elementwise ConvScaleAdd and an example.
      
      * Numerical verification of results.
      
      Observed significant irregularities in F8 to F32 type conversions:
      ```log
      ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
      ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
      ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
      ```
      
      * Implemented ConvScaleAdd + Example.
      
      * Add ConvScale+Bias Instances
      
      * Add Client Example for ConvScale+Bias
      
      * Fix number of bytes in an example..
      
      * Cleanup.
      4a8a1bef
    • Bartłomiej Kocot's avatar
      Add support for half_t and bfloat to reduction operations (#1395) · ffabd70a
      Bartłomiej Kocot authored
      * Add support for half_t and bfloat to reduction operations
      
      * Fix bhalf convert
      
      * Next fix bf16
      ffabd70a
  24. 22 Jul, 2024 1 commit
  25. 19 Jul, 2024 3 commits
    • Haocong WANG's avatar
      [GEMM] F8 GEMM, performance optimized. (#1384) · 8c90f25b
      Haocong WANG authored
      
      
      * add ab_scale init support
      
      * enabled interwave
      
      * add scale type; update isSupport
      
      * adjust example
      
      * clean
      
      * enable f8 pure gemm rcr ckprofiler
      
      * Add gemm_multiply_multiply instances
      
      * clang format
      
      * Optimize for ScaleBlockMNK=128
      
      * enable abscale f8 gemm ck profiler
      
      * Add pure f8 gemm test suite
      
      * Reverting to the state of project at f60fd77
      
      * update copyright
      
      * clang format
      
      * update copyright
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      8c90f25b
    • ltqin's avatar
      Universal gemm splitk using reduce (with multi-d) (#1341) · c544eb4d
      ltqin authored
      
      
      * init for reduce_threadwise multi_d
      
      * add reduce_threadwise_multi_d
      
      * add reduce_multi_d
      
      * clean
      
      * start add an other splitk device op
      
      * add reduce template parameter to SplitKBatchOffset
      
      * add reduce c matrix
      
      * clean up code
      
      * change example data type to bf16
      
      * add bf16Ai8B example
      
      * remove reduce template parameter
      
      * add splitk atomic status to v4
      
      * example add multi d parameters
      
      * device op add multi-d parameters
      
      * add multi-d to reduce
      
      * fix kbach=1 bug
      
      * change B layout to col in  bf16Ai8B example
      
      * remove float adding struct
      
      * change  multi-d interface
      
      * change file and class name
      
      * remove multi-d of bf16Ai8B example
      
      * change IsReduce function to IsReduceAdd
      
      * change example layout to RRR from RCR
      
      * according layout to set ds stride
      
      * reset parameter layout
      
      * add gemm universal reduce instance
      
      * add reduce factory
      
      * add profile_gemm_universal_reduce
      
      * add reduce to profiler
      
      * fix reduce instance
      
      * fix profiler reduce compiling bug
      
      * format
      
      * format library instance code
      
      * add mem instance for reduce library
      
      * fix call instance names
      
      * add workspace for reduce in ckProfiler
      
      * format
      
      * add mnpading to reduce library instance
      
      * add fp16 instance to reduce of profiler
      
      * change copyright time
      
      * restore profiler cmake file
      
      * add reduce text to instances
      
      * add DsLayout and DsDataType to instances template parameter
      
      * fixed gemm_reduce_multi_d
      
      * add an example without multi_d
      
      * Update common.hpp
      
      * Update gtest.cmake
      
      * Update gemm_xdl_splitk_reduce_bf16.cpp
      
      * clean
      
      * Update gtest.cmake
      
      * format
      
      * fixe api
      
      * format
      
      * default parameter change to RRR
      
      * add vector_len for multi_d
      
      * format
      
      * Update gtest.cmake
      
      * fix bf16A iBB elementwiseop
      
      * add ReduceDataType
      
      * move ReduceDataType to end position
      
      * format
      
      * remove googletest git method  address
      
      * fix copyright time
      
      * update init data
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      c544eb4d
    • Bartłomiej Kocot's avatar
      Refactor transform conv to gemm fwd (#1391) · 70a814f1
      Bartłomiej Kocot authored
      * Refactor transform conv to gemm fwd
      
      * fixes codegen
      
      * wmma fixes
      
      * fix wmma
      
      * Fix copyright
      70a814f1
  26. 17 Jul, 2024 1 commit
  27. 16 Jul, 2024 1 commit
  28. 12 Jul, 2024 1 commit
  29. 06 Jul, 2024 1 commit
    • Harisankar Sadasivan's avatar
      Universal streamk with atomics (#1360) · 75e622f0
      Harisankar Sadasivan authored
      * universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). 
      
      * Update README.md
      
      * fixing clang-format issues
      
      * removed conflicts in struct members between streamk and universal streamk
      
      * corrected arg parsing for streamk and universal streamk
      
      * added stream-k policies for 3 tile and 4 tile
      
      * fixed argument type issue with parsing cmd args
      
      * changes suggested in PR review are made- removing comments and correcting copyright
      
      * file permissions updated
      
      * added default value support for grid_size and streamk-policy selection set to -1
      
      * print messages for arguments
      
      * print messages for arguments
      
      * print messages for arguments1
      75e622f0
  30. 04 Jul, 2024 2 commits