1. 05 Oct, 2023 8 commits
  2. 04 Oct, 2023 3 commits
    • zjing14's avatar
      Grouped conv bwd data with fp16 input and bf8fp8 comp (#962) · 04f93aad
      zjing14 authored
      
      
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      
      * Update naming (#937)
      
      * Add a client example
      
      * Add computetypes to device and gridwise ops
      
      * Add instances, update instance factory
      
      * Format
      
      * Fix a flag
      
      * Add ckProfiler mode
      
      * Fix typos
      
      * Add an example
      
      * Add bf8 generator
      
      * add bf8 mfma; fixed type_convert for bf8
      
      * move verfication ahead of timing
      
      * Update reference calculation
      
      * Fix reference
      
      * Narrow down float init range
      
      * Fix bf8 bf8 mfma
      
      * Add bf8 @ fp8 mfma
      
      * Update example
      
      * Update instances
      
      * Update profiler api
      
      * Update for compatibility
      
      * Format
      
      * Remove extra example
      
      * Clean up
      
      * workaround convert
      
      * added instance of f16_bf8f8, and client example
      
      * fixed mfma selector
      
      * format
      
      ---------
      Co-authored-by: default avatarRostyslav Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      04f93aad
    • Rostyslav Geyyer's avatar
      Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945) · 42facfc6
      Rostyslav Geyyer authored
      
      
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      
      * Update naming (#937)
      
      * Add a client example
      
      * Add computetypes to device and gridwise ops
      
      * Add instances, update instance factory
      
      * Format
      
      * Fix a flag
      
      * Add ckProfiler mode
      
      * Fix typos
      
      * Add an example
      
      * Add bf8 generator
      
      * add bf8 mfma; fixed type_convert for bf8
      
      * move verfication ahead of timing
      
      * Update reference calculation
      
      * Fix reference
      
      * Narrow down float init range
      
      * Fix bf8 bf8 mfma
      
      * Add bf8 @ fp8 mfma
      
      * Update example
      
      * Update instances
      
      * Update profiler api
      
      * Update for compatibility
      
      * Format
      
      * Remove extra example
      
      * Clean up
      
      * workaround convert
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      42facfc6
    • zjing14's avatar
      3d grouped conv fwd with input/output fp16 and comp fp8 (#931) · e921e1f0
      zjing14 authored
      
      
      * add f8 comp instance
      
      * fixed
      
      * fixed comments
      
      * rename
      
      * fixed dtype
      
      * format
      
      * fixed CI
      
      * fixed ci
      
      * add missing ComputeType
      
      * fixed cit
      
      * fixed
      
      * Update cmake-ck-dev.sh
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      e921e1f0
  3. 03 Oct, 2023 3 commits
  4. 02 Oct, 2023 3 commits
    • Rostyslav Geyyer's avatar
      Add fp8 @ bf8 gemm support and example (#933) · bd09b5c5
      Rostyslav Geyyer authored
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      bd09b5c5
    • Illia Silin's avatar
      59dbb01f
    • zjing14's avatar
      Contraction multi abd (#957) · 9d58c421
      zjing14 authored
      
      
      * add gridwise_multi_abd
      
      * move element_op into RunRead
      
      * merge element_wise op with data read
      
      * add multiABD example
      
      * allow packed elementwise_op
      
      * changed example
      
      * clean
      
      * clean
      
      * add is_detected
      
      * fix
      
      * minor fix
      
      * add scaleAdd_vec4 example
      
      * init commit for contraction_multi_ABD
      
      * add examples
      
      * add examples of multiA and broadcast
      
      * update example
      
      * fixed comments
      
      * Update cmake-ck-dev.sh
      
      * Update cmake-ck-dev.sh
      
      * Add comments into the example
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      9d58c421
  5. 29 Sep, 2023 3 commits
  6. 28 Sep, 2023 2 commits
  7. 27 Sep, 2023 6 commits
  8. 26 Sep, 2023 4 commits
  9. 23 Sep, 2023 1 commit
  10. 22 Sep, 2023 1 commit
  11. 21 Sep, 2023 1 commit
    • Illia Silin's avatar
      Refactoring cmake files to build data types separately. (#932) · bba085d2
      Illia Silin authored
      * refactor cmake files for the tests
      
      * refactor cmake files for examples
      
      * fix cmake for gemm example
      
      * fix the cmake file for all examples
      
      * add splitting by data types in gemm_splitk instance header
      
      * rename test to reflect only dl instances are used
      
      * clean up CI workspace, update cmake for instances
      
      * change the jenkinsfile syntax
      
      * build all instances except DL on gfx11
      
      * move workspace cleanup after stages
      
      * clean up workspace after every stage
      
      * isolate data types in grouped_conv_fwd header
      
      * isolate dl instances for grouped_conv2d_fwd
      
      * fix syntax
      
      * fix cmake and batchnorm instances
      
      * fix typo
      
      * fix reduction instances
      
      * fix grouped_conv headers
      
      * fix syntax
      
      * replace parsing logic for instances, replace bfp16 with bf16
      
      * fix the client examples build
      
      * clean up DTYPES from instances cmake files
      
      * update the parsing logic in cmake files
      
      * make an exception for reduction kernels
      
      * update few remaining cmake files to handle DTYPES
      
      * fix syntax
      
      * fix cmake conflicts
      
      * replace f8 with fp8 test name
      
      * resolve conflicts for dpp instances
      bba085d2
  12. 20 Sep, 2023 1 commit
  13. 19 Sep, 2023 2 commits
  14. 18 Sep, 2023 2 commits