1. 05 Oct, 2023 2 commits
  2. 04 Oct, 2023 3 commits
    • zjing14's avatar
      Grouped conv bwd data with fp16 input and bf8fp8 comp (#962) · 04f93aad
      zjing14 authored
      
      
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      
      * Update naming (#937)
      
      * Add a client example
      
      * Add computetypes to device and gridwise ops
      
      * Add instances, update instance factory
      
      * Format
      
      * Fix a flag
      
      * Add ckProfiler mode
      
      * Fix typos
      
      * Add an example
      
      * Add bf8 generator
      
      * add bf8 mfma; fixed type_convert for bf8
      
      * move verfication ahead of timing
      
      * Update reference calculation
      
      * Fix reference
      
      * Narrow down float init range
      
      * Fix bf8 bf8 mfma
      
      * Add bf8 @ fp8 mfma
      
      * Update example
      
      * Update instances
      
      * Update profiler api
      
      * Update for compatibility
      
      * Format
      
      * Remove extra example
      
      * Clean up
      
      * workaround convert
      
      * added instance of f16_bf8f8, and client example
      
      * fixed mfma selector
      
      * format
      
      ---------
      Co-authored-by: default avatarRostyslav Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      04f93aad
    • Rostyslav Geyyer's avatar
      Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945) · 42facfc6
      Rostyslav Geyyer authored
      
      
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      
      * Update naming (#937)
      
      * Add a client example
      
      * Add computetypes to device and gridwise ops
      
      * Add instances, update instance factory
      
      * Format
      
      * Fix a flag
      
      * Add ckProfiler mode
      
      * Fix typos
      
      * Add an example
      
      * Add bf8 generator
      
      * add bf8 mfma; fixed type_convert for bf8
      
      * move verfication ahead of timing
      
      * Update reference calculation
      
      * Fix reference
      
      * Narrow down float init range
      
      * Fix bf8 bf8 mfma
      
      * Add bf8 @ fp8 mfma
      
      * Update example
      
      * Update instances
      
      * Update profiler api
      
      * Update for compatibility
      
      * Format
      
      * Remove extra example
      
      * Clean up
      
      * workaround convert
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      42facfc6
    • zjing14's avatar
      3d grouped conv fwd with input/output fp16 and comp fp8 (#931) · e921e1f0
      zjing14 authored
      
      
      * add f8 comp instance
      
      * fixed
      
      * fixed comments
      
      * rename
      
      * fixed dtype
      
      * format
      
      * fixed CI
      
      * fixed ci
      
      * add missing ComputeType
      
      * fixed cit
      
      * fixed
      
      * Update cmake-ck-dev.sh
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      e921e1f0
  3. 03 Oct, 2023 3 commits
  4. 02 Oct, 2023 3 commits
    • Rostyslav Geyyer's avatar
      Add fp8 @ bf8 gemm support and example (#933) · bd09b5c5
      Rostyslav Geyyer authored
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      bd09b5c5
    • Illia Silin's avatar
      59dbb01f
    • zjing14's avatar
      Contraction multi abd (#957) · 9d58c421
      zjing14 authored
      
      
      * add gridwise_multi_abd
      
      * move element_op into RunRead
      
      * merge element_wise op with data read
      
      * add multiABD example
      
      * allow packed elementwise_op
      
      * changed example
      
      * clean
      
      * clean
      
      * add is_detected
      
      * fix
      
      * minor fix
      
      * add scaleAdd_vec4 example
      
      * init commit for contraction_multi_ABD
      
      * add examples
      
      * add examples of multiA and broadcast
      
      * update example
      
      * fixed comments
      
      * Update cmake-ck-dev.sh
      
      * Update cmake-ck-dev.sh
      
      * Add comments into the example
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      9d58c421
  5. 29 Sep, 2023 2 commits
    • Illia Silin's avatar
      6b5f6473
    • Bartlomiej Wroblewski's avatar
      Add support for mixed precision in contraction scale and bilinear (#936) · f0748506
      Bartlomiej Wroblewski authored
      * Extract common functionality to separate files
      
      * Reference contraction: Remove incorrect consts from type_converts
      
      * Reference contraction: Add missing type_convert for dst value
      
      * Reference contraction: Fix incorrect order of B matrix dimensions
      
      * Add support for mixed precision in contraction scale and bilinear
      
      * Move using statements from instances to a common file
      
      * Move using statements from examples to a common file
      
      * Fix the order of B matrix dimensions across examples and profiler
      
      * Fix the computation of error threshold
      
      * Make ComputeDataType an optional argument
      
      * Include possible DataType -> ComputeDataType casting error in the threshold
      
      * Remove commented code
      f0748506
  6. 28 Sep, 2023 2 commits
  7. 27 Sep, 2023 5 commits
  8. 26 Sep, 2023 4 commits
  9. 23 Sep, 2023 1 commit
  10. 22 Sep, 2023 1 commit
  11. 21 Sep, 2023 1 commit
    • Illia Silin's avatar
      Refactoring cmake files to build data types separately. (#932) · bba085d2
      Illia Silin authored
      * refactor cmake files for the tests
      
      * refactor cmake files for examples
      
      * fix cmake for gemm example
      
      * fix the cmake file for all examples
      
      * add splitting by data types in gemm_splitk instance header
      
      * rename test to reflect only dl instances are used
      
      * clean up CI workspace, update cmake for instances
      
      * change the jenkinsfile syntax
      
      * build all instances except DL on gfx11
      
      * move workspace cleanup after stages
      
      * clean up workspace after every stage
      
      * isolate data types in grouped_conv_fwd header
      
      * isolate dl instances for grouped_conv2d_fwd
      
      * fix syntax
      
      * fix cmake and batchnorm instances
      
      * fix typo
      
      * fix reduction instances
      
      * fix grouped_conv headers
      
      * fix syntax
      
      * replace parsing logic for instances, replace bfp16 with bf16
      
      * fix the client examples build
      
      * clean up DTYPES from instances cmake files
      
      * update the parsing logic in cmake files
      
      * make an exception for reduction kernels
      
      * update few remaining cmake files to handle DTYPES
      
      * fix syntax
      
      * fix cmake conflicts
      
      * replace f8 with fp8 test name
      
      * resolve conflicts for dpp instances
      bba085d2
  12. 20 Sep, 2023 1 commit
  13. 19 Sep, 2023 2 commits
  14. 18 Sep, 2023 2 commits
  15. 15 Sep, 2023 2 commits
    • Bartlomiej Kocot's avatar
      Stylistic improvements for grouped convolution code · bc2d0583
      Bartlomiej Kocot authored
      Remove unnecessary ignoring
      
      Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
      bc2d0583
    • zjing14's avatar
      Add fp16/fp8 support into Grouped gemm FixedNK (#874) · f9d0eddb
      zjing14 authored
      
      
      * move all arguments into device
      
      * add b2c_tile_map
      
      * add examples
      
      * add SetDeviceKernelArgs
      
      * dedicated fixed_nk solution
      
      * init client api
      
      * add grouped_gemm_bias example
      
      * add a instance
      
      * add instances
      
      * formatting
      
      * fixed cmake
      
      * Update EnableCompilerWarnings.cmake
      
      * Update cmake-ck-dev.sh
      
      * clean; fixed comments
      
      * fixed comment
      
      * add instances for fp32 output
      
      * add instances for fp32 output
      
      * add fp32 out client example
      
      * fixed CI
      
      * init commit for kbatch
      
      * add splitk gridwise
      
      * format
      
      * fixed
      
      * clean deviceop
      
      * clean code
      
      * finish splitk
      
      * fixed instances
      
      * change m_loops to tile_loops
      
      * add setkbatch
      
      * clean code
      
      * add splitK+bias
      
      * add instances
      
      * opt mk_nk instances
      
      * clean examples
      
      * fixed CI
      
      * remove zero
      
      * finished non-zero
      
      * clean
      
      * clean code
      
      * optimized global_barrier
      
      * fixed ci
      
      * fixed CI
      
      * instance and client
      
      * removed AddBias
      
      * format
      
      * fixed CI
      
      * fixed CI
      
      * move 20_grouped_gemm to 21_grouped_gemm
      
      * clean
      
      * formatting
      
      * clean
      
      * clean
      
      * fixed computeType
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      f9d0eddb
  16. 14 Sep, 2023 1 commit
  17. 13 Sep, 2023 4 commits
  18. 12 Sep, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Refactor f8_t, add bf8_t (#792) · 62d4af74
      Rostyslav Geyyer authored
      * Refactor f8_t to add bf8_t
      
      * Add check_err impl for f8_t
      
      * Update fp8 test
      
      * Format
      
      * Revert the fix
      
      * Update vector_type implementation
      
      * Add bf8 test
      
      * Add bf8, use BitInt types
      
      * Add bf8 conversion methods
      
      * Update type_convert for fp8/bf8
      
      * Add check_err fp8/bf8 support
      
      * Add subnorm fp8 tests
      
      * Add subnorm bf8 tests
      
      * Fix conversion
      
      * Add bf8 cmake bindings
      
      * Add macros to enable build with disabled fp8/bf8
      
      * Remove is_native method
      
      * Update flag combination for mixed precision instances
      
      * Add more flag checks
      
      * Add another flag to a client example
      
      * Add type traits, decouple f8/bf8 casting
      
      * Clean up
      
      * Decouple fp8 and bf8 flags
      
      * Remove more redundant flags
      
      * Remove leftover comments
      62d4af74