1. 29 Feb, 2024 2 commits
  2. 24 Feb, 2024 1 commit
  3. 14 Feb, 2024 1 commit
  4. 24 Jan, 2024 1 commit
    • Illia Silin's avatar
      Fixing most of the cppcheck errors. (#1142) · 180e5720
      Illia Silin authored
      * fix cppcheck errors, first pass
      
      * fix format
      
      * fix returned value in examples
      
      * add macro definitions for cppcheck
      
      * fix the profile_gemm logic
      
      * update the gemm profiler logic
      
      * add more difinitions to cppcheck, fix couple more errors
      
      * replace runtime error with message in device function
      
      * fix a couple of int4 issues
      
      * no return for fill function
      
      * fix errors in data_types.hpp
      
      * fix format
      
      * fix few remaining errors
      
      * fix errors in data_types.hpp
      
      * fix last couple of errors in datat_types.hpp
      180e5720
  5. 19 Jan, 2024 1 commit
  6. 27 Nov, 2023 1 commit
  7. 25 Nov, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Add basic support for direct loads from global to LDS (#999) · 627054b9
      Bartlomiej Wroblewski authored
      * Add basic support for direct loads from global to LDS
      
      * Clean the code and comments
      
      * Add support for fp16
      
      * Add comments
      
      * Add check for thread cluster lengths
      
      * Align non-direct-load fp16 example
      
      * Small fixes
      
      * Extend IsSupported to check for supported GPU gens
      
      * Build examples only on the supported HW
      
      * Do not throw when instance not supported in 04 example
      
      * Review: Apply review suggestions
      
      * Review: small fix
      
      * Review: small fix
      627054b9
  8. 17 Nov, 2023 1 commit
  9. 18 Oct, 2023 1 commit
  10. 10 Oct, 2023 1 commit
  11. 02 Oct, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Add fp8 @ bf8 gemm support and example (#933) · bd09b5c5
      Rostyslav Geyyer authored
      * Add f8 bf8 gemm example
      
      * Add element-wise ops
      
      * Add intrinsics
      
      * Update reference calculation
      
      * Add an additional type option for xdlops gemm
      
      * Fix build process
      
      * Add bf8 to buffer addressing
      
      * Update blockwise op, split typeA and typeB
      
      * Update for compatibility
      
      * Uppdate naming to f8->fp8
      
      * Update naming
      
      * Format
      bd09b5c5
  12. 21 Sep, 2023 1 commit
    • Illia Silin's avatar
      Refactoring cmake files to build data types separately. (#932) · bba085d2
      Illia Silin authored
      * refactor cmake files for the tests
      
      * refactor cmake files for examples
      
      * fix cmake for gemm example
      
      * fix the cmake file for all examples
      
      * add splitting by data types in gemm_splitk instance header
      
      * rename test to reflect only dl instances are used
      
      * clean up CI workspace, update cmake for instances
      
      * change the jenkinsfile syntax
      
      * build all instances except DL on gfx11
      
      * move workspace cleanup after stages
      
      * clean up workspace after every stage
      
      * isolate data types in grouped_conv_fwd header
      
      * isolate dl instances for grouped_conv2d_fwd
      
      * fix syntax
      
      * fix cmake and batchnorm instances
      
      * fix typo
      
      * fix reduction instances
      
      * fix grouped_conv headers
      
      * fix syntax
      
      * replace parsing logic for instances, replace bfp16 with bf16
      
      * fix the client examples build
      
      * clean up DTYPES from instances cmake files
      
      * update the parsing logic in cmake files
      
      * make an exception for reduction kernels
      
      * update few remaining cmake files to handle DTYPES
      
      * fix syntax
      
      * fix cmake conflicts
      
      * replace f8 with fp8 test name
      
      * resolve conflicts for dpp instances
      bba085d2
  13. 12 Sep, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Refactor f8_t, add bf8_t (#792) · 62d4af74
      Rostyslav Geyyer authored
      * Refactor f8_t to add bf8_t
      
      * Add check_err impl for f8_t
      
      * Update fp8 test
      
      * Format
      
      * Revert the fix
      
      * Update vector_type implementation
      
      * Add bf8 test
      
      * Add bf8, use BitInt types
      
      * Add bf8 conversion methods
      
      * Update type_convert for fp8/bf8
      
      * Add check_err fp8/bf8 support
      
      * Add subnorm fp8 tests
      
      * Add subnorm bf8 tests
      
      * Fix conversion
      
      * Add bf8 cmake bindings
      
      * Add macros to enable build with disabled fp8/bf8
      
      * Remove is_native method
      
      * Update flag combination for mixed precision instances
      
      * Add more flag checks
      
      * Add another flag to a client example
      
      * Add type traits, decouple f8/bf8 casting
      
      * Clean up
      
      * Decouple fp8 and bf8 flags
      
      * Remove more redundant flags
      
      * Remove leftover comments
      62d4af74
  14. 06 Sep, 2023 1 commit
    • Bartlomiej Wroblewski's avatar
      Redesign the DPP8 GEMM kernel to use warp-wise component (#863) · 37a8c1f7
      Bartlomiej Wroblewski authored
      * Redesign the DPP8 GEMM kernel to use warp-wise component
      
      * Review: Improve error messages
      
      * Review: Remove unnecessary empty lines
      
      * Review: Fix M, N per thread names
      
      * Review: Rename mfma_input_type to dpp_input_type
      
      * Review: Fix tensor adaptor; remove unnecessary element
      
      * Review: Remove calls to dpp_gemm's MakeCDescriptor
      
      * Review: Add blockwise doc, change function names to include dimension names
      
      * Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file
      
      * Review: Add __restrict__ keywords
      
      * Review: Use MatrixPadder for padding A, B, C matrices
      
      * Review: Remove hardcoded datatypes
      
      * Review: Change names from FloatX to XDataType
      
      * Review: Introduce AK0 and BK0 instead of a single K0
      
      * Review: Remove construction of dpp_datatypes object
      
      * Review: Rename DppInstrRunner to DppLanegroupGemm
      37a8c1f7
  15. 29 Aug, 2023 1 commit
  16. 14 Aug, 2023 1 commit
  17. 09 Aug, 2023 2 commits
  18. 07 Aug, 2023 1 commit
    • Illia Silin's avatar
      Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769
      Illia Silin authored
      * properly split conv_nd_bwd_data instances
      
      * split conv2d_fwd instance data types
      
      * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
      
      * split the tests by data types where possible
      
      * filter examples by DTYPES
      
      * split few remaining examples by DTYPES
      
      * filter most instances by DTYPES
      
      * add new lines at end of headers, fix grouped_gemm profiler
      
      * fix syntax
      
      * split the ckprofiler instances by DTYPES
      
      * split the conv2d and quantization DL and XDL instances
      
      * fix the splitting of conv2d DL instances
      
      * split softmax and pool_fwd tests for fp16 and fp32 types
      
      * fix syntax
      
      * fix the dl_int8 quantization instances isolation
      08eb1769
  19. 26 Jul, 2023 2 commits
    • carlushuang's avatar
      initial stream-k implementation with example (#699) · e7dca79d
      carlushuang authored
      
      
      * initial stream-k implementation with example
      
      * fix unexpected change in err
      
      * improve a little bit performance by reorganize pipeline.
      
      * improve perf a little bit by swizzle block idx
      
      * add profiler
      
      * update example
      
      * fix spelling
      
      * shrink karg for streamk
      
      * support dynamic buffer using memory coherence glc_slc bit from template
      
      * control memory coherence while construct dynamic buffer
      
      * update reduction for streamk(not ready yet)
      
      * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting
      
      * fix build issue
      
      * fix several bug
      
      * now result is correct, everything works (but has scratch)
      
      * remove scratch by manually reset coordinate
      
      * update device code
      
      * fix a bug in final reduce
      
      * fix something in example
      
      * update async memset
      
      * fix enum as camel case
      
      * modify coherence enum name
      
      * clean code and use atomic streamk by default
      
      * remove unused var
      
      * throw exception if have empty pointer
      
      * fix format
      
      * fix CI warning
      
      * fix type in init
      
      * modify CI error
      
      * filter out on gfx10+
      
      * restore changed example code
      
      ---------
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      e7dca79d
    • Bartłomiej Kocot's avatar
      Disable XDL kernels on unsupported HW Add ck::is_xdl_supported (#768) · ac6d68b3
      Bartłomiej Kocot authored
      
      
      * Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765)
      
      * Do not throw an error when GEMM problem is not supported.
      
      ---------
      Co-authored-by: default avatarBartlomiej Wroblewski <bwroblewski10@gmail.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      ac6d68b3
  20. 18 Jul, 2023 1 commit
    • Illia Silin's avatar
      Add mechanism to build CK for select data types, add Navi3x CI. (#790) · 189ea3b9
      Illia Silin authored
      * allow building CK for specific data types
      
      * add CI build and test stage on Naiv3x without some int8 instances
      
      * add missing gemm fp16 instances
      
      * add the changes to the missed cmake file
      
      * add empty lines at end of source files
      
      * Do not build quantization client example on navi3 in CI
      
      * disable batched_gemm_multi_d_int8 instances with DTYPES
      
      * disable device_conv2d_bwd_data_instance with DTYPES
      
      * fix ckprofiler for conv_bwd_data for int8
      
      * properly isolate the conv_bwd_data int8 instances
      
      * remove empty line
      189ea3b9
  21. 06 Jul, 2023 1 commit
  22. 05 Jul, 2023 1 commit
  23. 31 May, 2023 1 commit
  24. 19 May, 2023 1 commit
  25. 18 May, 2023 1 commit
  26. 10 May, 2023 1 commit
  27. 27 Apr, 2023 1 commit
  28. 20 Apr, 2023 1 commit
  29. 29 Mar, 2023 1 commit
  30. 23 Mar, 2023 2 commits
  31. 09 Mar, 2023 1 commit
  32. 06 Mar, 2023 2 commits
  33. 01 Mar, 2023 1 commit
  34. 28 Feb, 2023 1 commit
  35. 27 Feb, 2023 1 commit