"vscode:/vscode.git/clone" did not exist on "f569f6197b2224851d33b4dd675e810b542efbe1"
  1. 07 Nov, 2024 1 commit
  2. 05 Nov, 2024 1 commit
  3. 30 Oct, 2024 1 commit
  4. 29 Oct, 2024 1 commit
  5. 26 Oct, 2024 2 commits
  6. 25 Oct, 2024 1 commit
    • aledudek's avatar
      Generic threshold calculation (#1546) · 9385caa3
      aledudek authored
      * Calculate generic relative threshold pool3dfwd
      
      * Calculate absolute error threshold pool3d fwd
      
      * Generic threshold calculation take max input for relative error pool3dfwd
      
      * Remove max possible value for error calculation at runtime
      
      * Remove debug print in pool3dfwd
      
      * Pool3d fwd adjusted types in generic threshold calculation
      
      * Generic threshold calculation take into account number of accumulations and accdatatype
      
      * Generic threshold fix final error formula
      
      * Generic threshold calculation - num of accs fix
      
      * Generic threshold calculation - adjust absolute error
      
      * Generic threshold calculation - OutDataType in absolute error
      9385caa3
  7. 22 Oct, 2024 1 commit
  8. 14 Oct, 2024 2 commits
  9. 12 Oct, 2024 1 commit
  10. 09 Oct, 2024 1 commit
  11. 07 Oct, 2024 1 commit
  12. 04 Oct, 2024 1 commit
  13. 02 Oct, 2024 1 commit
  14. 25 Sep, 2024 1 commit
  15. 20 Sep, 2024 2 commits
  16. 13 Sep, 2024 1 commit
    • Jun Liu's avatar
      Customize filesystem in CK for legacy systems (#1509) · 81bc1496
      Jun Liu authored
      
      
      * Legacy support: customized filesystem
      
      * Update cmakefile for python alternative path
      
      * fix build issues
      
      * CK has no boost dependency
      
      * More fixes to issues found on legay systems
      
      * fix clang format issue
      
      * Check if blob is correctly generated in cmake
      
      * fix the python issues
      
      * add a compiler flag for codegen when using alternative python
      
      * use target_link_options instead of target_compile_options
      
      ---------
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      81bc1496
  17. 12 Sep, 2024 1 commit
  18. 11 Sep, 2024 2 commits
  19. 05 Sep, 2024 2 commits
  20. 03 Sep, 2024 1 commit
  21. 02 Sep, 2024 1 commit
  22. 21 Aug, 2024 2 commits
    • Andriy Roshchenko's avatar
      Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473) · c3515f27
      Andriy Roshchenko authored
      * Enable CMakePresets build
      
      * Verify Convolution, Scaling and ReLU algorithms.
      
      * Add tensor element-wise scale and type cast operation.
      
      * Reduction implemented but does not work.
      
      * Exploration of Reduction functionality.
      
      * Completed example for Convolution scaled with ReLu activation and AMAX reduction.
      
      * WIP: Add required instances for convolution.
      
      * WIP: Create client example. Implement convolution stage.
      
      * Add elementwise instances.
      
      * Add elementwise scale + convert example.
      
      * Add reduction instances.
      
      * WIP: Client example for AMAX reduction.
      
      * WIP: Add instances for multistage reduction.
      
      * WIP: Implementation of multistage reduction.
      
      * Refactoring.
      
      * Clean up.
      
      * Add CMakePresets.json
      
      * Guard off FP8 instances when the data type is not available.
      
      * Add example for Scaled FP8 Convolution with AMAX reduction.
      
      * Refactor CombConvScaleRelu instances.
      
      * Add CombConvScale instances.
      
      * Add client example for Scaled FP8 Convolution with AMAX reduction.
      
      * Cleanup.
      c3515f27
    • Rostyslav Geyyer's avatar
      Set RNE fp8 conversion as a default (#1458) · e20f20ef
      Rostyslav Geyyer authored
      * Set RNE fp8 conversion as a default
      
      * Update f8 tests
      
      * Disable failing test on gfx11
      
      * Update bf8 tests
      
      * Add a flag
      
      * Fix the flag
      
      * Raise flag for gfx10 as well
      
      * Temp commit for tolerance testing
      
      * Update tolerances
      e20f20ef
  23. 14 Aug, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] gemm_universal related optimization (#1453) · 3049b546
      Haocong WANG authored
      
      
      * replace buffer_atomic with global_atomic
      
      * fixed global_atomic_add
      
      * added bf16 atomic_add
      
      * format
      
      * clang-format-12
      
      * clean
      
      * clean
      
      * add guards
      
      * Update gtest.cmake
      
      * enabled splitk_gemm_multi_d
      
      * format
      
      * add ckProfiler
      
      * format
      
      * fixed naming
      
      * format
      
      * clean
      
      * clean
      
      * add guards
      
      * fix clang format
      
      * format
      
      * add kbatch printout
      
      * clean
      
      * Add rocm6.2 related gemm optimization
      
      * Limit bf16 atomic usage
      
      * remove redundant RCR gemm_universal instance
      
      * Add RRR fp8 gemm universal instance
      
      * Bug fix
      
      * Add GPU_TARGET guard to FP8/BF8 target
      
      * bug fix
      
      * update cmake
      
      * remove all fp8/bf8 example if arch not support
      
      * Enable fp8 RRR support in ckProfiler
      
      * limit greedy-reverse flag to gemm_universal in ckProfiler
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      3049b546
  24. 13 Aug, 2024 1 commit
  25. 10 Aug, 2024 1 commit
  26. 09 Aug, 2024 1 commit
  27. 07 Aug, 2024 1 commit
  28. 06 Aug, 2024 3 commits
  29. 31 Jul, 2024 1 commit
  30. 30 Jul, 2024 1 commit
  31. 25 Jul, 2024 1 commit
  32. 24 Jul, 2024 1 commit
    • Andriy Roshchenko's avatar
      Adding more instances of grouped convolution 3d forward for FP8 with... · 4a8a1bef
      Andriy Roshchenko authored
      Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
      
      * Add CMakePresets configurations.
      
      * Add binary elementwise ConvScaleAdd and an example.
      
      * Numerical verification of results.
      
      Observed significant irregularities in F8 to F32 type conversions:
      ```log
      ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
      ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
      ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
      ```
      
      * Implemented ConvScaleAdd + Example.
      
      * Add ConvScale+Bias Instances
      
      * Add Client Example for ConvScale+Bias
      
      * Fix number of bytes in an example..
      
      * Cleanup.
      4a8a1bef