1. 09 Feb, 2025 2 commits
  2. 08 Feb, 2025 2 commits
  3. 06 Feb, 2025 1 commit
  4. 04 Feb, 2025 4 commits
  5. 03 Feb, 2025 5 commits
  6. 02 Feb, 2025 3 commits
  7. 31 Jan, 2025 3 commits
  8. 30 Jan, 2025 3 commits
  9. 29 Jan, 2025 1 commit
  10. 26 Jan, 2025 6 commits
  11. 24 Jan, 2025 4 commits
  12. 22 Jan, 2025 1 commit
  13. 13 Jan, 2025 5 commits
    • Max Podkorytov's avatar
      fix parsing instances for pt inductor (#1796) · c0b90f13
      Max Podkorytov authored
      
      
      add unit test for gen instances for gemms
      
      add unit tests for conv and batched gemms
      
      add unit test for preselected gemm instances
      
      apply ruff lint
      
      add license header for the unit test
      
      add inductor pytest to CI
      
      verbose pip install
      
      switch the directory before installing python packages
      
      move the inductor codegen test
      
      try yet another workdir
      
      Update Jenkinsfile
      
      The directory looks right, fixing pip module not found by invoking pip directly
      
      Update Jenkinsfile
      
      invoke pytest directly since the module is not found
      
      Update Dockerfile
      
      Install setuptools
      
      update package structure
      
      bump setuptools
      
      maybe fix data path for library sources
      
      fix library search path for conv instances
      
      fix path in pyproject definition
      
      compare path used in gen_instances with one in pyproject.toml; fix the difference
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      c0b90f13
    • feli's avatar
      Dev/merge u8w8 (#1774) · 53ab1b90
      feli authored
      
      
      * port tiles from a8w8
      
      * rm debug used files
      
      * add instances
      
      * remove all non gemm in cmake
      
      * merge; impl fp16
      
      * recover cmake from develop
      
      * add missed files; fix clang format
      
      ---------
      Co-authored-by: default avatarcoderfeli <coderfeli@163.com>
      53ab1b90
    • Thomas Ning's avatar
      CK Tile GEMM CICD fixed & register block method refactor (#1776) · 5d671a5f
      Thomas Ning authored
      * refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm
      
      * Finished the 2x2 warp gemm policy and the block selection mechanism
      
      * Clang format
      
      * address poyen's comment
      
      * Address feedbacks
      
      * Fixed the compilation issue
      
      * Change the function name
      5d671a5f
    • ClementLinCF's avatar
      [CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779) · 0b8f117f
      ClementLinCF authored
      * Observed a 2x perf improvement with kBlockSize = 256
      * Using 512 threads may lead to redundant computations
      0b8f117f
    • Qianfeng's avatar
      Update for fmha_fwd qs_ks_vs pipeline (#1810) · 3d50f57f
      Qianfeng authored
      
      
      * Update for fmha_fwd qs_ks_vs pipeline
      
      * Remove _builtin_amdgcn_sched_barrier(0)
      
      * Move p_compute to p converting earlier for trying to increase vgprs re-using
      
      * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation
      
      * Re-add __builtin_amdgcn_sched_barrier(0)
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      3d50f57f