1. 31 Jan, 2025 4 commits
  2. 30 Jan, 2025 1 commit
  3. 29 Jan, 2025 13 commits
  4. 28 Jan, 2025 1 commit
  5. 27 Jan, 2025 2 commits
    • Andriy Roshchenko's avatar
      Add OCP FP8 support in CK_TILE (#1829) · 35aebe59
      Andriy Roshchenko authored
      * Add OCP FP8 to CK_TILE
      
      * Validate OCP FP8 in FMHA FWD under VALID=1
      35aebe59
    • Adam Osewski's avatar
      [CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835) · 39dc25a9
      Adam Osewski authored
      
      
      * Refactor universal gemm policy.
      
      * Adapt example to refactor changes.
      
      * Introduce static encoding pattern
      
      * Adding shuffled encoding patterns.
      
      * Fix err in reverse tuple.
      
      * Add transpose_tile2d
      
      * Small refactoring + doc
      
      * Enable reading on contiguous dimension in all layouts.
      
      * Transpose A/B register tile if needed for comp v3 pipeline.
      
      * Take contiguous dim size when calculating dram vector load size.
      
      * A/B smem pack size taken from WarpGemm attributes
      
      * Update B LDS layout and setup tile distribution pattern at class level.
      
      * Fix static assert.
      
      * Fix errors in examples.
      
      * Formatting & fix IsTranspose
      
      * Fix VectorSize & refactor.
      
      * Add error loging messages.
      
      * Fix VecLoadSize and TranspseC for mem pipeline.
      
      * Update unit-tests & disable mem pipeline.
      
      * Clang format
      
      * Update include/ck_tile/core/tensor/tile_window.hpp
      Co-authored-by: default avatarjakpiase <jakub.piasecki@amd.com>
      
      * Fix compilation and reviewers comments.
      
      * Refactor unit-test. Fallback to non-universal gemm.
      
      Need to use GemmPipelineAGmemBGmemCRegV1 for now,
      since GemmKernel is now supporting also non-K major vector reads.
      
      ---------
      Co-authored-by: default avatarjakpiase <jakub.piasecki@amd.com>
      39dc25a9
  6. 24 Jan, 2025 2 commits
  7. 22 Jan, 2025 3 commits
  8. 21 Jan, 2025 2 commits
    • Mateusz Ozga's avatar
      Simplify static_cast if-lands (#1828) · 3db77bc4
      Mateusz Ozga authored
      3db77bc4
    • Mateusz Ozga's avatar
      CK-Tile Grouped GEMM refactor and post PR fixes (#1756) · 3c93d3c4
      Mateusz Ozga authored
      * Grouped gemm simple code refactor
      
      * Offset invoker
      
      * Invoke generic Run, and replace name of parrtitioner variable
      
      * Tests fix type
      
      * Removed namespaces
      
      * Add template param to avoid implicit cast
      
      * Remove generic function
      
      * Constant value
      
      * underline enum to int16_t
      
      * Generalize partitioner function
      
      * Remove whitespaces
      
      * Rename function
      
      * Using support
      
      * Clang-format
      
      * Clang-format
      
      * Fn-partitioner description fn
      
      * Typo
      
      * Typo 2
      
      * Better description
      
      * Better description
      
      * Refactor after review
      
      * Use ctr instead of set fn
      
      * Inovke ctr and typo
      
      * Comments
      
      * Remove unnecessary comment
      
      * Review, remove modulo
      3c93d3c4
  9. 20 Jan, 2025 2 commits
  10. 19 Jan, 2025 1 commit
  11. 18 Jan, 2025 1 commit
  12. 17 Jan, 2025 2 commits
  13. 16 Jan, 2025 2 commits
  14. 15 Jan, 2025 3 commits
    • Illia Silin's avatar
      8c29e06f
    • Bartłomiej Kocot's avatar
      Add rounding for float to bf16 conversion as default (#1812) · 7790e8c3
      Bartłomiej Kocot authored
      * Add rounding for float to bf16 conversion
      
      * Add bhalf test
      
      * Add inf test bhalf
      
      * Refactor
      
      * update cmake
      
      * Fixes
      7790e8c3
    • ruanjm's avatar
      [CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148
      ruanjm authored
      
      
      * Add shortcut to RMSNorm
      
      * Modify test for adding shortcut for RMSNorm
      
      * Add fused parameter into tests
      
      * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp
      
      * 1. Supports various stride and percisions.
      
      * Add support of Epilogue
      
      * Add fuse and epilogue support to rmsnorm ref
      
      * Modify rmsnorm example
      
      * Refactor tests/examples
      
      * Bug fix for newly added tests/examples
      
      * Bug fix for new tests 2
      
      * Modify smoke test scripts
      
      remove dbg code
      
      * Supports non-smooth dyanmic quant
      
      * Update Rmsnorm2dFwd::GetName()
      
      * rename xscale and prec_sx to smoothscale and prec_sm
      
      Bug fix after rename
      
      Remove files
      
      * change example_rmsnorm2d_fwd.cpp
      
      * update performance calculator
      
      * Fix issue in two-pass when fuse add is enabled
      
      * Remove comment of beta
      
      ---------
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      04dd3148
  15. 13 Jan, 2025 1 commit
    • Max Podkorytov's avatar
      fix parsing instances for pt inductor (#1796) · c0b90f13
      Max Podkorytov authored
      
      
      add unit test for gen instances for gemms
      
      add unit tests for conv and batched gemms
      
      add unit test for preselected gemm instances
      
      apply ruff lint
      
      add license header for the unit test
      
      add inductor pytest to CI
      
      verbose pip install
      
      switch the directory before installing python packages
      
      move the inductor codegen test
      
      try yet another workdir
      
      Update Jenkinsfile
      
      The directory looks right, fixing pip module not found by invoking pip directly
      
      Update Jenkinsfile
      
      invoke pytest directly since the module is not found
      
      Update Dockerfile
      
      Install setuptools
      
      update package structure
      
      bump setuptools
      
      maybe fix data path for library sources
      
      fix library search path for conv instances
      
      fix path in pyproject definition
      
      compare path used in gen_instances with one in pyproject.toml; fix the difference
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      c0b90f13