1. 18 Dec, 2025 1 commit
    • Gabriel Wu's avatar
      feat(cutedsl): add CuTeDSL backend (#1421) · 7248a810
      Gabriel Wu authored
      
      
      * feat: CuTeDSL backend
      
      * fix: clang-tidy
      
      * fix: clang-format
      
      * fix: ci
      
      * fix: revert example gemm fp8
      
      * fix: remove duplicate code
      
      * fix: switch-case
      
      * fix: fp16 silence
      
      * fix: TVM IR print
      
      * fix: useless tir
      
      * fix: clang-format
      
      * fix: remove tilelang/contrib/cutedsl/.gitignore
      
      * fix: use hexfloat
      
      * fix: gsym guard
      
      * fix: unknown storage sync type
      
      * fix: string literal
      
      * fix: add args guard
      
      * fix: name hint dedup
      
      * fix: better find_kernel_by_pattern
      
      * fix: set libpath for from_database path
      
      * fix: guard buffer.strides
      
      * fix: from guard
      
      * fix: eviction guard
      
      * fix: use thread local tma descs
      
      * fix: ruff
      
      * fix: drop tma_init_cpp
      
      * fix: exc_info
      
      * fix: negative unmatch early return
      
      * fix: rename postproc func and add test
      
      * fix: handle fast math according to pass config
      
      * fix: dyn_sym parse
      
      * fix: wrap_forward
      
      * fix: use tvm_ffi.libinfo instead of cli
      
      * fix: keep signature
      
      * fix: C++ string safety
      
      * fix: mark tma_store_add as unsupported
      
      * fix: tvm version
      
      * resolve ldsm and cpasync issues.
      
      * fix: minor fixes
      
      * fix: parse signature using ast
      
      * fix: guard global_addr
      
      * fix: create tempfile only when necessary
      
      * fix: use logger.execption for exceptions
      
      * fix: guard lib_path and host_func
      
      * fix: remove tma_cpp_init and add timeout for cpp compile
      
      * add timeout for mbarrier_wait.
      
      * fix: _load_kernel_from_disk signature
      
      * resolve codegen issues.
      
      * fix: logger.exception
      
      * add comment for div_by=1
      
      * merge
      
      * fix: reserve cutlass,cute,tl
      
      * fix: guard tma_store
      
      * fix: allow int64 offset in make_tensor_at_offset
      
      * fix: guard barrier
      
      * fix: add comments for div_by=16
      
      * fix: div_by=1 issue
      
      * delete div_by when offset is 0
      
      * use tl.make_tensor when offset is 0
      
      * fix: explicitly check cutedsl target
      
      * fix: use param.torch_dtype()
      
      ---------
      Co-authored-by: default avataryuxic <yuxic@nvidia.com>
      Co-authored-by: default avatarYong <yong@local>
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      7248a810
  2. 15 Nov, 2025 1 commit
    • Gabriel Wu's avatar
      [fix] NVRTC execution backend (#1256) · eb415744
      Gabriel Wu authored
      * [fix] NVRTC execution backend
      
      * [fmt] run pre-commit
      
      * [fix] coderabbit reviews
      
      * [test] add cuda-python to test dep
      
      * [fix] coderabbit reviews
      
      * [fix] CUDA 13 compatibility
      
      * [fix] sm90
      
      * [fix] CUDA 13 compatibility
      
      * [fix] pre-commit
      
      * [fix] always use cuda::std::__atomic_ref_impl
      
      * [fix] restore to external API
      
      * Revert "[fix] restore to external API"
      
      This reverts commit 49bd875638fb631d270015f408991d38fd1e9a5d.
      
      * [fmt] use space instead tabs for py codegen
      
      * [fix] im2col API
      
      * [fix] revert atomic.h
      
      * [fix] dynamic shape
      
      * [refactor] extract common utils
      
      * [feat] support L2 persistent map
      
      * [fix] l2 persistent map
      
      * [fix] pre-commit
      
      * [fix] restore _TYPE_MAP
      
      * [fix] pre-commit
      
      * [fix] avoid duplicate TMA descs
      
      * [docs] add docstring
      
      * [fix] coderabbit
      
      * [fix] coderabbit
      
      * [fix] coderabbit
      
      * [fix] coderabbit
      eb415744
  3. 15 Oct, 2025 1 commit
    • Xuehai Pan's avatar
      [CI][Refactor] Merge test CI workflow files into one (#973) · 8ce27782
      Xuehai Pan authored
      * refactor: merge test CI workflow files into one
      
      * chore: set `UV_INDEX_STRATEGY=unsafe-best-match`
      
      * feat: add AST test with Python 3.8
      
      * feat: implement manual caching mechanism for self-hosted runners
      
      * refactor: simplify cache logic for self-hosted runners
      
      * chore: clear uv cache on failure
      
      * chore: print format.sh output to logs
      
      * chore: improve uv caching
      
      * chore: disable parallel test
      
      * chore: use `PYTHONDEVMODE=1` in CI
      
      * feat: enable coredump generation
      
      * fix: fix perfbench condition
      
      * Revert "feat: enable coredump generation"
      
      This reverts commit c52da65cb572932e09905d08c43a39ec3cf47c54.
      
      * chore: move example CI down
      
      * Revert "chore: move example CI down"
      
      This reverts commit 9d8e65055e01d955c5268a9a6705d270c2de0d57.
      
      * chore: skip example `test_example_mha_sink_bwd_bhsd`
      
      * chore: skip example `test_example_gqa_sink_bwd_bhsd`
      
      * fix: fix example argument passing
      
      * fix: loosen test criteria
      
      * chore: rename `CMAKE_CONFIG...
      8ce27782