Commits · 7248a810d97ca8ceb999cc0a9e2bf58adc68f263 · OpenDAS / tilelang

18 Dec, 2025 1 commit

feat(cutedsl): add CuTeDSL backend (#1421) · 7248a810

Gabriel Wu authored Dec 18, 2025



* feat: CuTeDSL backend

* fix: clang-tidy

* fix: clang-format

* fix: ci

* fix: revert example gemm fp8

* fix: remove duplicate code

* fix: switch-case

* fix: fp16 silence

* fix: TVM IR print

* fix: useless tir

* fix: clang-format

* fix: remove tilelang/contrib/cutedsl/.gitignore

* fix: use hexfloat

* fix: gsym guard

* fix: unknown storage sync type

* fix: string literal

* fix: add args guard

* fix: name hint dedup

* fix: better find_kernel_by_pattern

* fix: set libpath for from_database path

* fix: guard buffer.strides

* fix: from guard

* fix: eviction guard

* fix: use thread local tma descs

* fix: ruff

* fix: drop tma_init_cpp

* fix: exc_info

* fix: negative unmatch early return

* fix: rename postproc func and add test

* fix: handle fast math according to pass config

* fix: dyn_sym parse

* fix: wrap_forward

* fix: use tvm_ffi.libinfo instead of cli

* fix: keep signature

* fix: C++ string safety

* fix: mark tma_store_add as unsupported

* fix: tvm version

* resolve ldsm and cpasync issues.

* fix: minor fixes

* fix: parse signature using ast

* fix: guard global_addr

* fix: create tempfile only when necessary

* fix: use logger.execption for exceptions

* fix: guard lib_path and host_func

* fix: remove tma_cpp_init and add timeout for cpp compile

* add timeout for mbarrier_wait.

* fix: _load_kernel_from_disk signature

* resolve codegen issues.

* fix: logger.exception

* add comment for div_by=1

* merge

* fix: reserve cutlass,cute,tl

* fix: guard tma_store

* fix: allow int64 offset in make_tensor_at_offset

* fix: guard barrier

* fix: add comments for div_by=16

* fix: div_by=1 issue

* delete div_by when offset is 0

* use tl.make_tensor when offset is 0

* fix: explicitly check cutedsl target

* fix: use param.torch_dtype()

---------
Co-authored-by: yuxic <yuxic@nvidia.com>
Co-authored-by: Yong <yong@local>
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

7248a810

15 Nov, 2025 1 commit

[fix] NVRTC execution backend (#1256) · eb415744

Gabriel Wu authored Nov 15, 2025

* [fix] NVRTC execution backend

* [fmt] run pre-commit

* [fix] coderabbit reviews

* [test] add cuda-python to test dep

* [fix] coderabbit reviews

* [fix] CUDA 13 compatibility

* [fix] sm90

* [fix] CUDA 13 compatibility

* [fix] pre-commit

* [fix] always use cuda::std::__atomic_ref_impl

* [fix] restore to external API

* Revert "[fix] restore to external API"

This reverts commit 49bd875638fb631d270015f408991d38fd1e9a5d.

* [fmt] use space instead tabs for py codegen

* [fix] im2col API

* [fix] revert atomic.h

* [fix] dynamic shape

* [refactor] extract common utils

* [feat] support L2 persistent map

* [fix] l2 persistent map

* [fix] pre-commit

* [fix] restore _TYPE_MAP

* [fix] pre-commit

* [fix] avoid duplicate TMA descs

* [docs] add docstring

* [fix] coderabbit

* [fix] coderabbit

* [fix] coderabbit

* [fix] coderabbit

eb415744

15 Oct, 2025 1 commit

[CI][Refactor] Merge test CI workflow files into one (#973) · 8ce27782

Xuehai Pan authored Oct 15, 2025

* refactor: merge test CI workflow files into one

* chore: set `UV_INDEX_STRATEGY=unsafe-best-match`

* feat: add AST test with Python 3.8

* feat: implement manual caching mechanism for self-hosted runners

* refactor: simplify cache logic for self-hosted runners

* chore: clear uv cache on failure

* chore: print format.sh output to logs

* chore: improve uv caching

* chore: disable parallel test

* chore: use `PYTHONDEVMODE=1` in CI

* feat: enable coredump generation

* fix: fix perfbench condition

* Revert "feat: enable coredump generation"

This reverts commit c52da65cb572932e09905d08c43a39ec3cf47c54.

* chore: move example CI down

* Revert "chore: move example CI down"

This reverts commit 9d8e65055e01d955c5268a9a6705d270c2de0d57.

* chore: skip example `test_example_mha_sink_bwd_bhsd`

* chore: skip example `test_example_gqa_sink_bwd_bhsd`

* fix: fix example argument passing

* fix: loosen test criteria

* chore: rename `CMAKE_CONFIG...

8ce27782