Commits · cf6e11c955cfebcb283051084678247ff0e5db7c · OpenDAS / tilelang

18 Dec, 2025 1 commit

feat(cutedsl): add CuTeDSL backend (#1421) · 7248a810

Gabriel Wu authored Dec 18, 2025



* feat: CuTeDSL backend

* fix: clang-tidy

* fix: clang-format

* fix: ci

* fix: revert example gemm fp8

* fix: remove duplicate code

* fix: switch-case

* fix: fp16 silence

* fix: TVM IR print

* fix: useless tir

* fix: clang-format

* fix: remove tilelang/contrib/cutedsl/.gitignore

* fix: use hexfloat

* fix: gsym guard

* fix: unknown storage sync type

* fix: string literal

* fix: add args guard

* fix: name hint dedup

* fix: better find_kernel_by_pattern

* fix: set libpath for from_database path

* fix: guard buffer.strides

* fix: from guard

* fix: eviction guard

* fix: use thread local tma descs

* fix: ruff

* fix: drop tma_init_cpp

* fix: exc_info

* fix: negative unmatch early return

* fix: rename postproc func and add test

* fix: handle fast math according to pass config

* fix: dyn_sym parse

* fix: wrap_forward

* fix: use tvm_ffi.libinfo instead of cli

* fix: keep signature

* fix: C++ string safety

* fix: mark tma_store_add as unsupported

* fix: tvm version

* resolve ldsm and cpasync issues.

* fix: minor fixes

* fix: parse signature using ast

* fix: guard global_addr

* fix: create tempfile only when necessary

* fix: use logger.execption for exceptions

* fix: guard lib_path and host_func

* fix: remove tma_cpp_init and add timeout for cpp compile

* add timeout for mbarrier_wait.

* fix: _load_kernel_from_disk signature

* resolve codegen issues.

* fix: logger.exception

* add comment for div_by=1

* merge

* fix: reserve cutlass,cute,tl

* fix: guard tma_store

* fix: allow int64 offset in make_tensor_at_offset

* fix: guard barrier

* fix: add comments for div_by=16

* fix: div_by=1 issue

* delete div_by when offset is 0

* use tl.make_tensor when offset is 0

* fix: explicitly check cutedsl target

* fix: use param.torch_dtype()

---------
Co-authored-by: yuxic <yuxic@nvidia.com>
Co-authored-by: Yong <yong@local>
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

7248a810

15 Nov, 2025 1 commit

[fix] NVRTC execution backend (#1256) · eb415744

Gabriel Wu authored Nov 15, 2025

* [fix] NVRTC execution backend

* [fmt] run pre-commit

* [fix] coderabbit reviews

* [test] add cuda-python to test dep

* [fix] coderabbit reviews

* [fix] CUDA 13 compatibility

* [fix] sm90

* [fix] CUDA 13 compatibility

* [fix] pre-commit

* [fix] always use cuda::std::__atomic_ref_impl

* [fix] restore to external API

* Revert "[fix] restore to external API"

This reverts commit 49bd875638fb631d270015f408991d38fd1e9a5d.

* [fmt] use space instead tabs for py codegen

* [fix] im2col API

* [fix] revert atomic.h

* [fix] dynamic shape

* [refactor] extract common utils

* [feat] support L2 persistent map

* [fix] l2 persistent map

* [fix] pre-commit

* [fix] restore _TYPE_MAP

* [fix] pre-commit

* [fix] avoid duplicate TMA descs

* [docs] add docstring

* [fix] coderabbit

* [fix] coderabbit

* [fix] coderabbit

* [fix] coderabbit

eb415744

05 Jun, 2025 1 commit

[Enhancement] Add nvrtc execution backend (#461) · 17f7394f

Gabriel Wu authored Jun 05, 2025



* [wip] feat: add nvrtc backend

* [wip] fix: handle out_idx

* [wip] refactor: move lib logic to libgen

* feat: cache for nvrtc backend

* fmt: run format

* fix: handle cuda bindings import error

* fix: handle cuda bindings import error

* fix: handle cuda bindings import error

* fix: handle cuda bindings import error

* fix: get kernel source

* refactor: speedup pyimport

* Improve error handling for missing cuda-python dependency in nvrtc backend. Raise ImportError with detailed installation instructions instead of logging a warning.

* Enhance nvrtc backend error handling by introducing a flag to check for cuda-python availability. Raise ImportError with detailed installation instructions during initialization if the nvrtc backend is unavailable, improving user experience and clarity.

* Update README.md to include recent NVRTC Backend addition, highlighting reduced compilation time for CUDA templates.

* fix tl_templates

* ensure CUDA context

---------
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

17f7394f