- 15 Nov, 2025 1 commit
-
-
Gabriel Wu authored
* [fix] NVRTC execution backend * [fmt] run pre-commit * [fix] coderabbit reviews * [test] add cuda-python to test dep * [fix] coderabbit reviews * [fix] CUDA 13 compatibility * [fix] sm90 * [fix] CUDA 13 compatibility * [fix] pre-commit * [fix] always use cuda::std::__atomic_ref_impl * [fix] restore to external API * Revert "[fix] restore to external API" This reverts commit 49bd875638fb631d270015f408991d38fd1e9a5d. * [fmt] use space instead tabs for py codegen * [fix] im2col API * [fix] revert atomic.h * [fix] dynamic shape * [refactor] extract common utils * [feat] support L2 persistent map * [fix] l2 persistent map * [fix] pre-commit * [fix] restore _TYPE_MAP * [fix] pre-commit * [fix] avoid duplicate TMA descs * [docs] add docstring * [fix] coderabbit * [fix] coderabbit * [fix] coderabbit * [fix] coderabbit
-
- 05 Jun, 2025 1 commit
-
-
Gabriel Wu authored
* [wip] feat: add nvrtc backend * [wip] fix: handle out_idx * [wip] refactor: move lib logic to libgen * feat: cache for nvrtc backend * fmt: run format * fix: handle cuda bindings import error * fix: handle cuda bindings import error * fix: handle cuda bindings import error * fix: handle cuda bindings import error * fix: get kernel source * refactor: speedup pyimport * Improve error handling for missing cuda-python dependency in nvrtc backend. Raise ImportError with detailed installation instructions instead of logging a warning. * Enhance nvrtc backend error handling by introducing a flag to check for cuda-python availability. Raise ImportError with detailed installation instructions during initialization if the nvrtc backend is unavailable, improving user experience and clarity. * Update README.md to include recent NVRTC Backend addition, highlighting reduced compilation time for CUDA templates. * fix tl_templates * ensure CUDA context --------- Co-authored-by:LeiWang1999 <leiwang1999@outlook.com>
-