[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)

* remove llvm build * [Refactor] Update kernel compilation and profiling in examples - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation. - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency. - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations. * lint fix * License Update * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields. - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability. * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files - Improved comment alignment and readability in `cuda.h`. - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability. * lint fix * lint fix * lint fix * lint fix * fix * License update * [Enhancement] Update JITKernel to use artifact for kernel source - Assigned the generated artifact to `self.artifact` for better management. - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling. * lint fix * Add @tilelang.testing.requires_llvm decorator to vectorization tests * Enhance setup.py and env.py for library management - Added functionality to remove original files after copying in CMakeBuild. - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration. * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py * Refactor CMakeBuild file handling in setup.py - Added a check to ensure the target library directory exists before copying .so files. - Improved the logic for creating the target directory and copying files to enhance robustness. * bugfix * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement. * lint fix * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility. * lint fix * Add support for C target in device code generation - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function. * [Enhancement] Implement auto-clear cache feature based on environment variable * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing. * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing. * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true. * [Refactor] Update kernel invocation and import paths in tests and cache * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result. * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`. * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability. * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class. * Enhanced overall code formatting to align with project standards. * [Enhancement] Add bfloat16 test case and improve kernel caching logic * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`. * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading. * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management. * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database. * Improved code formatting and readability across several files. * lint fix * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage

[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)
* remove llvm build * [Refactor] Update kernel compilation and profiling in examples - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation. - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency. - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations. * lint fix * License Update * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields. - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability. * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files - Improved comment alignment and readability in `cuda.h`. - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability. * lint fix * lint fix * lint fix * lint fix * fix * License update * [Enhancement] Update JITKernel to use artifact for kernel source - Assigned the generated artifact to `self.artifact` for better management. - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling. * lint fix * Add @tilelang.testing.requires_llvm decorator to vectorization tests * Enhance setup.py and env.py for library management - Added functionality to remove original files after copying in CMakeBuild. - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration. * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py * Refactor CMakeBuild file handling in setup.py - Added a check to ensure the target library directory exists before copying .so files. - Improved the logic for creating the target directory and copying files to enhance robustness. * bugfix * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement. * lint fix * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility. * lint fix * Add support for C target in device code generation - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function. * [Enhancement] Implement auto-clear cache feature based on environment variable * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing. * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing. * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true. * [Refactor] Update kernel invocation and import paths in tests and cache * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result. * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`. * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability. * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class. * Enhanced overall code formatting to align with project standards. * [Enhancement] Add bfloat16 test case and improve kernel caching logic * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`. * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading. * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management. * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database. * Improved code formatting and readability across several files. * lint fix * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
f2e99180 · Lei Wang · LeiWang1999 · 43bd9d3e · f2e99180 · f2e99180
Commit f2e99180 authored Mar 20, 2025 by Lei Wang Committed by LeiWang1999 Mar 20, 2025
20 changed files
--- a/src/transform/common/loop_fusion_utils.h
+++ b/src/transform/common/loop_fusion_utils.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
 /*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file

--- a/src/transform/common/loop_vectorization_utils.h
+++ b/src/transform/common/loop_vectorization_utils.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
 /*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file

--- a/src/transform/inject_ptx_async_copy.cc
+++ b/src/transform/inject_ptx_async_copy.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
 /*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file

--- a/src/transform/loop_partition.h
+++ b/src/transform/loop_partition.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
 /*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file

--- a/src/transform/loop_vectorize.h
+++ b/src/transform/loop_vectorize.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
 /*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file

--- a/src/transform/simplify.cc
+++ b/src/transform/simplify.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file simplify.cc
 * \brief Remove useless parameters of TL PrimFunc.

--- a/src/transform/thread_partial_sync.cc
+++ b/src/transform/thread_partial_sync.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file thread_storage_sync.cc
 */

--- a/testing/python/autotune/test_tilelang_autotune.py
+++ b/testing/python/autotune/test_tilelang_autotune.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 import itertools
 import logging


--- a/testing/python/cache/test_tilelang_cache_matmul.py
+++ b/testing/python/cache/test_tilelang_cache_matmul.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 from tilelang import tvm as tvm
 import tilelang.testing
 from tilelang import cached

--- a/testing/python/carver/test_tilelang_carver_generate_hints.py
+++ b/testing/python/carver/test_tilelang_carver_generate_hints.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
 import tilelang.testing
 from tilelang import carver
 from tilelang.carver.roller import PrimFuncNode, OutputNode, Edge

--- a/testing/python/carver/test_tilelang_carver_recommend_hints.py
+++ b/testing/python/carver/test_tilelang_carver_recommend_hints.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
 import tilelang.testing
 from tilelang import carver
 from tilelang.carver.arch import auto_infer_current_arch

--- a/testing/python/cpu/test_tilelang_cpu_gemm.py
+++ b/testing/python/cpu/test_tilelang_cpu_gemm.py
@@ -49,9 +49,9 @@ def matmul(M, N, K, block_M, block_N, block_K, dtype="float16", accum_dtype="flo
 def assert_matmul_codegen(M=1024, N=1024, K=1024, block_M=128, block_N=128, block_K=32):
    func = matmul(M, N, K, block_M, block_N, block_K)

-    rt_mod, _ = tilelang.lower(func, target="c")
+    artifact = tilelang.lower(func, target="c")

-    code = rt_mod.imported_modules[0].get_source()
+    code = artifact.kernel_source

    assert code is not None, "Code generation failed"


--- a/testing/python/ir/test_ir_kernel_frame.py
+++ b/testing/python/ir/test_ir_kernel_frame.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 # TODO: implement this test for tilelang/language/kernel.py
--- a/testing/python/issue/test_tilelang_issue_96.py
+++ b/testing/python/issue/test_tilelang_issue_96.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 import tilelang
 import tilelang.testing
 import tilelang.language as T

--- a/testing/python/jit/test_tilelang_jit_callback.py
+++ b/testing/python/jit/test_tilelang_jit_callback.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 from tilelang import tvm as tvm
 import tilelang.testing
 import tilelang

--- a/testing/python/jit/test_tilelang_jit_gemm.py
+++ b/testing/python/jit/test_tilelang_jit_gemm.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 from tilelang import tvm as tvm
 import tilelang.testing
 import tilelang

--- a/testing/python/jit/test_tilelang_jit_gemm_ctypes.py
+++ b/testing/python/jit/test_tilelang_jit_gemm_ctypes.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 from tilelang import tvm as tvm
 import tilelang.language as T
 import tilelang.testing

--- a/testing/python/jit/test_tilelang_jit_gemm_cython.py
+++ b/testing/python/jit/test_tilelang_jit_gemm_cython.py
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT License.
-
 from tilelang import tvm as tvm
 import tilelang.language as T
 import tilelang.testing

--- a/testing/python/kernel/test_tilelang_kernel_bf16_gemm_mma.py
+++ b/testing/python/kernel/test_tilelang_kernel_bf16_gemm_mma.py
@@ -216,7 +216,7 @@ def assert_tl_matmul_correctness(M, N, K, in_dtype, out_dtype, accum_dtype):
 @tilelang.testing.requires_cuda
 @tilelang.testing.requires_cuda_compute_version(8, 0)
 def test_assert_tl_matmul_bfloat16():
-    assert_tl_matmul_correctness(128, 128, 128, "bfloat16", "float32", "float32")
+    assert_tl_matmul_correctness(256, 256, 256, "bfloat16", "float32", "float32")


 if __name__ == "__main__":

--- a/testing/python/kernel/test_tilelang_kernel_dequantize_gemm.py
+++ b/testing/python/kernel/test_tilelang_kernel_dequantize_gemm.py
@@ -602,7 +602,7 @@ def assert_tl_matmul_with_ladder_weight_only_transform_block_reduce_int4_correct
    QLB = ladder_permutate(qB.cpu()).cuda()
    QLB = lop3_permutate(QLB.cpu()).cuda()

-    kernel(A, QLB, C)
+    C = kernel(A, QLB)

    latency = profiler.do_bench()