Commit f2e99180 authored by Lei Wang's avatar Lei Wang Committed by LeiWang1999
Browse files

[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)

* remove llvm build

* [Refactor] Update kernel compilation and profiling in examples

- Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation.
- Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency.
- Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations.

* lint fix

* License Update

* [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files

- Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields.
- Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability.

* [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files

- Improved comment alignment and readability in `cuda.h`.
- Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability.

* lint fix

* lint fix

* lint fix

* lint fix

* fix

* License update

* [Enhancement] Update JITKernel to use artifact for kernel source

- Assigned the generated artifact to `self.artifact` for better management.
- Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling.

* lint fix

* Add @tilelang.testing.requires_llvm decorator to vectorization tests

* Enhance setup.py and env.py for library management

- Added functionality to remove original files after copying in CMakeBuild.
- Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration.

* Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py

* Refactor CMakeBuild file handling in setup.py

- Added a check to ensure the target library directory exists before copying .so files.
- Improved the logic for creating the target directory and copying files to enhance robustness.

* bugfix

* Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement.

* lint fix

* Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility.

* lint fix

* Add support for C target in device code generation

- Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function.

* [Enhancement] Implement auto-clear cache feature based on environment variable

* Added TILELANG_CLEAR_CACHE environment variable to control cache clearing.
* Updated CI workflow to set TILELANG_CLEAR_CACHE during testing.
* Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true.

* [Refactor] Update kernel invocation and import paths in tests and cache

* Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result.
* Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`.
* Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability.

* [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py

* Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class.
* Enhanced overall code formatting to align with project standards.

* [Enhancement] Add bfloat16 test case and improve kernel caching logic

* Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`.
* Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading.
* Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management.
* Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database.
* Improved code formatting and readability across several files.

* lint fix

* Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
parent 43bd9d3e
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file layout/utils.cc * \file layout/utils.cc
* \brief Some arith tools for layout & fragment inference * \brief Some arith tools for layout & fragment inference
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file layout/utils.h * \file layout/utils.h
* \brief Some arith tools for layout & fragment inference * \brief Some arith tools for layout & fragment inference
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/bulk_copy.h * \file tl/op/bulk_copy.h
* \brief Bulk copy operator. * \brief Bulk copy operator.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/elem.cc * \file tl/op/elem.cc
* *
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/elem.h * \file tl/op/elem.h
* \brief Define elment-wise operators. * \brief Define elment-wise operators.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/gemm.cc * \file tl/op/gemm.cc
* *
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/gemm.h * \file tl/op/gemm.h
* \brief Define gemm operator. * \brief Define gemm operator.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/op.cc * \file tl/op/op.cc
* *
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/parallel.h * \file tl/op/parallel.h
* \brief Infer layout from ops and parallel for * \brief Infer layout from ops and parallel for
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/reduce.cc * \file tl/op/reduce.cc
* *
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/op/reduce.h * \file tl/op/reduce.h
* \brief Define reduce operator. * \brief Define reduce operator.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/runtime/runtime.h * \file tl/runtime/runtime.h
* \brief Runtime functions. * \brief Runtime functions.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file tl/runtime/runtime.h * \file tl/runtime/runtime.h
* \brief Runtime functions. * \brief Runtime functions.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file target/codegen.cc * \file target/codegen.cc
*/ */
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file target/codegen.h * \file target/codegen.h
* \brief Utility to generate code * \brief Utility to generate code
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file target/codegen.cc * \file target/codegen.cc
*/ */
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*! /*!
* \file target/codegen.h * \file target/codegen.h
* \brief Utility to generate code * \brief Utility to generate code
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/* /*
* Copyright 1993-2023 NVIDIA Corporation. All rights reserved. * Copyright 1993-2023 NVIDIA Corporation. All rights reserved.
* *
...@@ -1150,7 +1148,7 @@ typedef enum CUpointer_attribute_enum { ...@@ -1150,7 +1148,7 @@ typedef enum CUpointer_attribute_enum {
from a mempool. Otherwise returns NULL. **/ from a mempool. Otherwise returns NULL. **/
CU_POINTER_ATTRIBUTE_MAPPING_SIZE = CU_POINTER_ATTRIBUTE_MAPPING_SIZE =
18, /**< Size of the actual underlying mapping that the pointer belongs to 18, /**< Size of the actual underlying mapping that the pointer belongs to
**/ **/
CU_POINTER_ATTRIBUTE_MAPPING_BASE_ADDR = CU_POINTER_ATTRIBUTE_MAPPING_BASE_ADDR =
19, /**< The start address of the mapping that the pointer belongs to **/ 19, /**< The start address of the mapping that the pointer belongs to **/
CU_POINTER_ATTRIBUTE_MEMORY_BLOCK_ID = CU_POINTER_ATTRIBUTE_MEMORY_BLOCK_ID =
...@@ -2230,12 +2228,14 @@ typedef struct CUgraphEdgeData_st { ...@@ -2230,12 +2228,14 @@ typedef struct CUgraphEdgeData_st {
::CU_GRAPH_KERNEL_NODE_PORT_LAUNCH_ORDER. */ ::CU_GRAPH_KERNEL_NODE_PORT_LAUNCH_ORDER. */
unsigned char unsigned char
to_port; /**< This indicates what portion of the downstream node is to_port; /**< This indicates what portion of the downstream node is
dependent on the upstream node or portion thereof (indicated dependent on the upstream node or portion thereof
by \c from_port). The meaning is specific to the node type. A (indicated by \c from_port). The meaning is
value of 0 in all cases means the entirety of the downstream specific to the node type. A value of 0 in all
node is dependent on the upstream work. <br> Currently no node cases means the entirety of the
types define non-zero ports. Accordingly, this field must be downstream node is dependent on the
set to zero. */ upstream work. <br> Currently no
node types define non-zero ports. Accordingly,
this field must be set to zero. */
unsigned char type; /**< This should be populated with a value from unsigned char type; /**< This should be populated with a value from
::CUgraphDependencyType. (It is typed as char due to ::CUgraphDependencyType. (It is typed as char due to
compiler-specific layout of bitfields.) See compiler-specific layout of bitfields.) See
...@@ -2495,15 +2495,17 @@ typedef enum CUlaunchAttributeID_enum { ...@@ -2495,15 +2495,17 @@ typedef enum CUlaunchAttributeID_enum {
typedef union CUlaunchAttributeValue_union { typedef union CUlaunchAttributeValue_union {
char pad[64]; /* Pad to 64 bytes */ char pad[64]; /* Pad to 64 bytes */
CUaccessPolicyWindow CUaccessPolicyWindow
accessPolicyWindow; /**< Value of launch attribute accessPolicyWindow; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW. */ ::CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW. */
int cooperative; /**< Value of launch attribute int cooperative; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_COOPERATIVE. Nonzero indicates a ::CU_LAUNCH_ATTRIBUTE_COOPERATIVE. Nonzero indicates a
cooperative kernel (see ::cuLaunchCooperativeKernel). */ cooperative kernel (see
CUsynchronizationPolicy syncPolicy; /**< Value of launch attribute ::cuLaunchCooperativeKernel). */
::CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY. CUsynchronizationPolicy
::CUsynchronizationPolicy for work syncPolicy; /**< Value of launch attribute
queued up in this stream */ ::CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY.
::CUsynchronizationPolicy for work
queued up in this stream */
   
/** /**
* Value of launch attribute ::CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION that * Value of launch attribute ::CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION that
...@@ -2524,8 +2526,8 @@ typedef union CUlaunchAttributeValue_union { ...@@ -2524,8 +2526,8 @@ typedef union CUlaunchAttributeValue_union {
CUclusterSchedulingPolicy CUclusterSchedulingPolicy
clusterSchedulingPolicyPreference; /**< Value of launch attribute clusterSchedulingPolicyPreference; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE. ::CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE.
Cluster scheduling policy preference Cluster scheduling policy
for the kernel. */ preference for the kernel. */
int programmaticStreamSerializationAllowed; /**< Value of launch attribute int programmaticStreamSerializationAllowed; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION. ::CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION.
*/ */
...@@ -4844,7 +4846,7 @@ typedef struct CUgraphNodeParams_st { ...@@ -4844,7 +4846,7 @@ typedef struct CUgraphNodeParams_st {
CUDA_CHILD_GRAPH_NODE_PARAMS graph; /**< Child graph node parameters. */ CUDA_CHILD_GRAPH_NODE_PARAMS graph; /**< Child graph node parameters. */
CUDA_EVENT_WAIT_NODE_PARAMS eventWait; /**< Event wait node parameters. */ CUDA_EVENT_WAIT_NODE_PARAMS eventWait; /**< Event wait node parameters. */
CUDA_EVENT_RECORD_NODE_PARAMS CUDA_EVENT_RECORD_NODE_PARAMS
eventRecord; /**< Event record node parameters. */ eventRecord; /**< Event record node parameters. */
CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_v2 CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_v2
extSemSignal; /**< External semaphore signal node parameters. */ extSemSignal; /**< External semaphore signal node parameters. */
CUDA_EXT_SEM_WAIT_NODE_PARAMS_v2 CUDA_EXT_SEM_WAIT_NODE_PARAMS_v2
...@@ -4854,7 +4856,7 @@ typedef struct CUgraphNodeParams_st { ...@@ -4854,7 +4856,7 @@ typedef struct CUgraphNodeParams_st {
CUDA_MEM_FREE_NODE_PARAMS free; /**< Memory free node parameters. */ CUDA_MEM_FREE_NODE_PARAMS free; /**< Memory free node parameters. */
CUDA_BATCH_MEM_OP_NODE_PARAMS_v2 memOp; /**< MemOp node parameters. */ CUDA_BATCH_MEM_OP_NODE_PARAMS_v2 memOp; /**< MemOp node parameters. */
CUDA_CONDITIONAL_NODE_PARAMS CUDA_CONDITIONAL_NODE_PARAMS
conditional; /**< Conditional node parameters. */ conditional; /**< Conditional node parameters. */
}; };
   
long long reserved2; /**< Reserved bytes. Must be zero. */ long long reserved2; /**< Reserved bytes. Must be zero. */
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#include "codegen_cpp.h" #include "codegen_cpp.h"
namespace tvm { namespace tvm {
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#include "codegen_cuda.h" #include "codegen_cuda.h"
#include "runtime/cuda/cuda_module.h" #include "runtime/cuda/cuda_module.h"
...@@ -70,7 +67,7 @@ runtime::Module BuildTileLangCUDA(IRModule mod, Target target) { ...@@ -70,7 +67,7 @@ runtime::Module BuildTileLangCUDA(IRModule mod, Target target) {
return runtime::CUDAModuleCreate(ptx, fmt, ExtractFuncInfo(mod), code); return runtime::CUDAModuleCreate(ptx, fmt, ExtractFuncInfo(mod), code);
} }
String BuildTLDebug(IRModule mod, Target target) { runtime::Module BuildTileLangCUDAWithoutCompile(IRModule mod, Target target) {
using tvm::runtime::Registry; using tvm::runtime::Registry;
bool output_ssa = false; bool output_ssa = false;
CodeGenTileLangCUDA cg; CodeGenTileLangCUDA cg;
...@@ -90,13 +87,13 @@ String BuildTLDebug(IRModule mod, Target target) { ...@@ -90,13 +87,13 @@ String BuildTLDebug(IRModule mod, Target target) {
if (const auto *f = Registry::Get("tilelang_callback_cuda_postproc")) { if (const auto *f = Registry::Get("tilelang_callback_cuda_postproc")) {
code = (*f)(code, target).operator std::string(); code = (*f)(code, target).operator std::string();
} }
return String(code); return runtime::CUDAModuleCreate("ptx", "ptx", ExtractFuncInfo(mod), code);
} }
TVM_REGISTER_GLOBAL("target.build.tilelang_cuda") TVM_REGISTER_GLOBAL("target.build.tilelang_cuda")
.set_body_typed(BuildTileLangCUDA); .set_body_typed(BuildTileLangCUDA);
TVM_REGISTER_GLOBAL("target.build.tl_debug_codegen") TVM_REGISTER_GLOBAL("target.build.tilelang_cuda_without_compile")
.set_body_typed(BuildTLDebug); .set_body_typed(BuildTileLangCUDAWithoutCompile);
} // namespace codegen } // namespace codegen
} // namespace tvm } // namespace tvm
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment