"...resnet50_tensorflow.git" did not exist on "f71e17d2934aa30f01db3e1013131cb76f56a365"
Commit f2e99180 authored by Lei Wang's avatar Lei Wang Committed by LeiWang1999
Browse files

[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)

* remove llvm build

* [Refactor] Update kernel compilation and profiling in examples

- Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation.
- Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency.
- Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations.

* lint fix

* License Update

* [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files

- Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields.
- Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability.

* [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files

- Improved comment alignment and readability in `cuda.h`.
- Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability.

* lint fix

* lint fix

* lint fix

* lint fix

* fix

* License update

* [Enhancement] Update JITKernel to use artifact for kernel source

- Assigned the generated artifact to `self.artifact` for better management.
- Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling.

* lint fix

* Add @tilelang.testing.requires_llvm decorator to vectorization tests

* Enhance setup.py and env.py for library management

- Added functionality to remove original files after copying in CMakeBuild.
- Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration.

* Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py

* Refactor CMakeBuild file handling in setup.py

- Added a check to ensure the target library directory exists before copying .so files.
- Improved the logic for creating the target directory and copying files to enhance robustness.

* bugfix

* Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement.

* lint fix

* Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility.

* lint fix

* Add support for C target in device code generation

- Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function.

* [Enhancement] Implement auto-clear cache feature based on environment variable

* Added TILELANG_CLEAR_CACHE environment variable to control cache clearing.
* Updated CI workflow to set TILELANG_CLEAR_CACHE during testing.
* Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true.

* [Refactor] Update kernel invocation and import paths in tests and cache

* Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result.
* Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`.
* Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability.

* [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py

* Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class.
* Enhanced overall code formatting to align with project standards.

* [Enhancement] Add bfloat16 test case and improve kernel caching logic

* Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`.
* Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading.
* Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management.
* Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database.
* Improved code formatting and readability across several files.

* lint fix

* Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
parent 43bd9d3e
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file layout/utils.cc
* \brief Some arith tools for layout & fragment inference
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file layout/utils.h
* \brief Some arith tools for layout & fragment inference
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/bulk_copy.h
* \brief Bulk copy operator.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/elem.cc
*
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/elem.h
* \brief Define elment-wise operators.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/gemm.cc
*
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/gemm.h
* \brief Define gemm operator.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/op.cc
*
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/parallel.h
* \brief Infer layout from ops and parallel for
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/reduce.cc
*
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/op/reduce.h
* \brief Define reduce operator.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/runtime/runtime.h
* \brief Runtime functions.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/runtime/runtime.h
* \brief Runtime functions.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file target/codegen.cc
*/
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file target/codegen.h
* \brief Utility to generate code
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file target/codegen.cc
*/
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file target/codegen.h
* \brief Utility to generate code
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*
* Copyright 1993-2023 NVIDIA Corporation. All rights reserved.
*
......@@ -1150,7 +1148,7 @@ typedef enum CUpointer_attribute_enum {
from a mempool. Otherwise returns NULL. **/
CU_POINTER_ATTRIBUTE_MAPPING_SIZE =
18, /**< Size of the actual underlying mapping that the pointer belongs to
**/
**/
CU_POINTER_ATTRIBUTE_MAPPING_BASE_ADDR =
19, /**< The start address of the mapping that the pointer belongs to **/
CU_POINTER_ATTRIBUTE_MEMORY_BLOCK_ID =
......@@ -2230,12 +2228,14 @@ typedef struct CUgraphEdgeData_st {
::CU_GRAPH_KERNEL_NODE_PORT_LAUNCH_ORDER. */
unsigned char
to_port; /**< This indicates what portion of the downstream node is
dependent on the upstream node or portion thereof (indicated
by \c from_port). The meaning is specific to the node type. A
value of 0 in all cases means the entirety of the downstream
node is dependent on the upstream work. <br> Currently no node
types define non-zero ports. Accordingly, this field must be
set to zero. */
dependent on the upstream node or portion thereof
(indicated by \c from_port). The meaning is
specific to the node type. A value of 0 in all
cases means the entirety of the
downstream node is dependent on the
upstream work. <br> Currently no
node types define non-zero ports. Accordingly,
this field must be set to zero. */
unsigned char type; /**< This should be populated with a value from
::CUgraphDependencyType. (It is typed as char due to
compiler-specific layout of bitfields.) See
......@@ -2495,15 +2495,17 @@ typedef enum CUlaunchAttributeID_enum {
typedef union CUlaunchAttributeValue_union {
char pad[64]; /* Pad to 64 bytes */
CUaccessPolicyWindow
accessPolicyWindow; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW. */
int cooperative; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_COOPERATIVE. Nonzero indicates a
cooperative kernel (see ::cuLaunchCooperativeKernel). */
CUsynchronizationPolicy syncPolicy; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY.
::CUsynchronizationPolicy for work
queued up in this stream */
accessPolicyWindow; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW. */
int cooperative; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_COOPERATIVE. Nonzero indicates a
cooperative kernel (see
::cuLaunchCooperativeKernel). */
CUsynchronizationPolicy
syncPolicy; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY.
::CUsynchronizationPolicy for work
queued up in this stream */
 
/**
* Value of launch attribute ::CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION that
......@@ -2524,8 +2526,8 @@ typedef union CUlaunchAttributeValue_union {
CUclusterSchedulingPolicy
clusterSchedulingPolicyPreference; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE.
Cluster scheduling policy preference
for the kernel. */
Cluster scheduling policy
preference for the kernel. */
int programmaticStreamSerializationAllowed; /**< Value of launch attribute
::CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION.
*/
......@@ -4844,7 +4846,7 @@ typedef struct CUgraphNodeParams_st {
CUDA_CHILD_GRAPH_NODE_PARAMS graph; /**< Child graph node parameters. */
CUDA_EVENT_WAIT_NODE_PARAMS eventWait; /**< Event wait node parameters. */
CUDA_EVENT_RECORD_NODE_PARAMS
eventRecord; /**< Event record node parameters. */
eventRecord; /**< Event record node parameters. */
CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_v2
extSemSignal; /**< External semaphore signal node parameters. */
CUDA_EXT_SEM_WAIT_NODE_PARAMS_v2
......@@ -4854,7 +4856,7 @@ typedef struct CUgraphNodeParams_st {
CUDA_MEM_FREE_NODE_PARAMS free; /**< Memory free node parameters. */
CUDA_BATCH_MEM_OP_NODE_PARAMS_v2 memOp; /**< MemOp node parameters. */
CUDA_CONDITIONAL_NODE_PARAMS
conditional; /**< Conditional node parameters. */
conditional; /**< Conditional node parameters. */
};
 
long long reserved2; /**< Reserved bytes. Must be zero. */
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#include "codegen_cpp.h"
namespace tvm {
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#include "codegen_cuda.h"
#include "runtime/cuda/cuda_module.h"
......@@ -70,7 +67,7 @@ runtime::Module BuildTileLangCUDA(IRModule mod, Target target) {
return runtime::CUDAModuleCreate(ptx, fmt, ExtractFuncInfo(mod), code);
}
String BuildTLDebug(IRModule mod, Target target) {
runtime::Module BuildTileLangCUDAWithoutCompile(IRModule mod, Target target) {
using tvm::runtime::Registry;
bool output_ssa = false;
CodeGenTileLangCUDA cg;
......@@ -90,13 +87,13 @@ String BuildTLDebug(IRModule mod, Target target) {
if (const auto *f = Registry::Get("tilelang_callback_cuda_postproc")) {
code = (*f)(code, target).operator std::string();
}
return String(code);
return runtime::CUDAModuleCreate("ptx", "ptx", ExtractFuncInfo(mod), code);
}
TVM_REGISTER_GLOBAL("target.build.tilelang_cuda")
.set_body_typed(BuildTileLangCUDA);
TVM_REGISTER_GLOBAL("target.build.tl_debug_codegen")
.set_body_typed(BuildTLDebug);
TVM_REGISTER_GLOBAL("target.build.tilelang_cuda_without_compile")
.set_body_typed(BuildTileLangCUDAWithoutCompile);
} // namespace codegen
} // namespace tvm
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment