[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)

* remove llvm build * [Refactor] Update kernel compilation and profiling in examples - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation. - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency. - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations. * lint fix * License Update * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields. - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability. * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files - Improved comment alignment and readabilit...

[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)
* remove llvm build * [Refactor] Update kernel compilation and profiling in examples - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation. - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency. - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations. * lint fix * License Update * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields. - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability. * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files - Improved comment alignment and readabilit...
f2e99180 · Lei Wang · LeiWang1999 · 43bd9d3e · f2e99180 · f2e99180
Commit f2e99180 authored Mar 20, 2025 by Lei Wang Committed by LeiWang1999 Mar 20, 2025
20 changed files
--- a/src/layout/utils.cc
+++ b/src/layout/utils.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file layout/utils.cc
 * \brief Some arith tools for layout & fragment inference

--- a/src/layout/utils.h
+++ b/src/layout/utils.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file layout/utils.h
 * \brief Some arith tools for layout & fragment inference

--- a/src/op/bulk_copy.h
+++ b/src/op/bulk_copy.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/bulk_copy.h
 * \brief Bulk copy operator.

--- a/src/op/elem.cc
+++ b/src/op/elem.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/elem.cc
 *

--- a/src/op/elem.h
+++ b/src/op/elem.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/elem.h
 * \brief Define elment-wise operators.

--- a/src/op/gemm.cc
+++ b/src/op/gemm.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/gemm.cc
 *

--- a/src/op/gemm.h
+++ b/src/op/gemm.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/gemm.h
 * \brief Define gemm operator.

--- a/src/op/op.cc
+++ b/src/op/op.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/op.cc
 *

--- a/src/op/parallel.h
+++ b/src/op/parallel.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/parallel.h
 * \brief Infer layout from ops and parallel for

--- a/src/op/reduce.cc
+++ b/src/op/reduce.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/reduce.cc
 *

--- a/src/op/reduce.h
+++ b/src/op/reduce.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/op/reduce.h
 * \brief Define reduce operator.

--- a/src/runtime/runtime.cc
+++ b/src/runtime/runtime.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/runtime/runtime.h
 * \brief Runtime functions.

--- a/src/runtime/runtime.h
+++ b/src/runtime/runtime.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file tl/runtime/runtime.h
 * \brief Runtime functions.

--- a/src/target/codegen_cuda.cc
+++ b/src/target/codegen_cuda.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file target/codegen.cc
 */

--- a/src/target/codegen_cuda.h
+++ b/src/target/codegen_cuda.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file target/codegen.h
 * \brief Utility to generate code

--- a/src/target/codegen_hip.cc
+++ b/src/target/codegen_hip.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file target/codegen.cc
 */

--- a/src/target/codegen_hip.h
+++ b/src/target/codegen_hip.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 /*!
 * \file target/codegen.h
 * \brief Utility to generate code

--- a/src/target/cuda.h
+++ b/src/target/cuda.h
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
 /*
 * Copyright 1993-2023 NVIDIA Corporation.  All rights reserved.
 *
@@ -1150,7 +1148,7 @@ typedef enum CUpointer_attribute_enum {
             from a mempool. Otherwise returns NULL. **/
  CU_POINTER_ATTRIBUTE_MAPPING_SIZE =
      18, /**< Size of the actual underlying mapping that the pointer belongs to
-             **/
+           **/
  CU_POINTER_ATTRIBUTE_MAPPING_BASE_ADDR =
      19, /**< The start address of the mapping that the pointer belongs to **/
  CU_POINTER_ATTRIBUTE_MEMORY_BLOCK_ID =
@@ -2230,12 +2228,14 @@ typedef struct CUgraphEdgeData_st {
                    ::CU_GRAPH_KERNEL_NODE_PORT_LAUNCH_ORDER. */
  unsigned char
      to_port;        /**< This indicates what portion of the downstream node is
-                         dependent on        the upstream node or portion thereof (indicated
-                         by \c from_port). The        meaning is specific to the node type. A
-                         value of 0 in all cases means        the entirety of the downstream
-                         node is dependent on the upstream work.        <br>        Currently no node
-                         types define non-zero ports. Accordingly, this field        must be
-                         set to zero. */
+                         dependent on        the upstream node or portion thereof
+                         (indicated        by \c from_port). The        meaning is
+                         specific to        the node type. A        value of 0 in all
+                         cases        means        the        entirety of the
+                         downstream        node        is        dependent on the
+                         upstream        work.        <br>        Currently        no
+                         node        types define        non-zero ports.        Accordingly,
+                         this        field        must be        set to        zero. */
  unsigned char type; /**< This should be populated with a value from
                         ::CUgraphDependencyType. (It is typed as char due to
                         compiler-specific layout of bitfields.) See
@@ -2495,15 +2495,17 @@ typedef enum CUlaunchAttributeID_enum {
 typedef union CUlaunchAttributeValue_union {
  char pad[64]; /* Pad to 64 bytes */
  CUaccessPolicyWindow
-      accessPolicyWindow;             /**< Value of launch attribute
-                                         ::CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW. */
-  int cooperative;                    /**< Value of launch attribute
-                                         ::CU_LAUNCH_ATTRIBUTE_COOPERATIVE. Nonzero indicates a
-                                         cooperative                    kernel (see ::cuLaunchCooperativeKernel). */
-  CUsynchronizationPolicy syncPolicy; /**< Value of launch attribute
-                                         ::CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY.
-                                         ::CUsynchronizationPolicy for work
-                                         queued up in this stream */
+      accessPolicyWindow; /**< Value of launch attribute
+                             ::CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW. */
+  int cooperative;        /**< Value of launch attribute
+                             ::CU_LAUNCH_ATTRIBUTE_COOPERATIVE. Nonzero indicates a
+                             cooperative                    kernel (see
+                             ::cuLaunchCooperativeKernel). */
+  CUsynchronizationPolicy
+      syncPolicy; /**< Value of launch attribute
+                     ::CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY.
+                     ::CUsynchronizationPolicy for work
+                     queued up in this stream */
  
  /**
   *  Value of launch attribute ::CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION that
@@ -2524,8 +2526,8 @@ typedef union CUlaunchAttributeValue_union {
  CUclusterSchedulingPolicy
      clusterSchedulingPolicyPreference;      /**< Value of launch attribute
                                                 ::CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE.
-                                                 Cluster      scheduling policy preference
-                                                 for the kernel. */
+                                                 Cluster      scheduling policy
+                                                 preference      for the kernel. */
  int programmaticStreamSerializationAllowed; /**< Value of launch attribute
                                                ::CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION.
                                              */
@@ -4844,7 +4846,7 @@ typedef struct CUgraphNodeParams_st {
    CUDA_CHILD_GRAPH_NODE_PARAMS graph;    /**< Child graph node parameters. */
    CUDA_EVENT_WAIT_NODE_PARAMS eventWait; /**< Event wait node parameters. */
    CUDA_EVENT_RECORD_NODE_PARAMS
-        eventRecord; /**< Event record node parameters. */
+    eventRecord; /**< Event record node parameters. */
    CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_v2
        extSemSignal; /**< External semaphore signal node parameters. */
    CUDA_EXT_SEM_WAIT_NODE_PARAMS_v2
@@ -4854,7 +4856,7 @@ typedef struct CUgraphNodeParams_st {
    CUDA_MEM_FREE_NODE_PARAMS free; /**< Memory free node parameters. */
    CUDA_BATCH_MEM_OP_NODE_PARAMS_v2 memOp; /**< MemOp node parameters. */
    CUDA_CONDITIONAL_NODE_PARAMS
-        conditional; /**< Conditional node parameters. */
+    conditional; /**< Conditional node parameters. */
  };
  
  long long reserved2; /**< Reserved bytes. Must be zero. */
--- a/src/target/rt_mod_cpp.cc
+++ b/src/target/rt_mod_cpp.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 #include "codegen_cpp.h"

 namespace tvm {

--- a/src/target/rt_mod_cuda.cc
+++ b/src/target/rt_mod_cuda.cc
-// Copyright (c) Microsoft Corporation.
-// Licensed under the MIT License.
-
 #include "codegen_cuda.h"
 #include "runtime/cuda/cuda_module.h"

@@ -70,7 +67,7 @@ runtime::Module BuildTileLangCUDA(IRModule mod, Target target) {
  return runtime::CUDAModuleCreate(ptx, fmt, ExtractFuncInfo(mod), code);
 }

-String BuildTLDebug(IRModule mod, Target target) {
+runtime::Module BuildTileLangCUDAWithoutCompile(IRModule mod, Target target) {
  using tvm::runtime::Registry;
  bool output_ssa = false;
  CodeGenTileLangCUDA cg;
@@ -90,13 +87,13 @@ String BuildTLDebug(IRModule mod, Target target) {
  if (const auto *f = Registry::Get("tilelang_callback_cuda_postproc")) {
    code = (*f)(code, target).operator std::string();
  }
-  return String(code);
+  return runtime::CUDAModuleCreate("ptx", "ptx", ExtractFuncInfo(mod), code);
 }

 TVM_REGISTER_GLOBAL("target.build.tilelang_cuda")
    .set_body_typed(BuildTileLangCUDA);
-TVM_REGISTER_GLOBAL("target.build.tl_debug_codegen")
-    .set_body_typed(BuildTLDebug);
+TVM_REGISTER_GLOBAL("target.build.tilelang_cuda_without_compile")
+    .set_body_typed(BuildTileLangCUDAWithoutCompile);

 } // namespace codegen
 } // namespace tvm