Commit f2e99180 authored by Lei Wang's avatar Lei Wang Committed by LeiWang1999
Browse files

[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)

* remove llvm build

* [Refactor] Update kernel compilation and profiling in examples

- Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation.
- Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency.
- Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations.

* lint fix

* License Update

* [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files

- Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields.
- Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability.

* [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files

- Improved comment alignment and readability in `cuda.h`.
- Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability.

* lint fix

* lint fix

* lint fix

* lint fix

* fix

* License update

* [Enhancement] Update JITKernel to use artifact for kernel source

- Assigned the generated artifact to `self.artifact` for better management.
- Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling.

* lint fix

* Add @tilelang.testing.requires_llvm decorator to vectorization tests

* Enhance setup.py and env.py for library management

- Added functionality to remove original files after copying in CMakeBuild.
- Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration.

* Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py

* Refactor CMakeBuild file handling in setup.py

- Added a check to ensure the target library directory exists before copying .so files.
- Improved the logic for creating the target directory and copying files to enhance robustness.

* bugfix

* Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement.

* lint fix

* Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility.

* lint fix

* Add support for C target in device code generation

- Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function.

* [Enhancement] Implement auto-clear cache feature based on environment variable

* Added TILELANG_CLEAR_CACHE environment variable to control cache clearing.
* Updated CI workflow to set TILELANG_CLEAR_CACHE during testing.
* Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true.

* [Refactor] Update kernel invocation and import paths in tests and cache

* Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result.
* Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`.
* Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability.

* [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py

* Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class.
* Enhanced overall code formatting to align with project standards.

* [Enhancement] Add bfloat16 test case and improve kernel caching logic

* Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`.
* Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading.
* Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management.
* Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database.
* Improved code formatting and readability across several files.

* lint fix

* Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
parent 43bd9d3e
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#if defined(__linux__)
#include <sys/stat.h>
#endif
......@@ -14,108 +11,6 @@
namespace tvm {
namespace codegen {
#define HIPRTC_CALL(x) \
\
{ \
\
hiprtcResult result = x; \
\
if (result != HIPRTC_SUCCESS) { \
\
LOG(FATAL) \
<< "HiprtcError: " #x " failed with error: " \
<< hiprtcGetErrorString(result); \
\
\
} \
\
\
}
static std::string FindHIPIncludePath() {
#if defined(_WIN32)
const std::string delimiter = "\\";
#else
const std::string delimiter = "/";
#endif
std::string hip_include_path;
const char *hip_path_env = std::getenv("HIP_PATH");
if (hip_path_env != nullptr) {
hip_include_path += hip_path_env;
hip_include_path += delimiter + "include";
return hip_include_path;
}
#if defined(__linux__)
struct stat st;
hip_include_path = "/opt/rocm/hip/include";
if (stat(hip_include_path.c_str(), &st) == 0) {
return hip_include_path;
}
if (stat("/usr/include/hip/hip_runtime.h", &st) == 0) {
return "/usr/include/hip";
}
#endif
LOG(FATAL) << "Cannot find HIP include path."
<< "HIP_PATH is not set or ROCm is not installed in the default "
"installation path."
<< "In other than linux, it is necessary to set HIP_PATH.";
return hip_include_path;
}
static std::string HIPRTCCompile(const std::string &code,
bool include_path = false) {
std::vector<std::string> compile_params;
std::vector<const char *> param_cstrings{};
hiprtcProgram prog;
std::string cc =
"gfx900"; // Default target architecture (can be changed as needed)
int major, minor;
hipError_t e1 = hipDeviceGetAttribute(
&major, hipDeviceAttributeComputeCapabilityMajor, 0);
hipError_t e2 = hipDeviceGetAttribute(
&minor, hipDeviceAttributeComputeCapabilityMinor, 0);
if (e1 == hipSuccess && e2 == hipSuccess) {
cc = "gfx" + std::to_string(major * 100 + minor * 10);
} else {
LOG(WARNING) << "cannot detect compute capability from your device, "
<< "fall back to gfx900.";
}
compile_params.push_back("--gpu-architecture=" + cc);
if (include_path) {
std::string include_option = "--include-path=" + FindHIPIncludePath();
compile_params.push_back(include_option);
}
for (const auto &string : compile_params) {
param_cstrings.push_back(string.c_str());
}
HIPRTC_CALL(
hiprtcCreateProgram(&prog, code.c_str(), nullptr, 0, nullptr, nullptr));
hiprtcResult compile_res =
hiprtcCompileProgram(prog, param_cstrings.size(), param_cstrings.data());
size_t log_size;
HIPRTC_CALL(hiprtcGetProgramLogSize(prog, &log_size));
std::string log;
log.resize(log_size);
HIPRTC_CALL(hiprtcGetProgramLog(prog, &log[0]));
ICHECK_EQ(compile_res, HIPRTC_SUCCESS) << log;
size_t code_size;
HIPRTC_CALL(hiprtcGetCodeSize(prog, &code_size));
std::string code_out;
code_out.resize(code_size);
HIPRTC_CALL(hiprtcGetCode(prog, &code_out[0]));
HIPRTC_CALL(hiprtcDestroyProgram(&prog));
return code_out;
}
static std::unordered_map<std::string, runtime::FunctionInfo>
ExtractFuncInfo(const IRModule &mod) {
std::unordered_map<std::string, runtime::FunctionInfo> fmap;
......@@ -173,13 +68,36 @@ runtime::Module BuildTileLangHIP(IRModule mod, Target target) {
if (ptx[0] != '/')
fmt = "hsaco";
} else {
ptx = HIPRTCCompile(code, false);
ICHECK(false) << "tilelang_callback_hip_compile is not set";
}
return ROCMModuleCreate(ptx, fmt, ExtractFuncInfo(mod), code, std::string());
}
String BuildTileLangHIPWithoutCompile(IRModule mod, Target target) {
using tvm::runtime::Registry;
bool output_ssa = false;
CodeGenTileLangHIP cg;
cg.Init(output_ssa);
for (auto kv : mod->functions) {
ICHECK(kv.second->IsInstance<PrimFuncNode>())
<< "CodeGenTileLangHIP: Can only take PrimFunc";
auto f = Downcast<PrimFunc>(kv.second);
auto calling_conv = f->GetAttr<Integer>(tvm::attr::kCallingConv);
ICHECK(calling_conv == CallingConv::kDeviceKernelLaunch);
cg.AddFunction(f);
}
std::string code = cg.Finish();
if (const auto *f = Registry::Get("tilelang_callback_hip_postproc")) {
code = (*f)(code, target).operator std::string();
}
return String(code);
}
TVM_REGISTER_GLOBAL("target.build.tilelang_hip")
.set_body_typed(BuildTileLangHIP);
TVM_REGISTER_GLOBAL("target.build.tilelang_hip_without_compile")
.set_body_typed(BuildTileLangHIPWithoutCompile);
} // namespace codegen
} // namespace tvm
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/target/utils.cc
* \brief helper functions for target attributes.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
/*!
* \file tl/target/utils.h
* \brief helper functions for target attributes.
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include <cuda_runtime.h>
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include <cuda.h>
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include <cute/numeric/numeric_types.hpp>
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#if (defined(__CUDA_ARCH_LIST__) && (__CUDA_ARCH_LIST__ >= 900))
#include "gemm_sm90.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include <cutlass/cutlass.h>
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include <cute/algorithm/clear.hpp>
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include <cute/arch/mma_sm80.hpp>
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include <ck_tile/core.hpp>
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
\ No newline at end of file
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
#pragma once
#include "common.h"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment