Unverified Commit e9022290 authored by Tim Moon's avatar Tim Moon Committed by GitHub
Browse files

Support for NVRTC kernels (#138)



* Initial implementation of NVRTC infrastructure
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Initial NVRTC impl for transpose

NVRTC gives compilation errors at runtime. Everything else compiles and passes tests as expected.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Debug NVRTC transpose impl

NVRTC kernel compiles, runs, and passes tests with FP32.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Use variadic template for kernel arguments in RTC kernel launch func
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Refactoring

Added utility header for CUDA Runtime API. Optimized concat_strings function.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add helper function for regex substitutions in strings
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add option to disable NVRTC support
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add support for header includes in NVRTC kernels
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Access lazily-initialized CUDA driver lib and add option to specify CUDA header dir
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Configure NVRTC transpose kernel with simple perf model
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Revert change to tests
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Style fixes
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add prime-valued test cases
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Fix multiple definition error
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Optimize NVRTC transpose kernel for small data sizes
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Mention NVRTC in docs
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add unit tests for NVRTC and string utils
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Add comment in install docs about NVRTC

Review suggestion from @nouiz
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Debug perf model for RTC transpose kernel
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Remove NVRTC discussion from docs
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Require CUDA headers unless NVRTC is explicitly disabled
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Use diagonal coords in transpose kernel to avoid partition camping
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Use std::call_once for thread-safety
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Minor fixes
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Debug CMake error
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Remove unnecessary call_once
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Remove diagonal coordinates from transpose kernel
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Use size_t indices instead of int
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

* Review suggestions from @ptrendx

Check build-time CUDA include path for run-time CUDA headers. Handle case where CUDA context is initially uninitialized.
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>

---------
Signed-off-by: default avatarTim Moon <tmoon@nvidia.com>
parent 0d251991
/*************************************************************************
* Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
*
* See LICENSE for license information.
************************************************************************/
#ifndef TRANSFORMER_ENGINE_COMMON_UTIL_SYSTEM_H_
#define TRANSFORMER_ENGINE_COMMON_UTIL_SYSTEM_H_
#include <string>
#include "../common.h"
namespace transformer_engine {
/*! \brief Get environment variable and convert to type
*
* If the environment variable is unset or empty, a falsy value is
* returned.
*/
template <typename T = std::string>
T getenv(const std::string &variable);
/*! \brief Get environment variable and convert to type */
template <typename T = std::string>
T getenv(const std::string &variable, const T &default_value);
/*! \brief Check if a file exists and can be read */
bool file_exists(const std::string &path);
} // namespace transformer_engine
#endif // TRANSFORMER_ENGINE_COMMON_UTIL_SYSTEM_H_
......@@ -9,9 +9,21 @@
#include <cuda_bf16.h>
#include <cuda_fp16.h>
#include <cstdint>
#include <cassert>
#include <cuda_fp8.h>
#if !defined(__CUDACC_RTC__)
#include <cstdint>
#else
// Importing C++ standard headers is a pain with NVRTC
using uint8_t = unsigned char;
using uint16_t = unsigned short int; // NOLINT(*)
using uint32_t = unsigned int;
using uint64_t = unsigned long long int; // NOLINT(*)
static_assert(sizeof(uint8_t) == 1);
static_assert(sizeof(uint16_t) == 2);
static_assert(sizeof(uint32_t) == 4);
static_assert(sizeof(uint64_t) == 8);
#endif
////////////////////////////////////////////////////////////////////////////////////////////////////
......@@ -44,7 +56,7 @@ struct Sum {
template<typename T>
inline __device__ T warp_shuffle_xor(const T & x, uint32_t idx) {
return __shfl_xor_sync(uint32_t(-1), x, idx);
return __shfl_xor_sync(static_cast<uint32_t>(-1), x, idx);
}
template<>
......@@ -54,7 +66,7 @@ inline __device__ float2 warp_shuffle_xor<float2>(const float2 & x, uint32_t idx
template<typename T>
inline __device__ T warp_shuffle_down(const T & x, uint32_t idx) {
return __shfl_down_sync(uint32_t(-1), x, idx);
return __shfl_down_sync(static_cast<uint32_t>(-1), x, idx);
}
template<>
......@@ -605,8 +617,8 @@ inline __device__ void warp_chan_upd_dynamic(T &m_a, T &m2_a, T &n_a, int num_ac
m2_a = m2_ab;
}
// Intra-warp broadcast (only lane 0 has valid stats).
m_a = __shfl_sync(uint32_t(-1), m_a, 0);
m2_a = __shfl_sync(uint32_t(-1), m2_a, 0);
m_a = __shfl_sync(static_cast<uint32_t>(-1), m_a, 0);
m2_a = __shfl_sync(static_cast<uint32_t>(-1), m2_a, 0);
}
////////////////////////////////////////////////////////////////////////////////////////////////////
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment