Unverified Commit 3e9711f0 authored by arai713's avatar arai713 Committed by GitHub
Browse files

CK Instance Gen (#1145)



* Format

* Format

* Format

* Remove const

* Use the right template

* Format

* Format

* add row/col instances

* Add missing file

* fixed

* fixing block to etile error

* Format

* Updates

* Format

* fixed rrr layout

* generating a sample JSON file: currently contains includes, prologue/epilogue and instances

* version where the json is passed into the instances to generate a key

* updated run function to just launch kernel

* updated run function: only contains kernel object, json file is updated but still needs to be cleaned up, added front-end API to parse JSON into character buffer

* adding in testing files

* cleaned up comments, still need to work on including header files

* removed unneeded files

* removed/commented out JSON implementation

* added fusion(prologue/epilogue) into instance generation

* working on instance selection

* added instance selection, need to fix instance validation

* removed block2etile map validity check for testing purposes

* test running: failing due to incorrect files/input

* all grid descs/ptrs completed, but device file not found

* Update test and embed modules

* Restore older version

* added convolution operation, written test, debugging generated code for compilation

* attempting to include CK in host directory: _Float16 error

* CK header file issues

* slight fix

* don't crash when hip can't report total memory

* dump generated code to a file

* changing sizes

* creating tensor descriptors using CK methods: set up grid desc manually, also trying to set up an argument pointer - this needs to be fixed

* some fixes to call the device code

* separating test files for conv and gemm

* completed arg ptr, now have linking errors

* clang format fix

* resolved linker issues in conv test

* remove dependency on libutility from ck

* resolved num dim error

* properly passing arg ptr, errors with passing typenames: redefinition/redeclaration

* undo the commenting of device function

* hand created kernel code to find rtc issues

* dump the full src to file

* resolved redeclaration errors, cleaned up errors for Amber's kernel code

* debugging purposes: redeclaration error

* config files

* resolved errors for NumTensor and redeclaration, formatted version.h

* resolved most errors in manually added kernel and my own. error with calling kernel object: overloaded function type

* WIP: close to getting kernel compiled

* WIP: fixing rtc errors

* fixed sequence errors, formatting, still one error with run fcn

* yay: kernel compiles and runs

* updated templated/generated version to run and compile

* minor fixes

* working generated example, resolved memory access error due to padding

* adding in reference kernel, validation failing against reference

* debugging: printing kernel argsz

* reduced error in results

* debugged reference kernel and output errors, added to generated version, currently debugging prologue function issues

* working validation (using reference convolution) with prologue function for both hard-coded and generated version

* WIP: create an alt version that creates Argument on the device

* wip: added new duplicate files, fixed fusion templating errors from working example, setting up kernel arguments

* wip: making necessary methods device code

* added grid descs, working on grid pointers, errors with stl numerics

* wip: updating kernel args - issue, replacing some std functions

* replaced std::accumulate call with temp hardcoded version

* wip: args causing memory issue

* Construct Argument object inside the kernel and use it to call convolution device function. Code runs and verification passes

* adding object file dump

* temporary hardcoding of grid size, can remove device op inst + arg ptr

* minor fix for grid size

* added modified example where arg ptr is created on the device for generated version as well

* removed device op instance and arg ptr from modified examples

* moving device op file for testing purposes and to properly build CK

* commenting out print-outs

* adjust compiler args to produce a valid ELF file

* temporary removal of validation

* reverting compiler args back for working example

* retrieve necessary arguments from generated template parameters in correct format

* calculating grid size on host-side, still need to clean up process, pass parameters to host functions properly

* scaled up factory functions/wrapper structs to implement host-side launch parameter calculations using CK host side functions - in hard-coded example

* temporary change to generate ELF format binary object file

* removed unecessary code, added comments

* formatting fix

* cleaned up code, added new tests, restructured library: move helper into CK

* refactored launch parameter calculation to be more concise

* renamed files and variables for more clarity/uniformity

* more code cleaning, removed debug statements

* moved majority of my files into codegen directory, running properly

* updated Embed.cmake(string_view) in codegen directory

* updated host directory to match Embed.cmake as well

* added old tests in

* updated instance generation methods to be more concise

* removed layout from launch parameter calculation

* working test

* fixed issue with verification, all instances working

* updated verification in other tests

* removed duplicate matrix padder file, removed code dumps

* removed old hard-coded tests

* removed old host directory, all files in codegen directory now

* fixed copyright in files

* commenting out validation

* renamed files

* made changes for review: fixed copyright, renamed files for clarity, removed comments, refactored code

* updated headers

* removing duplicate file for fwd conv to gemm, merging with original file

* fix building codegen with clang++ directly

* resolving build error from conv_fwd_to_gemm

* fix for previous error

* renaming tests

* created common test file

* cleaned up code, added comments

* renamed device op

* fixed typos in comments

* removed extra space

* code cleanup: resolving Amber's comments

* removed wrapper struct for matrix padder, fixed template

* cleaned up if statements for better readability

---------
Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
Co-authored-by: default avatarJing Zhang <jizha@amd.com>
Co-authored-by: default avatarM. Amber Hassaan <amber_474@yahoo.com>
Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
parent cb138394
cmake_minimum_required(VERSION 3.16) cmake_minimum_required(VERSION 3.16)
project(composable_kernel_host) project(composable_kernel_host LANGUAGES CXX HIP)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
...@@ -12,6 +12,22 @@ find_package(ROCM) ...@@ -12,6 +12,22 @@ find_package(ROCM)
include(ROCMInstallTargets) include(ROCMInstallTargets)
include(ROCMTest) include(ROCMTest)
add_compile_options(-std=c++17)
find_package(hip)
## HIP
set(CMAKE_HIP_PLATFORM amd)
set(CMAKE_HIP_COMPILER ${CMAKE_CXX_COMPILER})
set(CMAKE_HIP_EXTENSIONS ON)
message("CMAKE_HIP_COMPILER: ${CMAKE_HIP_COMPILER}")
# add include directories
include_directories(BEFORE
${PROJECT_BINARY_DIR}/include
${PROJECT_SOURCE_DIR}/include
${PROJECT_SOURCE_DIR}/library/include
${HIP_INCLUDE_DIRS}
)
list(APPEND CMAKE_MODULE_PATH ${CK_ROOT}/cmake) list(APPEND CMAKE_MODULE_PATH ${CK_ROOT}/cmake)
include(Embed) include(Embed)
file(GLOB_RECURSE KERNEL_FILES CONFIGURE_DEPENDS file(GLOB_RECURSE KERNEL_FILES CONFIGURE_DEPENDS
...@@ -20,8 +36,6 @@ message(STATUS "KERNEL_FILES: ${KERNEL_FILES}") ...@@ -20,8 +36,6 @@ message(STATUS "KERNEL_FILES: ${KERNEL_FILES}")
message(STATUS "RELATIVE: ${CK_ROOT}/include") message(STATUS "RELATIVE: ${CK_ROOT}/include")
add_embed_library(ck_headers ${KERNEL_FILES} RELATIVE ${CK_ROOT}/include) add_embed_library(ck_headers ${KERNEL_FILES} RELATIVE ${CK_ROOT}/include)
add_definitions(-std=c++17)
file(GLOB SOURCES CONFIGURE_DEPENDS src/*.cpp) file(GLOB SOURCES CONFIGURE_DEPENDS src/*.cpp)
# TODO: Use object library # TODO: Use object library
add_library(ck_host STATIC ${SOURCES}) add_library(ck_host STATIC ${SOURCES})
......
...@@ -5,24 +5,27 @@ ...@@ -5,24 +5,27 @@
#include <unordered_map> #include <unordered_map>
#include <vector> #include <vector>
#include "ck/host/device_gemm_multiple_d/operation.hpp" #include "ck/host/device_gemm_multiple_d/operation.hpp"
#include "ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_op.hpp"
#include "ck/host/stringutils.hpp" #include "ck/host/stringutils.hpp"
using ck::host::Transform; using ck::host::Transform;
struct Emitters struct Emitters
{ {
// retrieve the hard-coded instances provided, template them, and then store them in a map
std::unordered_map<std::string, std::function<std::vector<std::string>()>> m; std::unordered_map<std::string, std::function<std::vector<std::string>()>> m;
template <class T> template <class T>
void Register(const std::string& name) void Register(const std::string& name, const std::string& prologue, const std::string& epilogue)
{ {
m[name] = [] { m[name] = [&] {
auto configs = T::CreateOperations(); auto configs = T::CreateOperations(prologue, epilogue);
return Transform(configs, [](const auto& ops) { return ToTuple(ops); }); return Transform(configs, [](const auto& ops) { return ToTuple(ops); });
}; };
} }
// takes in an operation instance and uses it to substitute the correct values into the template
template <class T> template <class T>
static std::string ToTuple(const T& ops) static std::string ToTuple(const T& ops)
{ {
...@@ -31,6 +34,7 @@ struct Emitters ...@@ -31,6 +34,7 @@ struct Emitters
return "std::tuple<\n" + ck::host::JoinStrings(templates, ",\n") + ">"; return "std::tuple<\n" + ck::host::JoinStrings(templates, ",\n") + ">";
} }
// Join together all the strings in the map
std::string Emit(const std::string& name) { return ck::host::JoinStrings(m.at(name)(), "\n"); } std::string Emit(const std::string& name) { return ck::host::JoinStrings(m.at(name)(), "\n"); }
std::vector<std::string> List() const std::vector<std::string> List() const
...@@ -43,9 +47,38 @@ int main(int argc, const char* argv[]) ...@@ -43,9 +47,38 @@ int main(int argc, const char* argv[])
{ {
std::string prog = argv[0]; std::string prog = argv[0];
std::vector<std::string> args(argv + 1, argv + argc); std::vector<std::string> args(argv + 1, argv + argc);
// Specify problem type and problem size
ck::host::device_gemm_multiple_d::Problem prob;
prob.M = 1024;
prob.N = 1024;
prob.K = 1024;
// user provided fusion
std::string prologue = "";
std::string epilogue = R"(
struct Epilogue
{
__host__ __device__ Epilogue(float alpha, float beta) : alpha_(alpha), beta_(beta){};
template <typename E, typename D>
__host__ __device__ constexpr void operator()(E& e, const D& d) const;
template <>
__host__ __device__ constexpr void operator()<ck::half_t, ck::half_t>(ck::half_t& e,
const ck::half_t& d) const
{
e = ck::type_convert<ck::half_t>(alpha_ * e + beta_ * ck::type_convert<float>(d));
}
float alpha_;
float beta_;
};)";
// Load in operations into the Register
Emitters e; Emitters e;
e.Register<ck::host::device_gemm_multiple_d::Operation_Xdl_CShuffle>( e.Register<ck::host::device_gemm_multiple_d::Operation_Xdl_CShuffle>(
"DeviceGemmMultipleD_Xdl_CShuffle"); "DeviceGemmMultipleD_Xdl_CShuffle", prologue, epilogue);
if(args.empty() or std::any_of(args.begin(), args.end(), [](auto arg) { if(args.empty() or std::any_of(args.begin(), args.end(), [](auto arg) {
return arg == "-h" or arg == "--help"; return arg == "-h" or arg == "--help";
...@@ -64,6 +97,7 @@ int main(int argc, const char* argv[]) ...@@ -64,6 +97,7 @@ int main(int argc, const char* argv[])
return 0; return 0;
} }
// print out all the instances for the operation that was chosen at the command line
for(auto name : args) for(auto name : args)
std::cout << e.Emit(name) << std::endl; std::cout << e.Emit(name) << std::endl;
......
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once #pragma once
......
...@@ -14,10 +14,15 @@ namespace ck { ...@@ -14,10 +14,15 @@ namespace ck {
namespace host { namespace host {
namespace device_gemm_multiple_d { namespace device_gemm_multiple_d {
// defines all values need for an instance of fwd conv
struct Operation_Xdl_CShuffle struct Operation_Xdl_CShuffle
{ {
static std::vector<std::vector<Operation_Xdl_CShuffle>> CreateOperations(); // returns a vector of instances, only given fusion operators: will use default problem spec
static std::vector<Operation_Xdl_CShuffle> CreateOperations(const Problem& prob); static std::vector<std::vector<Operation_Xdl_CShuffle>>
CreateOperations(const std::string& prologue, const std::string& epilogue);
// returns a vector of instances, given a problem spec and fusion operators
static std::vector<Operation_Xdl_CShuffle>
CreateOperations(const Problem& prob, const std::string& prologue, const std::string& epilogue);
TensorDesc A{}; TensorDesc A{};
TensorDesc B{}; TensorDesc B{};
DataType acc = DataType::Float; DataType acc = DataType::Float;
...@@ -27,13 +32,21 @@ struct Operation_Xdl_CShuffle ...@@ -27,13 +32,21 @@ struct Operation_Xdl_CShuffle
std::string a_elem_op = PassThrough; std::string a_elem_op = PassThrough;
std::string b_elem_op = PassThrough; std::string b_elem_op = PassThrough;
std::string cde_elem_op = Bilinear; std::string cde_elem_op = Bilinear;
std::string prologue = "";
std::string epilogue = "";
std::string gemm_specialization = "ck::tensor_operation::device::GemmSpecialization::Default"; std::string gemm_specialization = "ck::tensor_operation::device::GemmSpecialization::Default";
// tuning parameters
operation::TileDesc tile_desc{}; operation::TileDesc tile_desc{};
operation::BlockTransferDesc a_block_transfer{}; operation::BlockTransferDesc a_block_transfer{};
operation::BlockTransferDesc b_block_transfer{}; operation::BlockTransferDesc b_block_transfer{};
operation::CShuffleDesc cshuffle{}; operation::CShuffleDesc cshuffle{};
operation::CBlockTransferDesc c_block_transfer{}; operation::CBlockTransferDesc c_block_transfer{};
// functions to update fusion operators if provided
void update_prologue(const std::string& prologue);
void update_epilogue(const std::string& epilogue);
/**constexpr**/ bool IsSupported(std::size_t MRaw_, std::size_t NRaw_, std::size_t KRaw_);
// returns a templated instance
Solution ToSolution() const; Solution ToSolution() const;
}; };
......
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once #pragma once
...@@ -12,11 +12,14 @@ namespace ck { ...@@ -12,11 +12,14 @@ namespace ck {
namespace host { namespace host {
namespace device_gemm_multiple_d { namespace device_gemm_multiple_d {
// defines the problem specification for a GEMM operation
struct Problem struct Problem
{ {
// dimensions for GEMM operation
std::size_t M = 0; std::size_t M = 0;
std::size_t N = 0; std::size_t N = 0;
std::size_t K = 0; std::size_t K = 0;
// layouts for tensors
bool TransA = false; bool TransA = false;
bool TransB = false; bool TransB = false;
bool TransE = false; bool TransE = false;
...@@ -29,9 +32,13 @@ struct Problem ...@@ -29,9 +32,13 @@ struct Problem
std::string BElementOp = PassThrough; std::string BElementOp = PassThrough;
std::string CDEElementOp = PassThrough; std::string CDEElementOp = PassThrough;
// returns the correct device op file for the operation
std::string GetIncludeHeader() const; std::string GetIncludeHeader() const;
std::vector<Solution> GetSolutions(const std::string& arch) const; // returns a list of instances based on the problem spec and provided fusion operations
std::vector<Solution> GetSolutions(const std::string& arch,
const std::string& prologue,
const std::string& epilogue) const;
}; };
} // namespace device_gemm_multiple_d } // namespace device_gemm_multiple_d
......
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once
#include <cstdlib>
#include <vector>
#include <string>
#include "ck/host/types.hpp"
#include "ck/host/operation/gemm.hpp"
#include "ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_problem.hpp"
namespace ck {
namespace host {
namespace conv {
// defines the values needed for an instance of forward convolution and functions to return
// (templated) instances
struct Operation_Conv_Fwd_Xdl_Cshuffle
{
// returns a vector of instances given the fusion operations, uses default values for problem
// spec
static std::vector<Operation_Conv_Fwd_Xdl_Cshuffle>
CreateOperations(const std::string& prologue, const std::string& epilogue);
// returns a vector of instances, provided with a problem spec and fusion operations
static std::vector<Operation_Conv_Fwd_Xdl_Cshuffle> CreateOperations(
const Problem_Conv_Fwd& prob, const std::string& prologue, const std::string& epilogue);
std::size_t NumDim;
TensorDesc A{};
TensorDesc B{};
DataType acc = DataType::Float;
DataType cs_type = DataType::Half;
std::vector<TensorDesc> Ds = {};
TensorDesc E{};
std::string a_elem_op = PassThrough;
std::string b_elem_op = PassThrough;
std::string cde_elem_op = PassThrough;
std::string prologue = "";
std::string epilogue = "";
std::string conv_specialization =
"ck::tensor_operation::device::ConvolutionForwardSpecialization::Default";
std::string gemm_specialization =
"ck::tensor_operation::device::GemmSpecialization::MNKPadding";
// tuning parameters
operation::TileDesc tile_desc{};
operation::BlockTransferDesc a_block_transfer{};
operation::BlockTransferDesc b_block_transfer{};
operation::CShuffleDesc cshuffle{};
operation::CBlockTransferDesc c_block_transfer{};
// functions to update fusion operations if they are provided
void update_prologue(const std::string& prologue);
void update_epilogue(const std::string& epilogue);
// returns a templated instance
Solution ToSolution() const;
};
} // namespace conv
} // namespace host
} // namespace ck
// SPDX-License-Identifier: MIT
// Copyright (c) 2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once
#include <cstdlib>
#include <vector>
#include <memory>
#include <sstream>
#include <iterator>
#include <numeric>
#include "ck/host/types.hpp"
namespace ck {
namespace host {
namespace conv {
// defines the problem specification for a forward convolution operation
struct Problem_Conv_Fwd
{
std::size_t NumDim = 0;
// size of a forward convolution operation
std::size_t G = 0;
std::size_t N = 0;
std::size_t C = 0;
std::size_t Hi = 0;
std::size_t Wi = 0;
std::size_t Ho = 0;
std::size_t Wo = 0;
std::size_t K = 0;
std::size_t Y = 0;
std::size_t X = 0;
Layout ALayout = Layout::NHWGC;
Layout BLayout = Layout::GKYXC;
Layout ELayout = Layout::NHWGK;
std::vector<Layout> DsLayout = {};
DataType ADataType = DataType::Half;
DataType BDataType = DataType::Half;
DataType EDataType = DataType::Half;
std::vector<DataType> DsDataType = {};
std::string AElementOp = "ck::tensor_operation::element_wise::PassThrough";
std::string BElementOp = "ck::tensor_operation::element_wise::PassThrough";
std::string CDEElementOp = "ck::tensor_operation::element_wise::PassThrough";
// returns the correct device op file for the operation
std::string GetIncludeHeader() const;
// returns a list of instances based on the problem spec and provided fusion operations
std::vector<Solution> GetSolutions(const std::string& arch,
const std::string& prologue,
const std::string& epilogue) const;
};
} // namespace conv
} // namespace host
} // namespace ck
...@@ -4,7 +4,6 @@ ...@@ -4,7 +4,6 @@
#pragma once #pragma once
#include <string> #include <string>
#include <string_view>
#include <utility> #include <utility>
#include <unordered_map> #include <unordered_map>
#include <vector> #include <vector>
......
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once #pragma once
......
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once #pragma once
......
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once #pragma once
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
namespace ck { namespace ck {
namespace host { namespace host {
// holds the templated instance, substitues values into template from instancess
struct Solution struct Solution
{ {
...@@ -33,6 +34,7 @@ struct Solution ...@@ -33,6 +34,7 @@ struct Solution
std::unordered_map<std::string, std::string> template_values; std::unordered_map<std::string, std::string> template_values;
}; };
// supported data types
enum class DataType enum class DataType
{ {
Half, Half,
...@@ -40,22 +42,28 @@ enum class DataType ...@@ -40,22 +42,28 @@ enum class DataType
Int8, Int8,
Int32 Int32
}; };
std::string ToString(DataType dt); std::string ToString(DataType dt);
// supported layouts: gemm and fwd conv
enum class Layout enum class Layout
{ {
Row, Row,
Column Column,
GKYXC,
GKCYX,
GNHWK,
GNHWC,
NHWGC,
NHWGK
}; };
std::string ToString(Layout dl); std::string ToString(Layout dl);
Layout ToLayout(bool Trans); // returns the layout for gemm
// supported GEMM types
enum class GemmType enum class GemmType
{ {
Default Default
}; };
std::string ToString(GemmType gt); std::string ToString(GemmType gt);
struct TensorDesc struct TensorDesc
......
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#pragma once #pragma once
#include <cstdint> #include <cstdint>
#include <unordered_set> #include <unordered_set>
#include <numeric>
#include <iterator>
namespace ck { namespace ck {
namespace host { namespace host {
...@@ -12,6 +14,5 @@ namespace host { ...@@ -12,6 +14,5 @@ namespace host {
std::size_t integer_divide_ceil(std::size_t x, std::size_t y); std::size_t integer_divide_ceil(std::size_t x, std::size_t y);
const std::unordered_set<std::string>& get_xdlop_archs(); const std::unordered_set<std::string>& get_xdlop_archs();
} // namespace host } // namespace host
} // namespace ck } // namespace ck
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/host/device_gemm_multiple_d/problem.hpp" #include "ck/host/device_gemm_multiple_d/problem.hpp"
#include "ck/host/device_gemm_multiple_d/operation.hpp" #include "ck/host/device_gemm_multiple_d/operation.hpp"
...@@ -11,19 +11,24 @@ namespace ck { ...@@ -11,19 +11,24 @@ namespace ck {
namespace host { namespace host {
namespace device_gemm_multiple_d { namespace device_gemm_multiple_d {
// return the relevant device op file based on the operation
std::string Problem::GetIncludeHeader() const std::string Problem::GetIncludeHeader() const
{ {
return "ck/tensor_operation/gpu/device/impl/device_gemm_multiple_d_xdl_cshuffle.hpp"; return "ck/tensor_operation/gpu/device/impl/device_gemm_multiple_d_xdl_cshuffle.hpp";
} }
std::vector<Solution> Problem::GetSolutions(const std::string& arch) const // returns templated instances when provided with a problem specification
std::vector<Solution> Problem::GetSolutions(const std::string& arch,
const std::string& prologue,
const std::string& epilogue) const
{ {
if(get_xdlop_archs().count(arch) == 0) if(get_xdlop_archs().count(arch) == 0)
return {}; return {};
auto ops = ck::host::device_gemm_multiple_d::Operation_Xdl_CShuffle::CreateOperations(*this); auto ops = ck::host::device_gemm_multiple_d::Operation_Xdl_CShuffle::CreateOperations(
*this, prologue, epilogue); // obtains vector of instances
std::vector<Solution> result; std::vector<Solution> result;
std::transform(ops.begin(), ops.end(), std::back_inserter(result), [&](const auto& op) { std::transform(ops.begin(), ops.end(), std::back_inserter(result), [&](const auto& op) {
return op.ToSolution(); return op.ToSolution(); // template instance with correct values
}); });
return result; return result;
} }
......
...@@ -10,6 +10,7 @@ namespace ck { ...@@ -10,6 +10,7 @@ namespace ck {
namespace host { namespace host {
namespace device_gemm_multiple_d { namespace device_gemm_multiple_d {
// calculate appropriate Gemm Specification based on input tensor dimensions
static std::string GetGemmSpec(const std::size_t m, static std::string GetGemmSpec(const std::size_t m,
const std::size_t n, const std::size_t n,
const std::size_t k, const std::size_t k,
...@@ -30,9 +31,40 @@ static std::string GetGemmSpec(const std::size_t m, ...@@ -30,9 +31,40 @@ static std::string GetGemmSpec(const std::size_t m,
return "ck::tensor_operation::device::GemmSpecialization::" + spec + "Padding"; return "ck::tensor_operation::device::GemmSpecialization::" + spec + "Padding";
} }
// function to update prologue/epilogue with user provided operation
void Operation_Xdl_CShuffle::update_prologue(const std::string& prologue)
{
if(!prologue.empty())
{
this->prologue = prologue;
this->cde_elem_op = "CDEElementOp";
}
else
{
this->prologue = "";
}
}
void Operation_Xdl_CShuffle::update_epilogue(const std::string& epilogue)
{
if(!epilogue.empty())
{
this->epilogue = epilogue;
this->cde_elem_op = "CDEElementOp";
}
else
{
this->epilogue = "";
}
}
// accounts for all possible combinations of Row/Col major
static Layout ToLayout(bool Trans) { return Trans ? Layout::Column : Layout::Row; } static Layout ToLayout(bool Trans) { return Trans ? Layout::Column : Layout::Row; }
std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(const Problem& prob) // Hard-code tuning parameters in modularized fashion, string them together into a vector of
// instances
std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(
const Problem& prob, const std::string& prologue, const std::string& epilogue)
{ {
std::vector<Operation_Xdl_CShuffle> result; std::vector<Operation_Xdl_CShuffle> result;
...@@ -155,6 +187,7 @@ std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(con ...@@ -155,6 +187,7 @@ std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(con
// clang-format on // clang-format on
}; };
// choose correct arrangement of tuning parameters based on the layout of each tensor
const auto a_block_descriptions = const auto a_block_descriptions =
prob.TransA ? a_block_descriptions_colmajor : a_block_descriptions_rowmajor; prob.TransA ? a_block_descriptions_colmajor : a_block_descriptions_rowmajor;
const auto b_block_descriptions = const auto b_block_descriptions =
...@@ -165,6 +198,7 @@ std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(con ...@@ -165,6 +198,7 @@ std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(con
assert(tile_descriptions.size() == cshuffle_descriptions.size()); assert(tile_descriptions.size() == cshuffle_descriptions.size());
assert(tile_descriptions.size() == c_block_descriptions.size()); assert(tile_descriptions.size() == c_block_descriptions.size());
// Put all values together into a single operation > store into the result vector
for(std::size_t i = 0; i < tile_descriptions.size(); i++) for(std::size_t i = 0; i < tile_descriptions.size(); i++)
{ {
Operation_Xdl_CShuffle x; Operation_Xdl_CShuffle x;
...@@ -188,12 +222,17 @@ std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(con ...@@ -188,12 +222,17 @@ std::vector<Operation_Xdl_CShuffle> Operation_Xdl_CShuffle::CreateOperations(con
x.tile_desc.m_per_block, x.tile_desc.m_per_block,
x.tile_desc.n_per_block, x.tile_desc.n_per_block,
x.tile_desc.k_per_block); x.tile_desc.k_per_block);
x.update_prologue(prologue);
x.update_epilogue(epilogue);
result.push_back(x); result.push_back(x);
} }
return result; return result;
} }
std::vector<std::vector<Operation_Xdl_CShuffle>> Operation_Xdl_CShuffle::CreateOperations() // set up instances when not provided with a problem specification, use default operation values and
// all possible layout combinations
std::vector<std::vector<Operation_Xdl_CShuffle>>
Operation_Xdl_CShuffle::CreateOperations(const std::string& prologue, const std::string& epilogue)
{ {
std::vector<Problem> problems; std::vector<Problem> problems;
for(bool TransA : {true, false}) for(bool TransA : {true, false})
...@@ -204,7 +243,8 @@ std::vector<std::vector<Operation_Xdl_CShuffle>> Operation_Xdl_CShuffle::CreateO ...@@ -204,7 +243,8 @@ std::vector<std::vector<Operation_Xdl_CShuffle>> Operation_Xdl_CShuffle::CreateO
prob.TransB = TransB; prob.TransB = TransB;
problems.push_back(prob); problems.push_back(prob);
} }
return Transform(problems, [](const Problem& p) { return CreateOperations(p); }); return Transform(problems,
[&](const Problem& p) { return CreateOperations(p, prologue, epilogue); });
} }
static const char* const DeviceGemmMultipleD_Xdl_CShuffleTemplate = static const char* const DeviceGemmMultipleD_Xdl_CShuffleTemplate =
...@@ -224,9 +264,20 @@ static const char* const DeviceGemmMultipleD_Xdl_CShuffleTemplate = ...@@ -224,9 +264,20 @@ static const char* const DeviceGemmMultipleD_Xdl_CShuffleTemplate =
"${CDEBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock}, " "${CDEBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock}, "
"${CDEBlockTransferScalarPerVector_NPerBlock}>"; "${CDEBlockTransferScalarPerVector_NPerBlock}>";
// use hardcoded instances from vector of operations to substitute values into instance template
Solution Operation_Xdl_CShuffle::ToSolution() const Solution Operation_Xdl_CShuffle::ToSolution() const
{ {
std::unordered_map<std::string, std::string> values = { std::unordered_map<std::string, std::string> values = {
{"name",
std::to_string(this->tile_desc.block_size) + "_" +
std::to_string(this->tile_desc.m_per_block) + "_" +
std::to_string(this->tile_desc.n_per_block) + "_" +
std::to_string(this->tile_desc.k_per_block) + "_" +
std::to_string(this->tile_desc.ak1) + "_" + std::to_string(this->tile_desc.bk1) + "_" +
std::to_string(this->tile_desc.m_per_XDL) + "_" +
std::to_string(this->tile_desc.n_per_XDL) + "_" +
std::to_string(this->tile_desc.m_Xdl_per_wave) + "_" +
std::to_string(this->tile_desc.n_Xdl_per_wave)},
{"LayoutA", ToString(this->A.layout)}, {"LayoutA", ToString(this->A.layout)},
{"LayoutB", ToString(this->B.layout)}, {"LayoutB", ToString(this->B.layout)},
{"LayoutDs", {"LayoutDs",
......
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_problem.hpp"
#include "ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_op.hpp"
#include "ck/host/utils.hpp"
#include <algorithm>
#include <iostream>
namespace ck {
namespace host {
namespace conv {
// return the relevant device op file based on the operation
// NOTE: this is a modified version of the original CK file that calls the kernel from a device
// function and makes the Argument class accessible on the device
std::string Problem_Conv_Fwd::GetIncludeHeader() const
{
return "ck/tensor_operation/gpu/device/impl/"
"codegen_device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp";
}
// return vector of forward convolution instances when provided with a problem instance
std::vector<Solution> Problem_Conv_Fwd::GetSolutions(const std::string& arch,
const std::string& prologue,
const std::string& epilogue) const
{
if(get_xdlop_archs().count(arch) == 0)
return {};
auto ops = ck::host::conv::Operation_Conv_Fwd_Xdl_Cshuffle::CreateOperations(
*this, prologue, epilogue);
std::vector<Solution> result;
std::transform(ops.begin(), ops.end(), std::back_inserter(result), [&](const auto& op) {
return op.ToSolution();
});
return result;
}
} // namespace conv
} // namespace host
} // namespace ck
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_op.hpp"
#include <iostream>
#include "ck/host/stringutils.hpp"
#include "ck/host/utils.hpp"
#include <cassert>
namespace ck {
namespace host {
namespace conv {
// calculate appropriate Gemm Specification based on input tensor dimensions
// NOTE: in CK, MNKPadding is always used for forward convolution
static std::string GetGemmSpec(const std::size_t m,
const std::size_t n,
const std::size_t k,
const std::size_t m_per_block,
const std::size_t n_per_block,
const std::size_t k_per_block)
{
std::string spec = "";
if(integer_divide_ceil(m, m_per_block) * m_per_block - m != 0)
spec += "M";
if(integer_divide_ceil(n, n_per_block) * n_per_block - n != 0)
spec += "N";
if(integer_divide_ceil(k, k_per_block) * k_per_block - k != 0)
spec += "K";
if(spec == "")
return "ck::tensor_operation::device::GemmSpecialization::Default";
return "ck::tensor_operation::device::GemmSpecialization::" + spec + "Padding";
}
// function to update prologue/epilogue with user provided operation
void Operation_Conv_Fwd_Xdl_Cshuffle::update_prologue(const std::string& prologue)
{
if(!prologue.empty())
{
this->prologue = prologue;
this->cde_elem_op = "CDEElementOp";
}
else
{
this->prologue = "";
}
}
void Operation_Conv_Fwd_Xdl_Cshuffle::update_epilogue(const std::string& epilogue)
{
if(!epilogue.empty())
{
this->epilogue = epilogue;
this->cde_elem_op = "CDEElementOp";
}
else
{
this->epilogue = "";
}
}
// Hard-code tuning parameters in modularized fashion, string them together into a vector of
// instances
std::vector<Operation_Conv_Fwd_Xdl_Cshuffle> Operation_Conv_Fwd_Xdl_Cshuffle::CreateOperations(
const Problem_Conv_Fwd& prob, const std::string& prologue, const std::string& epilogue)
{
std::vector<Operation_Conv_Fwd_Xdl_Cshuffle> result;
std::vector<operation::TileDesc> tile_descriptions = {
// clang-format off
// Block| MPer| NPer| KPer| AK1| BK1| MPer| NPer| MXdl| NXdl| NumGemmK|
// Size| Block| Block| Block| | | XDL| XDL| Per| Per| Prefetch|
// | | | | | | | | Wave| Wave| Stage|
// | | | | | | | | | | |
{ 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, 1},
{ 256, 128, 256, 32, 8, 8, 32, 32, 4, 2, 1},
{ 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, 1},
{ 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, 1},
{ 256, 256, 128, 32, 8, 8, 32, 32, 4, 2, 1},
{ 128, 128, 128, 32, 8, 8, 32, 32, 4, 2, 1}
// clang-format on
};
std::vector<operation::BlockTransferDesc> a_block_descriptions = {
// clang-format off
// ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockLds|
// ThreadCluster| ThreadCluster| SrcAccessOrder| SrcVectorDim| SrcScalar| DstScalar| AddExtraM|
// Lengths_K0_M_K1| ArrangeOrder| | | PerVector| PerVector_K1| |
// | | | | | | |
{ S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1},
{ S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1},
{ S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1},
{ S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1},
{ S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1},
{ S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1}
// clang-format on
};
std::vector<operation::BlockTransferDesc> b_block_descriptions = {
// clang-format off
// BBlockTransfer| BBlockTransfer| BBlockTransfer| BlockTransfer| BBlockTransfer| BBlockTransfer| BBlockLds|
// ThreadCluster| ThreadCluster| SrcAccessOrder| SrcVectorDim| SrcScalar| DstScalar| AddExtraN|
// Lengths_K0_N_K1| ArrangeOrder| | | PerVector| PerVector_K1| |
// | | | | | | |
{ S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1},
{ S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1},
{ S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1},
{ S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1},
{ S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1},
{ S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1}
// clang-format on
};
std::vector<operation::CShuffleDesc> cshuffle_descriptions = {
// clang-format off
// CShuffle| CShuffle|
// MXdlPerWave| NXdlPerWave|
// PerShuffle| PerShuffle|
// | |
{ 1, 1},
{ 1, 1},
{ 1, 1},
{ 1, 1},
{ 1, 1},
{ 1, 1}
// clang-format on
};
std::vector<operation::CBlockTransferDesc> c_block_descriptions = {
// clang-format off
// CBlockTransferClusterLengths| CBlockTransfer
// _MBlock_MWaveMPerXdl| ScalarPerVector
// _NBlock_NWaveNPerXdl| _NWaveNPerXdl
// |
{ S<1, 16, 1, 4>, 1},
{ S<1, 32, 1, 8>, 8},
{ S<1, 32, 1, 8>, 8},
{ S<1, 16, 1, 4>, 1},
{ S<1, 32, 1, 8>, 8},
{ S<1, 16, 1, 8>, 8}
// clang-format on
};
assert(tile_descriptions.size() == a_block_descriptions.size());
assert(tile_descriptions.size() == b_block_descriptions.size());
assert(tile_descriptions.size() == cshuffle_descriptions.size());
assert(tile_descriptions.size() == c_block_descriptions.size());
// Put all values together into a single operation > store into the result vector
for(std::size_t i = 0; i < tile_descriptions.size(); i++)
{
Operation_Conv_Fwd_Xdl_Cshuffle x;
x.NumDim = prob.NumDim;
x.tile_desc = tile_descriptions[i];
x.a_block_transfer = a_block_descriptions[i];
x.b_block_transfer = b_block_descriptions[i];
x.cshuffle = cshuffle_descriptions[i];
x.c_block_transfer = c_block_descriptions[i];
x.A = TensorDesc{prob.ADataType, prob.ALayout};
x.B = TensorDesc{prob.BDataType, prob.BLayout};
x.E = TensorDesc{prob.EDataType, prob.ELayout};
x.Ds = Transform(prob.DsLayout, prob.DsDataType, [](auto lo, auto dt) {
return TensorDesc{dt, lo};
});
x.a_elem_op = prob.AElementOp;
x.b_elem_op = prob.BElementOp;
x.cde_elem_op = prob.CDEElementOp;
x.update_prologue(prologue);
x.update_epilogue(epilogue);
result.push_back(x);
}
return result;
}
// set up instances when not provided with a problem specification, use default operation values
std::vector<Operation_Conv_Fwd_Xdl_Cshuffle>
Operation_Conv_Fwd_Xdl_Cshuffle::CreateOperations(const std::string& prologue,
const std::string& epilogue)
{
Problem_Conv_Fwd prob;
return CreateOperations(prob, prologue, epilogue);
}
static const char* const CopyDevice_ConvTemplate =
R"(
${Prologue}
${Epilogue}
using CDEElementOp = Epilogue;
using DeviceConv = ck::tensor_operation::device::CodegenDeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<${NumDim}, ${LayoutA}, ${LayoutB}, ${LayoutDs}, ${LayoutE}, ${ADataType}, ${BDataType}, ${AccDataType}, ${CShuffleDataType}, ${DsDataType}, ${EDataType}, ${AElementwiseOperation}, ${BElementwiseOperation}, ${CDEElementwiseOperation}, ${ConvSpecialization}, ${GemmSpecialization}, ${NumGemmkPrefetchStage}, ${BlockSize}, ${MPerBlock}, ${NPerBlock}, ${KPerBlock}, ${AK1}, ${BK1}, ${MPerXDL}, ${NPerXDL}, ${MXdlPerWave}, ${NXdlPerWave}, ${ABlockTransferThreadClusterLengths_AK0_M_AK1}, ${ABlockTransferThreadClusterArrangeOrder}, ${ABlockTransferSrcAccessOrder}, ${ABlockTransferSrcVectorDim}, ${ABlockTransferSrcScalarPerVector}, ${ABlockTransferDstScalarPerVector_AK1}, ${ABlockLdsExtraM}, ${BBlockTransferThreadClusterLengths_BK0_N_BK1}, ${BBlockTransferThreadClusterArrangeOrder}, ${BBlockTransferSrcAccessOrder}, ${BBlockTransferSrcVectorDim}, ${BBlockTransferSrcScalarPerVector}, ${BBlockTransferDstScalarPerVector_BK1}, ${BBlockLdsExtraN}, ${CShuffleMXdlPerWavePerShuffle}, ${CShuffleNXdlPerWavePerShuffle}, ${CDEBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock}, ${CDEBlockTransferScalarPerVector_NPerBlock}>;
constexpr ck::index_t NumATensor = ck::tensor_operation::device::GetNumABTensors<false, ${ADataType}>();
constexpr ck::index_t NumBTensor = ck::tensor_operation::device::GetNumABTensors<false, ${BDataType}>();
extern "C" __global__ void run_${name}(
const ${ADataType}* in_dev,
const ${BDataType}* wei_dev,
${EDataType}* __restrict__ out_dev,
ck::Array<ck::index_t, ${NumDim} + 3> in_lengths,
ck::Array<ck::index_t, ${NumDim} + 3> in_strides,
ck::Array<ck::index_t, ${NumDim} + 3> wei_lengths,
ck::Array<ck::index_t, ${NumDim} + 3> wei_strides,
ck::Array<ck::index_t, ${NumDim} + 3> out_lengths,
ck::Array<ck::index_t, ${NumDim} + 3> out_strides,
ck::Array<ck::index_t, ${NumDim}> conv_filter_strides,
ck::Array<ck::index_t, ${NumDim}> conv_filter_dilations,
ck::Array<ck::index_t, ${NumDim}> input_left_pads,
ck::Array<ck::index_t, ${NumDim}> input_right_pads,
const ${AElementwiseOperation} a_element_op,
const ${BElementwiseOperation} b_element_op,
const ${CDEElementwiseOperation} cde_element_op
){
auto arg = DeviceConv::Argument(in_dev,
wei_dev,
ck::Array<const void*, 0>{},
out_dev,
in_lengths,
in_strides,
wei_lengths,
wei_strides,
ck::Array<ck::Array<ck::index_t, ${NumDim} + 3>, 0>{},
ck::Array<ck::Array<ck::index_t, ${NumDim} + 3>, 0>{},
out_lengths,
out_strides,
conv_filter_strides,
conv_filter_dilations,
input_left_pads,
input_right_pads,
${AElementwiseOperation}{},
${BElementwiseOperation}{},
${CDEElementwiseOperation}{1.0f, 1.0f});
constexpr ck::LoopScheduler LoopSched = ck::make_default_loop_scheduler();
// GridwiseGemm
using GridwiseGemm = DeviceConv::GridwiseGemm;
static constexpr auto I0 = ck::Number<0>{};
ck::tensor_operation::device::device_grouped_conv_fwd_multiple_abd_xdl_cshuffle<
GridwiseGemm,
const ${ADataType}*,
const ${BDataType}*,
typename GridwiseGemm::DsGridPointer,
${EDataType},
${AElementwiseOperation},
${BElementwiseOperation},
${CDEElementwiseOperation},
DeviceConv::AGridDesc_AK0_M_AK1,
DeviceConv::BGridDesc_BK0_N_BK1,
DeviceConv::DsGridDesc_MBlock_MPerBlock_NBlock_NPerBlock,
DeviceConv::EGridDesc_MBlock_MPerBlock_NBlock_NPerBlock,
DeviceConv::Block2ETileMap,
ck::tensor_operation::device::ComputePtrOffsetOfStridedBatch<NumATensor, NumBTensor, 0>,
ck::integral_constant<bool, true>{},
false,
false>
(
arg.p_as_grid_.At(I0),
arg.p_bs_grid_.At(I0),
arg.p_ds_grid_,
arg.p_e_grid_,
arg.a_element_op_,
arg.b_element_op_,
arg.cde_element_op_,
arg.a_g_n_c_wis_lengths_[0], // Group count
arg.a_grid_desc_ak0_m_ak1_,
arg.b_grid_desc_bk0_n_bk1_,
arg.ds_grid_desc_mblock_mperblock_nblock_nperblock_,
arg.e_grid_desc_mblock_mperblock_nblock_nperblock_,
arg.block_2_etile_map_,
arg.compute_ptr_offset_of_batch_
);
}
)";
// use hardcoded instances from vector of operations to substitute values into instance template
Solution Operation_Conv_Fwd_Xdl_Cshuffle::ToSolution() const
{
std::unordered_map<std::string, std::string> values = {
{"name",
std::to_string(this->tile_desc.block_size) + "_" +
std::to_string(this->tile_desc.m_per_block) + "_" +
std::to_string(this->tile_desc.n_per_block) + "_" +
std::to_string(this->tile_desc.k_per_block) + "_" +
std::to_string(this->tile_desc.ak1) + "_" + std::to_string(this->tile_desc.bk1) + "_" +
std::to_string(this->tile_desc.m_per_XDL) + "_" +
std::to_string(this->tile_desc.n_per_XDL) + "_" +
std::to_string(this->tile_desc.m_Xdl_per_wave) + "_" +
std::to_string(this->tile_desc.n_Xdl_per_wave)},
{"NumDim", std::to_string(this->NumDim)},
{"LayoutA", ToString(this->A.layout)},
{"LayoutB", ToString(this->B.layout)},
{"LayoutDs",
MakeTuple(Transform(this->Ds, [](auto tensor) { return ToString(tensor.layout); }))},
{"LayoutE", ToString(this->E.layout)},
{"ADataType", ToString(this->A.element)},
{"BDataType", ToString(this->B.element)},
{"AccDataType", ToString(this->acc)},
{"ComputeDataType", ToString(this->A.element)},
{"CShuffleDataType", ToString(this->cs_type)},
{"DsDataType",
MakeTuple(Transform(this->Ds, [](auto tensor) { return ToString(tensor.element); }))},
{"EDataType", ToString(this->E.element)},
{"AElementwiseOperation", this->a_elem_op},
{"BElementwiseOperation", this->b_elem_op},
{"CDEElementwiseOperation", this->cde_elem_op},
{"Prologue", this->prologue},
{"Epilogue", this->epilogue},
{"ConvSpecialization", this->conv_specialization},
{"GemmSpecialization", this->gemm_specialization},
{"NumGemmkPrefetchStage", std::to_string(this->tile_desc.num_gemmk_prefetch_stage)},
{"BlockSize", std::to_string(this->tile_desc.block_size)},
{"MPerBlock", std::to_string(this->tile_desc.m_per_block)},
{"NPerBlock", std::to_string(this->tile_desc.n_per_block)},
{"KPerBlock", std::to_string(this->tile_desc.k_per_block)},
{"AK1", std::to_string(this->tile_desc.ak1)},
{"BK1", std::to_string(this->tile_desc.bk1)},
{"MPerXDL", std::to_string(this->tile_desc.m_per_XDL)},
{"NPerXDL", std::to_string(this->tile_desc.n_per_XDL)},
{"MXdlPerWave", std::to_string(this->tile_desc.m_Xdl_per_wave)},
{"NXdlPerWave", std::to_string(this->tile_desc.n_Xdl_per_wave)},
{"ABlockTransferThreadClusterLengths_AK0_M_AK1",
this->a_block_transfer.thread_cluster_length},
{"ABlockTransferThreadClusterArrangeOrder",
this->a_block_transfer.thread_cluster_arrange_order},
{"ABlockTransferSrcAccessOrder", this->a_block_transfer.src_access_order},
{"ABlockTransferSrcVectorDim", std::to_string(this->a_block_transfer.src_vec_dim)},
{"ABlockTransferSrcScalarPerVector",
std::to_string(this->a_block_transfer.src_scalar_per_vector)},
{"ABlockTransferDstScalarPerVector_AK1",
std::to_string(this->a_block_transfer.dst_scalar_per_vector_k1)},
{"ABlockLdsExtraM", std::to_string(this->a_block_transfer.lds_add_extra_dim)},
{"BBlockTransferThreadClusterLengths_BK0_N_BK1",
this->b_block_transfer.thread_cluster_length},
{"BBlockTransferThreadClusterArrangeOrder",
this->b_block_transfer.thread_cluster_arrange_order},
{"BBlockTransferSrcAccessOrder", this->b_block_transfer.src_access_order},
{"BBlockTransferSrcVectorDim", std::to_string(this->b_block_transfer.src_vec_dim)},
{"BBlockTransferSrcScalarPerVector",
std::to_string(this->b_block_transfer.src_scalar_per_vector)},
{"BBlockTransferDstScalarPerVector_BK1",
std::to_string(this->b_block_transfer.dst_scalar_per_vector_k1)},
{"BBlockLdsExtraN", std::to_string(this->b_block_transfer.lds_add_extra_dim)},
{"CShuffleMXdlPerWavePerShuffle",
std::to_string(this->cshuffle.m_Xdl_per_wave_per_shuffle)},
{"CShuffleNXdlPerWavePerShuffle",
std::to_string(this->cshuffle.n_Xdl_per_wave_per_shuffle)},
{"CDEBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock",
this->c_block_transfer.cluster_lengths_m_block_m_wave_m_per_Xdl_n_block_n_wave_n_per_Xdl},
{"CDEBlockTransferScalarPerVector_NPerBlock",
std::to_string(this->c_block_transfer.scalar_per_vector_n_wave_n_per_Xdl)},
};
return Solution{InterpolateString(CopyDevice_ConvTemplate, values), std::move(values)};
}
} // namespace conv
} // namespace host
} // namespace ck
...@@ -29,12 +29,20 @@ std::string ToString(DataType dt) ...@@ -29,12 +29,20 @@ std::string ToString(DataType dt)
throw std::runtime_error("Incorrect data type"); throw std::runtime_error("Incorrect data type");
} }
Layout ToLayout(bool Trans) { return Trans ? Layout::Column : Layout::Row; }
std::string ToString(Layout dl) std::string ToString(Layout dl)
{ {
switch(dl) switch(dl)
{ {
case Layout::Row: return "ck::tensor_layout::gemm::RowMajor"; case Layout::Row: return "ck::tensor_layout::gemm::RowMajor";
case Layout::Column: return "ck::tensor_layout::gemm::ColumnMajor"; case Layout::Column: return "ck::tensor_layout::gemm::ColumnMajor";
case Layout::GKCYX: return "ck::tensor_layout::convolution::GKCYX";
case Layout::GKYXC: return "ck::tensor_layout::convolution::GKYXC";
case Layout::GNHWK: return "ck::tensor_layout::convolution::GNHWK";
case Layout::GNHWC: return "ck::tensor_layout::convolution::GNHWC";
case Layout::NHWGC: return "ck::tensor_layout::convolution::NHWGC";
case Layout::NHWGK: return "ck::tensor_layout::convolution::NHWGK";
} }
throw std::runtime_error("Incorrect layout"); throw std::runtime_error("Incorrect layout");
} }
......
// SPDX-License-Identifier: MIT // SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. // Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/host/utils.hpp" #include "ck/host/utils.hpp"
......
list(APPEND CMAKE_PREFIX_PATH /opt/rocm) list(APPEND CMAKE_PREFIX_PATH /opt/rocm)
add_subdirectory(rtc) add_subdirectory(rtc)
file(GLOB TEST_SRCS CONFIGURE_DEPENDS *.cpp) file(GLOB TEST_SRCS CONFIGURE_DEPENDS *.cpp)
foreach(TEST_SRC ${TEST_SRCS}) foreach(TEST_SRC ${TEST_SRCS})
get_filename_component(BASE_NAME ${TEST_SRC} NAME_WE) set_source_files_properties(${TEST_SRC} PROPERTIES LANGUAGE HIP)
rocm_add_test_executable(test_host_${BASE_NAME} ${TEST_SRC}) get_filename_component(BASE_NAME ${TEST_SRC} NAME_WE)
target_link_libraries(test_host_${BASE_NAME} ck_rtc ck_host) rocm_add_test_executable(test_host_${BASE_NAME} ${TEST_SRC})
target_include_directories(test_host_${BASE_NAME} PUBLIC include()) target_link_libraries(test_host_${BASE_NAME} ck_rtc ck_host)
# target_link_libraries(test_host_${BASE_NAME} ${CK_ROOT}/build/lib/libutility.a)
target_include_directories(test_host_${BASE_NAME} PUBLIC include())
target_include_directories(test_host_${BASE_NAME} PUBLIC ${CK_ROOT}/include)
target_include_directories(test_host_${BASE_NAME} PUBLIC ${CK_ROOT}/library/include)
endforeach() endforeach()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment