"...include/git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "5aa4e6868f0a260603e521c47650c909ad0a2d8f"
Commit 9b0fc65a authored by Alan Turner's avatar Alan Turner
Browse files

Merge remote-tracking branch 'origin/ck-host-lib' into ck-gemm-api

parents 5cef60b8 031ccf5f
...@@ -7,17 +7,17 @@ This workflow adds pull requests and issues to a specific GitHub project board w ...@@ -7,17 +7,17 @@ This workflow adds pull requests and issues to a specific GitHub project board w
</p> </p>
- ## Trigger - ## Trigger
The workflow is triggered by the following events: The workflow is triggered by the following events:
> - A pull request being opened. - A pull request being opened.
> - An issue being opened. - An issue being opened.
- ## Jobs - ## Jobs
The workflow has a single job named `add-to-project`. The following step is executed in this job: The workflow has a single job named `add-to-project`. The following step is executed in this job:
> - The `add-to-project` job uses the `actions/add-to-project@v0.4.0` action to add pull requests and issues to a specific project board. The `with` parameters are `project-url` and `github-token`, which specify the URL of the project board and the GitHub token used to authenticate the action. - The `add-to-project` job uses the `actions/add-to-project@v0.4.0` action to add pull requests and issues to a specific project board. The `with` parameters are `project-url` and `github-token`, which specify the URL of the project board and the GitHub token used to authenticate the action.
For more details, please refer to the [add-to-project.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/add-to-project.yaml) file in the repository. For more details, please refer to the [add-to-project.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/add-to-project.yaml) file in the repository.
--- ---
## `benchmark.yaml` ## `benchmark.yaml`
...@@ -27,21 +27,21 @@ This workflow runs the `MiGraphX performance benchmarks` and generates reports b ...@@ -27,21 +27,21 @@ This workflow runs the `MiGraphX performance benchmarks` and generates reports b
</p> </p>
- ## Trigger - ## Trigger
TODO: Update [benchmarks.yml (archived)](https://github.com/ROCmSoftwarePlatform/actions/blob/main/.github/workflows/benchmarks.yml) link after workflow is updated TODO: Update [benchmarks.yml (archived)](https://github.com/ROCmSoftwarePlatform/actions/blob/main/.github/workflows/benchmarks.yml) link after workflow is updated
> The workflow is triggered manually through the "Run workflow" button in the Actions tab of the repository and it will run reusable workflow [benchmarks.yml (archived)](https://github.com/ROCmSoftwarePlatform/actions/blob/main/.github/workflows/benchmarks.yml) - The workflow is triggered manually through the "Run workflow" button in the Actions tab of the repository and it will run reusable workflow [benchmarks.yml (archived)](https://github.com/ROCmSoftwarePlatform/actions/blob/main/.github/workflows/benchmarks.yml)
- ## Input Parameters - ## Input Parameters
The workflow uses the following input parameters: The workflow uses the following input parameters:
> - `rocm_version`: the version of ROCm to use for running the benchmarks. - `rocm_version`: the version of ROCm to use for running the benchmarks.
> - `script_repo`: repository that contains the benchmark scripts. - `script_repo`: repository that contains the benchmark scripts.
> - `result_path`: the path where benchmark results will be stored. - `result_path`: the path where benchmark results will be stored.
> - `result_repo`: the repository where the benchmark results will be pushed for comparison. - `result_repo`: the repository where the benchmark results will be pushed for comparison.
For more details, please refer to the [benchmark.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/benchmark.yaml) file in the repository. For more details, please refer to the [benchmark.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/benchmark.yaml) file in the repository.
--- ---
...@@ -52,36 +52,35 @@ Overall, this workflow automates the process of building and testing the AMDMIGr ...@@ -52,36 +52,35 @@ Overall, this workflow automates the process of building and testing the AMDMIGr
</p> </p>
- ## Trigger - ## Trigger
The workflow is triggered by the following events: The workflow is triggered by the following events:
> - A pull request being opened, synchronized or closed. - A pull request being opened, synchronized or closed.
> - On push to the `develop`, `master`, and `release/**` branches. - On push to the `develop`, `master`, and `release/**` branches.
- ## Jobs - ## Jobs
The following jobs are executed in the workflow: The following jobs are executed in the workflow:
> - `cancel`: This job is responsible for canceling any previous runs of the workflow that may still be running. It runs on an `ubuntu-latest` runner and uses the `styfle/cancel-workflow-action` action to cancel any previous runs of the workflow. - `cancel`: This job is responsible for canceling any previous runs of the workflow that may still be running. It runs on an `ubuntu-latest` runner and uses the `styfle/cancel-workflow-action` action to cancel any previous runs of the workflow.
> - `tidy`: It runs on an `ubuntu-20.04` runner and runs `clang-tidy` for the codebase in a Docker container with the MIGraphX build environment. - `tidy`: It runs on an `ubuntu-20.04` runner and runs `clang-tidy` for the codebase in a Docker container with the MIGraphX build environment.
> - `cppcheck`: It runs on an `ubuntu-20.04` runner and performs static analysis on code in a Docker container, and caches the results for faster subsequent runs. - `cppcheck`: It runs on an `ubuntu-20.04` runner and performs static analysis on code in a Docker container, and caches the results for faster subsequent runs.
> - `format`: It runs on an `ubuntu-20.04` runner and includes steps for freeing up disk space, caching Docker layers, and checking code formatting. - `format`: It runs on an `ubuntu-20.04` runner and includes steps for freeing up disk space, caching Docker layers, and checking code formatting.
> - `pyflakes`: It runs on an `ubuntu-20.04` runner and runs the Pyflakes static analysis tool to detect and report Python code issues. - `pyflakes`: It runs on an `ubuntu-20.04` runner and runs the Pyflakes static analysis tool to detect and report Python code issues.
> - `licensing`: It runs on an `ubuntu-20.04` runner and includes steps to free up space, checkout the code, set up Python and run a license check using a Python script. - `licensing`: It runs on an `ubuntu-20.04` runner and includes steps to free up space, checkout the code, set up Python and run a license check using a Python script.
---
We have 2 jobs with multiple matrix configurations, both of them are running on `ubuntu-20.04` runner but right now only `linux` works on all 3 configurations (debug, release, codecov) ,`linux-fpga` works just on (debug). We have 2 jobs with multiple matrix configurations, both of them are running on `ubuntu-20.04` runner but right now only `linux` works on all 3 configurations (debug, release, codecov) ,`linux-fpga` works just on (debug).
---
> - `linux`: this job runs continuous integration tests for AMDMIGraphX on a Linux operating system. It tests a variety of build configurations to ensure code quality and compatibility.
> - `linux-fpga`: this job builds and tests AMDMIGraphX on a Linux operating system with support for FPGA acceleration. It includes additional steps to verify FPGA functionality and performance. - `linux`: this job runs continuous integration tests for AMDMIGraphX on a Linux operating system. It tests a variety of build configurations to ensure code quality and compatibility.
- `linux-fpga`: this job builds and tests AMDMIGraphX on a Linux operating system with support for FPGA acceleration. It includes additional steps to verify FPGA functionality and performance.
For more details, please refer to the [ci.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/ci.yaml) file in the repository. For more details, please refer to the [ci.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/ci.yaml) file in the repository.
--- ---
## `clean-closed-pr-caches.yaml` ## `clean-closed-pr-caches.yaml`
...@@ -91,17 +90,17 @@ This workflow has purpose to clean up any cached data related to the pull reques ...@@ -91,17 +90,17 @@ This workflow has purpose to clean up any cached data related to the pull reques
</p> </p>
- ## Trigger - ## Trigger
The workflow is triggered by the following events: The workflow is triggered by the following events:
> - A pull request being closed. - A pull request being closed.
- ## Jobs - ## Jobs
The workflow has a single job named `cleanup`. The following steps are executed in this job: The workflow has a single job named `cleanup`. The following steps are executed in this job:
> - `Check out code`: step checks out the codebase from the repository. - `Check out code`: step checks out the codebase from the repository.
> - `Cleanup`: step performs the actual cache cleanup using a series of commands. - `Cleanup`: step performs the actual cache cleanup using a series of commands.
For more details, please refer to the [clean-closed-pr-caches.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/clean-closed-pr-caches.yaml) file in the repository. For more details, please refer to the [clean-closed-pr-caches.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/clean-closed-pr-caches.yaml) file in the repository.
--- ---
...@@ -112,22 +111,22 @@ This workflow generates a report of the MiGraphX benchmark results between two d ...@@ -112,22 +111,22 @@ This workflow generates a report of the MiGraphX benchmark results between two d
</p> </p>
- ## Trigger - ## Trigger
> The workflow is triggered manually through the "Run workflow" button in the Actions tab of the repository and it will run reusable workflow [history.yml](https://github.com/ROCmSoftwarePlatform/migraphx-benchmark/blob/main/.github/workflows/history.yml) - The workflow is triggered manually through the "Run workflow" button in the Actions tab of the repository and it will run reusable workflow [history.yml](https://github.com/ROCmSoftwarePlatform/migraphx-benchmark/blob/main/.github/workflows/history.yml)
- ## Input Parameters - ## Input Parameters
The workflow requires the following inputs: The workflow requires the following inputs:
> - `start_date`: Start date for results analysis. - `start_date`: Start date for results analysis.
> - `end_date`: End date for results analysis. - `end_date`: End date for results analysis.
> - `history_repo`: Repository for history results between dates. - `history_repo`: Repository for history results between dates.
> - `benchmark_utils_repo`: Repository where benchmark utils are stored. - `benchmark_utils_repo`: Repository where benchmark utils are stored.
> - `organization`: Organization based on which location of files will be different. - `organization`: Organization based on which location of files will be different.
For more details, please refer to the [history.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/history.yaml) file in the repository. For more details, please refer to the [history.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/history.yaml) file in the repository.
--- ---
## `performance.yaml` ## `performance.yaml`
...@@ -137,32 +136,32 @@ This workflow runs performance tests on the MIGraphX repository and generates a ...@@ -137,32 +136,32 @@ This workflow runs performance tests on the MIGraphX repository and generates a
</p> </p>
- ## Trigger - ## Trigger
The workflow will run reusable workflow [perf-test.yml](https://github.com/ROCmSoftwarePlatform/migraphx-benchmark/blob/main/.github/workflows/perf-test.yml) by the following events: The workflow will run reusable workflow [perf-test.yml](https://github.com/ROCmSoftwarePlatform/migraphx-benchmark/blob/main/.github/workflows/perf-test.yml) by the following events:
> - Pull requests opened, synchronized or closed on the `develop` branch. - Pull requests opened, synchronized or closed on the `develop` branch.
> - Schedule: Runs every day of the week from Monday to Saturday at 6:00 AM. - Schedule: Runs every day of the week from Monday to Saturday at 6:00 AM.
> - Manual trigger through the "Run workflow" button in the Actions tab of the repository. - Manual trigger through the "Run workflow" button in the Actions tab of the repository.
- ## Input Parameters - ## Input Parameters
The workflow requires the following inputs: The workflow requires the following inputs:
> - `rocm_release`: ROCm version to use for the performance tests. - `rocm_release`: ROCm version to use for the performance tests.
> - `performance_reports_repo`: Repository where the performance reports are stored. - `performance_reports_repo`: Repository where the performance reports are stored.
> - `benchmark_utils_repo`: Repository where the benchmark utilities are stored. - `benchmark_utils_repo`: Repository where the benchmark utilities are stored.
> - `organization`: Organization based on which location of files will be different. - `organization`: Organization based on which location of files will be different.
> - `result_number`: Last N results. - `result_number`: Last N results.
> - `model_timeout`: If a model in the performance test script passes this threshold, it will be skipped. - `model_timeout`: If a model in the performance test script passes this threshold, it will be skipped.
> - `flags`: Command line arguments to be passed to the performance test script. Default is `-r`. - `flags`: Command line arguments to be passed to the performance test script. Default is `-r`.
For more details, please refer to the [performance.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/performance.yaml) file in the repository. For more details, please refer to the [performance.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/performance.yaml) file in the repository.
--- ---
## `rocm-image-release.yaml` ## `rocm-image-release.yaml`
...@@ -172,26 +171,26 @@ This workflow builds a Docker image for a specified ROCm release version and pus ...@@ -172,26 +171,26 @@ This workflow builds a Docker image for a specified ROCm release version and pus
</p> </p>
- ## Trigger - ## Trigger
> The workflow is triggered manually through the "Run workflow" button in the Actions tab of the repository and it will run reusable workflow [rocm-release.yml](https://github.com/ROCmSoftwarePlatform/migraphx-benchmark/blob/main/.github/workflows/rocm-release.yml) - The workflow is triggered manually through the "Run workflow" button in the Actions tab of the repository and it will run reusable workflow [rocm-release.yml](https://github.com/ROCmSoftwarePlatform/migraphx-benchmark/blob/main/.github/workflows/rocm-release.yml)
- ## Input Parameters - ## Input Parameters
The workflow requires the following inputs: The workflow requires the following inputs:
> - `rocm_release`: ROCm release version to build Docker image for. - `rocm_release`: ROCm release version to build Docker image for.
> - `benchmark_utils_repo`: Repository where benchmark utils are stored. - `benchmark_utils_repo`: Repository where benchmark utils are stored.
> - `base_image`: Base image for ROCm Docker build. - `base_image`: Base image for ROCm Docker build.
> - `docker_image`: Docker image name for ROCm Docker build. - `docker_image`: Docker image name for ROCm Docker build.
> - `build_navi`: Build number for the Navi architecture. - `build_navi`: Build number for the Navi architecture.
> - `organization`: The organization name used to determine the location of files. - `organization`: The organization name used to determine the location of files.
> - `overwrite`: Specify whether to overwrite the Docker image if it already exists. - `overwrite`: Specify whether to overwrite the Docker image if it already exists.
For more details, please refer to the [rocm-image-release.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/rocm-image-release.yaml) file in the repository. For more details, please refer to the [rocm-image-release.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/rocm-image-release.yaml) file in the repository.
--- ---
...@@ -202,24 +201,24 @@ This workflow updates a file with the latest commit hash then creates a pull req ...@@ -202,24 +201,24 @@ This workflow updates a file with the latest commit hash then creates a pull req
</p> </p>
- ## Trigger - ## Trigger
The workflow is triggered by the following events: The workflow is triggered by the following events:
> - Schedule: Runs every week on Friday at 05:07 PM. - Schedule: Runs every week on Friday at 05:07 PM.
- ## Jobs - ## Jobs
The workflow has a single job named `Update and create pull request`. The following steps are executed in this job: The workflow has a single job named `Update and create pull request`. The following steps are executed in this job:
> - `get_date`: step sets an environment variable to the current date in the format 'YYYY-MM-DD'. - `get_date`: step sets an environment variable to the current date in the format 'YYYY-MM-DD'.
> - `extract_sha1`: step fetches the latest SHA1 commit hash of the HEAD branch of the `microsoft/onnxruntime` repository and sets it as an environment variable. - `extract_sha1`: step fetches the latest SHA1 commit hash of the HEAD branch of the `microsoft/onnxruntime` repository and sets it as an environment variable.
> - `echo_sha1`: step prints the SHA1 commit hash set in step `extract_sha1`. - `echo_sha1`: step prints the SHA1 commit hash set in step `extract_sha1`.
> - `actions/checkout@v3`: step checks out the codebase from the repository. - `actions/checkout@v3`: step checks out the codebase from the repository.
> - `update_file`: step updates a file in the repository with the SHA1 commit hash fetched in step `extract_sha1`. - `update_file`: step updates a file in the repository with the SHA1 commit hash fetched in step `extract_sha1`.
> - `Make changes to pull request`: step uses the `peter-evans/create-pull-request` action to create a pull request. - `Make changes to pull request`: step uses the `peter-evans/create-pull-request` action to create a pull request.
For more details, please refer to the [sync-onnxrt-main.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/sync-onnxrt-main.yaml) file in the repository. For more details, please refer to the [sync-onnxrt-main.yaml](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/.github/workflows/sync-onnxrt-main.yaml) file in the repository.
--- ---
...@@ -28,4 +28,4 @@ ROCmSoftwarePlatform/half@rocm-5.4.2 ...@@ -28,4 +28,4 @@ ROCmSoftwarePlatform/half@rocm-5.4.2
pybind/pybind11@d159a563383d10c821ba7b2a71905d1207db6de4 --build pybind/pybind11@d159a563383d10c821ba7b2a71905d1207db6de4 --build
msgpack/msgpack-c@cpp-3.3.0 -DMSGPACK_BUILD_TESTS=Off msgpack/msgpack-c@cpp-3.3.0 -DMSGPACK_BUILD_TESTS=Off
sqlite3@3.17 -DCMAKE_POSITION_INDEPENDENT_CODE=On sqlite3@3.17 -DCMAKE_POSITION_INDEPENDENT_CODE=On
ROCmSoftwarePlatform/composable_kernel@bef0cb20dba0d9b315df46899310478a81c21852 -X header #ROCmSoftwarePlatform/composable_kernel@bef0cb20dba0d9b315df46899310478a81c21852 -X header
...@@ -46,6 +46,7 @@ def shape_type_wrap(p): ...@@ -46,6 +46,7 @@ def shape_type_wrap(p):
def auto_handle(*args, **kwargs): def auto_handle(*args, **kwargs):
def with_handle(f): def with_handle(f):
return api.handle('migraphx_' + f.__name__, 'migraphx::' + f.__name__, return api.handle('migraphx_' + f.__name__, 'migraphx::' + f.__name__,
*args, **kwargs)(f) *args, **kwargs)(f)
......
...@@ -213,28 +213,22 @@ void from_value_impl(rank<6>, const value& v, optional<T>& x) ...@@ -213,28 +213,22 @@ void from_value_impl(rank<6>, const value& v, optional<T>& x)
x = from_value<T>(v); x = from_value<T>(v);
} }
template <class T, MIGRAPHX_REQUIRES(std::is_arithmetic<T>{})> template <class T, MIGRAPHX_REQUIRES(std::is_arithmetic<T>{} or std::is_enum<T>{})>
void from_value_impl(rank<7>, const value& v, T& x) void from_value_impl(rank<7>, const value& v, T& x)
{ {
x = v.to<T>(); x = v.to<T>();
} }
template <class T, MIGRAPHX_REQUIRES(std::is_enum<T>{})> inline void from_value_impl(rank<8>, const value& v, std::string& x) { x = v.to<std::string>(); }
void from_value_impl(rank<8>, const value& v, T& x)
{
x = v.to<T>();
}
inline void from_value_impl(rank<9>, const value& v, std::string& x) { x = v.to<std::string>(); }
template <class T> template <class T>
auto from_value_impl(rank<10>, const value& v, T& x) -> decltype(x.from_value(v), void()) auto from_value_impl(rank<9>, const value& v, T& x) -> decltype(x.from_value(v), void())
{ {
x.from_value(v); x.from_value(v);
} }
template <class T> template <class T>
auto from_value_impl(rank<11>, const value& v, T& x) -> decltype(migraphx_from_value(v, x), void()) auto from_value_impl(rank<10>, const value& v, T& x) -> decltype(migraphx_from_value(v, x), void())
{ {
migraphx_from_value(v, x); migraphx_from_value(v, x);
} }
...@@ -250,7 +244,7 @@ value to_value(const T& x) ...@@ -250,7 +244,7 @@ value to_value(const T& x)
template <class T> template <class T>
void from_value(const value& v, T& x) void from_value(const value& v, T& x)
{ {
detail::from_value_impl(rank<11>{}, v, x); detail::from_value_impl(rank<10>{}, v, x);
} }
} // namespace MIGRAPHX_INLINE_NS } // namespace MIGRAPHX_INLINE_NS
......
...@@ -243,8 +243,10 @@ else() ...@@ -243,8 +243,10 @@ else()
message(STATUS "MIOpen does not have find mode api") message(STATUS "MIOpen does not have find mode api")
endif() endif()
find_package(composable_kernel 1.0.0 COMPONENTS jit_library REQUIRED)
target_link_libraries(migraphx_gpu PUBLIC migraphx MIOpen roc::rocblas) target_link_libraries(migraphx_gpu PUBLIC migraphx MIOpen roc::rocblas)
target_link_libraries(migraphx_gpu PRIVATE migraphx_device migraphx_kernels) target_link_libraries(migraphx_gpu PRIVATE migraphx_device migraphx_kernels composable_kernel::jit_library)
add_subdirectory(driver) add_subdirectory(driver)
add_subdirectory(hiprtc) add_subdirectory(hiprtc)
......
...@@ -112,7 +112,8 @@ const std::vector<std::string>& compiler_warnings() ...@@ -112,7 +112,8 @@ const std::vector<std::string>& compiler_warnings()
"-Wno-sign-compare", "-Wno-sign-compare",
"-Wno-unused-command-line-argument", "-Wno-unused-command-line-argument",
"-Wno-weak-vtables", "-Wno-weak-vtables",
"-Wno-c99-extensions"}; "-Wno-c99-extensions",
"-Wno-global-constructors"};
return warnings; return warnings;
} }
...@@ -171,6 +172,19 @@ operation compile_hip_code_object(const std::string& content, hip_compile_option ...@@ -171,6 +172,19 @@ operation compile_hip_code_object(const std::string& content, hip_compile_option
auto path = fs::path{"migraphx"} / "kernels" / name; auto path = fs::path{"migraphx"} / "kernels" / name;
return src_file{path, c}; return src_file{path, c};
}); });
if(not options.embedded_headers.empty())
{
std::transform(options.embedded_headers.begin(),
options.embedded_headers.end(),
std::back_inserter(srcs),
[](auto&& p) {
auto&& name = p.first;
auto&& c = p.second;
auto path = fs::path{"migraphx"} / "kernels" / name;
return src_file{path, c};
});
}
srcs.push_back(src_file{fs::path{"main.cpp"}, srcs.push_back(src_file{fs::path{"main.cpp"},
std::make_pair(content.data(), content.data() + content.size())}); std::make_pair(content.data(), content.data() + content.size())});
auto args_hpp = auto args_hpp =
......
...@@ -42,6 +42,7 @@ struct hip_compile_options ...@@ -42,6 +42,7 @@ struct hip_compile_options
std::string kernel_name = "kernel"; std::string kernel_name = "kernel";
std::string params = ""; std::string params = "";
std::vector<shape> virtual_inputs = {}; std::vector<shape> virtual_inputs = {};
std::unordered_map<std::string, std::pair<const char*, const char*>> embedded_headers;
/** /**
* @brief Set the launch parameters but allow v to override the values * @brief Set the launch parameters but allow v to override the values
......
...@@ -38,8 +38,7 @@ ...@@ -38,8 +38,7 @@
#include <migraphx/env.hpp> #include <migraphx/env.hpp>
#include <migraphx/file_buffer.hpp> #include <migraphx/file_buffer.hpp>
const std::vector<std::string>& #include "ck/include/device_gemm_multiple_d.hpp"
get_instance(std::size_t i, const std::function<bool(const std::vector<std::string>&)>& pred);
namespace migraphx { namespace migraphx {
inline namespace MIGRAPHX_INLINE_NS { inline namespace MIGRAPHX_INLINE_NS {
...@@ -58,6 +57,7 @@ static const char* const ck_gemm_kernel = R"__migraphx__( ...@@ -58,6 +57,7 @@ static const char* const ck_gemm_kernel = R"__migraphx__(
#include <args.hpp> #include <args.hpp>
#include <migraphx/kernels/ck_gemm.hpp> #include <migraphx/kernels/ck_gemm.hpp>
#include <migraphx/kernels/pointwise.hpp> #include <migraphx/kernels/pointwise.hpp>
#include <migraphx/kernels/${include}>
namespace migraphx { namespace migraphx {
...@@ -68,7 +68,7 @@ extern "C" { ...@@ -68,7 +68,7 @@ extern "C" {
__global__ void ${kernel}(${params}) __global__ void ${kernel}(${params})
{ {
transform_args(make_tensors(), rotate_last())(${args})([](auto... xs) { transform_args(make_tensors(), rotate_last())(${args})([](auto... xs) {
ck_gemm<CK_DeviceGemmMultipleD<${instance}>, ${blocks_per_batch}>(xs...); ck_gemm<${solution}, ${blocks_per_batch}>(xs...);
}); });
} }
...@@ -78,88 +78,6 @@ __global__ void ${kernel}(${params}) ...@@ -78,88 +78,6 @@ __global__ void ${kernel}(${params})
)__migraphx__"; )__migraphx__";
static std::size_t int_div_ceil(std::size_t x, std::size_t y) { return (x + y - 1) / y; }
struct instance
{
std::vector<std::string> params;
static const std::size_t block_size_index = 15;
std::size_t int_at(std::size_t i) const { return std::stoull(params[i]); }
std::size_t get_block_size() const { return int_at(block_size_index); }
std::size_t get_pb(std::size_t i) const
{
assert(i < 4);
return int_at(block_size_index + 1 + i);
}
std::array<std::size_t, 3> get_pad(const std::array<std::size_t, 3>& config) const
{
std::array<std::size_t, 3> result{};
for(auto i : range(config.size()))
{
result[i] = int_div_ceil(config[i], get_pb(i)) * get_pb(i) - config[i];
}
return result;
}
std::size_t get_grid_size(const std::array<std::size_t, 3>& config) const
{
return int_div_ceil(config[0], get_pb(0)) * int_div_ceil(config[1], get_pb(1));
}
void set_ds_layout(const std::string& s)
{
assert(params[2] == "ck::Tuple<>");
params[2] = s;
}
void set_ds_type(const std::string& s)
{
assert(params[8] == "ck::Tuple<>");
params[8] = s;
}
void set_e_type(const std::string& s)
{
//assert(params[9] == "ck::Tuple<>");
params[9] = s;
}
void set_ds_op(const std::string& s)
{
assert(params[12] == "ck_passthrough");
params[12] = s;
}
void set_gemm(const std::string& s)
{
assert(params[13] == "ck::tensor_operation::device::GemmSpecialization::Default");
params[13] = s;
}
void set_a_scalar_per_vec(const std::string& s)
{
params[block_size_index + 14] = s;
params[block_size_index + 15] = s;
}
void set_b_scalar_per_vec(const std::string& s)
{
params[block_size_index + 20] = s;
params[block_size_index + 21] = s;
}
void set_c_scalar_per_vec(const std::string& s)
{
params[params.size() - 3] = s;
}
std::string str() const { return join_strings(params, ","); }
};
static bool transposed_matrix(const shape& s) { return s.strides().back() != 1; } static bool transposed_matrix(const shape& s) { return s.strides().back() != 1; }
template <class F, class Action> template <class F, class Action>
...@@ -309,9 +227,10 @@ struct ck_gemm_compiler : compiler<ck_gemm_compiler> ...@@ -309,9 +227,10 @@ struct ck_gemm_compiler : compiler<ck_gemm_compiler>
operation compile_op(context& /* ctx */, const std::vector<shape>& inputs, const value& v) const operation compile_op(context& /* ctx */, const std::vector<shape>& inputs, const value& v) const
{ {
auto a_shape = inputs[0]; auto a_shape = inputs[0];
auto b_shape = inputs[1]; auto b_shape = inputs[1];
auto c_shape = inputs.back(); auto c_shape = inputs.back();
auto tuning_value = get_tuning_for({a_shape, b_shape, c_shape});
auto rank = a_shape.lens().size(); auto rank = a_shape.lens().size();
auto b_strides = b_shape.strides(); auto b_strides = b_shape.strides();
...@@ -322,55 +241,61 @@ struct ck_gemm_compiler : compiler<ck_gemm_compiler> ...@@ -322,55 +241,61 @@ struct ck_gemm_compiler : compiler<ck_gemm_compiler>
m = can_fold_batch ? m * batch_count : m; m = can_fold_batch ? m * batch_count : m;
auto n = c_shape.lens().back(); auto n = c_shape.lens().back();
auto k = a_shape.lens().back(); auto k = a_shape.lens().back();
std::array<char, 3> keys{'M', 'N', 'K'};
std::array<std::size_t, 3> config{m, n, k}; const bool transA = transposed_matrix(a_shape);
auto tuning_val = v.get("tuning_val", get_tuning_for({a_shape, b_shape, c_shape.with_type(a_shape.type())})); const bool transB = transposed_matrix(b_shape);
auto ip = instance{get_instance(tuning_val, [&](const auto& x) -> bool { const bool transE = transposed_matrix(c_shape);
return get_layout(a_shape) == x[0] and get_layout(b_shape) == x[1] and const auto a_type = get_type(a_shape);
get_layout(c_shape) == x[3] and get_type(a_shape) == x[4] and const auto b_type = get_type(b_shape);
get_type(b_shape) == x[5]; const auto e_type = get_type(c_shape);
})}; std::vector<bool> ds_layout;
std::transform(inputs.begin() + 2,
inputs.end() - 1,
std::back_inserter(ds_layout),
[](const auto& i) { return transposed_matrix(i); });
std::vector<std::string> ds_type;
std::transform(inputs.begin() + 2,
inputs.end() - 1,
std::back_inserter(ds_type),
[](const auto& i) { return get_type(i); });
std::string ck_passthrough = "ck_passthrough";
std::string cde_op = ck_passthrough;
assert(inputs.size() < 4 or v.contains("post")); assert(inputs.size() < 4 or v.contains("post"));
if(v.contains("post")) if(v.contains("post"))
{ {
ip.set_ds_layout(ck_tuple(inputs.begin() + 2, inputs.end() - 1, &get_layout)); cde_op = v.at("post").to<std::string>();
ip.set_ds_type(ck_tuple(inputs.begin() + 2, inputs.end() - 1, &get_type));
ip.set_ds_op(v.at("post").to<std::string>());
}
if (a_shape.type() == shape::int8_type)
{
ip.set_e_type(get_type(c_shape));
if (std::any_of(inputs.begin(), inputs.end(), [](auto s) { return get_type(s) == "ck::half_t"; }))
{
ip.set_c_scalar_per_vec("8");
}
if (std::any_of(inputs.begin(), inputs.end(), [](auto s) { return get_type(s) == "float"; }))
{
ip.set_c_scalar_per_vec("4");
}
} }
auto padding = ip.get_pad(config); auto problem = ck::tensor_operation::device::device_gemm_multiple_d::Problem{
std::string gemm_type; static_cast<ck::index_t>(m),
for(auto i : range(padding.size())) static_cast<ck::index_t>(n),
{ static_cast<ck::index_t>(k),
if(padding[i] != 0) transA,
gemm_type += keys[i]; transB,
} transE,
if(gemm_type.empty()) ds_layout,
gemm_type = "Default"; a_type,
else b_type,
gemm_type += "Padding"; e_type,
ip.set_gemm("ck::tensor_operation::device::GemmSpecialization::" + gemm_type); ds_type,
ck_passthrough,
auto blocks_per_batch = ip.get_grid_size(config); ck_passthrough,
cde_op};
const auto include_header = problem.GetIncludeHeader();
const auto ck_headers = problem.GetHeaders();
const auto solutions = problem.GetSolutions("gfx90a");
const auto solution = solutions.at(tuning_value);
const auto template_str = solution.template_str;
const auto blocks_per_batch = solution.grid_size;
const auto block_size = solution.block_size;
hip_compile_options options; hip_compile_options options;
auto block_size = ip.get_block_size(); options.embedded_headers = ck_headers;
auto grid_size = can_fold_batch ? blocks_per_batch : batch_count * blocks_per_batch; auto grid_size = can_fold_batch ? blocks_per_batch : batch_count * blocks_per_batch;
options.set_launch_params(v, grid_size * block_size, block_size); options.set_launch_params(v, grid_size * block_size, block_size);
options.inputs = inputs; options.inputs = inputs;
options.output = c_shape; options.output = c_shape;
...@@ -389,7 +314,8 @@ struct ck_gemm_compiler : compiler<ck_gemm_compiler> ...@@ -389,7 +314,8 @@ struct ck_gemm_compiler : compiler<ck_gemm_compiler>
options.params += " -DMIGRAPHX_CK_CHECK=1"; options.params += " -DMIGRAPHX_CK_CHECK=1";
auto src = interpolate_string(ck_gemm_kernel, auto src = interpolate_string(ck_gemm_kernel,
{{"instance", ip.str()}, {{"solution", template_str},
{"include", include_header},
{"params", enum_params(inputs.size(), "void * private_p")}, {"params", enum_params(inputs.size(), "void * private_p")},
{"args", enum_params(inputs.size(), "private_p")}, {"args", enum_params(inputs.size(), "private_p")},
{"blocks_per_batch", to_string(blocks_per_batch)}, {"blocks_per_batch", to_string(blocks_per_batch)},
......
...@@ -29,7 +29,6 @@ ...@@ -29,7 +29,6 @@
#include <migraphx/kernels/integral_constant.hpp> #include <migraphx/kernels/integral_constant.hpp>
#include <migraphx/kernels/tensor_view.hpp> #include <migraphx/kernels/tensor_view.hpp>
#include <migraphx/kernels/ck.hpp> #include <migraphx/kernels/ck.hpp>
#include <migraphx/kernels/ck_gemm_includes.hpp>
#include <migraphx/kernels/gemm_batcher.hpp> #include <migraphx/kernels/gemm_batcher.hpp>
namespace migraphx { namespace migraphx {
...@@ -48,52 +47,15 @@ using ck_transposeb = decltype(make_shape(ck_transposeb_dims(get_shape_c<Tensor> ...@@ -48,52 +47,15 @@ using ck_transposeb = decltype(make_shape(ck_transposeb_dims(get_shape_c<Tensor>
template <class G, class E, class A, class B, class... Ds> template <class G, class E, class A, class B, class... Ds>
__device__ void ck_gemm_matrix(E e, A a, B b, Ds... ds) __device__ void ck_gemm_matrix(E e, A a, B b, Ds... ds)
{ {
constexpr const G gemm{}; constexpr auto desc = G::make_descriptor(to_ck_tensor<A>(),
to_ck_tensor<ck_transposeb<B>>(),
constexpr const auto a_grid_desc_m_k = gemm.matrix_padder.PadADescriptor_M_K(to_ck_tensor<A>()); ck::make_tuple(to_ck_tensor<Ds>()...),
constexpr const auto b_grid_desc_n_k = to_ck_tensor<E>());
gemm.matrix_padder.PadBDescriptor_N_K(to_ck_tensor<ck_transposeb<B>>()); G::Run(desc,
to_ck_const_pointer(a.data()),
constexpr const auto e_grid_desc_m_n = gemm.matrix_padder.PadCDescriptor_M_N(to_ck_tensor<E>()); to_ck_const_pointer(b.data()),
constexpr const auto ds_grid_desc_m_n = ck::make_tuple(to_ck_const_pointer(ds.data())...),
ck::make_tuple(gemm.matrix_padder.PadCDescriptor_M_N(to_ck_tensor<Ds>())...); to_ck_pointer(e.data()));
constexpr const auto block_2_etile_map = gemm.MakeDefaultBlock2ETileMap(e_grid_desc_m_n);
using GridwiseGemm = typename G::GridwiseGemm;
// tensor descriptors for block/thread-wise copy
constexpr auto a_grid_desc_ak0_m_ak1 =
GridwiseGemm::MakeDefaultAGridDescriptor_AK0_M_AK1(a_grid_desc_m_k);
constexpr auto b_grid_desc_bk0_n_bk1 =
GridwiseGemm::MakeDefaultBGridDescriptor_BK0_N_BK1(b_grid_desc_n_k);
constexpr auto ds_grid_desc_mblock_mperblock_nblock_nperblock =
GridwiseGemm::MakeDsGridDescriptor_MBlock_MPerBlock_NBlock_NPerBlock(ds_grid_desc_m_n);
constexpr auto e_grid_desc_mblock_mperblock_nblock_nperblock =
GridwiseGemm::MakeEGridDescriptor_MBlock_MPerBlock_NBlock_NPerBlock(e_grid_desc_m_n);
static_assert(GridwiseGemm::CheckValidity(
a_grid_desc_m_k, b_grid_desc_n_k, ds_grid_desc_m_n, e_grid_desc_m_n, block_2_etile_map));
__shared__ char p_shared_block[GridwiseGemm::GetSharedMemoryNumberOfByte()];
constexpr const bool HasMainKBlockLoop =
GridwiseGemm::CalculateHasMainKBlockLoop(a_grid_desc_ak0_m_ak1.GetLength(ck::Number<0>{}) *
a_grid_desc_ak0_m_ak1.GetLength(ck::Number<2>{}));
GridwiseGemm::template Run<HasMainKBlockLoop>(to_ck_const_pointer(a.data()),
to_ck_const_pointer(b.data()),
ck::make_tuple(to_ck_const_pointer(ds.data())...),
to_ck_pointer(e.data()),
p_shared_block,
gemm.a_element_op,
gemm.b_element_op,
gemm.cde_element_op,
a_grid_desc_ak0_m_ak1,
b_grid_desc_bk0_n_bk1,
ds_grid_desc_mblock_mperblock_nblock_nperblock,
e_grid_desc_mblock_mperblock_nblock_nperblock,
block_2_etile_map);
} }
template <class G, index_int BlocksPerBatch, class... Ts> template <class G, index_int BlocksPerBatch, class... Ts>
......
/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#ifndef MIGRAPHX_GUARD_KERNELS_CK_INCLUDES_HPP
#define MIGRAPHX_GUARD_KERNELS_CK_INCLUDES_HPP
#include <migraphx/kernels/index.hpp>
#include <migraphx/kernels/algorithm.hpp>
#include <migraphx/kernels/integral_constant.hpp>
#include <migraphx/kernels/tensor_view.hpp>
#include <ck/utility/common_header.hpp>
#include <ck/tensor_description/tensor_descriptor.hpp>
#include <ck/tensor_description/tensor_descriptor_helper.hpp>
#include <ck/tensor_operation/gpu/device/tensor_layout.hpp>
#include <ck/tensor_operation/gpu/device/device_gemm.hpp>
#include <ck/tensor_operation/gpu/device/gemm_specialization.hpp>
#include <ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v1.hpp>
#include <ck/tensor_operation/gpu/device/matrix_padder.hpp>
#include <ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_d_xdl_cshuffle.hpp>
namespace migraphx {
template <ck::index_t MPerBlock, ck::index_t NPerBlock, typename CGridDesc_M_N>
struct BlockToCTileMap_M00_N0_M01Adapt
{
static constexpr auto I0 = ck::Number<0>{};
static constexpr auto I1 = ck::Number<1>{};
static constexpr auto I2 = ck::Number<2>{};
static constexpr auto I3 = ck::Number<3>{};
__host__ __device__ constexpr BlockToCTileMap_M00_N0_M01Adapt() = default;
__host__
__device__ constexpr BlockToCTileMap_M00_N0_M01Adapt(const CGridDesc_M_N& c_grid_desc_m_n,
ck::index_t M01 = 8)
: M01_(M01), c_grid_desc_m_n_(c_grid_desc_m_n)
{
}
__host__ __device__ constexpr ck::index_t
CalculateGridSize(const CGridDesc_M_N& c_grid_desc_m_n) const
{
const auto M0 = ck::math::integer_divide_ceil(c_grid_desc_m_n.GetLength(I0), MPerBlock);
const auto N0 = ck::math::integer_divide_ceil(c_grid_desc_m_n.GetLength(I1), NPerBlock);
const ck::index_t grid_size = M0 * N0;
return grid_size;
}
template <typename TopIdx>
__host__ __device__ constexpr auto CalculateBottomIndex(const TopIdx& idx_top) const
{
auto block_1d_id = idx_top[I0];
const auto M0 = ck::math::integer_divide_ceil(c_grid_desc_m_n_.GetLength(I0), MPerBlock);
const auto N0 = ck::math::integer_divide_ceil(c_grid_desc_m_n_.GetLength(I1), NPerBlock);
block_1d_id = block_1d_id % (M0 * N0); // swallow batch index
ck::index_t idx_N0 = block_1d_id % N0;
ck::index_t idx_M0 = block_1d_id / N0;
const auto M01_adapt = (idx_M0 < M0 - M0 % M01_) ? M01_ : M0 % M01_;
ck::index_t idx_M00 = idx_M0 / M01_;
ck::index_t idx_M01 = idx_M0 % M01_;
ck::index_t idx_N0_M01_local = idx_N0 + idx_M01 * N0;
return ck::make_tuple(idx_N0_M01_local % M01_adapt + idx_M00 * M01_,
idx_N0_M01_local / M01_adapt);
}
template <typename CTileIdx, typename CTileDim>
__host__ __device__ bool constexpr ValidCTileIndex(const CTileIdx& /* c_tile_idx */,
const CTileDim& /* c_tile_dim */) const
{
return true; // always valid provided that user gets grid size from CalculateGridSize()
}
__host__ __device__ constexpr bool
CheckValidity(const CGridDesc_M_N& /* c_grid_desc_m_n */) const
{
return true;
}
private:
ck::index_t M01_;
CGridDesc_M_N c_grid_desc_m_n_;
};
template <typename ALayout,
typename BLayout,
typename DsLayout,
typename ELayout,
typename ADataType,
typename BDataType,
typename AccDataType,
typename CShuffleDataType,
typename DsDataType,
typename EDataType,
typename AElementwiseOperation,
typename BElementwiseOperation,
typename CDEElementwiseOperation,
ck::tensor_operation::device::GemmSpecialization GemmSpec,
ck::index_t NumGemmKPrefetchStage,
ck::index_t BlockSize,
ck::index_t MPerBlock,
ck::index_t NPerBlock,
ck::index_t KPerBlock,
ck::index_t AK1,
ck::index_t BK1,
ck::index_t MPerXDL,
ck::index_t NPerXDL,
ck::index_t MXdlPerWave,
ck::index_t NXdlPerWave,
typename ABlockTransferThreadClusterLengths_AK0_M_AK1,
typename ABlockTransferThreadClusterArrangeOrder,
typename ABlockTransferSrcAccessOrder,
ck::index_t ABlockTransferSrcVectorDim,
ck::index_t ABlockTransferSrcScalarPerVector,
ck::index_t ABlockTransferDstScalarPerVector_AK1,
ck::index_t ABlockLdsExtraM,
typename BBlockTransferThreadClusterLengths_BK0_N_BK1,
typename BBlockTransferThreadClusterArrangeOrder,
typename BBlockTransferSrcAccessOrder,
ck::index_t BBlockTransferSrcVectorDim,
ck::index_t BBlockTransferSrcScalarPerVector,
ck::index_t BBlockTransferDstScalarPerVector_BK1,
ck::index_t BBlockLdsExtraN,
ck::index_t CShuffleMXdlPerWavePerShuffle,
ck::index_t CShuffleNXdlPerWavePerShuffle,
typename CDEBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock,
ck::index_t CDEBlockTransferScalarPerVector_NPerBlock,
ck::LoopScheduler LoopSched = ck::make_default_loop_scheduler(),
ck::PipelineVersion PipelineVer = ck::PipelineVersion::v1>
struct CK_DeviceGemmMultipleD
{
static constexpr auto I0 = ck::Number<0>{};
static constexpr auto I1 = ck::Number<1>{};
// static constexpr auto I2 = ck::Number<2>{};
// static constexpr auto I3 = ck::Number<3>{};
// static constexpr auto I4 = ck::Number<4>{};
// static constexpr auto I5 = ck::Number<5>{};
// static constexpr auto I6 = ck::Number<6>{};
// static constexpr auto I7 = ck::Number<7>{};
ck::tensor_operation::device::MatrixPadder<GemmSpec, ck::index_t, ck::index_t, ck::index_t>
matrix_padder{MPerBlock, NPerBlock, KPerBlock};
// GridwiseGemm
using GridwiseGemm = ck::GridwiseGemmMultipleD_xdl_cshuffle<
ADataType, // TODO: distinguish A/B datatype
AccDataType,
CShuffleDataType,
DsDataType,
EDataType,
AElementwiseOperation,
BElementwiseOperation,
CDEElementwiseOperation,
ck::InMemoryDataOperationEnum::Set,
NumGemmKPrefetchStage,
BlockSize,
MPerBlock,
NPerBlock,
KPerBlock,
AK1,
BK1,
MPerXDL,
NPerXDL,
MXdlPerWave,
NXdlPerWave,
ABlockTransferThreadClusterLengths_AK0_M_AK1,
ABlockTransferThreadClusterArrangeOrder,
ABlockTransferSrcAccessOrder,
ABlockTransferSrcVectorDim,
ABlockTransferSrcScalarPerVector,
ABlockTransferDstScalarPerVector_AK1,
false,
ABlockLdsExtraM,
BBlockTransferThreadClusterLengths_BK0_N_BK1,
BBlockTransferThreadClusterArrangeOrder,
BBlockTransferSrcAccessOrder,
BBlockTransferSrcVectorDim,
BBlockTransferSrcScalarPerVector,
BBlockTransferDstScalarPerVector_BK1,
false,
BBlockLdsExtraN,
CShuffleMXdlPerWavePerShuffle,
CShuffleNXdlPerWavePerShuffle,
CDEBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock,
CDEBlockTransferScalarPerVector_NPerBlock,
LoopSched,
PipelineVer>;
// return block_id to E matrix tile idx (m0, n0) mapping
template <class EGridDesc_M_N>
__device__ static constexpr auto
MakeDefaultBlock2ETileMap(const EGridDesc_M_N& e_grid_desc_m_n_)
{
return BlockToCTileMap_M00_N0_M01Adapt<MPerBlock, NPerBlock, EGridDesc_M_N>(
e_grid_desc_m_n_);
}
// block_id to matrix tile idx (m0, n0) mapping are controlled by {M01, N01}
template <typename AGridDesc_M_K,
typename BGridDesc_N_K,
typename DsGridDesc_M_N,
typename EGridDesc_M_N,
typename Block2ETileMap>
static constexpr bool CheckValidity(const AGridDesc_M_K& a_grid_desc_m_k,
const BGridDesc_N_K& b_grid_desc_n_k,
const DsGridDesc_M_N& ds_grid_desc_m_n,
const EGridDesc_M_N& e_grid_desc_m_n,
const Block2ETileMap& block_2_etile_map)
{
const auto M = a_grid_desc_m_k.GetLength(I0);
const auto N = b_grid_desc_n_k.GetLength(I0);
const auto K = a_grid_desc_m_k.GetLength(I1);
// check consistency of desc
MIGRAPHX_CHECK(M == e_grid_desc_m_n.GetLength(I0) && N == e_grid_desc_m_n.GetLength(I1));
// check tile size
MIGRAPHX_CHECK(M % MPerBlock == 0 && N % NPerBlock == 0 && K % KPerBlock == 0);
// check block-to-E-tile
MIGRAPHX_CHECK(block_2_etile_map.CheckValidity(e_grid_desc_m_n));
return GridwiseGemm::CheckValidity(
a_grid_desc_m_k, b_grid_desc_n_k, ds_grid_desc_m_n, e_grid_desc_m_n, block_2_etile_map);
}
AElementwiseOperation a_element_op{};
BElementwiseOperation b_element_op{};
CDEElementwiseOperation cde_element_op{};
};
} // namespace migraphx
#endif
...@@ -161,8 +161,7 @@ MIGRAPHX_DEVICE_MATH_HALF(fmod, ::fmod) ...@@ -161,8 +161,7 @@ MIGRAPHX_DEVICE_MATH_HALF(fmod, ::fmod)
// Map math functions to hip half2 functions // Map math functions to hip half2 functions
// The half2 type is defined in include/hip/amd_detail/hip_fp16_gcc.h and is 2 16-bit floats // The half2 type is defined in include/hip/amd_detail/hip_fp16_gcc.h and is 2 16-bit floats
// packed into a 32-bit number. See include/hip/amd_detail/hip_fp16_math_fwd.h for the HIP names // packed into a 32-bit number. See include/hip/amd_detail/hip_fp16_math_fwd.h for the HIP names
// Most but not all of these math ops have operators of the same names. Ones not yet implemented // Most but not all of these math ops have operators of the same names.
// at this time are: exp2, exp10, log2, log10, isinf
MIGRAPHX_DEVICE_MATH_HALF2(abs, ::__habs2) MIGRAPHX_DEVICE_MATH_HALF2(abs, ::__habs2)
MIGRAPHX_DEVICE_MATH_HALF2(ceil, ::h2ceil) MIGRAPHX_DEVICE_MATH_HALF2(ceil, ::h2ceil)
MIGRAPHX_DEVICE_MATH_HALF2(cos, ::h2cos) MIGRAPHX_DEVICE_MATH_HALF2(cos, ::h2cos)
...@@ -189,7 +188,8 @@ MIGRAPHX_DEVICE_MATH_BINARY_FOR(float, max, ::max) ...@@ -189,7 +188,8 @@ MIGRAPHX_DEVICE_MATH_BINARY_FOR(float, max, ::max)
MIGRAPHX_DEVICE_MATH_BINARY_FOR(float, min, ::min) MIGRAPHX_DEVICE_MATH_BINARY_FOR(float, min, ::min)
MIGRAPHX_DEVICE_MATH_BINARY_FOR(double, max, ::max) MIGRAPHX_DEVICE_MATH_BINARY_FOR(double, max, ::max)
MIGRAPHX_DEVICE_MATH_BINARY_FOR(double, min, ::min) MIGRAPHX_DEVICE_MATH_BINARY_FOR(double, min, ::min)
// Add overloads for half that calls the float version // Add overloads for half that calls the float version, this should use "hmax" and "hmin" once
// perf CI docker is upgraded to rocm-5.5
MIGRAPHX_DEVICE_MATH_BINARY_FOR(migraphx::half, max, ::fmaxf) MIGRAPHX_DEVICE_MATH_BINARY_FOR(migraphx::half, max, ::fmaxf)
MIGRAPHX_DEVICE_MATH_BINARY_FOR(migraphx::half, min, ::fminf) MIGRAPHX_DEVICE_MATH_BINARY_FOR(migraphx::half, min, ::fminf)
......
...@@ -164,10 +164,10 @@ std::string mlir_print(F f, T x) ...@@ -164,10 +164,10 @@ std::string mlir_print(F f, T x)
return ss.str(); return ss.str();
} }
const std::unordered_set<std::string>& get_xdlops_archs() bool has_xdlops(const std::string& target_arch)
{ {
static std::unordered_set<std::string> supported_archs{"gfx908", "gfx90a"}; const auto device_name = trim(split_string(target_arch, ':').front());
return supported_archs; return (starts_with(device_name, "gfx9") and device_name >= "gfx908");
} }
struct mlir_program struct mlir_program
...@@ -560,9 +560,7 @@ struct mlir_program ...@@ -560,9 +560,7 @@ struct mlir_program
pp = pp =
problem_params{ins->get_operator(), to_shapes(ins->inputs()), ins->get_shape()}; problem_params{ins->get_operator(), to_shapes(ins->inputs()), ins->get_shape()};
// check if HW supports xdlops // check if HW supports xdlops
auto target_chip = trim(split_string(target_arch, ':').front()); if(has_xdlops(target_arch))
bool xdlops = contains(get_xdlops_archs(), target_chip);
if(xdlops)
ops.add_attributes({{"xdlopsV2", true}}); ops.add_attributes({{"xdlopsV2", true}});
} }
......
...@@ -47,16 +47,10 @@ rocblas_handle_ptr create_rocblas_handle_ptr(hipStream_t s) ...@@ -47,16 +47,10 @@ rocblas_handle_ptr create_rocblas_handle_ptr(hipStream_t s)
return rb; return rb;
} }
const std::unordered_set<std::string>& get_rocblas_fp32_archs()
{
static std::unordered_set<std::string> supported_archs{"gfx908", "gfx90a"};
return supported_archs;
}
bool get_compute_fp32_flag() bool get_compute_fp32_flag()
{ {
const auto device_name = trim(split_string(get_device_name(), ':').front()); const auto device_name = trim(split_string(get_device_name(), ':').front());
return contains(get_rocblas_fp32_archs(), device_name); return (starts_with(device_name, "gfx9") and device_name >= "gfx908");
} }
bool get_int8_x4_format(context& ctx) bool get_int8_x4_format(context& ctx)
......
/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "verify_program.hpp"
#include <migraphx/program.hpp>
#include <migraphx/generate.hpp>
#include <migraphx/op/max.hpp>
#include <migraphx/op/min.hpp>
template <class Op, migraphx::shape::type_t T>
struct test_min_max : verify_program<test_min_max<Op, T>>
{
migraphx::program create_program() const
{
migraphx::program p;
auto* mm = p.get_main_module();
migraphx::shape s{T, {128}};
auto x = mm->add_parameter("x", s);
auto y = mm->add_parameter("y", s);
mm->add_instruction(Op{}, x, y);
return p;
}
};
template struct test_min_max<migraphx::op::max, migraphx::shape::float_type>;
template struct test_min_max<migraphx::op::max, migraphx::shape::half_type>;
template struct test_min_max<migraphx::op::max, migraphx::shape::double_type>;
template struct test_min_max<migraphx::op::min, migraphx::shape::float_type>;
template struct test_min_max<migraphx::op::min, migraphx::shape::half_type>;
template struct test_min_max<migraphx::op::min, migraphx::shape::double_type>;
...@@ -50,6 +50,7 @@ class Template(string.Template): ...@@ -50,6 +50,7 @@ class Template(string.Template):
class Type: class Type:
def __init__(self, name: str) -> None: def __init__(self, name: str) -> None:
self.name = name.strip() self.name = name.strip()
...@@ -144,6 +145,7 @@ extern "C" ${error_type} ${name}(${params}) ...@@ -144,6 +145,7 @@ extern "C" ${error_type} ${name}(${params})
class CFunction: class CFunction:
def __init__(self, name: str) -> None: def __init__(self, name: str) -> None:
self.name = name self.name = name
self.params: List[str] = [] self.params: List[str] = []
...@@ -188,12 +190,14 @@ class CFunction: ...@@ -188,12 +190,14 @@ class CFunction:
class BadParam: class BadParam:
def __init__(self, cond: str, msg: str) -> None: def __init__(self, cond: str, msg: str) -> None:
self.cond = cond self.cond = cond
self.msg = msg self.msg = msg
class Parameter: class Parameter:
def __init__(self, def __init__(self,
name: str, name: str,
type: str, type: str,
...@@ -250,7 +254,8 @@ class Parameter: ...@@ -250,7 +254,8 @@ class Parameter:
size=self.size_name, size=self.size_name,
result=result or '') result=result or '')
def add_param(self, t: Union[str, Type], def add_param(self,
t: Union[str, Type],
name: Optional[str] = None) -> None: name: Optional[str] = None) -> None:
if not isinstance(t, str): if not isinstance(t, str):
t = t.str() t = t.str()
...@@ -409,6 +414,7 @@ def to_template_vars(params: List[Union[Any, Parameter]]) -> str: ...@@ -409,6 +414,7 @@ def to_template_vars(params: List[Union[Any, Parameter]]) -> str:
class Function: class Function:
def __init__(self, def __init__(self,
name: str, name: str,
params: Optional[List[Parameter]] = None, params: Optional[List[Parameter]] = None,
...@@ -545,6 +551,7 @@ cpp_class_constructor_template = Template(''' ...@@ -545,6 +551,7 @@ cpp_class_constructor_template = Template('''
class CPPMember: class CPPMember:
def __init__(self, def __init__(self,
name: str, name: str,
function: Function, function: Function,
...@@ -621,6 +628,7 @@ class CPPMember: ...@@ -621,6 +628,7 @@ class CPPMember:
class CPPClass: class CPPClass:
def __init__(self, name: str, ctype: str) -> None: def __init__(self, name: str, ctype: str) -> None:
self.name = name self.name = name
self.ctype = ctype self.ctype = ctype
...@@ -677,6 +685,7 @@ def add_function(name: str, *args, **kwargs) -> Function: ...@@ -677,6 +685,7 @@ def add_function(name: str, *args, **kwargs) -> Function:
def once(f: Callable) -> Any: def once(f: Callable) -> Any:
@wraps(f) @wraps(f)
def decorated(*args, **kwargs): def decorated(*args, **kwargs):
if not decorated.has_run: if not decorated.has_run:
...@@ -722,6 +731,7 @@ c_type_map: Dict[str, Type] = {} ...@@ -722,6 +731,7 @@ c_type_map: Dict[str, Type] = {}
def cwrap(name: str, c_type: Optional[str] = None) -> Callable: def cwrap(name: str, c_type: Optional[str] = None) -> Callable:
def with_cwrap(f): def with_cwrap(f):
type_map[name] = f type_map[name] = f
if c_type: if c_type:
...@@ -1015,6 +1025,7 @@ def string_c_wrap(p: Parameter) -> None: ...@@ -1015,6 +1025,7 @@ def string_c_wrap(p: Parameter) -> None:
class Handle: class Handle:
def __init__(self, name: str, ctype: str, cpptype: str, **kwargs) -> None: def __init__(self, name: str, ctype: str, cpptype: str, **kwargs) -> None:
self.name = name self.name = name
self.ctype = ctype self.ctype = ctype
...@@ -1140,6 +1151,7 @@ def generate_virtual_impl(f: Function, fname: str) -> str: ...@@ -1140,6 +1151,7 @@ def generate_virtual_impl(f: Function, fname: str) -> str:
class Interface(Handle): class Interface(Handle):
def __init__(self, name: str, ctype: str, cpptype: str) -> None: def __init__(self, name: str, ctype: str, cpptype: str) -> None:
super().__init__(name, ctype, cpptype, skip_def=True) super().__init__(name, ctype, cpptype, skip_def=True)
self.ifunctions: List[Function] = [] self.ifunctions: List[Function] = []
...@@ -1234,6 +1246,7 @@ def handle(ctype: str, ...@@ -1234,6 +1246,7 @@ def handle(ctype: str,
cpptype: str, cpptype: str,
name: Optional[str] = None, name: Optional[str] = None,
ref: Optional[bool] = None) -> Callable: ref: Optional[bool] = None) -> Callable:
def with_handle(f): def with_handle(f):
n = name or f.__name__ n = name or f.__name__
h = Handle(n, ctype, cpptype, ref=ref) h = Handle(n, ctype, cpptype, ref=ref)
...@@ -1249,8 +1262,10 @@ def handle(ctype: str, ...@@ -1249,8 +1262,10 @@ def handle(ctype: str,
return with_handle return with_handle
def interface(ctype: str, cpptype: str, def interface(ctype: str,
cpptype: str,
name: Optional[str] = None) -> Callable: name: Optional[str] = None) -> Callable:
def with_interface(f): def with_interface(f):
n = name or f.__name__ n = name or f.__name__
h = Interface(n, ctype, cpptype) h = Interface(n, ctype, cpptype)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment