Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel_ROCM
Commits
7b7a3978
Commit
7b7a3978
authored
Oct 02, 2023
by
Jun Liu
Browse files
Merge branch 'amd-develop' into amd-master
parents
b24d93a1
7e8230da
Changes
185
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1113 additions
and
821 deletions
+1113
-821
CHANGELOG.md
CHANGELOG.md
+40
-30
CMakeLists.txt
CMakeLists.txt
+2
-2
Dockerfile
Dockerfile
+1
-1
Jenkinsfile
Jenkinsfile
+2
-2
example/01_gemm/CMakeLists.txt
example/01_gemm/CMakeLists.txt
+11
-4
example/01_gemm/gemm_xdl_fp16_fp8.cpp
example/01_gemm/gemm_xdl_fp16_fp8.cpp
+0
-0
example/01_gemm/gemm_xdl_fp8.cpp
example/01_gemm/gemm_xdl_fp8.cpp
+0
-0
example/01_gemm/gemm_xdl_fp8_bf8.cpp
example/01_gemm/gemm_xdl_fp8_bf8.cpp
+49
-0
example/26_contraction/CMakeLists.txt
example/26_contraction/CMakeLists.txt
+48
-0
example/26_contraction/common_instances.hpp
example/26_contraction/common_instances.hpp
+183
-0
example/26_contraction/contraction_bilinear_xdl_bf16_compute_fp32.cpp
...ontraction/contraction_bilinear_xdl_bf16_compute_fp32.cpp
+86
-0
example/26_contraction/contraction_bilinear_xdl_fp16_compute_fp32.cpp
...ontraction/contraction_bilinear_xdl_fp16_compute_fp32.cpp
+86
-0
example/26_contraction/contraction_bilinear_xdl_fp32.cpp
example/26_contraction/contraction_bilinear_xdl_fp32.cpp
+59
-266
example/26_contraction/contraction_bilinear_xdl_fp32_compute_bf16.cpp
...ontraction/contraction_bilinear_xdl_fp32_compute_bf16.cpp
+86
-0
example/26_contraction/contraction_bilinear_xdl_fp32_compute_fp16.cpp
...ontraction/contraction_bilinear_xdl_fp32_compute_fp16.cpp
+86
-0
example/26_contraction/contraction_bilinear_xdl_fp64.cpp
example/26_contraction/contraction_bilinear_xdl_fp64.cpp
+59
-266
example/26_contraction/contraction_bilinear_xdl_fp64_compute_fp32.cpp
...ontraction/contraction_bilinear_xdl_fp64_compute_fp32.cpp
+86
-0
example/26_contraction/contraction_scale_xdl_bf16_compute_fp32.cpp
...6_contraction/contraction_scale_xdl_bf16_compute_fp32.cpp
+85
-0
example/26_contraction/contraction_scale_xdl_fp16_compute_fp32.cpp
...6_contraction/contraction_scale_xdl_fp16_compute_fp32.cpp
+85
-0
example/26_contraction/contraction_scale_xdl_fp32.cpp
example/26_contraction/contraction_scale_xdl_fp32.cpp
+59
-250
No files found.
CHANGELOG.md
View file @
7b7a3978
# Change
L
og for Composable Kernel
# Change
l
og for Composable Kernel
Full documentation for Composable Kernel is not yet available.
## (Unreleased) CK for ROCm 6.0.0
### Fixed
### Fixes
-
Fixed a hazard associated with inline v_dot (#808)
-
Fixed two bugs in grouped convolution backward data without K padding (#848 #876)
### Optimizations
None
### Added
-
Added image to column (#867) and column to image kernels (#930).
### Additions
-
Added an image to a column kernel (#867)
-
Added a column to an image kernel (#930)
-
Support for 3D grouped convolution forward on RDNA 3 GPUs (#935)
-
Grouped convolution support for small K and C (#822 #879 #897)
-
Support for NHWGC (2D and 3D) grouped convolution backward weight (#769 #804)
-
Support for bf16/f32/f16 and NHWGC (2D and 3d) grouped convolution backward data (#757 #799)
-
Support for Batched Gemm DL (#732)
### Changed
### Changes
-
Changed the grouped convolution API to maintain consistency with other convolution kernels (#817)
## CK 0.2.0 for ROCm 5.7.0
## CK 0.2.0 for ROCm 5.5.0
### Fixed
-
Fixed a bug in 6-dimensional kernels (#555).
-
Fixed grouped ConvBwdWeight test case failure (#524).
### Fixes
-
Fixed a bug in 6-dimensional kernels (#555)
-
Fixed a test case failure with grouped convolution backward weight (#524)
### Optimizations
-
Improve proformance of normalization kernel
### Added
-
Added new cmake flag "DL_KERNELS" must be set to "ON" in order to build the gemm_dl and batched_gemm_multi_d_dl instances.
-
Added new cmake flag "DTYPES" which could be set to any subset of "fp64;fp32;fp16;fp8;bf16;int8" to build instance of select data types.
-
Added new cmake flag "INSTANCES_ONLY" which will only build CK library and instances without the tests, examples, or profiler.
-
Added new feature: if GPU_TARGETS is not set on cmake command line, CK will be built for all targets supported by compiler.
-
Added support on MI300A/MI300X.
-
Added support on NAVI3x.
-
Added user tutorial (#563).
-
Added more instances for irregular GEMM sizes (#560).
-
Added inter-wave consumer-producer programming model for GEMM kernels (#310).
-
Added multi-D GEMM client APIs (#534).
-
Added multi-embeddings support (#542).
-
Added Navi3x blockwise GEMM and real GEMM support (#541).
-
Added Navi grouped ConvBwdWeight support (#505).
-
Added MaxPool, AvgPool forward (#815).
-
Added MaxPool backward (#750).
### Changed
-
Improved the performance of the normalization kernel
### Additions
-
New CMake flags:
-
"DL_KERNELS"-- Must be set to "ON" in order to build the gemm_dl and batched_gemm_multi_d_dl instances
-
"DTYPES" -- Can be set to any subset of "fp64;fp32;fp16;fp8;bf16;int8" to build an instance of the specified data types
-
"INSTANCES_ONLY" -- Only builds CK library and instances without tests, examples, or profiler
-
New feature: if GPU_TARGETS is not set in the CMake command line, CK will be built for all targets supported by the compiler
-
Support for MI300A/MI300X
-
Support for AMD RDNA 3
-
New user tutorial (#563)
-
Additional instances for irregular GEMM sizes (#560)
-
New inter-wave consumer-producer programming model for GEMM kernels (#310)
-
GEMM with support multiple elementwise fusions (multi-D) (#534)
-
Multi-embeddings support (#542)
-
AMD RDNA 3 blockwise GEMM and real GEMM support (#541)
-
AMD RDNA grouped convolution backward weight support (#505)
-
MaxPool and AvgPool forward (#815); MaxPool backward (#750)
### Changes
None
CMakeLists.txt
View file @
7b7a3978
...
...
@@ -106,7 +106,7 @@ message("checking which targets are supported")
#Setting GPU_TARGETS on command line will override this list
if
(
NOT PROFILER_ONLY
)
rocm_check_target_ids
(
DEFAULT_GPU_TARGETS
TARGETS
"
gfx900;gfx906;
gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102"
)
TARGETS
"gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102"
)
else
()
add_definitions
(
-DPROFILER_ONLY
)
set
(
GPU_TARGETS
""
CACHE STRING
""
FORCE
)
...
...
@@ -114,7 +114,7 @@ else()
message
(
FATAL_ERROR
"For PROFILE_ONLY build, please do not set GPU_TARGETS, use GPU_ARCH = gfx90, gfx94, gfx10, or gfx11"
)
endif
()
if
(
GPU_ARCH MATCHES
"gfx90"
)
rocm_check_target_ids
(
DEFAULT_GPU_TARGETS TARGETS
"
gfx900;gfx906;
gfx908;gfx90a"
)
rocm_check_target_ids
(
DEFAULT_GPU_TARGETS TARGETS
"gfx908;gfx90a"
)
elseif
(
GPU_ARCH MATCHES
"gfx94"
)
rocm_check_target_ids
(
DEFAULT_GPU_TARGETS TARGETS
"gfx940;gfx941;gfx942"
)
elseif
(
GPU_ARCH MATCHES
"gfx10"
)
...
...
Dockerfile
View file @
7b7a3978
FROM
ubuntu:20.04
ARG
DEBIAN_FRONTEND=noninteractive
ARG
ROCMVERSION=5.
6
ARG
ROCMVERSION=5.
7
ARG
compiler_version=""
ARG
compiler_commit=""
...
...
Jenkinsfile
View file @
7b7a3978
...
...
@@ -713,8 +713,8 @@ pipeline {
}
agent
{
label
rocmnode
(
"gfx908 || gfx90a"
)
}
environment
{
setup_args
=
""" -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941" """
execute_args
=
""" cd ../client_example && rm -rf build && mkdir build && cd build && cmake -D CMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" -DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941" -D CMAKE_CXX_COMPILER="${build_compiler()}" .. && make -j """
setup_args
=
""" -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941
;gfx942
" """
execute_args
=
""" cd ../client_example && rm -rf build && mkdir build && cd build && cmake -D CMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" -DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941
;gfx942
" -D CMAKE_CXX_COMPILER="${build_compiler()}" .. && make -j """
}
steps
{
Build_CK_and_Reboot
(
setup_args:
setup_args
,
config_targets:
"install"
,
no_reboot:
true
,
build_type:
'Release'
,
execute_cmd:
execute_args
,
prefixpath:
'/usr/local'
)
...
...
example/01_gemm/CMakeLists.txt
View file @
7b7a3978
...
...
@@ -67,13 +67,20 @@ add_example_executable(example_gemm_xdl_streamk gemm_xdl_streamk.cpp)
if
(
GPU_TARGETS MATCHES
"gfx940"
OR GPU_TARGETS MATCHES
"gfx941"
OR GPU_TARGETS MATCHES
"gfx942"
)
add_example_executable
(
example_gemm_xdl_f8 gemm_xdl_f8.cpp
)
add_example_executable
(
example_gemm_xdl_f
p
8 gemm_xdl_f
p
8.cpp
)
if
(
result EQUAL 0
)
add_dependencies
(
example_gemm_xdl example_gemm_xdl_f8
)
add_dependencies
(
example_gemm_xdl example_gemm_xdl_f
p
8
)
endif
()
endif
()
add_example_executable
(
example_gemm_xdl_fp16_f8 gemm_xdl_fp16_f8.cpp
)
if
(
GPU_TARGETS MATCHES
"gfx940"
OR GPU_TARGETS MATCHES
"gfx941"
OR GPU_TARGETS MATCHES
"gfx942"
)
add_example_executable
(
example_gemm_xdl_fp8_bf8 gemm_xdl_fp8_bf8.cpp
)
if
(
result EQUAL 0
)
add_dependencies
(
example_gemm_xdl example_gemm_xdl_fp8_bf8
)
endif
()
endif
()
add_example_executable
(
example_gemm_xdl_fp16_fp8 gemm_xdl_fp16_fp8.cpp
)
if
(
result EQUAL 0
)
add_dependencies
(
example_gemm_xdl example_gemm_xdl_fp16_f8
)
add_dependencies
(
example_gemm_xdl example_gemm_xdl_fp16_f
p
8
)
endif
()
example/01_gemm/gemm_xdl_fp16_f8.cpp
→
example/01_gemm/gemm_xdl_fp16_f
p
8.cpp
View file @
7b7a3978
File moved
example/01_gemm/gemm_xdl_f8.cpp
→
example/01_gemm/gemm_xdl_f
p
8.cpp
View file @
7b7a3978
File moved
example/01_gemm/gemm_xdl_fp8_bf8.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "common.hpp"
#include "ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle.hpp"
using
ADataType
=
ck
::
f8_t
;
using
BDataType
=
ck
::
bf8_t
;
using
CDataType
=
ck
::
f8_t
;
using
AccDataType
=
float
;
using
CShuffleDataType
=
ck
::
f8_t
;
using
ALayout
=
Row
;
using
BLayout
=
Col
;
using
CLayout
=
Row
;
using
AElementOp
=
PassThrough
;
using
BElementOp
=
PassThrough
;
using
CElementOp
=
PassThrough
;
static
constexpr
auto
GemmDefault
=
ck
::
tensor_operation
::
device
::
GemmSpecialization
::
Default
;
static
constexpr
auto
LoopSched
=
ck
::
make_default_loop_scheduler
();
static
constexpr
auto
PipelineVer
=
ck
::
PipelineVersion
::
v1
;
using
ComputeTypeA
=
ck
::
f8_t
;
using
ComputeTypeB
=
ck
::
bf8_t
;
// clang-format off
using
DeviceGemmInstance
=
ck
::
tensor_operation
::
device
::
DeviceGemm_Xdl_CShuffle
// ######| ALayout| BLayout| CLayout| AData| BData| CData| AccData| CShuffle| A| B| C| GEMM| NumGemmK| Block| MPer| NPer| KPer| AK1| BK1| MPer| NPer| MXdl| NXdl| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockTransfer| ABlockLds| BBlockTransfer| BBlockTransfer| BBlockTransfer| BlockTransfer| BBlockTransfer| BBlockTransfer| BBlockLds| CShuffle| CShuffle| CBlockTransferClusterLengths| CBlockTransfer|
// ######| | | | Type| Type| Type| Type| DataType| Elementwise| Elementwise| Elementwise| Spacialization| Prefetch| Size| Block| Block| Block| | | XDL| XDL| Per| Per| ThreadCluster| ThreadCluster| SrcAccessOrder| SrcVectorDim| SrcScalar| DstScalar| AddExtraM| ThreadCluster| ThreadCluster| SrcAccessOrder| SrcVectorDim| SrcScalar| DstScalar| AddExtraN| MXdlPerWave| NXdlPerWave| _MBlock_MWaveMPerXdl| ScalarPerVector|
// ######| | | | | | | | | Operation| Operation| Operation| | Stage| | | | | | | | | Wave| Wave| Lengths_K0_M_K1| ArrangeOrder| | | PerVector| PerVector_K1| | Lengths_K0_N_K1| ArrangeOrder| | | PerVector| PerVector_K1| | PerShuffle| PerShuffle| _NBlock_NWaveNPerXdl| _NWaveNPerXdl|
// ######| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
<
ALayout
,
BLayout
,
CLayout
,
ADataType
,
BDataType
,
CDataType
,
AccDataType
,
CShuffleDataType
,
AElementOp
,
BElementOp
,
CElementOp
,
GemmDefault
,
1
,
256
,
256
,
128
,
64
,
16
,
16
,
32
,
32
,
4
,
2
,
S
<
4
,
64
,
1
>
,
S
<
1
,
0
,
2
>
,
S
<
1
,
0
,
2
>
,
2
,
16
,
16
,
1
,
S
<
4
,
64
,
1
>
,
S
<
1
,
0
,
2
>
,
S
<
1
,
0
,
2
>
,
2
,
8
,
8
,
1
,
1
,
1
,
S
<
1
,
64
,
1
,
4
>
,
16
,
LoopSched
,
PipelineVer
,
ComputeTypeA
,
ComputeTypeB
>
;
// clang-format on
using
ReferenceGemmInstance
=
ck
::
tensor_operation
::
host
::
ReferenceGemm
<
ADataType
,
BDataType
,
CDataType
,
AccDataType
,
AElementOp
,
BElementOp
,
CElementOp
,
ComputeTypeA
,
ComputeTypeB
>
;
#include "run_gemm_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
!
run_gemm_example
(
argc
,
argv
);
}
example/26_contraction/CMakeLists.txt
View file @
7b7a3978
add_custom_target
(
example_contraction
)
add_custom_target
(
example_contraction_scale
)
add_custom_target
(
example_contraction_bilinear
)
# FP32
add_example_executable
(
example_contraction_bilinear_xdl_fp32 contraction_bilinear_xdl_fp32.cpp
)
add_dependencies
(
example_contraction_bilinear example_contraction_bilinear_xdl_fp32
)
add_example_executable
(
example_contraction_scale_xdl_fp32 contraction_scale_xdl_fp32.cpp
)
add_dependencies
(
example_contraction_scale example_contraction_scale_xdl_fp32
)
add_example_executable
(
example_contraction_bilinear_xdl_fp32_compute_bf16 contraction_bilinear_xdl_fp32_compute_bf16.cpp
)
add_dependencies
(
example_contraction_bilinear example_contraction_bilinear_xdl_fp32_compute_bf16
)
add_example_executable
(
example_contraction_scale_xdl_fp32_compute_bf16 contraction_scale_xdl_fp32_compute_bf16.cpp
)
add_dependencies
(
example_contraction_scale example_contraction_scale_xdl_fp32_compute_bf16
)
add_example_executable
(
example_contraction_bilinear_xdl_fp32_compute_fp16 contraction_bilinear_xdl_fp32_compute_fp16.cpp
)
add_dependencies
(
example_contraction_bilinear example_contraction_bilinear_xdl_fp32_compute_fp16
)
add_example_executable
(
example_contraction_scale_xdl_fp32_compute_fp16 contraction_scale_xdl_fp32_compute_fp16.cpp
)
add_dependencies
(
example_contraction_scale example_contraction_scale_xdl_fp32_compute_fp16
)
# FP64
add_example_executable
(
example_contraction_bilinear_xdl_fp64 contraction_bilinear_xdl_fp64.cpp
)
add_dependencies
(
example_contraction_bilinear example_contraction_bilinear_xdl_fp64
)
add_example_executable
(
example_contraction_scale_xdl_fp64 contraction_scale_xdl_fp64.cpp
)
add_dependencies
(
example_contraction_scale example_contraction_scale_xdl_fp64
)
add_example_executable
(
example_contraction_bilinear_xdl_fp64_compute_fp32 contraction_bilinear_xdl_fp64_compute_fp32.cpp
)
add_dependencies
(
example_contraction_bilinear example_contraction_bilinear_xdl_fp64_compute_fp32
)
add_example_executable
(
example_contraction_scale_xdl_fp64_compute_fp32 contraction_scale_xdl_fp64_compute_fp32.cpp
)
add_dependencies
(
example_contraction_scale example_contraction_scale_xdl_fp64_compute_fp32
)
# FP16
add_example_executable
(
example_contraction_bilinear_xdl_fp16_compute_fp32 contraction_bilinear_xdl_fp16_compute_fp32.cpp
)
add_dependencies
(
example_contraction_bilinear example_contraction_bilinear_xdl_fp16_compute_fp32
)
add_example_executable
(
example_contraction_scale_xdl_fp16_compute_fp32 contraction_scale_xdl_fp16_compute_fp32.cpp
)
add_dependencies
(
example_contraction_scale example_contraction_scale_xdl_fp16_compute_fp32
)
# BF16
add_example_executable
(
example_contraction_bilinear_xdl_bf16_compute_fp32 contraction_bilinear_xdl_bf16_compute_fp32.cpp
)
add_dependencies
(
example_contraction_bilinear example_contraction_bilinear_xdl_bf16_compute_fp32
)
add_example_executable
(
example_contraction_scale_xdl_bf16_compute_fp32 contraction_scale_xdl_bf16_compute_fp32.cpp
)
add_dependencies
(
example_contraction_scale example_contraction_scale_xdl_bf16_compute_fp32
)
add_dependencies
(
example_contraction example_contraction_scale
)
add_dependencies
(
example_contraction example_contraction_bilinear
)
example/26_contraction/common_instances.hpp
0 → 100644
View file @
7b7a3978
This diff is collapsed.
Click to expand it.
example/26_contraction/contraction_bilinear_xdl_bf16_compute_fp32.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/ck.hpp"
#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
#include "common_instances.hpp"
using
ADataType
=
BF16
;
using
BDataType
=
BF16
;
using
AccDataType
=
F32
;
using
CShuffleDataType
=
BF16
;
using
DDataType
=
BF16
;
using
DsDataType
=
ck
::
Tuple
<
DDataType
>
;
using
EDataType
=
BF16
;
using
ComputeDataType
=
F32
;
static
constexpr
ck
::
index_t
NumDimM
=
2
;
static
constexpr
ck
::
index_t
NumDimN
=
2
;
static
constexpr
ck
::
index_t
NumDimK
=
2
;
using
AElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
BElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
CDEElementOp
=
ck
::
tensor_operation
::
element_wise
::
Bilinear
;
using
DeviceOpInstanceKKNN
=
DeviceOpInstanceKK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceKNNN
=
DeviceOpInstanceKN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMKNN
=
DeviceOpInstanceMK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMNNN
=
DeviceOpInstanceMN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstance
=
DeviceOpInstanceKKNN
;
#include "run_contraction_bilinear_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
run_contraction_bilinear_example
(
argc
,
argv
);
}
example/26_contraction/contraction_bilinear_xdl_fp16_compute_fp32.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/ck.hpp"
#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
#include "common_instances.hpp"
using
ADataType
=
F16
;
using
BDataType
=
F16
;
using
AccDataType
=
F32
;
using
CShuffleDataType
=
F16
;
using
DDataType
=
F16
;
using
DsDataType
=
ck
::
Tuple
<
DDataType
>
;
using
EDataType
=
F16
;
using
ComputeDataType
=
F32
;
static
constexpr
ck
::
index_t
NumDimM
=
2
;
static
constexpr
ck
::
index_t
NumDimN
=
2
;
static
constexpr
ck
::
index_t
NumDimK
=
2
;
using
AElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
BElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
CDEElementOp
=
ck
::
tensor_operation
::
element_wise
::
Bilinear
;
using
DeviceOpInstanceKKNN
=
DeviceOpInstanceKK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceKNNN
=
DeviceOpInstanceKN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMKNN
=
DeviceOpInstanceMK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMNNN
=
DeviceOpInstanceMN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstance
=
DeviceOpInstanceKKNN
;
#include "run_contraction_bilinear_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
run_contraction_bilinear_example
(
argc
,
argv
);
}
example/26_contraction/contraction_bilinear_xdl_fp32.cpp
View file @
7b7a3978
This diff is collapsed.
Click to expand it.
example/26_contraction/contraction_bilinear_xdl_fp32_compute_bf16.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/ck.hpp"
#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
#include "common_instances.hpp"
using
ADataType
=
F32
;
using
BDataType
=
F32
;
using
AccDataType
=
F32
;
using
CShuffleDataType
=
F32
;
using
DDataType
=
F32
;
using
DsDataType
=
ck
::
Tuple
<
DDataType
>
;
using
EDataType
=
F32
;
using
ComputeDataType
=
BF16
;
static
constexpr
ck
::
index_t
NumDimM
=
2
;
static
constexpr
ck
::
index_t
NumDimN
=
2
;
static
constexpr
ck
::
index_t
NumDimK
=
2
;
using
AElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
BElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
CDEElementOp
=
ck
::
tensor_operation
::
element_wise
::
Bilinear
;
using
DeviceOpInstanceKKNN
=
DeviceOpInstanceKK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceKNNN
=
DeviceOpInstanceKN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMKNN
=
DeviceOpInstanceMK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMNNN
=
DeviceOpInstanceMN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstance
=
DeviceOpInstanceKKNN
;
#include "run_contraction_bilinear_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
run_contraction_bilinear_example
(
argc
,
argv
);
}
example/26_contraction/contraction_bilinear_xdl_fp32_compute_fp16.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/ck.hpp"
#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
#include "common_instances.hpp"
using
ADataType
=
F32
;
using
BDataType
=
F32
;
using
AccDataType
=
F32
;
using
CShuffleDataType
=
F32
;
using
DDataType
=
F32
;
using
DsDataType
=
ck
::
Tuple
<
DDataType
>
;
using
EDataType
=
F32
;
using
ComputeDataType
=
F16
;
static
constexpr
ck
::
index_t
NumDimM
=
2
;
static
constexpr
ck
::
index_t
NumDimN
=
2
;
static
constexpr
ck
::
index_t
NumDimK
=
2
;
using
AElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
BElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
CDEElementOp
=
ck
::
tensor_operation
::
element_wise
::
Bilinear
;
using
DeviceOpInstanceKKNN
=
DeviceOpInstanceKK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceKNNN
=
DeviceOpInstanceKN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMKNN
=
DeviceOpInstanceMK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMNNN
=
DeviceOpInstanceMN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstance
=
DeviceOpInstanceKKNN
;
#include "run_contraction_bilinear_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
run_contraction_bilinear_example
(
argc
,
argv
);
}
example/26_contraction/contraction_bilinear_xdl_fp64.cpp
View file @
7b7a3978
This diff is collapsed.
Click to expand it.
example/26_contraction/contraction_bilinear_xdl_fp64_compute_fp32.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/ck.hpp"
#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
#include "common_instances.hpp"
using
ADataType
=
F64
;
using
BDataType
=
F64
;
using
AccDataType
=
F32
;
using
CShuffleDataType
=
F64
;
using
DDataType
=
F64
;
using
DsDataType
=
ck
::
Tuple
<
DDataType
>
;
using
EDataType
=
F64
;
using
ComputeDataType
=
F32
;
static
constexpr
ck
::
index_t
NumDimM
=
2
;
static
constexpr
ck
::
index_t
NumDimN
=
2
;
static
constexpr
ck
::
index_t
NumDimK
=
2
;
using
AElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
BElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
CDEElementOp
=
ck
::
tensor_operation
::
element_wise
::
Bilinear
;
using
DeviceOpInstanceKKNN
=
DeviceOpInstanceKK_FP64
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceKNNN
=
DeviceOpInstanceKN_FP64
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMKNN
=
DeviceOpInstanceMK_FP64
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMNNN
=
DeviceOpInstanceMN_FP64
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstance
=
DeviceOpInstanceKKNN
;
#include "run_contraction_bilinear_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
run_contraction_bilinear_example
(
argc
,
argv
);
}
example/26_contraction/contraction_scale_xdl_bf16_compute_fp32.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/ck.hpp"
#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
#include "common_instances.hpp"
using
ADataType
=
BF16
;
using
BDataType
=
BF16
;
using
AccDataType
=
F32
;
using
CShuffleDataType
=
BF16
;
using
DsDataType
=
ck
::
Tuple
<>
;
using
EDataType
=
BF16
;
using
ComputeDataType
=
F32
;
static
constexpr
ck
::
index_t
NumDimM
=
2
;
static
constexpr
ck
::
index_t
NumDimN
=
2
;
static
constexpr
ck
::
index_t
NumDimK
=
2
;
using
AElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
BElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
CDEElementOp
=
ck
::
tensor_operation
::
element_wise
::
Scale
;
using
DeviceOpInstanceKKN
=
DeviceOpInstanceKK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceKNN
=
DeviceOpInstanceKN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMKN
=
DeviceOpInstanceMK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMNN
=
DeviceOpInstanceMN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstance
=
DeviceOpInstanceKKN
;
#include "run_contraction_scale_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
run_contraction_scale_example
(
argc
,
argv
);
}
example/26_contraction/contraction_scale_xdl_fp16_compute_fp32.cpp
0 → 100644
View file @
7b7a3978
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
#include "ck/ck.hpp"
#include "ck/tensor_operation/gpu/element/element_wise_operation.hpp"
#include "common_instances.hpp"
using
ADataType
=
F16
;
using
BDataType
=
F16
;
using
AccDataType
=
F32
;
using
CShuffleDataType
=
F16
;
using
DsDataType
=
ck
::
Tuple
<>
;
using
EDataType
=
F16
;
using
ComputeDataType
=
F32
;
static
constexpr
ck
::
index_t
NumDimM
=
2
;
static
constexpr
ck
::
index_t
NumDimN
=
2
;
static
constexpr
ck
::
index_t
NumDimK
=
2
;
using
AElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
BElementOp
=
ck
::
tensor_operation
::
element_wise
::
PassThrough
;
using
CDEElementOp
=
ck
::
tensor_operation
::
element_wise
::
Scale
;
using
DeviceOpInstanceKKN
=
DeviceOpInstanceKK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceKNN
=
DeviceOpInstanceKN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMKN
=
DeviceOpInstanceMK_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstanceMNN
=
DeviceOpInstanceMN_Generic
<
NumDimM
,
NumDimN
,
NumDimK
,
ADataType
,
BDataType
,
AccDataType
,
CShuffleDataType
,
DsDataType
,
EDataType
,
ComputeDataType
,
AElementOp
,
BElementOp
,
CDEElementOp
>
;
using
DeviceOpInstance
=
DeviceOpInstanceKKN
;
#include "run_contraction_scale_example.inc"
int
main
(
int
argc
,
char
*
argv
[])
{
return
run_contraction_scale_example
(
argc
,
argv
);
}
example/26_contraction/contraction_scale_xdl_fp32.cpp
View file @
7b7a3978
This diff is collapsed.
Click to expand it.
Prev
1
2
3
4
5
…
10
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment