Commit 55a89c74 authored by Jun Liu's avatar Jun Liu
Browse files

Merge branch 'develop' into amd-develop

parents 0dacd895 dcedf363
blank_issues_enabled: true
name: Issue Report
description: File a report for ROCm related issues on Linux and Windows. For issues pertaining to documentation or non-bug related, please open a blank issue located below.
title: "[Issue]: "
body:
- type: markdown
attributes:
value: |
Thank you for taking the time to fill out this report!
You can acquire your OS, CPU, GPU (for filling out this report) with the following commands:
Linux:
echo "OS:" && cat /etc/os-release | grep -E "^(NAME=|VERSION=)";
echo "CPU: " && cat /proc/cpuinfo | grep "model name" | sort --unique;
echo "GPU:" && /opt/rocm/bin/rocminfo | grep -E "^\s*(Name|Marketing Name)";
Windows:
(Get-WmiObject Win32_OperatingSystem).Version
(Get-WmiObject win32_Processor).Name
(Get-WmiObject win32_VideoController).Name
- type: textarea
attributes:
label: Problem Description
description: Describe the issue you encountered.
validations:
required: true
- type: input
attributes:
label: Operating System
description: What is the name and version number of the OS?
placeholder: "e.g. Ubuntu 22.04.3 LTS (Jammy Jellyfish)"
validations:
required: true
- type: input
attributes:
label: CPU
description: What CPU did you encounter the issue on?
placeholder: "e.g. AMD Ryzen 9 5900HX with Radeon Graphics"
validations:
required: true
- type: dropdown
attributes:
label: GPU
description: What GPU(s) did you encounter the issue on (you can select multiple GPUs from the list)
multiple: true
options:
- AMD Instinct MI300X
- AMD Instinct MI300A
- AMD Instinct MI300
- AMD Instinct MI250X
- AMD Instinct MI250
- AMD Instinct MI210
- AMD Instinct MI100
- AMD Instinct MI50
- AMD Instinct MI25
- AMD Radeon Pro V620
- AMD Radeon Pro VII
- AMD Radeon RX 7900 XTX
- AMD Radeon VII
- AMD Radeon Pro W7900
- AMD Radeon Pro W7800
- AMD Radeon Pro W6800
- AMD Radeon Pro W6600
- AMD Radeon Pro W5500
- AMD Radeon RX 7900 XT
- AMD Radeon RX 7600
- AMD Radeon RX 6950 XT
- AMD Radeon RX 6900 XT
- AMD Radeon RX 6800 XT
- AMD Radeon RX 6800
- AMD Radeon RX 6750
- AMD Radeon RX 6700 XT
- AMD Radeon RX 6700
- AMD Radeon RX 6650 XT
- AMD Radeon RX 6600 XT
- AMD Radeon RX 6600
- Other
validations:
required: true
- type: input
attributes:
label: Other
description: If you selected Other, please specify
- type: dropdown
attributes:
label: ROCm Version
description: What version(s) of ROCm did you encounter the issue on?
multiple: true
options:
- ROCm 6.0.0
- ROCm 5.7.1
- ROCm 5.7.0
- ROCm 5.6.1
- ROCm 5.6.0
- ROCm 5.5.1
- ROCm 5.5.0
validations:
required: true
- type: dropdown
attributes:
label: ROCm Component
description: (Optional) If this issue relates to a specific ROCm component, it can be mentioned here.
multiple: true
options:
- Other
- AMD Common Language Runtime
- AMD MIGraphX
- AMD System Management Interface
- amdgpu KCL/autoconf
- amdgpu Kernel-mode GPU Driver
- amdgpu-install
- AOMP
- AOMP Extras
- AqlProfile
- build-infra
- chelsio
- clang-ocl
- Composable Kernel
- dkms
- docker / ROCm-docker
- flang
- gpuburn
- half
- HIP
- HIP Examples
- hipBLAS
- hipBLASLt
- HIPCC
- hipCUB
- hip-examples-private
- hipFFT
- hipfort
- HIPIFY
- hipRAND
- hipSOLVER
- hipSPARSE
- hipSPARSELt
- hipTensor
- hip-tests
- HSA Runtime
- infrastructure
- jenkins-utils
- libdrm
- Linux BPI packaging framework
- llvm-project
- Mesa
- meta
- MIOpen
- MIVisionX
- ml-framework-ci
- MLSEQA_TestRepo
- OpenCL API C++ Bindings
- OpenCL API Headers
- OpenCL Conformance Test Suite
- OpenCL ICD Loader
- perftest-p2p
- prototype
- RCCL
- rccl-rdma-sharp-plugins
- rocALUTION
- rocBLAS
- ROCdbgapi
- ROCdebug-agent
- rocFFT
- ROCgdb
- ROCK
- ROCm Documentation/Website
- ROCm Data Center Tool
- ROCm Examples
- ROCm for Windows
- ROCm Performance Primitives
- ROCm System Management Interface Library
- ROCm Thrust
- ROCm Validation Suite
- rocm_bandwidth_test
- rocm-cmake
- rocm-core
- rocm-docs-core
- rocminfo
- rocMLIR
- rocmtools
- rocPRIM
- rocprofiler
- rocRAND
- ROCR-Runtime
- rocSOLVER
- rocSPARSE
- roctracer
- ROCT-Thunk-Interface
- rocWMMA
- Tensile
- umr
- ibv_rc_pingpong-amd
- mellanox
- mpitest
- Pytorch
- Tensorflow
- APEX
- torchvision
- Magma
- type: textarea
attributes:
label: Steps to Reproduce
description: (Optional) Detailed steps to reproduce the issue.
validations:
required: false
- type: textarea
attributes:
label: (Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
description: The output of rocminfo --support could help to better address the problem.
validations:
required: false
- type: textarea
attributes:
label: Additional Information
description: (Optional) Any additional information that is relevant, e.g. relevant environment variables, dockerfiles, log files, dmesg output (on Linux), etc.
validations:
required: false
......@@ -10,3 +10,9 @@ updates:
open-pull-requests-limit: 10
schedule:
interval: "daily"
labels:
- "documentation"
- "dependencies"
- "ci:docs-only"
reviewers:
- "samjwu"
......@@ -54,5 +54,4 @@ _images/
_static/
_templates/
_toc.yml
docBin/
_doxygen/
......@@ -3,11 +3,6 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.8"
sphinx:
configuration: docs/conf.py
......@@ -16,3 +11,8 @@ formats: [htmlzip, pdf, epub]
python:
install:
- requirements: docs/sphinx/requirements.txt
build:
os: ubuntu-22.04
tools:
python: "3.8"
......@@ -19,6 +19,7 @@ None
- Support for NHWGC (2D and 3D) grouped convolution backward weight (#769 #804)
- Support for bf16/f32/f16 and NHWGC (2D and 3D) grouped convolution backward data (#757 #799)
- Support for Batched Gemm DL (#732)
- Introduce wrapper sublibrary (limited functionality). (#1071, #1098)
### Changes
- Changed the grouped convolution API to maintain consistency with other convolution kernels (#817)
......
......@@ -59,9 +59,9 @@ authors:
family-names: Zhou
- given-names: Jianfeng
family-names: Yan
repository-code: 'https://github.com/ROCmSoftwarePlatform/composable_kernel'
repository-code: 'https://github.com/ROCm/composable_kernel'
abstract: Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel progarmming languages, like HIP C++.
keywords:
- 'CK, Composable Kernel, Tensor Coordinate Transformation'
license: MIT
license-url: https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/7fc3ed761aa35709d87c8fbbe41dd368648b3541/LICENSE
license-url: https://github.com/ROCm/composable_kernel/blob/7fc3ed761aa35709d87c8fbbe41dd368648b3541/LICENSE
......@@ -146,6 +146,33 @@ if(${hip_VERSION_FLAT} GREATER 500723302)
add_compile_options(-fno-offload-uniform-block)
endif()
#
# Seperate linking jobs from compiling
# Too many concurrent linking jobs can break the build
# Copied from LLVM
set(CK_PARALLEL_LINK_JOBS "" CACHE STRING
"Define the maximum number of concurrent link jobs (Ninja only).")
if(CMAKE_GENERATOR MATCHES "Ninja")
if(CK_PARALLEL_LINK_JOBS)
set_property(GLOBAL APPEND PROPERTY JOB_POOLS link_job_pool=${CK_PARALLEL_LINK_JOBS})
set(CMAKE_JOB_POOL_LINK link_job_pool)
endif()
elseif(CK_PARALLEL_LINK_JOBS)
message(WARNING "Job pooling is only available with Ninja generators.")
endif()
# Similar for compiling
set(CK_PARALLEL_COMPILE_JOBS "" CACHE STRING
"Define the maximum number of concurrent compile jobs (Ninja only).")
if(CMAKE_GENERATOR MATCHES "Ninja")
if(CK_PARALLEL_COMPILE_JOBS)
set_property(GLOBAL APPEND PROPERTY JOB_POOLS compile_job_pool=${CK_PARALLEL_COMPILE_JOBS})
set(CMAKE_JOB_POOL_COMPILE compile_job_pool)
endif()
elseif(CK_PARALLEL_COMPILE_JOBS)
message(WARNING "Job pooling is only available with Ninja generators.")
endif()
option(USE_BITINT_EXTENSION_INT4, "Whether to enable clang's BitInt extension to provide int4 data type." OFF)
option(USE_OPT_NAVI3X, "Whether to enable LDS cumode and Wavefront32 mode for NAVI3X silicons." OFF)
......
FROM ubuntu:20.04
ARG DEBIAN_FRONTEND=noninteractive
ARG ROCMVERSION=5.7
ARG ROCMVERSION=6.0
ARG compiler_version=""
ARG compiler_commit=""
......@@ -16,8 +16,8 @@ RUN apt-get install -y --allow-unauthenticated apt-utils wget gnupg2 curl
ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn
RUN curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key | gpg --dearmor -o /etc/apt/trusted.gpg.d/rocm-keyring.gpg
RUN wget https://repo.radeon.com/amdgpu-install/5.7/ubuntu/focal/amdgpu-install_5.7.50700-1_all.deb --no-check-certificate
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ./amdgpu-install_5.7.50700-1_all.deb
RUN wget https://repo.radeon.com/amdgpu-install/6.0/ubuntu/focal/amdgpu-install_6.0.60000-1_all.deb --no-check-certificate
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ./amdgpu-install_6.0.60000-1_all.deb
RUN wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - && \
sh -c "echo deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] $DEB_ROCM_REPO focal main > /etc/apt/sources.list.d/rocm.list" && \
......
......@@ -33,7 +33,7 @@ def runShell(String command){
def getDockerImageName(){
def img
if (params.ROCMVERSION != "6.0"){
if (params.ROCMVERSION != "6.1"){
if (params.COMPILER_VERSION == "") {
img = "${env.CK_DOCKERHUB}:ck_ub20.04_rocm${params.ROCMVERSION}"
}
......@@ -302,7 +302,7 @@ def buildHipClangJob(Map conf=[:]){
def retimage
(retimage, image) = getDockerImage(conf)
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCmSoftwarePlatform', repo: 'composable_kernel') {
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCm', repo: 'composable_kernel') {
withDockerContainer(image: image, args: dockerOpts + ' -v=/var/jenkins/:/var/jenkins') {
timeout(time: 5, unit: 'HOURS')
{
......@@ -355,7 +355,7 @@ def runCKProfiler(Map conf=[:]){
def variant = env.STAGE_NAME
def retimage
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCmSoftwarePlatform', repo: 'composable_kernel') {
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCm', repo: 'composable_kernel') {
try {
(retimage, image) = getDockerImage(conf)
withDockerContainer(image: image, args: dockerOpts) {
......@@ -487,7 +487,7 @@ def Build_CK(Map conf=[:]){
def retimage
def navi_node = 0
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCmSoftwarePlatform', repo: 'composable_kernel') {
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCm', repo: 'composable_kernel') {
try {
(retimage, image) = getDockerImage(conf)
withDockerContainer(image: image, args: dockerOpts) {
......@@ -553,7 +553,7 @@ def Build_CK(Map conf=[:]){
sh """#!/bin/bash
rm -rf "${params.hipTensor_branch}".zip
rm -rf hipTensor-"${params.hipTensor_branch}"
wget https://github.com/ROCmSoftwarePlatform/hipTensor/archive/refs/heads/"${params.hipTensor_branch}".zip
wget https://github.com/ROCm/hipTensor/archive/refs/heads/"${params.hipTensor_branch}".zip
unzip -o "${params.hipTensor_branch}".zip
"""
dir("hipTensor-${params.hipTensor_branch}"){
......@@ -605,7 +605,7 @@ def process_results(Map conf=[:]){
def variant = env.STAGE_NAME
def retimage
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCmSoftwarePlatform', repo: 'composable_kernel') {
gitStatusWrapper(credentialsId: "${status_wrapper_creds}", gitHubContext: "Jenkins - ${variant}", account: 'ROCm', repo: 'composable_kernel') {
try {
(retimage, image) = getDockerImage(conf)
}
......@@ -655,8 +655,8 @@ def process_results(Map conf=[:]){
}
//launch develop branch daily at 23:00 UT in FULL_QA mode and at 19:00 UT with latest staging compiler version
CRON_SETTINGS = BRANCH_NAME == "develop" ? '''0 23 * * * % RUN_FULL_QA=true;ROCMVERSION=5.7;COMPILER_VERSION=
0 21 * * * % ROCMVERSION=5.7;COMPILER_VERSION=;COMPILER_COMMIT=
CRON_SETTINGS = BRANCH_NAME == "develop" ? '''0 23 * * * % RUN_FULL_QA=true;ROCMVERSION=6.0;COMPILER_VERSION=
0 21 * * * % ROCMVERSION=6.0;COMPILER_VERSION=;COMPILER_COMMIT=
0 19 * * * % BUILD_DOCKER=true;DL_KERNELS=true;COMPILER_VERSION=amd-stg-open;COMPILER_COMMIT=;USE_SCCACHE=false
0 17 * * * % BUILD_DOCKER=true;DL_KERNELS=true;COMPILER_VERSION=amd-mainline-open;COMPILER_COMMIT=;USE_SCCACHE=false''' : ""
......@@ -675,8 +675,8 @@ pipeline {
description: "Force building docker image (default: false), set to true if docker image needs to be updated.")
string(
name: 'ROCMVERSION',
defaultValue: '5.7',
description: 'Specify which ROCM version to use: 5.7 (default).')
defaultValue: '6.0',
description: 'Specify which ROCM version to use: 6.0 (default).')
string(
name: 'COMPILER_VERSION',
defaultValue: '',
......@@ -703,8 +703,8 @@ pipeline {
description: "Use the CK build to verify hipTensor build and tests (default: ON)")
string(
name: 'hipTensor_branch',
defaultValue: 'develop',
description: 'Specify which branch of hipTensor to use (default: develop)')
defaultValue: 'mainline',
description: 'Specify which branch of hipTensor to use (default: mainline)')
booleanParam(
name: "USE_SCCACHE",
defaultValue: true,
......@@ -768,8 +768,15 @@ pipeline {
}
agent{ label rocmnode("gfx908 || gfx90a") }
environment{
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941;gfx942" -DCMAKE_EXE_LINKER_FLAGS=" -L ${env.WORKSPACE}/script -T hip_fatbin_insert " """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && cmake -D CMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" -DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941;gfx942" -D CMAKE_CXX_COMPILER="${build_compiler()}" .. && make -j """
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install \
-DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941;gfx942" \
-DCMAKE_EXE_LINKER_FLAGS=" -L ${env.WORKSPACE}/script -T hip_fatbin_insert " \
-DCMAKE_CXX_FLAGS=" -O3 " """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && \
cmake -DCMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" \
-DGPU_TARGETS="gfx908;gfx90a;gfx940;gfx941;gfx942" \
-DCMAKE_CXX_COMPILER="${build_compiler()}" \
-DCMAKE_CXX_FLAGS=" -O3 " .. && make -j """
}
steps{
Build_CK_and_Reboot(setup_args: setup_args, config_targets: "install", no_reboot:true, build_type: 'Release', execute_cmd: execute_args, prefixpath: '/usr/local')
......@@ -784,8 +791,12 @@ pipeline {
}
agent{ label rocmnode("gfx908 || gfx90a") }
environment{
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx908;gfx90a" """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && cmake -D CMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" -DGPU_TARGETS="gfx908;gfx90a" -D CMAKE_CXX_COMPILER="${build_compiler()}" .. && make -j """
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx908;gfx90a" -DCMAKE_CXX_FLAGS=" -O3 " """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && \
cmake -DCMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" \
-DGPU_TARGETS="gfx908;gfx90a" \
-DCMAKE_CXX_COMPILER="${build_compiler()}" \
-DCMAKE_CXX_FLAGS=" -O3 " .. && make -j """
}
steps{
Build_CK_and_Reboot(setup_args: setup_args, config_targets: "install", no_reboot:true, build_type: 'Release', execute_cmd: execute_args, prefixpath: '/usr/local')
......@@ -800,8 +811,12 @@ pipeline {
}
agent{ label rocmnode("navi21") }
environment{
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx1030" -DDL_KERNELS=ON """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && cmake -D CMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" -DGPU_TARGETS="gfx1030" -D CMAKE_CXX_COMPILER="${build_compiler()}" .. && make -j """
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx1030" -DDL_KERNELS=ON -DCMAKE_CXX_FLAGS=" -O3 " """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && \
cmake -DCMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" \
-DGPU_TARGETS="gfx1030" \
-DCMAKE_CXX_COMPILER="${build_compiler()}" \
-DCMAKE_CXX_FLAGS=" -O3 " .. && make -j """
}
steps{
Build_CK_and_Reboot(setup_args: setup_args, config_targets: "install", no_reboot:true, build_type: 'Release', execute_cmd: execute_args, prefixpath: '/usr/local')
......@@ -816,8 +831,12 @@ pipeline {
}
agent{ label rocmnode("navi32") }
environment{
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx1101" -DDL_KERNELS=ON """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && cmake -D CMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" -DGPU_TARGETS="gfx1101" -DDL_KERNELS=ON -D CMAKE_CXX_COMPILER="${build_compiler()}" .. && make -j """
setup_args = """ -DCMAKE_INSTALL_PREFIX=../install -DGPU_TARGETS="gfx1101" -DDL_KERNELS=ON -DCMAKE_CXX_FLAGS=" -O3 " """
execute_args = """ cd ../client_example && rm -rf build && mkdir build && cd build && \
cmake -DCMAKE_PREFIX_PATH="${env.WORKSPACE}/install;/opt/rocm" \
-DGPU_TARGETS="gfx1101" \
-DCMAKE_CXX_COMPILER="${build_compiler()}" \
-DCMAKE_CXX_FLAGS=" -O3 " .. && make -j """
}
steps{
Build_CK_and_Reboot(setup_args: setup_args, config_targets: "install", no_reboot:true, build_type: 'Release', execute_cmd: execute_args, prefixpath: '/usr/local')
......
......@@ -32,7 +32,6 @@ python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
You can find a list of our developers and contributors on our [Contributors](/CONTRIBUTORS.md) page.
page.
```note
If you use CK, cite us as follows:
......@@ -71,7 +70,7 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
3. Clone CK source code from the GitHub repository and start the build:
```bash
git clone https://github.com/ROCmSoftwarePlatform/composable_kernel.git && \
git clone https://github.com/ROCm/composable_kernel.git && \
cd composable_kernel && \
mkdir build && \
cd build
......
......@@ -16,6 +16,7 @@
using InLayout = ck::tensor_layout::convolution::NDHWGC;
using WeiLayout = ck::tensor_layout::convolution::GKZYXC;
using OutLayout = ck::tensor_layout::convolution::NDHWGK;
using BiasLayout = ck::tensor_layout::convolution::G_K;
using PassThrough = ck::tensor_operation::element_wise::PassThrough;
using ScaleAddScaleAddRelu = ck::tensor_operation::element_wise::ScaleAddScaleAddRelu;
......@@ -64,6 +65,9 @@ int execute_conv_fwd_scaleadd_scaleadd_relu()
std::array<ck::index_t, 6> out_lengths{G, N, K, Do, Ho, Wo};
std::array<ck::index_t, 6> out_strides{
K, Do * Ho * Wo * G * K, 1, Ho * Wo * G * K, Wo * G * K, G * K};
// Logical broadcast bias (we have to pass bias lengths in the same format as output - GNKDHW)
std::array<ck::index_t, 6> bias_lengths{G, 1, K, 1, 1, 1};
std::array<ck::index_t, 6> bias_strides{K, 0, 1, 0, 0, 0};
std::array<ck::index_t, NumDimSpatial> filter_strides{1, 1, 1};
std::array<ck::index_t, NumDimSpatial> filter_dilations{1, 1, 1};
......@@ -74,13 +78,13 @@ int execute_conv_fwd_scaleadd_scaleadd_relu()
SimpleDeviceMem wei(sizeof(WeiDataType) * G * K * Z * Y * X * C);
SimpleDeviceMem out(sizeof(OutDataType) * N * Do * Ho * Wo * G * K);
SimpleDeviceMem d0(sizeof(std::tuple_element_t<0, DDataTypes>) * N * Do * Ho * Wo * G * K);
SimpleDeviceMem d1(sizeof(std::tuple_element_t<1, DDataTypes>) * N * Do * Ho * Wo * G * K);
SimpleDeviceMem d1(sizeof(std::tuple_element_t<1, DDataTypes>) * G * K);
using DeviceOp = ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<
NumDimSpatial,
InLayout,
WeiLayout,
ck::Tuple<OutLayout, OutLayout>,
ck::Tuple<OutLayout, BiasLayout>,
OutLayout,
InDataType,
WeiDataType,
......@@ -117,8 +121,8 @@ int execute_conv_fwd_scaleadd_scaleadd_relu()
in_strides,
wei_lengths,
wei_strides,
{out_lengths, out_lengths},
{out_strides, out_strides},
{out_lengths, bias_lengths},
{out_strides, bias_strides},
out_lengths,
out_strides,
filter_strides,
......@@ -187,8 +191,8 @@ int execute_conv_fwd_scaleadd_scaleadd_relu()
in_strides,
wei_lengths,
wei_strides,
{out_lengths, out_lengths},
{out_strides, out_strides},
{out_lengths, bias_lengths},
{out_strides, bias_strides},
out_lengths,
out_strides,
filter_strides,
......
add_executable(client_tensor_transform tensor_transform.cpp)
target_link_libraries(client_tensor_transform PRIVATE composable_kernel::device_other_operations)
add_executable(client_tensor_transform_using_wrapper tensor_transform_using_wrapper.cpp)
target_link_libraries(client_tensor_transform_using_wrapper PRIVATE composable_kernel::device_other_operations)
......@@ -9,7 +9,7 @@
#include "ck/utility/tuple.hpp"
#include "ck/utility/sequence.hpp"
#include "tensor_transform_wrapper.hpp"
#include "ck/wrapper/layout.hpp"
using DataType = int;
......@@ -17,7 +17,7 @@ template <typename Layout>
void Print1d(const Layout& layout)
{
std::cout << "Print1d" << std::endl;
for(ck::index_t w = 0; w < ck::tensor_transform_wrapper::size(layout); w++)
for(ck::index_t w = 0; w < ck::wrapper::size(layout); w++)
{
std::cout << layout(ck::make_tuple(w)) << " ";
}
......@@ -28,9 +28,9 @@ template <typename Layout>
void Print2d(const Layout& layout)
{
std::cout << "Print2d" << std::endl;
for(ck::index_t h = 0; h < ck::tensor_transform_wrapper::size<0>(layout); h++)
for(ck::index_t h = 0; h < ck::wrapper::size<0>(layout); h++)
{
for(ck::index_t w = 0; w < ck::tensor_transform_wrapper::size<1>(layout); w++)
for(ck::index_t w = 0; w < ck::wrapper::size<1>(layout); w++)
{
std::cout << layout(ck::make_tuple(h, w)) << " ";
}
......@@ -43,15 +43,11 @@ template <typename Layout>
void Print3dCustom(const Layout& layout)
{
std::cout << "Print3dCustom" << std::endl;
for(ck::index_t d = 0;
d < ck::tensor_transform_wrapper::size<0>(ck::tensor_transform_wrapper::get<0>(layout));
d++)
for(ck::index_t d = 0; d < ck::wrapper::size<0>(ck::wrapper::get<0>(layout)); d++)
{
for(ck::index_t h = 0;
h < ck::tensor_transform_wrapper::size<1>(ck::tensor_transform_wrapper::get<0>(layout));
h++)
for(ck::index_t h = 0; h < ck::wrapper::size<1>(ck::wrapper::get<0>(layout)); h++)
{
for(ck::index_t w = 0; w < ck::tensor_transform_wrapper::size<1>(layout); w++)
for(ck::index_t w = 0; w < ck::wrapper::size<1>(layout); w++)
{
std::cout << layout(ck::make_tuple(ck::make_tuple(d, h), w)) << " ";
}
......@@ -68,7 +64,7 @@ int main()
// Basic descriptor 0, 1, 2, ... 30, 31 (compile-time descriptor)
// (dims:4,8 strides:1,4)
const auto shape_4x8 = ck::make_tuple(ck::Number<4>{}, ck::Number<8>{});
const auto layout_4x8_s1x4 = ck::tensor_transform_wrapper::make_layout(shape_4x8);
const auto layout_4x8_s1x4 = ck::wrapper::make_layout(shape_4x8);
std::cout << "dims:4,8 strides:1,4" << std::endl;
Print2d(layout_4x8_s1x4);
using Cord1x1Type = ck::Tuple<ck::Number<1>, ck::Number<1>>;
......@@ -79,8 +75,7 @@ int main()
// dims:4,(2,4) strides:2,(1,8)
const auto shape_4x2x4 = ck::make_tuple(4, ck::make_tuple(2, 4));
const auto strides_s2x1x8 = ck::make_tuple(2, ck::make_tuple(1, 8));
const auto layout_4x2x4_s2x1x8 =
ck::tensor_transform_wrapper::make_layout(shape_4x2x4, strides_s2x1x8);
const auto layout_4x2x4_s2x1x8 = ck::wrapper::make_layout(shape_4x2x4, strides_s2x1x8);
std::cout << "dims:4,(2,4) strides:2,(1,8)" << std::endl;
Print2d(layout_4x2x4_s2x1x8);
......@@ -92,7 +87,7 @@ int main()
const auto strides_s1x4x2x8 = ck::make_tuple(ck::make_tuple(ck::Number<1>{}, ck::Number<4>{}),
ck::make_tuple(ck::Number<2>{}, ck::Number<8>{}));
static const auto layout_2x2x2x4_s1x4x2x8 =
ck::tensor_transform_wrapper::make_layout(shape_2x2x2x4, strides_s1x4x2x8);
ck::wrapper::make_layout(shape_2x2x2x4, strides_s1x4x2x8);
std::cout << "dims:(2,2),(2,4) strides:(1,4),(2,8)" << std::endl;
Print2d(layout_2x2x2x4_s1x4x2x8);
......@@ -108,7 +103,7 @@ int main()
ck::make_tuple(ck::make_tuple(ck::Number<1>{}, ck::Number<4>{}), ck::Number<2>{}),
ck::Number<8>{});
static const auto layout_2x2x2x4_s1x4x2x8_nested =
ck::tensor_transform_wrapper::make_layout(shape_2x2x2x4_nested, strides_s1x4x2x8_nested);
ck::wrapper::make_layout(shape_2x2x2x4_nested, strides_s1x4x2x8_nested);
std::cout << "dims:((2,2),2),4 strides:((1,4),2),8" << std::endl;
Print1d(layout_2x2x2x4_s1x4x2x8_nested);
......
......@@ -4,23 +4,34 @@
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import subprocess
import re
from rocm_docs import ROCmDocs
html_theme_options = {"flavor": "list"}
name = "Composable Kernel"
get_version = r'sed -n -e "s/^rocm_setup_version(.* \([0-9\.]\{1,\}\).*/\1/p" ../CMakeLists.txt'
version = subprocess.getoutput(get_version)
if len(version) > 0:
name = f"{name} {version}"
with open('../CMakeLists.txt', encoding='utf-8') as f:
match = re.search(r'.*set\(version ([0-9.]+)[^0-9.]+', f.read())
if not match:
raise ValueError("VERSION not found!")
version_number = match[1]
left_nav_title = f"Composable Kernel {version_number} Documentation"
# for PDF output on Read the Docs
project = "Composable Kernel Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
version = version_number
release = version_number
external_toc_path = "./sphinx/_toc.yml"
docs_core = ROCmDocs(f"{name} Documentation")
docs_core.run_doxygen(doxygen_root="doxygen", doxygen_path="doxygen/docBin/xml")
docs_core = ROCmDocs(left_nav_title)
docs_core.run_doxygen(doxygen_root="doxygen", doxygen_path="doxygen/xml")
docs_core.setup()
external_projects_current_project = "composable_kernel"
mathjax3_config = {
'tex': {
'macros': {
......
......@@ -58,7 +58,7 @@ PROJECT_LOGO =
# entered, it will be relative to the location where doxygen was started. If
# left blank the current directory will be used.
OUTPUT_DIRECTORY = docBin
OUTPUT_DIRECTORY = .
# If the CREATE_SUBDIRS tag is set to YES then doxygen will create 4096 sub-
# directories (in 2 levels) under the output directory of each output format and
......@@ -778,7 +778,9 @@ WARN_LOGFILE =
INPUT = ../../include/ck/tensor_operation/gpu/grid \
../../include/ck/tensor_operation/gpu/block \
../../include/ck/tensor_operation/gpu/thread \
../../library/include/ck/library/utility
../../library/include/ck/library/utility \
../../include/ck/wrapper
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
......
......@@ -34,6 +34,7 @@ Current CK library are structured into 4 layers:
* "Templated Tile Operators" layer
* "Templated Kernel and Invoker" layer
* "Instantiated Kernel and Invoker" layer
* "Wrapper for tensor transform operations"
* "Client API" layer
.. image:: data/ck_layer.png
......@@ -50,6 +51,7 @@ The following is a list of CK documents in the suggested reading order:
tutorial_hello_world
dockerhub
wrapper
Supported_Primitives_Guide
API_Reference_Guide
Contributors_Guide
......@@ -5,6 +5,6 @@ defaults:
maxdepth: 6
root: index
subtrees:
- caption: About
- caption: About
entries:
- file: license
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment