Initial

ca34d4d2 · yanjl1 · ca34d4d2 · ca34d4d2 · ca34d4d2 · ca34d4d2
Commit ca34d4d2 authored Jun 02, 2026 by yanjl1
20 changed files
--- a/README.md
+++ b/README.md
+# hygon_samples
+
+本项目提供了 hipDNN（HIP Deep Neural Network）前端 API 的使用示例，覆盖海光 DCU（Deep Computing Unit）硬件上常用的深度学习算子、融合算子以及 PyTorch 集成用法。
+
+## 环境要求
+
+- **DTK 版本**：≥ 25.04.2（推荐 26.04）
+- **支持架构**：`gfx906`、`gfx926`、`gfx928`、`gfx936`、`gfx938`、`gfx92a`
+- **依赖**：`hipdnn`（Python/C++）、`hip::host`、`hipdnn_frontend`、`PyTorch`（Python 示例）
+
+所有开发和运行都需要先加载 DTK 环境：
+
+```bash
+source /data/dtk-26.04/env.sh
+```
+
+> 若未加载，C++ 编译时会报错 `Must be source dtk/env.sh`（`ROCM_PATH` 未设置）。
+
+## 目录结构
+
+```
+.
+├── cpp/          # C++ 示例（hipDNN Frontend C++ API）
+│   ├── CMakeLists.txt
+│   ├── utils.hpp              # 错误检查宏（HIP_CHECK / HIPDNN_CHECK / HIPDNN_FE_CHECK）
+│   ├── build/                 # 编译输出目录
+│   ├── convolution/           # 卷积前向/反向/权值更新
+│   ├── conv_fusion/           # 卷积融合：bias + ReLU/Swish/PReLU/Add 等
+│   ├── conv_depthtospace_fusion/   # 卷积 + DepthToSpace 融合
+│   ├── concat_conv_fusion/    # Concat + 卷积融合
+│   ├── matmul/                # 矩阵乘法
+│   ├── matmul_fusion/         # MatMul + bias + 激活
+│   ├── batchnorm/             # BatchNorm 推理/训练/反向
+│   ├── layernorm/             # LayerNorm
+│   ├── groupnorm/             # GroupNorm
+│   ├── instancenorm/          # InstanceNorm
+│   ├── rmsnorm/               # RMSNorm
+│   ├── sdpa/                  # Scaled Dot-Product Attention
+│   ├── rope/                  # RoPE（旋转位置编码）
+│   ├── deformconvolution/     # 可变形卷积
+│   ├── deformattention/       # 可变形注意力
+│   ├── adamw/                 # AdamW 优化器
+│   ├── softmax/               # Softmax
+│   ├── reduction/             # Reduce / Pointwise+Reduce
+│   ├── transpose/             # Transpose
+│   ├── pointwise/             # 逐元素二元运算
+│   ├── ctc_loss/              # CTC Loss
+│   ├── kthvalue/              # Top-K / KthValue
+│   ├── multi_margin_loss/     # MultiMarginLoss
+│   ├── soft_margin_loss/      # SoftMarginLoss
+│   ├── block_scale/           # 块量化/反量化
+│   └── ...
+├── python/       # Python 示例（hipdnn Python API + PyTorch）
+│   ├── convolution/
+│   ├── conv_fusion/
+│   ├── matmul/
+│   ├── sdpa/
+│   ├── batchnorm/
+│   ├── layernorm/
+│   ├── groupnorm/
+│   ├── adamw/
+│   ├── torch_wrapper/         # PyTorch 模块封装（如 TorchPReLU）
+│   └── ...
+└── CLAUDE.md     # 本项目开发指引
+```
+
+## 编译 C++ 示例
+
+```bash
+cd cpp/build
+cmake -G Ninja ..
+ninja
+```
+
+编译完成后，可执行文件位于 `cpp/build/bin/`。如需单独编译某个示例：
+
+```bash
+ninja conv_forward
+ninja sdpa_inference
+```
+
+> `CMakeLists.txt` 中部分示例被注释掉（如 `bn_finalize`、`block_scale_quantize`、`slice`、`rng`），如需启用请取消对应 `add_hipdnn_sample(...)` 行的注释。
+
+## 运行示例
+
+**C++ 示例：**
+
+```bash
+./cpp/build/bin/conv_forward
+./cpp/build/bin/softmax
+./cpp/build/bin/sdpa_inference
+```
+
+**Python 示例：**
+
+```bash
+cd python/softmax
+python softmax.py
+```
+
+Python 示例依赖 `import hipdnn` 和 `import torch`，张量需创建在 `device="cuda"` 上。
+
+运行前需安装 hipdnn Python whl 包（在已加载 DTK 环境的前提下）：
+
+```bash
+pip install ${ROCM_PATH}/share/hipdnn/wheels/hipdnn-*.whl
+```
+
+## 算子示例分类
+
+| 分类 | C++ 路径 | Python 路径 | 说明 |
+|------|----------|-------------|------|
+| 卷积 | `convolution/`、`conv_fusion/`、`conv_depthtospace_fusion/`、`concat_conv_fusion/` | `convolution/`、`conv_fusion/`、`conv_depthtospace_fusion/`、`concat_conv_fusion/` | 前向、反向、权值梯度、融合 bias/激活/ReLU/Swish/PReLU/INT8/DepthToSpace |
+| 矩阵乘法 | `matmul/`、`matmul_fusion/` | `matmul/`、`matmul_fusion/` | MatMul、MatMul+bias+激活 |
+| 归一化 | `batchnorm/`、`layernorm/`、`groupnorm/`、`instancenorm/`、`rmsnorm/` | `batchnorm/`、`layernorm/`、`groupnorm/`、`instancenorm/`、`rmsnorm/` | 推理、训练、反向 |
+| 注意力 | `sdpa/`、`rope/`、`deformattention/` | `sdpa/`、`rope/`、`deformattention/` | SDPA、RoPE、可变形注意力 |
+| 优化器 | `adamw/` | `adamw/` | AdamW、Transformer 调度 AdamW |
+| 融合算子 | `fusion/`、`conv_bn_fusion/` | `fusion/`、`conv_bn_fusion/` | add+layernorm、groupnorm+swish、pointwise+conv+genstats、scale/bias 融合 |
+| 量化 | `block_scale/`、`conv_fusion/Int8*` | `block_scale/`、`conv_fusion/convint8_*` | INT8 卷积、块量化/反量化 |
+| PyTorch 封装 | — | `torch_wrapper/` | `hipdnn.TorchPReLU()` 等模块级封装 |
+| 其他 | `softmax/`、`reduction/`、`transpose/`、`pointwise/`、`ctc_loss/`、`kthvalue/` 等 | `softmax/`、`reduction/`、`transpose/`、`pointwise/`、`ctc_loss/`、`kthvalue/` 等 | 常用算子及 Loss |
+
+## 快速开始
+
+1. 加载 DTK 环境：
+   ```bash
+   source /data/dtk-26.04/env.sh
+   ```
+
+2. 编译 C++ 示例并运行：
+   ```bash
+   cd cpp/build && cmake -G Ninja .. && ninja
+   ./bin/conv_forward
+   ```
+
+3. 运行 Python 示例：
+   ```bash
+   cd python/softmax
+   python softmax.py
+   ```
+
+## 常见问题排查
+
+| 现象 | 原因 | 解决方式 |
+|------|------|----------|
+| `Must be source dtk/env.sh` | `ROCM_PATH` 未设置 | 先执行 `source /data/dtk-26.04/env.sh` |
+| `hipdnn` 模块找不到 | Python 环境未加载 hipDNN | 确认 DTK 环境已加载，且 `hipdnn` 在 `PYTHONPATH` 中 |
+| CMake 找不到 `hipdnn_frontend` | hipDNN 未安装或环境未加载 | 检查 `${ROCM_PATH}/lib/cmake/hipdnn/` 是否存在 |
+| CUDA 相关报错 | PyTorch 张量未放至 GPU | 确保张量使用 `device="cuda"` |
+| 编译警告被当作错误 | CMake 开启了 `-Werror` | 修复代码中的警告，或临时在 `CMakeLists.txt` 中移除 `-Werror` |
+
+## 数据类型与布局说明
+
+- **默认数据类型**：`float`（C++）/ `torch.float32`（Python）。
+- **FP16**：使用 `hipdnn_data_sdk::types::half` / `torch.float16`。
+- **INT8**：使用 `int8_t` / `torch.int8`，并采用 **NCHWc32** 分块布局（`vector_count=32`）。INT8 示例中的量化/反量化通过显式的 SUB/MUL/DIV/ADD 节点完成。
+- **布局**：默认 NCHW；部分卷积示例使用 channels-last（`torch.channels_last`）；INT8 使用 NCHWc32。
+
+## 许可证
+
+代码文件遵循 MIT 许可证（SPDX-License-Identifier: MIT）。
--- a/cpp/.omc/sessions/eafc2bdd-f5a2-481a-9368-0fc8c1ceb3a0.json
+++ b/cpp/.omc/sessions/eafc2bdd-f5a2-481a-9368-0fc8c1ceb3a0.json
+{
+  "session_id": "eafc2bdd-f5a2-481a-9368-0fc8c1ceb3a0",
+  "ended_at": "2026-06-02T10:11:57.127Z",
+  "reason": "prompt_input_exit",
+  "agents_spawned": 0,
+  "agents_completed": 0,
+  "modes_used": []
+}
\ No newline at end of file
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
+# Copyright © Advanced Micro Devices, Inc., or its affiliates.
+# SPDX-License-Identifier:  MIT
+
+cmake_minimum_required(VERSION 3.25.2)
+
+# Enable PIC/PIE to ensure compatibility with the plugin loader system (dlopen). This prevents
+# potential Thread Local Storage (TLS) model mismatches between the executable and dynamically
+# loaded backend plugins.
+set(CMAKE_POSITION_INDEPENDENT_CODE ON)
+
+add_compile_definitions(__HIP_PLATFORM_AMD__)
+
+if(DEFINED ENV{ROCM_PATH})
+    set(ROCM_PATH "$ENV{ROCM_PATH}")
+else()
+    message(FATAL_ERROR "Must be source dtk/env.sh")
+endif()
+
+project(hipdnn_samples VERSION 0.1.0 LANGUAGES C CXX)
+include(GNUInstallDirs)
+set(CMAKE_CXX_STANDARD 17)
+
+find_package(hip REQUIRED)
+find_package(Threads REQUIRED)
+
+if(NOT TARGET hipdnn_frontend)
+    find_package(hipdnn_frontend CONFIG REQUIRED)
+endif()
+
+include_directories(${CMAKE_CURRENT_LIST_DIR})
+set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
+
+list(PREPEND HIPDNN_WARNING_COMPILE_OPTIONS
+    -Werror                                  # Treat all warnings as errors
+    -Wall                                    # Enable most common warnings
+    -Wextra                                  # Enable additional warnings not covered by -Wall
+    -Wpedantic                               # Enforce strict ISO C++ compliance
+    -Wshadow                                 # Warn about variable shadowing
+    -Wnon-virtual-dtor                       # Warn if a class with virtual functions has a non-virtual destructor
+    -Wold-style-cast                         # Warn about C-style casts
+    -Wcast-align                             # Warn about potential performance issues with misaligned casts
+    -Woverloaded-virtual                     # Warn if a base class function is hidden by a derived class function with the same name
+    -Wconversion                             # Warn about implicit type conversions that may alter a value
+    -Wsign-conversion                        # Warn about implicit conversions between signed and unsigned types
+    -Wnull-dereference                       # Warn about dereferencing null pointers
+    -Wdouble-promotion                       # Warn when a float is implicitly promoted to a double
+    -Wformat=2                               # Enable stricter format string checks
+    -Winit-self                              # Warn about variables initialized with itself
+    -Wunreachable-code                       # Warn about unreachable code
+    -Wno-return-type                         # DTK-25.04.2 need ignore
+    -Wswitch-default                         # Warn if a switch statement does not have a default case
+)
+
+function(add_hipdnn_sample NAME SOURCE)
+    add_executable(${NAME} ${SOURCE})
+    target_compile_options(${NAME} PRIVATE ${HIPDNN_WARNING_COMPILE_OPTIONS})
+    target_link_libraries(${NAME} PRIVATE hip::host Threads::Threads hipdnn_frontend)
+endfunction()
+
+add_hipdnn_sample(bn_inference batchnorm/BnInference.cpp)
+#add_hipdnn_sample(bn_finalize batchnorm/BnFinalize.cpp)
+add_hipdnn_sample(bn_training batchnorm/BnTraining.cpp)
+add_hipdnn_sample(bn_backward batchnorm/BnBackward.cpp)
+#add_hipdnn_sample(bn_backward_weight batchnorm/BnBackwardWeight.cpp)
+add_hipdnn_sample(conv_forward convolution/ConvForward.cpp)
+add_hipdnn_sample(conv_backward convolution/ConvBackward.cpp)
+add_hipdnn_sample(conv_wrw convolution/ConvBackwardWeight.cpp)
+add_hipdnn_sample(conv_bias_prelu conv_fusion/ConvBiasPrelu.cpp)
+add_hipdnn_sample(conv_bias_prelu_add conv_fusion/ConvBiasPreluAdd.cpp)
+add_hipdnn_sample(conv_bias_swish_add conv_fusion/ConvBiasSwishAdd.cpp)
+add_hipdnn_sample(conv_bias_swish conv_fusion/ConvBiasSwish.cpp)
+add_hipdnn_sample(conv_bias_relu conv_fusion/ConvBiasRelu.cpp)
+add_hipdnn_sample(conv_bias_add conv_fusion/ConvBiasAdd.cpp)
+add_hipdnn_sample(conv_bias conv_fusion/ConvBias.cpp)
+add_hipdnn_sample(conv_bias_add_relu conv_fusion/ConvBiasAddRelu.cpp)
+add_hipdnn_sample(convbwd_bias_relu conv_fusion/ConvbwdBiasRelu.cpp)
+add_hipdnn_sample(convint8_bias conv_fusion/Int8ConvBias.cpp)
+add_hipdnn_sample(convint8_bias_add conv_fusion/Int8ConvBiasAdd.cpp)
+add_hipdnn_sample(convint8_bias_add_relu conv_fusion/Int8ConvBiasAddRelu.cpp)
+add_hipdnn_sample(convint8_bias_relu conv_fusion/Int8ConvBiasRelu.cpp)
+add_hipdnn_sample(convint8_bias_relu_add conv_fusion/Int8ConvBiasReluAdd.cpp)
+add_hipdnn_sample(convfp16_bias_relu conv_fusion/Fp16ConvBiasRelu.cpp)
+add_hipdnn_sample(ln_inference layernorm/LnInference.cpp)
+#add_hipdnn_sample(ln_backward layernorm/LnBackward.cpp)
+add_hipdnn_sample(rms_forward rmsnorm/RmsnormForward.cpp)
+add_hipdnn_sample(deform_conv_fprop deformconvolution/DeformConvForward.cpp)
+add_hipdnn_sample(deform_conv_dgrad deformconvolution/DeformConvBackward.cpp)
+add_hipdnn_sample(deform_conv_wgrad deformconvolution/DeformConvBackwardWeight.cpp)
+add_hipdnn_sample(gn_training groupnorm/GNTraining.cpp)
+add_hipdnn_sample(gn_inference groupnorm/GNInference.cpp)
+add_hipdnn_sample(gn_backward groupnorm/GNBackward.cpp)
+add_hipdnn_sample(add_layernorm fusion/AddLayernorm.cpp)
+add_hipdnn_sample(gn_swish fusion/GroupnormSwish.cpp)
+add_hipdnn_sample(sdpa_inference sdpa/SDPAInference.cpp)
+add_hipdnn_sample(reduction reduction/Reduction.cpp)
+add_hipdnn_sample(reluBwd_reduction reduction/PointwiseReduction.cpp)
+add_hipdnn_sample(transpose transpose/Transpose.cpp)
+#add_hipdnn_sample(genstats genstats/Genstats.cpp)
+add_hipdnn_sample(reshape_transpose fusion/ReshapeTranspose.cpp)
+#add_hipdnn_sample(resample resample/Resample.cpp)
+add_hipdnn_sample(deform_attn_fprop deformattention/DeformAttnForward.cpp)
+add_hipdnn_sample(deform_attn_dgrad deformattention/DeformAttnBackward.cpp)
+add_hipdnn_sample(instancenorm_inference instancenorm/InstancenormInference.cpp)
+add_hipdnn_sample(instancenorm_backward instancenorm/InstancenormBackward.cpp)
+add_hipdnn_sample(instancenorm_training instancenorm/InstancenormTraining.cpp)
+#add_hipdnn_sample(block_scale_dequantize block_scale/BlockScaleDequantize.cpp)
+#add_hipdnn_sample(block_scale_quantize block_scale/BlockScaleQuantize.cpp)
+#add_hipdnn_sample(slice slice/Slice.cpp)
+#add_hipdnn_sample(rng rng/Rng.cpp)
+add_hipdnn_sample(adamw adamw/Adamw.cpp)
+add_hipdnn_sample(transformer_adamw adamw/TransformerAdamw.cpp)
+add_hipdnn_sample(concatenate concatenate/Concatenate.cpp)
+# add_hipdnn_sample(pw_conv_genstats fusion/PointwiseConvGenstats.cpp)
+add_hipdnn_sample(concat_conv concat_conv_fusion/ConcatConv.cpp)
+add_hipdnn_sample(concat_conv_bias concat_conv_fusion/ConcatConvBias.cpp)
+add_hipdnn_sample(concat_conv_bias_add concat_conv_fusion/ConcatConvBiasAdd.cpp)
+add_hipdnn_sample(concat_conv_bias_leakyRelu concat_conv_fusion/ConcatConvBiasLeakyRelu.cpp)
+add_hipdnn_sample(concat_conv_bias_leakyRelu_add concat_conv_fusion/ConcatConvBiasLeakyReluAdd.cpp)
+add_hipdnn_sample(conv_bias_depthToSpace conv_depthtospace_fusion/ConvBiasDepthToSpace.cpp)
+add_hipdnn_sample(conv_bias_depthToSpace_add conv_depthtospace_fusion/ConvBiasDepthToSpaceAdd.cpp)
+add_hipdnn_sample(conv_bias_add_depthToSpace conv_depthtospace_fusion/ConvBiasAddDepthToSpace.cpp)
+add_hipdnn_sample(conv_bias_depthToSpace_clippedRelu conv_depthtospace_fusion/ConvBiasDepthToSpaceClippedRelu.cpp)
+add_hipdnn_sample(conv_bias_depthToSpace_clippedRelu_add conv_depthtospace_fusion/ConvBiasDepthToSpaceClippedReluAdd.cpp)
+add_hipdnn_sample(conv_depthToSpace conv_depthtospace_fusion/ConvDepthToSpace.cpp)
+add_hipdnn_sample(matmul matmul/Matmul.cpp)
+add_hipdnn_sample(matmul_bias matmul_fusion/MatmulBias.cpp)
+add_hipdnn_sample(matmul_bias_swish matmul_fusion/MatmulBiasSwish.cpp)
+add_hipdnn_sample(rope_forward rope/RopeForward.cpp)
+add_hipdnn_sample(rope_backward rope/RopeBackward.cpp)
+add_hipdnn_sample(pointwise_binary pointwise/BinaryPointwise.cpp)
+add_hipdnn_sample(softmax softmax/Softmax.cpp)
+add_hipdnn_sample(ctc_loss ctc_loss/CtcLoss.cpp)
+add_hipdnn_sample(kthvalue2d kthvalue/Kthvalue2D.cpp)
+add_hipdnn_sample(kthvalue4d kthvalue/Kthvalue4D.cpp)
+add_hipdnn_sample(multi_margin_loss multi_margin_loss/MultiMarginLoss.cpp)
+add_hipdnn_sample(soft_margin_loss soft_margin_loss/SoftMarginLossForward.cpp)
+add_hipdnn_sample(soft_margin_loss_backward soft_margin_loss/SoftMarginLossBackward.cpp)
+add_hipdnn_sample(getitem_indices_backward getitem_backward/GetitemBackwardIndices.cpp)
+add_hipdnn_sample(getitem_slice_backward getitem_backward/GetitemBackwardSlice.cpp)
+add_hipdnn_sample(scale_bias_relu_conv_genstats conv_bn_fusion/ScaleBiasReluConvGenstats.cpp)
+add_hipdnn_sample(scale_bias_relu_convwrw conv_bn_fusion/ScaleBiasReluConvwrw.cpp)
+add_hipdnn_sample(mul_mul_add_add conv_bn_fusion/MulMulAddAdd.cpp)
+add_hipdnn_sample(sub_mul_mul_add_convbwd_relubwd_bnwrw conv_bn_fusion/SubMulMulAddConvbwdRelubwdBnwrw.cpp)
+add_hipdnn_sample(conv_genstats conv_bn_fusion/ConvGenstats.cpp)
+add_hipdnn_sample(scale_bias conv_bn_fusion/ScaleBias.cpp)
\ No newline at end of file
--- a/cpp/adamw/Adamw.cpp
+++ b/cpp/adamw/Adamw.cpp
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 2; // Batch size
+    // Input
+    const int64_t c = 3; // Number of channels
+    const int64_t h = 4; // Height
+    const int64_t w = 5; // Width
+
+    auto buildAdamwGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        graph->set_name("adamw_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT); //
+
+        auto params = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("params")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto grads = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("grads")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto expAvgs = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("exp_avgs")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto expAvgSqs = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("exp_avg_sqs")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto maxExpAvgSqs = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("max_exp_avg_sqs")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto adamwAttributes = hipdnn_frontend::graph::AdamwAttributes()
+                                   .set_name("adamw_node")
+                                   .set_transformeradamw(false)
+                                   .set_max_exp_avg_sqs(maxExpAvgSqs);
+
+        graph->adamw(params, grads, expAvgs, expAvgSqs, adamwAttributes);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, params, grads, expAvgs, expAvgSqs, maxExpAvgSqs);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, params, grads, expAvgs, expAvgSqs, maxExpAvgSqs] = buildAdamwGraph(handle);
+
+    // Allocate DCU memory
+    hipdnn_data_sdk::utilities::Tensor<InputType> paramsTensor(params->get_dim(),
+                                                               params->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> gradsTensor(grads->get_dim(),
+                                                              grads->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> expAvgsTensor(expAvgs->get_dim(),
+                                                                expAvgs->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> expAvgSqsTensor(expAvgSqs->get_dim(),
+                                                                  expAvgSqs->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> maxExpAvgSqsTensor(maxExpAvgSqs->get_dim(),
+                                                                     maxExpAvgSqs->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[params->get_uid()] = paramsTensor.memory().deviceData();
+    variantPack[grads->get_uid()] = gradsTensor.memory().deviceData();
+    variantPack[expAvgs->get_uid()] = expAvgsTensor.memory().deviceData();
+    variantPack[expAvgSqs->get_uid()] = expAvgSqsTensor.memory().deviceData();
+    variantPack[maxExpAvgSqs->get_uid()] = maxExpAvgSqsTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "Adamw graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/adamw/TransformerAdamw.cpp
+++ b/cpp/adamw/TransformerAdamw.cpp
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 2; // Batch size
+    // Input
+    const int64_t c = 3; // Number of channels
+    const int64_t h = 4; // Height
+    const int64_t w = 5; // Width
+
+    auto buildTransformerAdamwGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        graph->set_name("transformer_adamw_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT); //
+
+        auto params = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("params")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto grads = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("grads")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto expAvgs = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("exp_avgs")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto expAvgSqs = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("exp_avg_sqs")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1}));
+
+        auto adamwAttributes = hipdnn_frontend::graph::AdamwAttributes()
+                                   .set_name("transformer_adamw_node")
+                                   .set_correct_bias(false)
+                                   .set_transformeradamw(true);
+
+        graph->adamw(params, grads, expAvgs, expAvgSqs, adamwAttributes);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, params, grads, expAvgs, expAvgSqs);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, params, grads, expAvgs, expAvgSqs] = buildTransformerAdamwGraph(handle);
+
+    // Allocate DCU memory
+    hipdnn_data_sdk::utilities::Tensor<InputType> paramsTensor(params->get_dim(),
+                                                               params->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> gradsTensor(grads->get_dim(),
+                                                              grads->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> expAvgsTensor(expAvgs->get_dim(),
+                                                                expAvgs->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> expAvgSqsTensor(expAvgSqs->get_dim(),
+                                                                  expAvgSqs->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[params->get_uid()] = paramsTensor.memory().deviceData();
+    variantPack[grads->get_uid()] = gradsTensor.memory().deviceData();
+    variantPack[expAvgs->get_uid()] = expAvgsTensor.memory().deviceData();
+    variantPack[expAvgSqs->get_uid()] = expAvgSqsTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "TransformerAdamw graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/batchnorm/BnBackward.cpp
+++ b/cpp/batchnorm/BnBackward.cpp
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 16; // Batch size
+    // Input
+    const int64_t c = 16; // Number of channels
+    const int64_t h = 16; // Height
+    const int64_t w = 16; // Width
+
+    auto buildBnBackwardGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("bn_backward_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto dy = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("dy")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto scale = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("scale")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto savedMean = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("save_mean")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto savedInvVariance = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("save_inv_variance")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto bnBwdAttributes = hipdnn_frontend::graph::BatchnormBackwardAttributes()
+                                   .set_name("bn_backward_node")
+                                   .set_saved_mean_and_inv_variance(savedMean, savedInvVariance);
+
+        auto [dx, dscale, dbias] = graph->batchnorm_backward(dy, x, scale, bnBwdAttributes);
+        dx->set_output(true);
+        dscale->set_output(true);
+        dbias->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, dy, x, scale, savedMean, savedInvVariance, dx, dscale, dbias);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, dy, x, scale, savedMean, savedInvVariance, dx, dscale, dbias]
+        = buildBnBackwardGraph(handle);
+
+    // Allocate DCU memory
+    hipdnn_data_sdk::utilities::Tensor<InputType> dyTensor(dy->get_dim(), dy->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> scaleTensor(scale->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> savedMeanTensor(savedMean->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> savedInvVarTensor(savedInvVariance->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> dxTensor(dx->get_dim(), dx->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> dscaleTensor(dscale->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> dbiasTensor(dbias->get_dim());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[dy->get_uid()] = dyTensor.memory().deviceData();
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[scale->get_uid()] = scaleTensor.memory().deviceData();
+    variantPack[savedMean->get_uid()] = savedMeanTensor.memory().deviceData();
+    variantPack[savedInvVariance->get_uid()] = savedInvVarTensor.memory().deviceData();
+    variantPack[dx->get_uid()] = dxTensor.memory().deviceData();
+    variantPack[dscale->get_uid()] = dscaleTensor.memory().deviceData();
+    variantPack[dbias->get_uid()] = dbiasTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "Batch normalization backward graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/batchnorm/BnBackwardWeight.cpp
+++ b/cpp/batchnorm/BnBackwardWeight.cpp
+#include <iostream>
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_frontend.hpp>
+
+#include "hipdnn_data_sdk/utilities/Workspace.hpp"
+#include "utils.hpp"
+
+int main()
+{
+    using InputType = float;
+
+    const int64_t n = 16; // Batch size
+    // Input
+    const int64_t c = 16; // Number of channels
+    const int64_t h = 16; // Height
+    const int64_t w = 16; // Width
+
+    auto buildBnBackwarWeightdGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("bn_backward_weight_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto dy = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("dy")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto scale = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("scale")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto savedMean = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("save_mean")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto savedInvVariance = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("save_inv_variance")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto bnBwdWeightAttributes
+            = hipdnn_frontend::graph::BatchnormBackwardWeightAttributes().set_name(
+                "bn_backward_weight_node");
+
+        auto [dscale, dbias, eqScaleDy, eqScaleX, eqBias]
+            = graph->dbn_weight(dy, x, savedMean, savedInvVariance, scale, bnBwdWeightAttributes);
+        dscale->set_output(true);
+        dbias->set_output(true);
+        eqScaleDy->set_output(true);
+        eqScaleX->set_output(true);
+        eqBias->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph,
+                               dy,
+                               x,
+                               scale,
+                               savedMean,
+                               savedInvVariance,
+                               dscale,
+                               dbias,
+                               eqScaleDy,
+                               eqScaleX,
+                               eqBias);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph,
+          dy,
+          x,
+          scale,
+          savedMean,
+          savedInvVariance,
+          dscale,
+          dbias,
+          eqScaleDy,
+          eqScaleX,
+          eqBias]
+        = buildBnBackwarWeightdGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> dyTensor(dy->get_dim(), dy->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> scaleTensor(scale->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> savedMeanTensor(savedMean->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> savedInvVarTensor(savedInvVariance->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> dscaleTensor(dscale->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> dbiasTensor(dbias->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> eqScaleDyTensor(eqScaleDy->get_dim(),
+                                                                  eqScaleDy->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> eqScaleXTensor(eqScaleX->get_dim(),
+                                                                 eqScaleX->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> eqBiasTensor(eqBias->get_dim(),
+                                                               eqBias->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+
+    variantPack[dy->get_uid()] = dyTensor.memory().deviceData();
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[scale->get_uid()] = scaleTensor.memory().deviceData();
+    variantPack[savedMean->get_uid()] = savedMeanTensor.memory().deviceData();
+    variantPack[savedInvVariance->get_uid()] = savedInvVarTensor.memory().deviceData();
+    variantPack[dscale->get_uid()] = dscaleTensor.memory().deviceData();
+    variantPack[dbias->get_uid()] = dbiasTensor.memory().deviceData();
+    variantPack[eqScaleDy->get_uid()] = eqScaleDyTensor.memory().deviceData();
+    variantPack[eqScaleX->get_uid()] = eqScaleXTensor.memory().deviceData();
+    variantPack[eqBias->get_uid()] = eqBiasTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "nBatch normalization backward weight graph execution complete.  \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/batchnorm/BnFinalize.cpp
+++ b/cpp/batchnorm/BnFinalize.cpp
+#include <iostream>
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_frontend.hpp>
+
+#include "hipdnn_data_sdk/utilities/Workspace.hpp"
+#include "utils.hpp"
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 1; // Batch size
+    // Input
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 1; // Height
+    const int64_t w = 1; // Width
+
+    auto buildBnFinalizeGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("bn_finalize_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto sum = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("sum")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto sqSum = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("sq_sum")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto scale = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("scale")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto prevRunningMean = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("save_mean")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto prevRunningVar = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("save_inv_variance")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto momentum = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("momentum")
+                .set_dim({1, 1, 1, 1})
+                .set_stride({1, 1, 1, 1}));
+
+        auto epsilon = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("epsilon")
+                .set_dim({1, 1, 1, 1})
+                .set_stride({1, 1, 1, 1}));
+
+        auto accumCount = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("accum_count")
+                .set_dim({1, 1, 1, 1})
+                .set_stride({1, 1, 1, 1}));
+
+        epsilon->set_value(1e-5);
+        momentum->set_value(0.001f);
+        accumCount->set_value(static_cast<int32_t>(n * h * w));
+
+        auto bnFinalizeAttributes
+            = hipdnn_frontend::graph::BatchnormFinalizeAttributes()
+                  .set_name("bn_finalize_node")
+                  .set_previous_running_stats(prevRunningMean, prevRunningVar, momentum);
+
+        auto [eqScale, eqBias, mean, invVariance, nextRunningMean, nextRunningVar]
+            = graph->bn_finalize(
+                sum, sqSum, scale, bias, epsilon, accumCount, bnFinalizeAttributes);
+        eqScale->set_output(true);
+        eqBias->set_output(true);
+        mean->set_output(true);
+        invVariance->set_output(true);
+        nextRunningMean->set_output(true);
+        nextRunningVar->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph,
+                               sum,
+                               sqSum,
+                               scale,
+                               bias,
+                               epsilon,
+                               prevRunningMean,
+                               prevRunningVar,
+                               momentum,
+                               accumCount,
+                               eqScale,
+                               eqBias,
+                               mean,
+                               invVariance,
+                               nextRunningMean,
+                               nextRunningVar);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph,
+          sum,
+          sqSum,
+          scale,
+          bias,
+          epsilon,
+          prevRunningMean,
+          prevRunningVar,
+          momentum,
+          accumCount,
+          eqScale,
+          eqBias,
+          mean,
+          invVariance,
+          nextRunningMean,
+          nextRunningVar]
+        = buildBnFinalizeGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> sumTensor(sum->get_dim(), sum->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> sqSumTensor(sqSum->get_dim(),
+                                                              sqSum->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> scaleTensor(scale->get_dim(),
+                                                              scale->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim(), bias->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> prevMeanTensor(prevRunningMean->get_dim(),
+                                                                 prevRunningMean->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> prevVarTensor(prevRunningVar->get_dim(),
+                                                                prevRunningVar->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> momentumTensor(momentum->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> epsilonTensor(epsilon->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> accumCountTensor(accumCount->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> eqScaleTensor(eqScale->get_dim(),
+                                                                eqScale->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> eqBiasTensor(eqBias->get_dim(),
+                                                               eqBias->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> nextMeanTensor(nextRunningMean->get_dim(),
+                                                                 nextRunningMean->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> nextVarTensor(nextRunningVar->get_dim(),
+                                                                nextRunningVar->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> meanTensor(mean->get_dim(), mean->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> invVarTensor(invVariance->get_dim(),
+                                                               invVariance->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+
+    variantPack[sum->get_uid()] = sumTensor.memory().deviceData();
+    variantPack[sqSum->get_uid()] = sqSumTensor.memory().deviceData();
+    variantPack[scale->get_uid()] = scaleTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[prevRunningMean->get_uid()] = prevMeanTensor.memory().deviceData();
+    variantPack[prevRunningVar->get_uid()] = prevVarTensor.memory().deviceData();
+    variantPack[momentum->get_uid()] = momentumTensor.memory().deviceData();
+    variantPack[epsilon->get_uid()] = epsilonTensor.memory().deviceData();
+    variantPack[accumCount->get_uid()] = accumCountTensor.memory().deviceData();
+    variantPack[eqScale->get_uid()] = eqScaleTensor.memory().deviceData();
+    variantPack[eqBias->get_uid()] = eqBiasTensor.memory().deviceData();
+    variantPack[nextRunningMean->get_uid()] = nextMeanTensor.memory().deviceData();
+    variantPack[nextRunningVar->get_uid()] = nextVarTensor.memory().deviceData();
+    variantPack[mean->get_uid()] = meanTensor.memory().deviceData();
+    variantPack[invVariance->get_uid()] = invVarTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "Batch normalization finalize graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/batchnorm/BnInference.cpp
+++ b/cpp/batchnorm/BnInference.cpp
+#include <iostream>
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_frontend.hpp>
+
+#include "hipdnn_data_sdk/utilities/Workspace.hpp"
+#include "utils.hpp"
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 16; // Batch size
+    // Input
+    const int64_t c = 16; // Number of channels
+    const int64_t h = 16; // Height
+    const int64_t w = 16; // Width
+
+    auto buildBnInferenceGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("bn_inference_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto scale = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("scale")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto mean = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("mean")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto variance = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("variance")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto epsilon = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("epsilon")
+                .set_dim({1, 1, 1, 1})
+                .set_stride({1, 1, 1, 1})
+                .set_value(1e-5));
+
+        auto bnInferenceAttributes
+            = hipdnn_frontend::graph::BatchnormInferenceAttributesVarianceExt().set_name(
+                "bn_inference_node");
+
+        auto y = graph->batchnorm_inference_variance_ext(
+            x, mean, variance, scale, bias, epsilon, bnInferenceAttributes);
+        y->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x, scale, bias, mean, variance, epsilon, y);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x, scale, bias, mean, variance, epsilon, y] = buildBnInferenceGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> scaleTensor(scale->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> meanTensor(mean->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> varianceTensor(variance->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> epsilonTensor(epsilon->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[scale->get_uid()] = scaleTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[mean->get_uid()] = meanTensor.memory().deviceData();
+    variantPack[variance->get_uid()] = varianceTensor.memory().deviceData();
+    variantPack[epsilon->get_uid()] = epsilonTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "Batch normalization inference graph execution complete. ";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/batchnorm/BnTraining.cpp
+++ b/cpp/batchnorm/BnTraining.cpp
+#include <iostream>
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_frontend.hpp>
+
+#include "hipdnn_data_sdk/utilities/Workspace.hpp"
+#include "utils.hpp"
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 16; // Batch size
+    // Input
+    const int64_t c = 16; // Number of channels
+    const int64_t h = 16; // Height
+    const int64_t w = 16; // Width
+
+    auto buildBnTrainingGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("bn_training_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto scale = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("scale")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto prevRunningMean = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("prev_running_mean")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto prevRunningVar = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("prev_running_variance")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto momentum = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("momentum")
+                .set_dim({1, 1, 1, 1})
+                .set_stride({1, 1, 1, 1}));
+
+        auto epsilon = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("epsilon")
+                .set_dim({1, 1, 1, 1})
+                .set_stride({1, 1, 1, 1}));
+
+        epsilon->set_value(1e-5);
+        momentum->set_value(0.1);
+
+        auto bnTrainingAttributes
+            = hipdnn_frontend::graph::BatchnormAttributes()
+                  .set_name("bn_training_node")
+                  .set_epsilon(epsilon)
+                  .set_previous_running_stats(prevRunningMean, prevRunningVar, momentum);
+
+        auto [y, savedMean, savedInvVariance, nextRunningMean, nextRunningVar]
+            = graph->batchnorm(x, scale, bias, bnTrainingAttributes);
+        y->set_output(true);
+        nextRunningMean->set_output(true);
+        nextRunningVar->set_output(true);
+        savedMean->set_output(true);
+        savedInvVariance->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph,
+                               x,
+                               scale,
+                               bias,
+                               prevRunningMean,
+                               prevRunningVar,
+                               momentum,
+                               epsilon,
+                               y,
+                               savedMean,
+                               savedInvVariance,
+                               nextRunningMean,
+                               nextRunningVar);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph,
+          x,
+          scale,
+          bias,
+          prevRunningMean,
+          prevRunningVar,
+          momentum,
+          epsilon,
+          y,
+          savedMean,
+          savedInvVariance,
+          nextRunningMean,
+          nextRunningVar]
+        = buildBnTrainingGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> scaleTensor(scale->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> prevMeanTensor(prevRunningMean->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> prevVarTensor(prevRunningVar->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> momentumTensor(momentum->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> epsilonTensor(epsilon->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> savedMeanTensor(savedMean->get_dim());
+    hipdnn_data_sdk::utilities::Tensor<InputType> savedInvVarTensor(savedInvVariance->get_dim());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[scale->get_uid()] = scaleTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[prevRunningMean->get_uid()] = prevMeanTensor.memory().deviceData();
+    variantPack[prevRunningVar->get_uid()] = prevVarTensor.memory().deviceData();
+    variantPack[momentum->get_uid()] = momentumTensor.memory().deviceData();
+    variantPack[epsilon->get_uid()] = epsilonTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    // hipDNN uses two separate memory blocks to store the statistics before and after updates,
+    // whereas MIOpen only uses one memory block to store them.
+    // To accommodate this difference, both the prev and next statistics in the hipDNN interface are pointed to the same memory address here,
+    // and the plugin layer passes this address to MIOpen.
+    variantPack[nextRunningMean->get_uid()] = prevMeanTensor.memory().deviceData();
+    variantPack[nextRunningVar->get_uid()] = prevVarTensor.memory().deviceData();
+    variantPack[savedMean->get_uid()] = savedMeanTensor.memory().deviceData();
+    variantPack[savedInvVariance->get_uid()] = savedInvVarTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "Batch normalization training graph execution complete. ";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/block_scale/BlockScaleDequantize.cpp
+++ b/cpp/block_scale/BlockScaleDequantize.cpp
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 2; // Batch size
+    // Input
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 32; // Height
+    const int64_t w = 32; // Width
+    std::vector<int32_t> blockSize = {1, 32};
+    const int64_t scaleW = w / blockSize[1];
+
+    auto buildBlockScaleDequantizeGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("block_scale_dequantize_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto scale = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("scale")
+                .set_dim({n, c, h, scaleW})
+                .set_stride({c * h * scaleW, 1, c * scaleW, c}));
+
+        auto blockScaleDequantizeAttributes
+            = hipdnn_frontend::graph::BlockScaleDequantizeAttributes()
+                  .set_name("block_scale_dequantize_node")
+                  .set_block_size(blockSize)
+                  .set_is_negative_scale(true);
+
+        auto y = graph->block_scale_dequantize(x, scale, blockScaleDequantizeAttributes);
+        y->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x, scale, y);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x, scale, y] = buildBlockScaleDequantizeGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> scaleTensor(scale->get_dim(),
+                                                              scale->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[scale->get_uid()] = scaleTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "\nBlockScaleDequantize graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/block_scale/BlockScaleQuantize.cpp
+++ b/cpp/block_scale/BlockScaleQuantize.cpp
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    const int64_t n = 2; // Batch size
+    // Input
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 32; // Height
+    const int64_t w = 32; // Width
+    const int32_t blockSize = 1;
+
+    auto buildBlockScaleQuantizeGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("block_scale_quantize_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto blockScaleQuantizeAttributes = hipdnn_frontend::graph::BlockScaleQuantizeAttributes()
+                                                .set_name("block_scale_quantize_node")
+                                                .set_block_size(blockSize);
+
+        auto [y, scale] = graph->block_scale_quantize(x, blockScaleQuantizeAttributes);
+        y->set_output(true);
+        scale->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x, y, scale);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x, y, scale] = buildBlockScaleQuantizeGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> scaleTensor(scale->get_dim(),
+                                                              scale->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[scale->get_uid()] = scaleTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "\nBlockScaleQuantize graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/concat_conv_fusion/ConcatConv.cpp
+++ b/cpp/concat_conv_fusion/ConcatConv.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    // params
+    const int64_t n = 1; // Batch size
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 128; // Height
+    const int64_t w = 128; // Width
+    const int64_t k = 32; // Number of filters
+    const int64_t r = 2; // Height
+    const int64_t s = 2; // Width
+
+    const int64_t axis = 1;
+
+    // create graph
+    auto buildConcatConvGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        const auto inputType = hipdnn_frontend::getDataTypeEnumFromType<InputType>();
+        graph->set_name("concat_conv_graph")
+            .set_io_data_type(inputType)
+            .set_intermediate_data_type(inputType)
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        // create concat
+        auto x1 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x1")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto x2 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x2")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto concatenateAttributes = hipdnn_frontend::graph::ConcatenateAttributes().set_axis(axis);
+        auto concatOutput = graph->concatenate({x1, x2}, concatenateAttributes);
+
+        // create conv
+        const int64_t c2 = c * 2;
+        auto filter = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("filter")
+                .set_dim({k, c2, r, s})
+                .set_stride({c2 * r * s, 1, c2 * s, c2}));
+        auto convFpropAttributes = hipdnn_frontend::graph::ConvFpropAttributes()
+                                       .set_name("conv_fprop_node")
+                                       .set_padding({1, 1})
+                                       .set_stride({1, 1})
+                                       .set_dilation({1, 1});
+        auto y = graph->conv_fprop(concatOutput, filter, convFpropAttributes);
+        y->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x1, x2, filter, y);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x1, x2, filter, y] = buildConcatConvGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> x1Tensor(x1->get_dim(), x1->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> x2Tensor(x2->get_dim(), x2->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> filterTensor(filter->get_dim(),
+                                                               filter->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x1->get_uid()] = x1Tensor.memory().deviceData();
+    variantPack[x2->get_uid()] = x2Tensor.memory().deviceData();
+    variantPack[filter->get_uid()] = filterTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "Concatenate graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+
+    return 0;
+}
--- a/cpp/concat_conv_fusion/ConcatConvBias.cpp
+++ b/cpp/concat_conv_fusion/ConcatConvBias.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    // params
+    const int64_t n = 1; // Batch size
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 128; // Height
+    const int64_t w = 128; // Width
+    const int64_t k = 32; // Number of filters
+    const int64_t r = 2; // Height
+    const int64_t s = 2; // Width
+
+    const int64_t axis = 1;
+
+    // create graph
+    auto buildConcatConvGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        const auto inputType = hipdnn_frontend::getDataTypeEnumFromType<InputType>();
+        graph->set_name("concat_conv_pointwise_graph")
+            .set_io_data_type(inputType)
+            .set_intermediate_data_type(inputType)
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        // create concat
+        auto x1 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x1")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto x2 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x2")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto concatenateAttributes = hipdnn_frontend::graph::ConcatenateAttributes().set_axis(axis);
+        auto concatOutput = graph->concatenate({x1, x2}, concatenateAttributes);
+
+        // create conv
+        const int64_t c2 = c * 2;
+        auto filter = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("filter")
+                .set_dim({k, c2, r, s})
+                .set_stride({c2 * r * s, 1, c2 * s, c2}));
+        auto convFpropAttributes = hipdnn_frontend::graph::ConvFpropAttributes()
+                                       .set_name("conv_fprop_node")
+                                       .set_padding({1, 1})
+                                       .set_stride({1, 1})
+                                       .set_dilation({1, 1});
+        auto y = graph->conv_fprop(concatOutput, filter, convFpropAttributes);
+
+        // create bias
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({1, k, 1, 1})
+                .set_stride({k, 1, k, k}));
+        auto biasAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                  .set_name("bias_node")
+                                  .set_mode(hipdnn_frontend::PointwiseMode_t::ADD);
+        auto biasOutput = graph->pointwise(y, bias, biasAttributes);
+        biasOutput->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x1, x2, filter, bias, biasOutput);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x1, x2, filter, bias, y] = buildConcatConvGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> x1Tensor(x1->get_dim(), x1->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> x2Tensor(x2->get_dim(), x2->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> filterTensor(filter->get_dim(),
+                                                               filter->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim(), bias->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x1->get_uid()] = x1Tensor.memory().deviceData();
+    variantPack[x2->get_uid()] = x2Tensor.memory().deviceData();
+    variantPack[filter->get_uid()] = filterTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "ConcatConvPointwise graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+
+    return 0;
+}
--- a/cpp/concat_conv_fusion/ConcatConvBiasAdd.cpp
+++ b/cpp/concat_conv_fusion/ConcatConvBiasAdd.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    // params
+    const int64_t n = 1; // Batch size
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 128; // Height
+    const int64_t w = 128; // Width
+    const int64_t k = 32; // Number of filters
+    const int64_t r = 3; // Height
+    const int64_t s = 3; // Width
+
+    const int64_t axis = 1;
+
+    // create graph
+    auto buildConcatConvGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        const auto inputType = hipdnn_frontend::getDataTypeEnumFromType<InputType>();
+        graph->set_name("concat_conv_pointwise_graph")
+            .set_io_data_type(inputType)
+            .set_intermediate_data_type(inputType)
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        // create concat
+        auto x1 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x1")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto x2 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x2")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto concatenateAttributes = hipdnn_frontend::graph::ConcatenateAttributes().set_axis(axis);
+        auto concatOutput = graph->concatenate({x1, x2}, concatenateAttributes);
+
+        // create conv
+        const int64_t c2 = c * 2;
+        auto filter = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("filter")
+                .set_dim({k, c2, r, s})
+                .set_stride({c2 * r * s, 1, c2 * s, c2}));
+        auto convFpropAttributes = hipdnn_frontend::graph::ConvFpropAttributes()
+                                       .set_name("conv_fprop_node")
+                                       .set_padding({1, 1})
+                                       .set_stride({1, 1})
+                                       .set_dilation({1, 1});
+        auto y = graph->conv_fprop(concatOutput, filter, convFpropAttributes);
+
+        // create bias
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({1, k, 1, 1})
+                .set_stride({k, 1, k, k}));
+        auto biasAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                  .set_name("bias_node")
+                                  .set_mode(hipdnn_frontend::PointwiseMode_t::ADD);
+        auto biasOutput = graph->pointwise(y, bias, biasAttributes);
+
+        // create add
+        auto add = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("add")
+                .set_dim({n, k, h, w})
+                .set_stride({k * h * w, 1, k * w, k}));
+        auto addAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                 .set_name("add_node")
+                                 .set_mode(hipdnn_frontend::PointwiseMode_t::ADD);
+        auto output = graph->pointwise(biasOutput, add, addAttributes);
+        output->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x1, x2, filter, bias, add, output);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x1, x2, filter, bias, add, y] = buildConcatConvGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> x1Tensor(x1->get_dim(), x1->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> x2Tensor(x2->get_dim(), x2->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> filterTensor(filter->get_dim(),
+                                                               filter->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim(), bias->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> addTensor(add->get_dim(), add->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x1->get_uid()] = x1Tensor.memory().deviceData();
+    variantPack[x2->get_uid()] = x2Tensor.memory().deviceData();
+    variantPack[filter->get_uid()] = filterTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[add->get_uid()] = addTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "ConcatConvPointwise graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+
+    return 0;
+}
--- a/cpp/concat_conv_fusion/ConcatConvBiasLeakyRelu.cpp
+++ b/cpp/concat_conv_fusion/ConcatConvBiasLeakyRelu.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    // params
+    const int64_t n = 1; // Batch size
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 128; // Height
+    const int64_t w = 128; // Width
+    const int64_t k = 32; // Number of filters
+    const int64_t r = 2; // Height
+    const int64_t s = 2; // Width
+
+    const int64_t axis = 1;
+
+    // create graph
+    auto buildConcatConvGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        const auto inputType = hipdnn_frontend::getDataTypeEnumFromType<InputType>();
+        graph->set_name("concat_conv_pointwise_graph")
+            .set_io_data_type(inputType)
+            .set_intermediate_data_type(inputType)
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        // create concat
+        auto x1 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x1")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto x2 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x2")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto concatenateAttributes = hipdnn_frontend::graph::ConcatenateAttributes().set_axis(axis);
+        auto concatOutput = graph->concatenate({x1, x2}, concatenateAttributes);
+
+        // create conv
+        const int64_t c2 = c * 2;
+        auto filter = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("filter")
+                .set_dim({k, c2, r, s})
+                .set_stride({c2 * r * s, 1, c2 * s, c2}));
+        auto convFpropAttributes = hipdnn_frontend::graph::ConvFpropAttributes()
+                                       .set_name("conv_fprop_node")
+                                       .set_padding({1, 1})
+                                       .set_stride({1, 1})
+                                       .set_dilation({1, 1});
+        auto y = graph->conv_fprop(concatOutput, filter, convFpropAttributes);
+
+        // create bias
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({1, k, 1, 1})
+                .set_stride({k, 1, k, k}));
+        auto biasAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                  .set_name("bias_node")
+                                  .set_mode(hipdnn_frontend::PointwiseMode_t::ADD);
+        auto biasOutput = graph->pointwise(y, bias, biasAttributes);
+
+        // create leaky relu
+        auto reluAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                  .set_name("relu_node")
+                                  .set_mode(hipdnn_frontend::PointwiseMode_t::RELU_FWD)
+                                  .set_relu_lower_clip_slope(0.1f);
+        auto reluOutput = graph->pointwise(biasOutput, reluAttributes);
+        reluOutput->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x1, x2, filter, bias, reluOutput);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x1, x2, filter, bias, y] = buildConcatConvGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> x1Tensor(x1->get_dim(), x1->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> x2Tensor(x2->get_dim(), x2->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> filterTensor(filter->get_dim(),
+                                                               filter->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim(), bias->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x1->get_uid()] = x1Tensor.memory().deviceData();
+    variantPack[x2->get_uid()] = x2Tensor.memory().deviceData();
+    variantPack[filter->get_uid()] = filterTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "ConcatConvPointwise graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+
+    return 0;
+}
--- a/cpp/concat_conv_fusion/ConcatConvBiasLeakyReluAdd.cpp
+++ b/cpp/concat_conv_fusion/ConcatConvBiasLeakyReluAdd.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    // params
+    const int64_t n = 1; // Batch size
+    const int64_t c = 32; // Number of channels
+    const int64_t h = 128; // Height
+    const int64_t w = 128; // Width
+    const int64_t k = 32; // Number of filters
+    const int64_t r = 3; // Height
+    const int64_t s = 3; // Width
+
+    const int64_t axis = 1;
+
+    // create graph
+    auto buildConcatConvGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        const auto inputType = hipdnn_frontend::getDataTypeEnumFromType<InputType>();
+        graph->set_name("concat_conv_pointwise_graph")
+            .set_io_data_type(inputType)
+            .set_intermediate_data_type(inputType)
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        // create concat
+        auto x1 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x1")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto x2 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x2")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c})
+                .set_data_type(inputType));
+        auto concatenateAttributes = hipdnn_frontend::graph::ConcatenateAttributes().set_axis(axis);
+        auto concatOutput = graph->concatenate({x1, x2}, concatenateAttributes);
+
+        // create conv
+        const int64_t c2 = c * 2;
+        auto filter = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("filter")
+                .set_dim({k, c2, r, s})
+                .set_stride({c2 * r * s, 1, c2 * s, c2}));
+        auto convFpropAttributes = hipdnn_frontend::graph::ConvFpropAttributes()
+                                       .set_name("conv_fprop_node")
+                                       .set_padding({1, 1})
+                                       .set_stride({1, 1})
+                                       .set_dilation({1, 1});
+        auto y = graph->conv_fprop(concatOutput, filter, convFpropAttributes);
+
+        // create bias
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({1, k, 1, 1})
+                .set_stride({k, 1, k, k}));
+        auto biasAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                  .set_name("bias_node")
+                                  .set_mode(hipdnn_frontend::PointwiseMode_t::ADD);
+        auto biasOutput = graph->pointwise(y, bias, biasAttributes);
+
+        // create relu
+        auto reluAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                  .set_name("relu_node")
+                                  .set_mode(hipdnn_frontend::PointwiseMode_t::RELU_FWD)
+                                  .set_relu_lower_clip_slope(0.1f);
+        auto reluOutput = graph->pointwise(biasOutput, reluAttributes);
+
+        // create add
+        auto add = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("add")
+                .set_dim({n, k, h, w})
+                .set_stride({k * h * w, 1, k * w, k}));
+        auto addAttributes = hipdnn_frontend::graph::PointwiseAttributes()
+                                 .set_name("add_node")
+                                 .set_mode(hipdnn_frontend::PointwiseMode_t::ADD);
+        auto output = graph->pointwise(reluOutput, add, addAttributes);
+        output->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x1, x2, filter, bias, add, output);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x1, x2, filter, bias, add, y] = buildConcatConvGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> x1Tensor(x1->get_dim(), x1->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> x2Tensor(x2->get_dim(), x2->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> filterTensor(filter->get_dim(),
+                                                               filter->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim(), bias->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> addTensor(add->get_dim(), add->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x1->get_uid()] = x1Tensor.memory().deviceData();
+    variantPack[x2->get_uid()] = x2Tensor.memory().deviceData();
+    variantPack[filter->get_uid()] = filterTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[add->get_uid()] = addTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "ConcatConvPointwise graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+
+    return 0;
+}
--- a/cpp/concatenate/Concatenate.cpp
+++ b/cpp/concatenate/Concatenate.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include "utils.hpp"
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+int main()
+{
+    using InputType = hipdnn_data_sdk::types::half;
+
+    // params
+    const int64_t n = 1; // Batch size
+    const int64_t c = 16; // Number of channels
+    const int64_t h = 16; // Height
+    const int64_t w = 16; // Width
+    const int64_t axis = 0;
+
+    // create graph
+    auto buildConcatenateGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+
+        const auto inputType = hipdnn_frontend::getDataTypeEnumFromType<InputType>();
+        graph->set_name("ConcatenateGraph")
+            .set_io_data_type(inputType)
+            .set_intermediate_data_type(inputType)
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto x1 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x1")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1})
+                .set_data_type(inputType));
+
+        auto x2 = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x2")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, h * w, w, 1})
+                .set_data_type(inputType));
+
+        auto concatenateAttributes = hipdnn_frontend::graph::ConcatenateAttributes().set_axis(axis);
+
+        auto y = graph->concatenate({x1, x2}, concatenateAttributes);
+        y->set_output(true);
+
+        // build graph
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x1, x2, y);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x1, x2, y] = buildConcatenateGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> x1Tensor(x1->get_dim(), x1->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> x2Tensor(x2->get_dim(), x2->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x1->get_uid()] = x1Tensor.memory().deviceData();
+    variantPack[x2->get_uid()] = x2Tensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "Concatenate graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+
+    return 0;
+}
--- a/cpp/conv_bn_fusion/ConvGenstats.cpp
+++ b/cpp/conv_bn_fusion/ConvGenstats.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+#include "../utils.hpp"
+
+int main()
+{
+    using InputType = float;
+
+    const int64_t n = 4;
+    const int64_t c = 64;
+    const int64_t h = 16;
+    const int64_t w = 16;
+    const int64_t k = 32;
+    const int64_t r = 3;
+    const int64_t s = 3;
+
+    auto buildConvGenstatsGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("conv_genstats_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto filter = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("filter")
+                .set_dim({k, c, r, s})
+                .set_stride({c * r * s, 1, c * s, c}));
+
+        auto convAttrs = hipdnn_frontend::graph::ConvFpropAttributes()
+                             .set_name("conv_fprop_node")
+                             .set_padding({1, 1})
+                             .set_stride({1, 1})
+                             .set_dilation({1, 1});
+        auto y = graph->conv_fprop(x, filter, convAttrs);
+
+        auto genstatsAttrs = hipdnn_frontend::graph::GenstatsAttributes().set_name("genstats_node");
+        auto [sum, sqSum] = graph->genstats(y, genstatsAttrs);
+
+        y->set_output(true);
+        sum->set_output(true);
+        sqSum->set_output(true);
+
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, x, filter, y, sum, sqSum);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, x, filter, y, sum, sqSum] = buildConvGenstatsGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> filterTensor(filter->get_dim(),
+                                                               filter->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> sumTensor(sum->get_dim(), sum->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> sqSumTensor(sqSum->get_dim(),
+                                                              sqSum->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[filter->get_uid()] = filterTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+    variantPack[sum->get_uid()] = sumTensor.memory().deviceData();
+    variantPack[sqSum->get_uid()] = sqSumTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "ConvGenstats graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}
--- a/cpp/conv_bn_fusion/MulMulAddAdd.cpp
+++ b/cpp/conv_bn_fusion/MulMulAddAdd.cpp
+// Copyright © Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier:  MIT
+
+#include <iostream>
+
+#include <hipdnn_data_sdk/utilities/Tensor.hpp>
+#include <hipdnn_data_sdk/utilities/Workspace.hpp>
+#include <hipdnn_frontend.hpp>
+
+#include "../utils.hpp"
+
+int main()
+{
+    using InputType = float;
+
+    const int64_t n = 1;
+    const int64_t c = 4;
+    const int64_t h = 32;
+    const int64_t w = 32;
+
+    auto buildMulMulAddAddGraph = [=](hipdnnHandle_t handle) {
+        auto graph = std::make_shared<hipdnn_frontend::graph::Graph>();
+        graph->set_name("mul_mul_add_add_graph")
+            .set_io_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_intermediate_data_type(hipdnn_frontend::getDataTypeEnumFromType<InputType>())
+            .set_compute_data_type(hipdnn_frontend::DataType::FLOAT);
+
+        auto a = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("a")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+        auto x = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("x")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto b = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("b")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+        auto y = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("y")
+                .set_dim({n, c, h, w})
+                .set_stride({c * h * w, 1, c * w, c}));
+
+        auto bias = std::make_shared<hipdnn_frontend::graph::TensorAttributes>(
+            hipdnn_frontend::graph::Tensor_attributes()
+                .set_name("bias")
+                .set_dim({1, c, 1, 1})
+                .set_stride({c, 1, c, c}));
+
+        auto mulAttrs0 = hipdnn_frontend::graph::PointwiseAttributes()
+                             .set_name("mul0_node")
+                             .set_mode(hipdnn_frontend::PointwiseMode::MUL);
+        auto mulOut0 = graph->pointwise(a, x, mulAttrs0);
+
+        auto mulAttrs1 = hipdnn_frontend::graph::PointwiseAttributes()
+                             .set_name("mul1_node")
+                             .set_mode(hipdnn_frontend::PointwiseMode::MUL);
+        auto mulOut1 = graph->pointwise(b, y, mulAttrs1);
+
+        auto addAttrs0 = hipdnn_frontend::graph::PointwiseAttributes()
+                             .set_name("add0_node")
+                             .set_mode(hipdnn_frontend::PointwiseMode::ADD);
+        auto addOut0 = graph->pointwise(mulOut0, mulOut1, addAttrs0);
+
+        auto addAttrs1 = hipdnn_frontend::graph::PointwiseAttributes()
+                             .set_name("add1_node")
+                             .set_mode(hipdnn_frontend::PointwiseMode::ADD);
+        auto z = graph->pointwise(addOut0, bias, addAttrs1);
+        z->set_output(true);
+
+        HIPDNN_FE_CHECK(graph->build(handle));
+
+        return std::make_tuple(graph, a, x, b, y, bias, z);
+    };
+
+    auto backend = hipdnn_frontend::detail::hipdnnBackend();
+    if(!backend)
+    {
+        std::cout << "Creat backend failed. \n";
+        return 1;
+    }
+
+    hipdnnHandle_t handle;
+    HIPDNN_CHECK(backend->create(&handle));
+
+    auto [graph, a, x, b, y, bias, z] = buildMulMulAddAddGraph(handle);
+
+    hipdnn_data_sdk::utilities::Tensor<InputType> aTensor(a->get_dim(), a->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> xTensor(x->get_dim(), x->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> bTensor(b->get_dim(), b->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> yTensor(y->get_dim(), y->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> biasTensor(bias->get_dim(), bias->get_stride());
+    hipdnn_data_sdk::utilities::Tensor<InputType> zTensor(z->get_dim(), z->get_stride());
+
+    std::unordered_map<int64_t, void*> variantPack;
+    variantPack[a->get_uid()] = aTensor.memory().deviceData();
+    variantPack[x->get_uid()] = xTensor.memory().deviceData();
+    variantPack[b->get_uid()] = bTensor.memory().deviceData();
+    variantPack[y->get_uid()] = yTensor.memory().deviceData();
+    variantPack[bias->get_uid()] = biasTensor.memory().deviceData();
+    variantPack[z->get_uid()] = zTensor.memory().deviceData();
+
+    int64_t workspaceSize = 0;
+    HIPDNN_FE_CHECK(graph->get_workspace_size(workspaceSize));
+    const hipdnn_data_sdk::utilities::Workspace workspace(static_cast<size_t>(workspaceSize));
+
+    HIPDNN_FE_CHECK(graph->execute(handle, variantPack, workspace.get()));
+
+    std::cout << "MulMulAddAdd graph execution complete. \n";
+
+    HIPDNN_CHECK(backend->destroy(handle));
+    return 0;
+}