README.md 5.91 KB
Newer Older
yanjl1's avatar
yanjl1 committed
1
# samples
yanjl1's avatar
Initial  
yanjl1 committed
2
3
4
5
6

本项目提供了 hipDNN(HIP Deep Neural Network)前端 API 的使用示例,覆盖海光 DCU(Deep Computing Unit)硬件上常用的深度学习算子、融合算子以及 PyTorch 集成用法。

## 环境要求

yanjl1's avatar
yanjl1 committed
7
8
- **DTK 版本**:≥ 26.04.2
- **CMake 版本**:≥ 3.25.2
yanjl1's avatar
Initial  
yanjl1 committed
9
10
11
12
13
14
- **支持架构**`gfx906``gfx926``gfx928``gfx936``gfx938``gfx92a`
- **依赖**`hipdnn`(Python/C++)、`hip::host``hipdnn_frontend``PyTorch`(Python 示例)

所有开发和运行都需要先加载 DTK 环境:

```bash
yanjl1's avatar
yanjl1 committed
15
source /opt/dtk-26.04.2/env.sh
yanjl1's avatar
Initial  
yanjl1 committed
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
```

> 若未加载,C++ 编译时会报错 `Must be source dtk/env.sh`(`ROCM_PATH` 未设置)。

## 目录结构

```
.
├── cpp/          # C++ 示例(hipDNN Frontend C++ API)
│   ├── CMakeLists.txt
│   ├── utils.hpp              # 错误检查宏(HIP_CHECK / HIPDNN_CHECK / HIPDNN_FE_CHECK)
│   ├── build/                 # 编译输出目录
│   ├── convolution/           # 卷积前向/反向/权值更新
│   ├── conv_fusion/           # 卷积融合:bias + ReLU/Swish/PReLU/Add 等
│   ├── conv_depthtospace_fusion/   # 卷积 + DepthToSpace 融合
│   ├── concat_conv_fusion/    # Concat + 卷积融合
│   ├── matmul/                # 矩阵乘法
│   ├── matmul_fusion/         # MatMul + bias + 激活
│   ├── batchnorm/             # BatchNorm 推理/训练/反向
│   ├── layernorm/             # LayerNorm
│   ├── groupnorm/             # GroupNorm
│   ├── instancenorm/          # InstanceNorm
│   ├── rmsnorm/               # RMSNorm
│   ├── sdpa/                  # Scaled Dot-Product Attention
│   ├── rope/                  # RoPE(旋转位置编码)
│   ├── deformconvolution/     # 可变形卷积
│   ├── deformattention/       # 可变形注意力
│   ├── adamw/                 # AdamW 优化器
│   ├── softmax/               # Softmax
│   ├── reduction/             # Reduce / Pointwise+Reduce
│   ├── transpose/             # Transpose
│   ├── pointwise/             # 逐元素二元运算
│   ├── ctc_loss/              # CTC Loss
│   ├── kthvalue/              # Top-K / KthValue
│   ├── multi_margin_loss/     # MultiMarginLoss
│   ├── soft_margin_loss/      # SoftMarginLoss
│   ├── block_scale/           # 块量化/反量化
│   └── ...
├── python/       # Python 示例(hipdnn Python API + PyTorch)
yanjl1's avatar
yanjl1 committed
55
56
57
58
59
60
61
62
63
64
    ├── convolution/
    ├── conv_fusion/
    ├── matmul/
    ├── sdpa/
    ├── batchnorm/
    ├── layernorm/
    ├── groupnorm/
    ├── adamw/
    ├── torch_wrapper/         # PyTorch 模块封装(如 TorchPReLU)
    └── ...
yanjl1's avatar
Initial  
yanjl1 committed
65
66
67
68
69
70
```

## 编译 C++ 示例

```bash
cd cpp/build
yanjl1's avatar
yanjl1 committed
71
72
cmake ..
make
yanjl1's avatar
Initial  
yanjl1 committed
73
74
75
76
77
```

编译完成后,可执行文件位于 `cpp/build/bin/`。如需单独编译某个示例:

```bash
yanjl1's avatar
yanjl1 committed
78
79
make conv_forward
make sdpa_inference
yanjl1's avatar
Initial  
yanjl1 committed
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
```

> `CMakeLists.txt` 中部分示例被注释掉(如 `bn_finalize`、`block_scale_quantize`、`slice`、`rng`),如需启用请取消对应 `add_hipdnn_sample(...)` 行的注释。

## 运行示例

**C++ 示例:**

```bash
./cpp/build/bin/conv_forward
./cpp/build/bin/softmax
./cpp/build/bin/sdpa_inference
```

**Python 示例:**

```bash
cd python/softmax
python softmax.py
```

Python 示例依赖 `import hipdnn``import torch`,张量需创建在 `device="cuda"` 上。

运行前需安装 hipdnn Python whl 包(在已加载 DTK 环境的前提下):

```bash
pip install ${ROCM_PATH}/share/hipdnn/wheels/hipdnn-*.whl
```

## 算子示例分类

| 分类 | C++ 路径 | Python 路径 | 说明 |
|------|----------|-------------|------|
| 卷积 | `convolution/``conv_fusion/``conv_depthtospace_fusion/``concat_conv_fusion/` | `convolution/``conv_fusion/``conv_depthtospace_fusion/``concat_conv_fusion/` | 前向、反向、权值梯度、融合 bias/激活/ReLU/Swish/PReLU/INT8/DepthToSpace |
| 矩阵乘法 | `matmul/``matmul_fusion/` | `matmul/``matmul_fusion/` | MatMul、MatMul+bias+激活 |
| 归一化 | `batchnorm/``layernorm/``groupnorm/``instancenorm/``rmsnorm/` | `batchnorm/``layernorm/``groupnorm/``instancenorm/``rmsnorm/` | 推理、训练、反向 |
| 注意力 | `sdpa/``rope/``deformattention/` | `sdpa/``rope/``deformattention/` | SDPA、RoPE、可变形注意力 |
| 优化器 | `adamw/` | `adamw/` | AdamW、Transformer 调度 AdamW |
| 融合算子 | `fusion/``conv_bn_fusion/` | `fusion/``conv_bn_fusion/` | add+layernorm、groupnorm+swish、pointwise+conv+genstats、scale/bias 融合 |
| 量化 | `block_scale/``conv_fusion/Int8*` | `block_scale/``conv_fusion/convint8_*` | INT8 卷积、块量化/反量化 |
| PyTorch 封装 | — | `torch_wrapper/` | `hipdnn.TorchPReLU()` 等模块级封装 |
| 其他 | `softmax/``reduction/``transpose/``pointwise/``ctc_loss/``kthvalue/` 等 | `softmax/``reduction/``transpose/``pointwise/``ctc_loss/``kthvalue/` 等 | 常用算子及 Loss |

## 快速开始

1. 加载 DTK 环境:
   ```bash
   source /data/dtk-26.04/env.sh
   ```

2. 编译 C++ 示例并运行:
   ```bash
yanjl1's avatar
yanjl1 committed
132
   cd cpp/build && cmake  .. && make
yanjl1's avatar
Initial  
yanjl1 committed
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
   ./bin/conv_forward
   ```

3. 运行 Python 示例:
   ```bash
   cd python/softmax
   python softmax.py
   ```

## 常见问题排查

| 现象 | 原因 | 解决方式 |
|------|------|----------|
| `Must be source dtk/env.sh` | `ROCM_PATH` 未设置 | 先执行 `source /data/dtk-26.04/env.sh` |
| CMake 找不到 `hipdnn_frontend` | hipDNN 未安装或环境未加载 | 检查 `${ROCM_PATH}/lib/cmake/hipdnn/` 是否存在 |
| CUDA 相关报错 | PyTorch 张量未放至 GPU | 确保张量使用 `device="cuda"` |
yanjl1's avatar
yanjl1 committed
149
| 编译警告被当作错误 | CMake 开启了 `-Werror` | 修复代码中的警告,或临时在 `CMakeLists.txt` 中移除 `-Werror` |