change

5e75fc76 · wangkx1 · b9e18ed6 · 5e75fc76
Commit 5e75fc76 authored Mar 04, 2026 by wangkx1
Hide whitespace changes
Inline Side-by-side

Showing with 63 additions and 0 deletions

README.md README.md +63 -0

No files found.
--- a/README.md
+++ b/README.md
+本工程已经实现 hip 转码，无需 hip 转码。
+
+基于 2401.2 版本，实现hip转码。配合 6.0.7.19 的 FSL 可以在DCU上正常运行
+
+在这里分享下 hip 转码 的经验;
+
+1、基于 DTK-25.04.1 完成转码以及 FSL 工程的适配;
+
+```bash
+hipconvertinplace-perl.sh fsl-eddy/cuda
+```
+cuda 文件夹下原有的代码，转码后以 org-name.h/cu.prehip 形式存储在当前目录
+
+由于要使用hip编译, 因此所有的 cu 后缀, 修改为 hip 或者 cpp;
+
+2、对应Makefile, 所有的 cu 后缀, 修改为 hip 或者 cpp;
+
+cudabuild/cuda${CUDA_VER}/%.o: cuda/%.cu ==>  cudabuild/cuda${CUDA_VER}/%.o: cuda/%.cpp
+
+原有的nvcc 编译器修改为 hipcc
+
+3、代码中编译的问题解决:
+
+- CublasHandleManager.h
+
+```cpp
+#if !defined(ROCM_SYMLINK_HIPBLAS_H)
+#error hipblas.h must be included at the very top of any file including CublasHandleManager.h
+#endif
+
+从 CUBLAS_V2_H_ 更改为  ROCM_SYMLINK_HIPBLAS_H
+```
+
+- 增加文件: math_constants.h
+
+- EddyMatrixKernels.cpp 中不支持 min 的问题解决
+
+```bash
+__global__ void QR(// Input
+		   const float  *K,     // Row-first matrices to decompose
+		   unsigned int m,      // Number of rows of K
+		   unsigned int n,      // Number of columns of K
+		   unsigned int nmat,   // Number of matrices
+		   // Output
+		   float        *Qt,    // nmat mxm Q matrices
+		   float        *R)     // nmat mxn R matrices
+{
+  extern __shared__ float scratch[];
+
+  if (blockIdx.x < nmat && threadIdx.x < m) {
+    unsigned int id = threadIdx.x;
+    // unsigned int ntpm = min(m,blockDim.x); // Number of threads per matrix
+    unsigned int ntpm = (m < blockDim.x) ? m : blockDim.x;
+    float *v = scratch;
+    float *w = &scratch[m];
+    const float *lK = &K[blockIdx.x*m*n];
+    float *lQt = &Qt[blockIdx.x*m*m];
+    float *lR = &R[blockIdx.x*m*n];
+    qr_single(lK,m,n,v,w,id,ntpm,lQt,lR);
+  }
+  return;
+}
+```
\ No newline at end of file