Commit 5e75fc76 authored by wangkx1's avatar wangkx1
Browse files

change

parent b9e18ed6
本工程已经实现 hip 转码,无需 hip 转码。
基于 2401.2 版本,实现hip转码。配合 6.0.7.19 的 FSL 可以在DCU上正常运行
在这里分享下 hip 转码 的经验;
1、基于 DTK-25.04.1 完成转码以及 FSL 工程的适配;
```bash
hipconvertinplace-perl.sh fsl-eddy/cuda
```
cuda 文件夹下原有的代码,转码后以 org-name.h/cu.prehip 形式存储在当前目录
由于要使用hip编译, 因此所有的 cu 后缀, 修改为 hip 或者 cpp;
2、对应Makefile, 所有的 cu 后缀, 修改为 hip 或者 cpp;
cudabuild/cuda${CUDA_VER}/%.o: cuda/%.cu ==> cudabuild/cuda${CUDA_VER}/%.o: cuda/%.cpp
原有的nvcc 编译器修改为 hipcc
3、代码中编译的问题解决:
- CublasHandleManager.h
```cpp
#if !defined(ROCM_SYMLINK_HIPBLAS_H)
#error hipblas.h must be included at the very top of any file including CublasHandleManager.h
#endif
CUBLAS_V2_H_ 更改为 ROCM_SYMLINK_HIPBLAS_H
```
- 增加文件: math_constants.h
- EddyMatrixKernels.cpp 中不支持 min 的问题解决
```bash
__global__ void QR(// Input
const float *K, // Row-first matrices to decompose
unsigned int m, // Number of rows of K
unsigned int n, // Number of columns of K
unsigned int nmat, // Number of matrices
// Output
float *Qt, // nmat mxm Q matrices
float *R) // nmat mxn R matrices
{
extern __shared__ float scratch[];
if (blockIdx.x < nmat && threadIdx.x < m) {
unsigned int id = threadIdx.x;
// unsigned int ntpm = min(m,blockDim.x); // Number of threads per matrix
unsigned int ntpm = (m < blockDim.x) ? m : blockDim.x;
float *v = scratch;
float *w = &scratch[m];
const float *lK = &K[blockIdx.x*m*n];
float *lQt = &Qt[blockIdx.x*m*m];
float *lR = &R[blockIdx.x*m*n];
qr_single(lK,m,n,v,w,id,ntpm,lQt,lR);
}
return;
}
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment