Commit 7dd5c009 authored by wangsen's avatar wangsen
Browse files

install fpgemm

parent 227f16ef
[2025-12-23 11:33:01,663 module.py:579 INFO] - * mlcr get,dataset,squad,language-processing
[2025-12-23 11:33:01,685 module.py:579 INFO] - * mlcr get,sys-utils-mlc
[2025-12-23 11:33:01,707 module.py:579 INFO] - * mlcr detect,os
[2025-12-23 11:33:01,707 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/detect-os_f5e42c55/mlc-cached-state.json
[2025-12-23 11:33:01,715 customize.py:86 INFO] -
[2025-12-23 11:33:01,715 customize.py:87 INFO] - ***********************************************************************
[2025-12-23 11:33:01,715 customize.py:89 INFO] - This script will attempt to install minimal system dependencies for MLC.
[2025-12-23 11:33:01,715 customize.py:91 INFO] - Note that you may be asked for your SUDO password ...
[2025-12-23 11:33:01,715 customize.py:92 INFO] - ***********************************************************************
[2025-12-23 11:33:01,715 module.py:5403 INFO] - ! cd /root/MLC/repos/local/cache/get-sys-utils-mlc_17b756ec
[2025-12-23 11:33:01,715 module.py:5404 INFO] - ! call /root/MLC/repos/mlcommons@mlperf-automations/script/get-sys-utils-mlc/run-ubuntu.sh from tmp-run.sh
[2025-12-23 13:30:33,955 module.py:2170 INFO] - - cache UID: 17b756ec0a4c4a7f
[2025-12-23 13:30:33,975 module.py:579 INFO] - * mlcr download-and-extract,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,011 module.py:579 INFO] - * mlcr download,file,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,033 module.py:579 INFO] - * mlcr detect,os
[2025-12-23 13:30:34,033 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/detect-os_f5e42c55/mlc-cached-state.json
[2025-12-23 13:30:34,041 customize.py:78 INFO] -
[2025-12-23 13:30:34,041 customize.py:79 INFO] - Downloading from https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,041 customize.py:204 INFO] - wget -nc --tries=3 -O 'dev-v1.1.json' https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,041 module.py:5403 INFO] - ! cd /root
[2025-12-23 13:30:34,041 module.py:5404 INFO] - ! call /root/MLC/repos/mlcommons@mlperf-automations/script/download-file/run.sh from tmp-run.sh
[2025-12-23 13:35:36,829 module.py:579 INFO] - * mlcr get,dataset,squad,language-processing
[2025-12-23 13:35:36,845 module.py:579 INFO] - * mlcr get,sys-utils-mlc
[2025-12-23 13:35:36,845 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/get-sys-utils-mlc_17b756ec/mlc-cached-state.json
[2025-12-23 13:35:36,866 module.py:579 INFO] - * mlcr download-and-extract,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,893 module.py:579 INFO] - * mlcr download,file,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,906 module.py:579 INFO] - * mlcr detect,os
[2025-12-23 13:35:36,906 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/detect-os_f5e42c55/mlc-cached-state.json
[2025-12-23 13:35:36,914 customize.py:78 INFO] -
[2025-12-23 13:35:36,914 customize.py:79 INFO] - Downloading from https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,914 customize.py:204 INFO] - wget -nc --tries=3 -O 'dev-v1.1.json' https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,915 module.py:5403 INFO] - ! cd /root
[2025-12-23 13:35:36,915 module.py:5404 INFO] - ! call /root/MLC/repos/mlcommons@mlperf-automations/script/download-file/run.sh from tmp-run.sh
[2025-12-23 13:41:49,374 module.py:5549 INFO] - ! call "postprocess" from /root/MLC/repos/mlcommons@mlperf-automations/script/download-file/customize.py
[2025-12-23 13:41:49,392 module.py:2170 INFO] - - cache UID: f947e0e1c67046d8
[2025-12-23 13:41:49,393 module.py:5549 INFO] - ! call "postprocess" from /root/MLC/repos/mlcommons@mlperf-automations/script/download-and-extract/customize.py
[2025-12-23 13:41:49,410 module.py:2170 INFO] - - cache UID: 2dd2ae98e6c04853
[2025-12-23 13:41:49,410 module.py:5549 INFO] - ! call "postprocess" from /root/MLC/repos/mlcommons@mlperf-automations/script/get-dataset-squad/customize.py
[2025-12-23 13:41:49,427 module.py:2170 INFO] - - cache UID: 3e5d8df6a7e047bd
[2025-12-23 13:41:49,427 module.py:2201 INFO] - {
"return": 0,
"env": {
"MLC_DATASET_SQUAD_VAL_PATH": "/root/dev-v1.1.json",
"MLC_DATASET_SQUAD_PATH": "/root",
"MLC_DATASET_PATH": "/root"
},
"new_env": {
"MLC_DATASET_SQUAD_VAL_PATH": "/root/dev-v1.1.json",
"MLC_DATASET_SQUAD_PATH": "/root",
"MLC_DATASET_PATH": "/root"
},
"state": {},
"new_state": {},
"deps": [
"get,sys-utils-mlc",
"download-and-extract,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json"
]
}
[2025-12-23 13:41:49,427 module.py:2245 INFO] - Path to SQUAD dataset: /root/dev-v1.1.json
这份 `README.md` 是专门为你提供的 **海光 DCU (DTK 25.04.2)** 环境定制的。它整合了你提供的 Docker 镜像信息、编译步骤,以及我们之前解决的 **OneTBB 路径修复****HIP 类型适配**等关键补丁。
---
# FBGEMM_GPU 安装指南 (DTK 25.04.2)
本指南用于在海光加速卡环境下,基于 **DTK 25.04.2****PyTorch 2.5.1/2.7** 编译并安装 `fbgemm_gpu`
## 1. 环境启动
使用包含 DTK 25.04.2 的 PyTorch 镜像启动容器:
```bash
# 拉取镜像
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10
# 启动容器
docker run -it \
--network=host \
--ipc=host \
--shm-size=16G \
--device=/dev/kfd \
--device=/dev/mkfd \
--device=/dev/dri \
-v /opt/hyhal:/opt/hyhal \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10
```
## 2. 获取源码
下载并初始化 `FBGEMM` 仓库(推荐版本 v1.3.0):
```bash
git clone https://github.com/pytorch/FBGEMM.git --branch v1.3.0
cd FBGEMM/fbgemm_gpu
git submodule update --init --recursive
```
## 3. 关键补丁与修复 (必须执行)
在编译前,需要针对 DTK 25.04.2 的类型定义和 TBB 库结构进行手动修复:
### a. 系统环境与代码清理
```bash
# 建立系统库链接
ln -s /lib/x86_64-linux-gnu/librt.so.1 /usr/lib/x86_64-linux-gnu/librt.so
```
### b. 移除断言检查 CUDA_KERNEL_ASSERT2
![cuda_kernel_assert2](./imgs/cuda_kernel_assert2.png)
### c. 移除 ./include/fbgemm_gpu/utils/kernel_launcher.cuh 中的 TORCH_CHECK
![torch_check](./imgs/torch_check.png)
## 4. 编译与安装
```bash
# 1. 清理缓存
rm -rf _skbuild build
# 2. 设置环境变量
export PYTORCH_ROCM_ARCH="gfx936"
export ROCM_PATH=/opt/dtk
export BUILD_VERSION="1.3.0"
export TORCH_PATH="/usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch"
# 3. 执行安装命令
# 注意:通过 --no-as-needed 强制链接 TBB 并烧录 RPATH 以免运行时报错
python setup.py install \
-DFBGEMM_BUILD_TARGET=default \
-DFBGEMM_BUILD_VARIANT=rocm \
-DCMAKE_PREFIX_PATH="${TORCH_PATH};/opt/dtk" \
-DAMDGPU_TARGETS="${PYTORCH_ROCM_ARCH}" \
-DTORCH_DIR="${TORCH_PATH}" \
-DCMAKE_C_FLAGS="-Wno-return-type -Wno-ignored-attributes -O1" \
-DCMAKE_CXX_FLAGS="-Wno-return-type -Wno-ignored-attributes -O1 -DhipDeviceProp_t=hipDeviceProp_t_v2" \
-DCMAKE_HIP_FLAGS="-Wno-return-type -O1 -DhipDeviceProp_t=hipDeviceProp_t_v2" \
-DUSE_AVX512=on \
-DCOPY_VISIBLE_LIBRARIES=ON \
-DCMAKE_SHARED_LINKER_FLAGS="-Wl,--no-as-needed -ltbb -Wl,-rpath,/usr/lib/x86_64-linux-gnu" \
-DTBB_INCLUDE_DIR=/usr/include
```
## 5. 验证安装
安装完成后,在 Python 中验证:
```python
import torch
import fbgemm_gpu
print("FBGEMM_GPU 导入成功!")
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment