2 DCU基础环境完整教程-Centos7.6:
2.1 非root用户安装注意事项:
确保非root用户已加入
video组,以便能够使用DCU。# 对于有sudo权限的非root用户 sudo usermod -aG video $USER # 对于无sudo权限的用户,由root执行 usermod -aG video <userid>
2.2. 操作系统设置:
确保启动项中不包含nomodeset选项,如果内核以nomodeset选项启动,则驱动可能无法成功加载。
需要保证系统纯净,命令行运行
lsmod | grep amdgpu为空关闭 selinux(可选)
修改
/etc/selinux/config,设置SELINUX=disabled关闭 firewalld(可选)
systemctl stop firewalld systemctl disable firewalld
2.3. 关闭内核自动更新:
编辑
/etc/yum.conf,在[main]部分添加:exclude=kernel* exclude=centos-release*
2.4. 更新yum源:
替换为中科大源,针对CentOS 7.6的示例, (注意使用双引号):
sed -e "s|^mirrorlist=|#mirrorlist=|g" -e "s|^#baseurl=http://mirror.centos.org/centos/\$releasever|baseurl=https://mirrors.ustc.edu.cn/centos-vault/$minorver|g" -i.bak /etc/yum.repos.d/CentOS-*.repo
替换
CentOS-CR.repo并且enable(安装python3需要)sed -i "s|^baseurl=http://mirror.centos.org/centos/\$releasever|baseurl=https://mirrors.ustc.edu.cn/centos-vault/$minorver|g;s|enabled=0|enabled=1|g" /etc/yum.repos.d/CentOS-CR.repo
对 Centos7 配置 SCLo 源(安装devtoolset需要):
编辑
/etc/yum.repos.d/CentOS-SCLo.repo配置文件;vi /etc/yum.repos.d/CentOS-SCLo.repo在
/etc/yum.repos.d/CentOS-SCLo.repo写入以下内容,Esc + :wq保存退出[centos-sclo-sclo] name=CentOS-7 - SCLo sclo baseurl=https://mirrors.ustc.edu.cn/centos/7/sclo/$basearch/sclo/ gpgcheck=0 enabled=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo [centos-sclo-rh] name=CentOS-7 - SCLo rh baseurl=https://mirrors.ustc.edu.cn/centos/7/sclo/$basearch/rh/ gpgcheck=0 enabled=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo
对 Centos7 配置 epel 源(安装cmake3需要)
编辑
/etc/yum.repos.d/epel-7.repo配置文件;vi /etc/yum.repos.d/epel-7.repo在/etc/yum.repos.d/epel-7.repo写入以下内容,
Esc + :wq保存退出;[epel] name=Extra Packages for Enterprise Linux 7 - $basearch baseurl=http://mirrors.aliyun.com/epel/7/$basearch failovermethod=priority enabled=1 gpgcheck=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 [epel-debuginfo] name=Extra Packages for Enterprise Linux 7 - $basearch – Debug baseurl=http://mirrors.aliyun.com/epel/7/$basearch/debug failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=0 [epel-source] name=Extra Packages for Enterprise Linux 7 - $basearch – Source baseurl=http://mirrors.aliyun.com/epel/7/SRPMS failovermethod=priority enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 gpgcheck=0
更新cache
yum clean all yum makecache
2.5. 安装相关依赖:
联网执行以下命令安装必要的依赖包:
#安装 DCU 加速卡驱动所需的依赖包命令 yum install -y gcc gcc-c++ rpm-build autoconf kernel-devel-$(uname -r) kernel-headers-$(uname -r) #安装cmake3 yum install -y cmake3 ln -s /usr/bin/cmake3 /usr/bin/cmake #安装 DTK 开发环境所需的依赖包命令 yum install -y centos-release-scl yum install -y gcc gcc-c++ gcc-gfortran elfutils elfutils-devel make rpm-build devtoolset-7 yum install -y libbabeltrace-devel libbabeltrace pciutils-devel libpciaccess-devel yum install -y numactl-devel elfutils-libelf-devel mesa-libGL-devel yum install -y epel-release cmake3 pciutils-libs pciutils-devel yum install -y perl-File-Which perl-File-BaseDir perl-File-Copy-Recursive perl-File-Listing java-1.8.0-openjdk yum install -y git python python-pip python-devel python-wheel python3 python3-pip python3-devel python3-wheel sqlite-devel libibverbs yum install -y redhat-lsb-core gettext gettext-devel protobuf yum install -y perl-Digest perl-Digest-MD5 perl-Data-Dumper vim-common curl libcurl libcurl-devel yum install -y doxygen graphviz texlive texlive-xtab texlive-multirow texlive-sectsty texlive-tocloft yum install -y texlive-adjustbox deltarpm tcl automake
离线安装依赖:
依赖离线包:rpm_DTK2404_centos7.6_3.10.0-957.tar
链接:https://pan.baidu.com/s/1jnWfddL4lHWVQb3btD34Iw?pwd=wj6q
提取码:wj6q
下载好导入服务器,进行解压:vi /etc/yum.repos.d/localyum.repo,输入如下内容,请根据解压的路径,修改baseurl的内容,下面配置是在root路径下解压的依赖包;[local-repo] name=local-repo baseurl=file:///root/centos7-dtk24.04 enabled=1 gpgcheck=0
开始安装离线依赖包
sudo yum clean all #以安装cmake3为例 yum --disablerepo="*" --enablerepo="local-repo" install cmake3
2.6. 校验系统配置:
| 设备名称 | 设备码 |
|---|---|
| Z100L | 1d94:55b7 |
| K100 | 1d94:62b7 |
| K100_AI | 1d94:6210 |
查看DCU设备
# Z100L
root@sugontest79:/mnt#lspci -nn |grep -i 55b7
------------------------------------------------------------------------------------------------------
07:00.0 Display controller [0380]: Chengdu Haiguang IC Design Co., Ltd. ZIFANG [1d94:55b7] (rev 01)
0a:00.0 Display controller [0380]: Chengdu Haiguang IC Design Co., Ltd. ZIFANG [1d94:55b7] (rev 01)
------------------------------------------------------------------------------------------------------
# K100
root@sugontest79:/mnt#lspci -nn |grep -i 62b7
------------------------------------------------------------------------------------------------------
07:00.0 Co-processor [0b40]: Chengdu Haiguang IC Design Co., Ltd. KONGMING [1d94:62b7] (rev 01)
0a:00.0 Co-processor [0b40]: Chengdu Haiguang IC Design Co., Ltd. KONGMING [1d94:62b7] (rev 01)
------------------------------------------------------------------------------------------------------
# K100-AI
root@sugontest79:/mnt#lspci -nn |grep -i 6210
------------------------------------------------------------------------------------------------------
07:00.0 Co-processor [0b40]: Chengdu Haiguang IC Design Co., Ltd. KONGMING [1d94:6210] (rev 01)
0a:00.0 Co-processor [0b40]: Chengdu Haiguang IC Design Co., Ltd. KONGMING [1d94:6210] (rev 01)
------------------------------------------------------------------------------------------------------
注:输出信息可能和截图不完全一致。
配置环境变量
创建文件 `/etc/profile.d/devtoolset-7.sh`,内容如下:
`source /opt/rh/devtoolset-7/enable`
退出当前登录会话重新登录,或者执行source /etc/profile.d/devtoolset-7.sh。
2.7. 安装驱动:
注意:
DTK和rock驱动有对应关系,可参考dcu-环境安装手册,推荐安装最新的使用
-安装驱动之前需要安装基础包,包括cmake、gcc等多种基础依赖包,请先参考DCU环境安装手册完成基础环境包的安装。
驱动下载地址: https://cancon.hpccube.com:65024/6/main → latest 驱动→ rock-xxx-xxx.aio.run
安装 DCU 加速卡驱动
chmod 755 rock-5.7.1-6.2.13-V1.0.1a.aio.run ./rock-5.7.1-6.2.13-V1.0.1a.aio.run
如果安装过程更新了
vbios, 则需要重启机器reboot
查看验证是否安装成功
# 出现如下类似结果则安装成功 [root@b04r3n02 ~]# lsmod | grep hydcu hydcu 1435342 0 hydcu_sched 34432 1 hydcu hyttm 61919 1 hydcu hykcl 46567 3 hydcu_sched,hydcu,hyttm hy_extra 32140 3 hydcu_sched,hydcu,hykcl amd_iommu_v2 18821 1 hydcu drm_kms_helper 179394 3 ast,hydcu,hykcl drm 429744 8 ast,ttm,hydcu,hykcl,hyttm,drm_kms_helper
卸载驱动步骤:
如遇到异常情况或需要更新版本,先执行驱动卸载。
rpm -qa | grep rock #查询安装的驱动版本 rmmod hydcu rpm -e rock-5.7.1-6.2.18-1.x86_64
2.8. 安装DTK:
DTK下载地址: https://cancon.hpccube.com:65024/1/main → latest → 对应的操作系统 → DTK-version-OS-version-x86_64.tar.gz
安装
# 解压安装 tar xvf DTK-24.04.1-CentOS7.6-x86_64.tar.gz -C /opt # 创建软连接 ln -s /opt/dtk-24.04.1 /opt/dtk
设置 DTK 环境变量
DTK 压缩文件中提供了设置环境变量脚本 env.sh。可以通过 source /opt/dtk/env.sh 的方式临时加载环境变量。为避免多次配置,常用以下方式加载环境变量
echo "source /opt/dtk/env.sh">> ~/.bashrc # 激活环境变量 source ~/.bashrc
验证 DCU 环境
# 查看并执行 hy-smi 或者 rocm-smi 指令查看 dcu 基本信息 [root@h01r4n04~]# rocm-smi # 出现如下内容, 则安装成功 ===================System Management Interface ================= ========================================================== DCU Temp AvgPwr Fan Perf PwrCap VRAM% DCU% 0 50.0c 55.0W 0.0% auto 450.0W 0% 0% 1 50.0c 58.0W 0.0% auto 450.0W 0% 0% 2 49.0c 58.0W 0.0% auto 450.0W 0% 0% 3 49.0c 55.0W 0.0% auto 450.0W 0% 0% ========================================================== ======================End of SMI Log========================
2.9. 验证安装结果:
使用
rocminfo命令检查ROCm系统状态
终端输入如下内容:
rocminfo | grep gfx # 其中Z100/Z100L为gfx906,K100为gfx926,K100_AI为gfx928; 有输出即说明驱动和DTK安装成功 ------------------------------------------------------------------------------------------------------------ Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack- Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
运行
hy-smi或rocm-smi来监控DCU的状态和性能指标;# 查看并执行 hy-smi 或者 rocm-smi 指令查看 dcu 基本信息 [root@h01r4n04~]# rocm-smi # 出现如下内容, 则安装成功 ============================ System Management Interface ============================= ====================================================================================== DCU Temp AvgPwr Perf PwrCap VRAM% DCU% Mode 0 42.0C 39.0W auto 280.0W 0% 0% Normal 1 41.0C 39.0W auto 280.0W 0% 0% Normal 2 41.0C 36.0W auto 280.0W 0% 0% Normal 3 40.0C 38.0W auto 280.0W 0% 0% Normal 4 40.0C 39.0W auto 280.0W 0% 0% Normal 5 41.0C 41.0W auto 280.0W 0% 0% Normal 6 42.0C 37.0W auto 280.0W 0% 0% Normal 7 41.0C 36.0W auto 280.0W 0% 0% Normal ====================================================================================== =================================== End of SMI Log ===================================