Commit fdcfde81 authored by songlinfeng's avatar songlinfeng 💬
Browse files

Update README.md

parent 717e8cd7
......@@ -82,7 +82,140 @@ c-3000.com/hcu=hcu-73873c7a6eb02041
c-3000.com/hcu=hcu-73873c7a6eb040a1
```
### docker rootless下对文件读写权限的限制
非root用户在使用-v挂载目录时,保留原有权限,无法对ro目录添加w权限
非root用户在使用-v挂载目录时,保留原有权限,无法对ro目录添加w权限,执行该命令需要root权限
```sh
dtk-ctk rootless --runtime=docker
$ dtk-ctk rootless --runtime=docker
```
### DCU Tracker
DCU Tracker用来监控使用--gpus和-e DTK_VISIBLE_DEVICES启动容器的DCU使用情况,也可以设置DCU被独享或共享。默认情况下DCU Tracker是disable状态,用户可以使用dtk-ctk的命令行来enable.
DCU Tracker 提供命令行来控制容器对DCU的访问,可以被设置为shared或exclusive.
- shared 表示DCU可以同时被多个容器一起使用,这是默认状态
- exclusive 表示DCU同时只能被一个容器使用。
```sh
$ dtk-ctk dcu-tracker -h
NAME:
C-3000 DTK Container Toolkit CLI dcu-tracker - DCU Tracker related commands
USAGE:
dtk-ctk dcu-tracker [dcu-ids] [accessibility]
Arguments:
dcu-ids Comma-separated list of DCU IDs (comma separated list, range operator, all)
accessibility Must be either 'exclusive' or 'shared'
Examples:
dtk-ctk dcu-tracker 0,1,2 exclusive
dtk-ctk dcu-tracker 0,1-2 shared
dtk-ctk dcu-tracker all shared
OR
dtk-ctk dcu-tracker [command] [options]
COMMANDS:
disable Disable the DCU Tracker
enable Enable the DCU Tracker
reset Reset the DCU Tracker
status Show Status of DCUs
help, h Shows a list of commands or help for one command
OPTIONS:
--help, -h show help
```
###使用DCU Tracker
通过rocm-smi来查看节点上的DCUs
```sh
$ rocm-smi
========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 52.0c 56.0W 600Mhz 1000Mhz 0% auto 400.0W 0% 0%
1 57.0c 48.0W 600Mhz 1000Mhz 0% auto 400.0W 0% 0%
2 52.0c 66.0W 600Mhz 1000Mhz 0% auto 400.0W 0% 0%
====================================================================================
=============================== End of ROCm SMI Log ================================
```
- 查看DCU Tracker Status
如果DCU Tracker enabled,DCU默认被赋予 shared 权限
```sh
$ dtk-ctk dcu-tracker status
------------------------------------------------------------------------------------------------------------------------
GPU Id UUID Accessibility Container Ids
------------------------------------------------------------------------------------------------------------------------
0 0x73873C7A6EB02041 Shared None
1 0x73873C7A6EB008A1 Shared None
2 0x73873C7A6EB040A1 Shared None
```
如果DCU Tracker没有开启,则会有相应提示
```sh
$ dtk-ctk dcu-tracker status
DCU Tracker is disabled
```
- 开启 DCU Tracker
```sh
$ dtk-ctk dcu-tracker status
DCU Tracker is disabled
$ dtk-ctk dcu-tracker enable
DCU Tracker has been enabled
$ dtk-ctk dcu-tracker enable
DCU Tracker is already enabled
```
- 关闭 DCU Tracker
```sh
$ dtk-ctk dcu-tracker disable
DCU Tracker has been disabled
```
- 设置DCU的访问权限
当DCU Tracker开启时,启动容器时会自动记录容器使用DCU的情况
```sh
$ docker run --name slf_dmps -e DTK_VISIBLE_DEVICES=0,1 -it a4dd5be0ca23
$ docker run --name slf_dmp -e DTK_VISIBLE_DEVICES=0,1 -it a4dd5be0ca23
$ dtk-ctk dcu-tracker status
------------------------------------------------------------------------------------------------------------------------
GPU Id UUID Accessibility Container Ids
------------------------------------------------------------------------------------------------------------------------
0 0x73873C7A6EB02041 Shared 3d07700f961485c678999ea1a0fecaaf0b54f3be51f4a1e9a2f1ae61032b276d
dc3c3153ab2e1cde5013a5e5d116cf467894949d1ef4b29ba8caa23a40f66d8d
1 0x73873C7A6EB008A1 Shared 3d07700f961485c678999ea1a0fecaaf0b54f3be51f4a1e9a2f1ae61032b276d
dc3c3153ab2e1cde5013a5e5d116cf467894949d1ef4b29ba8caa23a40f66d8d
2 0x73873C7A6EB040A1 Shared None
$ docker rm slf_dmp
$ dtk-ctk dcu-tracker status
------------------------------------------------------------------------------------------------------------------------
GPU Id UUID Accessibility Container Ids
------------------------------------------------------------------------------------------------------------------------
0 0x73873C7A6EB02041 Shared dc3c3153ab2e1cde5013a5e5d116cf467894949d1ef4b29ba8caa23a40f66d8d
1 0x73873C7A6EB008A1 Shared dc3c3153ab2e1cde5013a5e5d116cf467894949d1ef4b29ba8caa23a40f66d8d
2 0x73873C7A6EB040A1 Shared None
```
- 设置DCU 为exclusive属性
```sh
$ dtk-ctk dcu-tracker 1 exclusive
DCUs [1] have been made exclusive
$ dtk-ctk dcu-tracker status
------------------------------------------------------------------------------------------------------------------------
GPU Id UUID Accessibility Container Ids
------------------------------------------------------------------------------------------------------------------------
0 0x73873C7A6EB02041 Shared dc3c3153ab2e1cde5013a5e5d116cf467894949d1ef4b29ba8caa23a40f66d8d
1 0x73873C7A6EB008A1 Exclusive dc3c3153ab2e1cde5013a5e5d116cf467894949d1ef4b29ba8caa23a40f66d8d
2 0x73873C7A6EB040A1 Shared None
$ docker run --name slf_dmp -e DTK_VISIBLE_DEVICES=0,1 -it a4dd5be0ca23
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: failed to create DTK Container Runtime: failed to construct OCI spec modifier: failed to reserve DCUs: DCUs [1] are exclusive and already in use: unknown.
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment