README.md 974 Bytes
Newer Older
wxj's avatar
wxj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
最新测试镜像: torch2.4.1-py3.10-dtk25.04-beta-das-alpha

该镜像自带transformer_engine1.8

git下载该项目

启动容器: 
```bash
docker run -it \
    --shm-size=32G \
    --device=/dev/kfd \
    --device=/dev/mkfd \
    --device=/dev/dri \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --ulimit memlock=-1:-1 \
    --ipc=host \
    --network=host \
    --group-add video \
    --privileged \
    --name nemo_dtk25.4 \
    -v /opt/hyhal:/opt/hyhal \
    -v /path/to/data/:/data \
    -v /path/to/workspace/:/workspace \
    ce83b4a462d9 \
    /bin/bash
```

安装依赖
```bash
cd nemo_dtk25-2.0.0.rc0.beta
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

cd .. && cd Megatron-LM-core_r0.7.0.beta
 pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

```

执行微调脚本:
单机八卡: `bash K100AI_finetune.sh >& K100AI_finetune.log`

wxj's avatar
wxj committed
44