更换dtk24.04.1镜像

bd4dbb83 · dcuai · 62a0aebd · bd4dbb83
Commit bd4dbb83 authored Aug 15, 2024 by dcuai
Hide whitespace changes
Inline Side-by-side

Showing with 14 additions and 9 deletions

README.md README.md +14 -9

No files found.
--- a/README.md
+++ b/README.md
@@ -53,12 +53,12 @@ LLaMA 2是LLaMA的新一代版本，具有商业友好的许可证。 LLaMA 2 
 ### Docker(方式一)
 推荐使用docker方式运行，提供拉取的docker镜像：
 ```
-docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py37-latest
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
 ```
 进入docker，安装docker中没有的依赖:
 ```
-docker run -dit --network=host --name=llama-tencentpretrain --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py37-latest
+docker run -dit --network=host --name=llama-tencentpretrain --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v /opt/hyhal:/opt/hyhal:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
 docker exec -it llama-tencentpretrain /bin/bash
 pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
 ```
@@ -66,18 +66,22 @@ pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trus
 ```
 docker build -t llama:latest .
-docker run -dit --network=host --name=llama-tencentpretrain --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 llama:latest
+docker run -dit --network=host --name=llama-tencentpretrain --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v /opt/hyhal:/opt/hyhal:ro llama:latest
 docker exec -it llama-tencentpretrain /bin/bash
 ``` 
 ### Conda(方式三)
 1. 创建conda虚拟环境：
 ```
-conda create -n chatglm python=3.7
+conda create -n llama-tencentpretrain python=3.10
 ```
 2. 关于本项目DCU显卡所需的工具包、深度学习库等均可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
- [DTK 23.04](https://cancon.hpccube.com:65024/1/main/DTK-23.04.1)
+```
- [Pytorch 1.13.1](https://cancon.hpccube.com:65024/4/main/pytorch/dtk23.04)
+DTK软件栈：dtk24.04.1
- [Deepspeed 0.9.2](https://cancon.hpccube.com:65024/4/main/deepspeed/dtk23.04)
+python：python3.10
+torch：2.1.0
+torchvision：0.16.0
+deepspeed: 0.12.3
+```
    Tips：以上dtk驱动、python、deepspeed等工具版本需要严格一一对应。
@@ -208,7 +212,7 @@ TencentPretrain格式模型推理请参考[llama_inference_pytorch](https://deve
 手臂的英文是“arm”。
 ```
-## 精度
+### 精度
 - 利用公开指令数据集[alpaca_gpt4_data_zh.json](https://huggingface.co/datasets/shibing624/alpaca-zh)，基于汉化ChineseLLaMA的7B、13B基础模型，我们进行指令微调训练实验，以下为训练Loss：
 <div align="center">
 <figure class="half">
@@ -241,7 +245,8 @@ TencentPretrain格式模型推理请参考[llama_inference_pytorch](https://deve
 项目中的预训练权重可从快速下载通道下载：
-[llama-7b-hf](http://113.200.138.88:18080/aimodels/llama-7b-hf)
+* [llama-7b-hf](http://113.200.138.88:18080/aimodels/llama-7b-hf)
+* [llama-13b-hf](http://113.200.138.88:18080/aimodels/llama-13b-hf)
 ## 源码仓库及问题反馈