README.md 5.79 KB
Newer Older
chenych's avatar
chenych committed
1
# 模型名称(跟原生模型一致,优先模型系列名,特殊情况可单独以模型某size命名)
chenzk's avatar
v1.0  
chenzk committed
2
## 论文
chenych's avatar
chenych committed
3
[此处填写论文名称](此处填写论文地址链接)
liuhy's avatar
liuhy committed
4

chenych's avatar
chenych committed
5
**如果没有写`暂无`**
chenzk's avatar
v1.0.6  
chenzk committed
6

chenych's avatar
chenych committed
7
8
## 模型简介
简要介绍模型结构,根据论文或者原生模型介绍内容填写,如果有模型结构或者模型算法图则放图,没有则不放。
liuhy's avatar
liuhy committed
9

chenzk's avatar
v1.0.6  
chenzk committed
10
<div align=center>
chenych's avatar
chenych committed
11
    <img src="./doc/xxxx.png"/>
chenzk's avatar
v1.0.6  
chenzk committed
12
13
</div>

chenych's avatar
chenych committed
14
15
16
17
18
19
## 环境依赖
- 列举基础环境需求,根据实际情况填写

| 软件 | 版本 |
| :------: | :------: |
| DTK | xxx |
chenych's avatar
chenych committed
20
21
22
23
24
25
| Python | xx |
| Transformers | xx |
| vLLM | xx |
| SGLang | xx |
| Paddlepaddle | xx |
| DeepSpeed | xx |
chenych's avatar
chenych committed
26

chenych's avatar
chenych committed
27
28
29
30
31
32
33
推荐使用镜像:
> 如果存在`vLLM`和`SGLang`需要不同镜像的,要分别列出,比如:
- **vLLM推理请使用:** xxxxxx(镜像地址,如 image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10)
- **SGLang推理请使用:** xxxxxx(镜像地址,如 harbor.sourcefind.cn:5443/dcu/admin/base/custom:sglang-0.5.10-glm5-0416)
> 如果一个镜像即可,就不用单独列出,比如
xxxxxx(镜像地址,如 image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10,单个镜像的时候使用)

chenych's avatar
chenych committed
34
- 挂载地址`-v``{docker_name}``{docker_image_name}`根据实际模型情况修改
chenych's avatar
chenych committed
35
- 下面以`vLLM`镜像启动示例,如果使用`SGLang`,请对应替换镜像地址(如果仅`vLLM``SGLang`框架推理,可不加这句话)
liuhy's avatar
liuhy committed
36

chenych's avatar
chenych committed
37
```bash
chenych's avatar
chenych committed
38
docker run -it \
chenych's avatar
chenych committed
39
    --shm-size 256g \
chenych's avatar
chenych committed
40
41
42
43
44
45
46
47
48
49
50
51
52
    --network=host \
    --name {docker_name} \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
    {docker_image_name} bash
chenych's avatar
chenych committed
53

chenych's avatar
chenych committed
54
示例如下(展示到modelzoo上的内容,就是将上面的{docker_image_name} {docker_name}根据实际模型填写)
chenych's avatar
chenych committed
55
docker run -it \
chenych's avatar
chenych committed
56
    --shm-size 256g \
chenych's avatar
chenych committed
57
58
59
60
61
62
63
64
65
66
67
68
69
    --network=host \
    --name qwen3 \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
    image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10 bash
chenzk's avatar
v1.0  
chenzk committed
70
```
chenych's avatar
chenych committed
71
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
liuhy's avatar
liuhy committed
72

chenych's avatar
chenych committed
73
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装,其它包参照requirements.txt安装:
chenzk's avatar
v1.0  
chenzk committed
74
75
76
```
pip install -r requirements.txt
```
chenzk's avatar
chenzk committed
77

chenych's avatar
chenych committed
78
79
## 数据集
[公开数据集名称](公开数据集官网下载地址,过小文件可打包到项目里。)
liuhy's avatar
liuhy committed
80

chenzk's avatar
v1.0  
chenzk committed
81
此处提供数据预处理脚本的使用方法
chenych's avatar
chenych committed
82
```bash
chenzk's avatar
v1.0  
chenzk committed
83
84
python xxx.py
```
chenzk's avatar
v1.0.1  
chenzk committed
85
项目中已提供用于试验训练的迷你数据集,训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:
chenzk's avatar
v1.0  
chenzk committed
86
87
```
 ── dataset
chenych's avatar
chenych committed
88
    │   ├── filename_1
chenzk's avatar
v1.0  
chenzk committed
89
90
91
    │             ├── xxx.png
    │             ├── xxx.png
    │             └── ...
chenych's avatar
chenych committed
92
    │   └── filename_2
chenzk's avatar
v1.0  
chenzk committed
93
94
95
96
    │             ├── xxx.png
    │             ├── xxx.png
    │             └── ...
```
chenych's avatar
chenych committed
97
**如果没有数据集,写`暂无`**
chenych's avatar
chenych committed
98

chenzk's avatar
v1.0  
chenzk committed
99
## 训练
chenych's avatar
chenych committed
100
101
102
1. `单机训练``多机训练`方法根据实际情况选择填写即可。
2. 如果没有训练脚本,则写`暂无`,后面`单机训练``多机训练`章节删掉。

chenych's avatar
chenych committed
103
### 单机训练
chenych's avatar
chenych committed
104
105
```bash
sh xxx.sh 或 python xxx.py
chenzk's avatar
v1.0  
chenzk committed
106
```
liuhy's avatar
liuhy committed
107

chenych's avatar
chenych committed
108
### 多机训练
chenych's avatar
chenych committed
109
110
```bash
sh xxx.sh 或 python xxx.py
chenzk's avatar
v1.0  
chenzk committed
111
```
chenych's avatar
chenych committed
112

chenzk's avatar
v1.0  
chenzk committed
113
## 推理
chenych's avatar
chenych committed
114
1. 推理框架有`Transformers``vLLM``SGLang`或者其他推理框架中任意一个即可,至少有一个;
chenych's avatar
chenych committed
115
2. `单机推理``多机推理`章节根据模型大小自行选择。
chenych's avatar
chenych committed
116

chenych's avatar
chenych committed
117
### Transformers
chenych's avatar
chenych committed
118
#### 单机推理
chenych's avatar
chenych committed
119
120
```bash
sh xxx.sh 或 python xxx.py
chenych's avatar
chenych committed
121
122
123
```

#### 多机推理
chenych's avatar
chenych committed
124
125
```bash
sh xxx.sh 或 python xxx.py
chenych's avatar
chenych committed
126
127
```

chenych's avatar
chenych committed
128
### vLLM
chenych's avatar
chenych committed
129
#### 单机推理
chenych's avatar
chenych committed
130
131
```bash
sh xxx.sh 或 python xxx.py
chenych's avatar
chenych committed
132
133
134
```

#### 多机推理
chenych's avatar
chenych committed
135
136
```bash
sh xxx.sh 或 python xxx.py
chenych's avatar
chenych committed
137
138
139
140
```

### SGLang
#### 单机推理
chenych's avatar
chenych committed
141
142
```bash
sh xxx.sh 或 python xxx.py
chenych's avatar
chenych committed
143
144
145
```

#### 多机推理
chenych's avatar
chenych committed
146
147
```bash
sh xxx.sh 或 python xxx.py
chenzk's avatar
v1.0  
chenzk committed
148
```
chenych's avatar
chenych committed
149
150
...

chenych's avatar
chenych committed
151
## 效果展示
chenzk's avatar
v1.0.6  
chenzk committed
152
此处填算法效果测试图(包括输入、输出)
liuhy's avatar
liuhy committed
153

chenzk's avatar
v1.0.6  
chenzk committed
154
155
156
<div align=center>
    <img src="./doc/xxx.png"/>
</div>
liuhy's avatar
liuhy committed
157

chenzk's avatar
v1.0.1  
chenzk committed
158
### 精度
liuhy's avatar
liuhy committed
159
160
测试数据:[test data](链接),使用的加速卡:xxx。

chenzk's avatar
v1.0  
chenzk committed
161
根据测试结果情况填写表格:
liuhy's avatar
liuhy committed
162
163
164
165
| xxx | xxx | xxx | xxx | xxx |
| :------: | :------: | :------: | :------: |:------: |
| xxx | xxx | xxx | xxx | xxx  |
| xxx | xx | xxx | xxx | xxx |
chenzk's avatar
v1.0.5  
chenzk committed
166

chenych's avatar
chenych committed
167
168
169
如果资源限制无法做到,至少要保证中英文测试用例DCU输出正常,填写:
`DCU与GPU精度一致,推理框架:XXX(测试使用的推理框架)。`

chenzk's avatar
v1.0  
chenzk committed
170
## 预训练权重
chenych's avatar
chenych committed
171
172
> `BW1100`显存不一样,所需卡数不一样,可单列一行写

chenych's avatar
chenych committed
173
174
| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
chenych's avatar
chenych committed
175
| Qwen3 | 4B | K100AI,BW1000,... | 1 | 填写公开预训练权重官网下载地址(必须),使用`[Hugging Face](链接)``[Modelscope](链接)`格式,样例如下[Hugging Face](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
chenych's avatar
chenych committed
176

177
## 源码仓库及问题反馈
chenzk's avatar
v1.0  
chenzk committed
178
- 此处填本项目gitlab地址
chenych's avatar
chenych committed
179

chenzk's avatar
v1.0  
chenzk committed
180
## 参考资料
chenzk's avatar
v1.0.1  
chenzk committed
181
- 此处填源github地址(方便使用者查看原github issue)
chenzk's avatar
v1.0  
chenzk committed
182
183
- 此处填参考项目或教程网址
- ......
chenzk's avatar
chenzk committed
184

chenych's avatar
chenych committed
185
186
187
其他说明:
关于model.properties(必要)、LICENSE(必要)、CONTRIBUTORS、模型图标(必要)等其它信息提供参照:[`ModelZooStd.md`](./ModelZooStd.md)
各个模型需要保留原项目README.md,改名为README_origin.md即可。