README.md 4.77 KB
Newer Older
dcuai's avatar
dcuai committed
1
# Qwen-7B
hepj987's avatar
hepj987 committed
2

hepj987's avatar
hepj987 committed
3
4
5
6
7
8
9
10
11
12
13
## 论文

Qwen-7B上增加视觉编码器得到Qwen-VL,论文与地址:

`Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities`

https://arxiv.org/pdf/2308.12966.pdf

## 模型结构

![qwen](qwen.jpg)
hepj987's avatar
hepj987 committed
14
15
16
17
18

```
通义千问-7B(Qwen-7B) 是阿里云研发的通义千问大模型系列的70亿参数规模的模型。Qwen-7B是基于Transformer的大语言模型, 在超大规模的预训练数据上进行训练得到。预训练数据类型多样,覆盖广泛,包括大量网络文本、专业书籍、代码等。
```

hepj987's avatar
hepj987 committed
19
## 算法原理
hepj987's avatar
hepj987 committed
20

hepj987's avatar
hepj987 committed
21
![qwen](qwen.png)
hepj987's avatar
hepj987 committed
22
23

```
hepj987's avatar
hepj987 committed
24
模型架构:Qwen-7B的构建采用了类似LLaMA的架构。与标准transformer的主要差异有:1)使用非连接嵌入、2)使用旋转位置嵌入、3)在注意力中除了QKV外不使用偏置、4)使用RMSNorm代替LayerNorm、5)使用SwiGLU代替ReLU、以及6)采用快速注意力来加速训练。该模型共有32层,嵌入维度为4096,注意力头数为32。
hepj987's avatar
hepj987 committed
25
26
```

hepj987's avatar
hepj987 committed
27
## 环境配置
hepj987's avatar
hepj987 committed
28

hepj987's avatar
hepj987 committed
29
30
### Docker(方式一)

hepj987's avatar
hepj987 committed
31
32
33
34
推荐使用docker方式运行,提供[光源](https://www.sourcefind.cn/#/main-page)拉取的docker镜像:

```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py39-latest
hepj987's avatar
hepj987 committed
35
36

docker run -dit --network=host --name=qwen_pytorch --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1  image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py39-latest
hepj987's avatar
hepj987 committed
37
38
39
40
41
42
43
44
45
46
47
docker exec -it qwen_pytorch /bin/bash
pip install -r requirements.txt  -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
```

## Dockerfile(方式二)

```
docker build -t qwen:latest .
docker run -dit --network=host --name=qwen_pytorch --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 qwen:latest
docker exec -it qwen_pytorch /bin/bash
pip install -r requirements.txt  -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
hepj987's avatar
hepj987 committed
48
49
```

hepj987's avatar
hepj987 committed
50
### conda(方式三)
hepj987's avatar
hepj987 committed
51
52

```
hepj987's avatar
hepj987 committed
53
conda create -n qwen python=3.9
hepj987's avatar
hepj987 committed
54
pip install -r requirements.txt  -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com
hepj987's avatar
hepj987 committed
55
56
57
58
59
60
61
```

[torch1.13-dtk23.04](https://cancon.hpccube.com:65024/directlink/4/pytorch/dtk23.04/torch-1.13.1+git55d300e.abi0.dtk2304-cp39-cp39-manylinux2014_x86_64.whl)

[deepspeed0.9.2-dtk23.04](https://cancon.hpccube.com:65024/directlink/4/deepspeed/dtk23.04/deepspeed-0.9.2+git25d5540.abi0.dtk2304.torch1.13.1-cp39-cp39-manylinux2014_x86_64.whl)

Tips:以上dtk驱动、python、deepspeed等工具版本需要严格一一对应。
hepj987's avatar
hepj987 committed
62

hepj987's avatar
hepj987 committed
63
64
65
66
67
68
### 注意

由于dtk版本的deepspeed目前最高是0.9.2因此需要进入虚拟环境修改一些版本判断

```
#到虚拟环境下对应的python/site-packages注释掉一些版本判断
hepj987's avatar
hepj987 committed
69
70
71
72
73
74
75
site-packages/accelerate/accelerator.py 文件

 287             #if not is_deepspeed_available():
 288             #    raise ImportError("DeepSpeed is not installed => run `pip install deepspeed` or build it from source.")
 289             #if compare_versions("deepspeed", "<", "0.9.3"):
 290             #    raise ImportError("DeepSpeed version must be >= 0.9.3. Please update DeepSpeed.")
 
hepj987's avatar
hepj987 committed
76
site-packages/transformers/utils/versions.py 文件
hepj987's avatar
hepj987 committed
77
78
79
80
 43     #if not ops[op](version.parse(got_ver), version.parse(want_ver)):
 44     #    raise ImportError(
 45     #        f"{requirement} is required for a normal functioning of this module, but found {pkg}=={got_ver}.{hint}"
 46     #    )
hepj987's avatar
hepj987 committed
81
82
83
84
```

其中apex、torch、deepspeed需要到[开发者社区](https://cancon.hpccube.com:65024/4/main/)下载对应版本

hepj987's avatar
hepj987 committed
85
## 数据集
hepj987's avatar
hepj987 committed
86
87

```
hepj987's avatar
hepj987 committed
88
89
90
91
92
93
94
95
使用alpaca_gpt4_zh数据集,已经包含在data目录中,具体文件为alpaca_gpt4_data_zh.json
```

```
#数据集树目录
data
├── alpaca_gpt4_data_en.json
└── alpaca_gpt4_data_zh.json
hepj987's avatar
hepj987 committed
96
97
```

hepj987's avatar
hepj987 committed
98
99


dcuai's avatar
dcuai committed
100
### 模型下载
hepj987's avatar
hepj987 committed
101

dongchy920's avatar
dongchy920 committed
102
SCNet下载[http://113.200.138.88:18080/aimodels/Qwen-7B-Chat](http://113.200.138.88:18080/aimodels/Qwen-7B-Chat)
hepj987's avatar
hepj987 committed
103

hepj987's avatar
hepj987 committed
104
## 训练
hepj987's avatar
hepj987 committed
105

hepj987's avatar
hepj987 committed
106
### 单节点
hepj987's avatar
hepj987 committed
107
108

```
hepj987's avatar
hepj987 committed
109
bash run-node.sh
hepj987's avatar
hepj987 committed
110
111
```

hepj987's avatar
hepj987 committed
112
### 多节点
hepj987's avatar
hepj987 committed
113

hepj987's avatar
hepj987 committed
114
115
116
117
```
#需要修改对应的节点名、加载对应虚拟环境以及模型路径等,修改hostfile为自己所用的节点
sh mpirun-nodes.sh
```
hepj987's avatar
hepj987 committed
118

hepj987's avatar
hepj987 committed
119
## result
hepj987's avatar
hepj987 committed
120

hepj987's avatar
hepj987 committed
121
![tuili](tuili.png)
hepj987's avatar
hepj987 committed
122

hepj987's avatar
hepj987 committed
123
### 精度
hepj987's avatar
hepj987 committed
124

hepj987's avatar
hepj987 committed
125
乌镇集群两节点八卡zero3训练
hepj987's avatar
hepj987 committed
126
127
128
129
130

|         train         |  loss  |
| :-------------------: | :----: |
| 1.44epoch(8780step) | 1.3917 |

hepj987's avatar
hepj987 committed
131
132
133
134
## 应用场景

### 算法类别

hepj987's avatar
hepj987 committed
135
`对话问答`
hepj987's avatar
hepj987 committed
136
137

### 热点应用行业
hepj987's avatar
hepj987 committed
138

hepj987's avatar
hepj987 committed
139
`科研,教育,政府,金融`
hepj987's avatar
hepj987 committed
140
141
142
143
144

## 源码仓库及问题反馈

https://developer.hpccube.com/codes/modelzoo/qwen-torch

hepj987's avatar
hepj987 committed
145
## 参考资料
hepj987's avatar
hepj987 committed
146
147

https://github.com/hiyouga/LLaMA-Efficient-Tuning/tree/main