README.md 5.43 KB
Newer Older
chenzk's avatar
v1.0  
chenzk committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# QwQ-32B
参数精简,性能不减,以1/21小参数媲美DeepSeek R1 6710亿参数的性能,成本仅1/10。

## 论文
`无`

## 模型结构
QwQ-32B采用transformer通用的Decoder-only结构。
<div align=center>
    <img src="./doc/qwen.png"/>
</div>

## 算法原理
强大的基础模型+大规模强化学习=强大的推理能力,这是当前大语言模型训练的有效新方向。除了基础推理能力外,QwQ-32B还集成了与Agent相关的能力,使其能够在使用工具的同时进行批判性思考,并根据环境反馈调整推理过程。

作者暂未公布具体采用的何种强化学习算法,若为GRPO,原理如下:

算法核心点:通过反向KL散度约束,GRPO实现了更稳定的策略更新。与TRPO的硬约束不同,采用软约束形式,既能保证训练稳定性,又避免了复杂的二阶优化计算,β负责动态调节探索与利用的平衡系数。
<div align=center>
    <img src="./doc/algorithm.png"/>
</div>

<div align=center>
    <img src="./doc/GRPO.png"/>
</div>

GRPO算法工作流程:
<div align=center>
    <img src="./doc/GRPO_flow.png"/>
</div>


## 环境配置
```
mv QwQ-32B_pytorch QwQ-32B # 去框架名后缀
```

### Docker(方法一)
```
chenzk's avatar
v1.0.2  
chenzk committed
40
41
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10
# <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:dee41741fb40
chenzk's avatar
v1.0.4  
chenzk committed
42
docker run -it --shm-size=64G --network host -v $PWD/QwQ-32B:/home/QwQ-32B -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name qwq <your IMAGE ID> bash
chenzk's avatar
v1.0  
chenzk committed
43
44
45
46
47
48
49
cd /home/QwQ-32B
pip install -r requirements.txt
```
### Dockerfile(方法二)
```
cd /home/QwQ-32B/docker
docker build --no-cache -t qwq:latest .
chenzk's avatar
v1.0.4  
chenzk committed
50
docker run --shm-size=64G --network host --name qwq -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../QwQ-32B:/home/QwQ-32B -it qwq bash
chenzk's avatar
v1.0  
chenzk committed
51
52
53
54
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
55
- https://developer.sourcefind.cn/tool/
chenzk's avatar
v1.0  
chenzk committed
56
```
chenzk's avatar
v1.0.3  
chenzk committed
57
DTK驱动:dtk2504
chenzk's avatar
v1.0  
chenzk committed
58
python:python3.10
chenzk's avatar
v1.0.3  
chenzk committed
59
60
61
torch:2.4.1
torchvision:0.19.1
triton:3.0.0
chenzk's avatar
v1.0  
chenzk committed
62
63
64
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
chenzk's avatar
v1.0.3  
chenzk committed
65
apex:1.4.0
chenzk's avatar
v1.0  
chenzk committed
66
67
68
69
70
71
72
73
74
75
76
transformers:4.48.0
```

`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`

2、其它非特殊库参照requirements.txt安装
```
cd /home/QwQ-32B
pip install -r requirements.txt
```

chenzk's avatar
v1.0.2  
chenzk committed
77
78
K100AI卡还可参考源项目中的[`README_k100ai`](./README_k100ai.md)进行使用。

chenzk's avatar
v1.0  
chenzk committed
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
## 数据集
`无`

## 训练


## 推理
预训练权重目录结构:
```
/home/QwQ-32B
    └── Qwen/QwQ-32B
``` 

### 单机多卡
```
# 方法一:transformers推理
python infer.py
chenzk's avatar
v1.0.2  
chenzk committed
96

chenzk's avatar
v1.0  
chenzk committed
97
# 方法二:VLLM推理
chenzk's avatar
v1.0.2  
chenzk committed
98
python infer_vllm.py # 默认设置为4张计算卡: tensor_parallel_size=4
chenzk's avatar
v1.0  
chenzk committed
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
```

## result

`输入: `
```
prompt: "How many r's are in the word \"strawberry\""
```

`输出:`
```
# vllm
Generated text: 'Okay, let\'s see... The user is asking how many times the letter \'r\' appears in the word "strawberry". Hmm, I need to make sure I get this right. First, I should probably write down the word and look at each letter one by one.\n\nSo, the word is S-T-R-A-W-B-E-R-R-Y. Let me break it down:\n\n1. S\n2. T\n3. R\n4. A\n5. W\n6. B\n7. E\n8. R\n9. R\n10. Y\n\nWait, let me count again to be sure. Sometimes when letters are close together, like the Rs here, it\'s easy to miscount. Starting over:\n\nFirst letter: S (no)\nSecond: T (no)\nThird: R (yes, that\'s one)\nFourth: A (no)\nFifth: W (no)\nSixth: B (no)\nSeventh: E (no)\nEighth: R (second one)\nNinth: R (third one)\nTenth: Y (no)\n\nSo, total Rs are at positions 3, 8, and 9. That makes three Rs. Wait, but sometimes people might pronounce it differently or maybe spell it differently? Let me check the spelling again. Strawberry is spelled S-T-R-A-W-B-E-R-R-Y. Yeah, that\'s correct. So there are three Rs. But hold on, maybe I missed an R somewhere else? Let me go through each letter once more slowly:\n\nS (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). Yep, that\'s three Rs. The first R is after the T, then two Rs towards the end. So the answer should be three. But I remember sometimes people might confuse it with "strawbery" without the second R, but no, the correct spelling has two Rs at the end. So yeah, three Rs in total.\n</think>\n\nThe word "strawberry" contains **3** instances of the letter \'r\'. Here\'s the breakdown:\n\n1. **R** at the 3rd position  \n2. **R** at the 8th position  \n3. **R** at the 9th position  \n\nSo, the final count is **3 r\'s**.'
```

### 精度
DCU与GPU精度一致,推理框架:pytorch。

## 应用场景
### 算法类别
`对话问答`
### 热点应用行业
`制造,广媒,金融,能源,医疗,家居,教育`
## 预训练权重
魔搭社区下载地址为:[QwQ-32B](https://www.modelscope.cn/models/Qwen/QwQ-32B)
chenzk's avatar
chenzk committed
124
125

预训练权重快速下载中心:[SCNet AIModels](https://www.scnet.cn/ui/aihub/models) ,项目中的预训练权重可从快速下载通道下载:[QwQ-32B](https://www.scnet.cn/ui/aihub/models/bal043/QwQ-32B)
chenzk's avatar
v1.0  
chenzk committed
126
127
128
129
130
## 源码仓库及问题反馈
- http://developer.sourcefind.cn/codes/modelzoo/QwQ-32B_pytorch.git
## 参考资料
- https://www.modelscope.cn/models/Qwen/QwQ-32B