Commit b9a3367a authored by zhangwq5's avatar zhangwq5
Browse files

all

parent a9188971
# Contributors
This file contains the list of everyone who contributed to the repository
<br>
<table>
<th>Contributors1</th><th>Contributors2</th> <tr>
<td><img src="xxx1">
<br>
<a href="xxx1">xxx1</a></td>
<td><img src="xxx2">
<br>
<a href="xxx2">xxx2</a></td>
</tr>
</table>
<br>
### Thanks to everyone who helped in building this Repository :)
Copyright 2018-2020 Open-MMLab. All rights reserved.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2018-2020 Open-MMLab.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# Qwen3-30B-A3B_vllm # Qwen3-30B-A3B_vllm
## 论文
`Qwen3 Technical Report`
- https://arxiv.org/abs/2505.09388
Qwen3-30B-A3B版本 ## 模型结构
\ No newline at end of file Qwen3-30B-A3B(Qwen/Qwen3-30B-A3B-Instruct-2507)在一般能力方面有显著提高,包括遵循指令、逻辑推理、文本理解、数学、科学、编码和工具使用。
跨多种语言的长尾知识覆盖的实质性增长。
在主观和开放式任务中与用户偏好明显更好的对齐,从而实现更有帮助的响应和更高质量的文本生成。
增强了256K长上下文理解能力。
<div align=center>
<img src="./doc/Qwen3-30B-A3B-Instruct-2507.jpeg"/>
</div>
## 环境配置
### 硬件需求
DCU型号:K100_AI,节点数量:1台,卡数:2张。
### Docker(方法一)
```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/vllm:0.8.5-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250724
docker run -it --name {docker_name} --device=/dev/kfd --privileged --network=host --device=/dev/dri --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /public/LLM-Models:/home/LLM-Models:ro -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --group-add video --shm-size 64G {imageID} bash
cd /your_code_path/qwen3-30b-a3b_vllm
```
### Dockerfile(方法二)
此处提供dockerfile的使用方法
```bash
cd docker
docker build --no-cache -t qwen3-30b-a3b:latest .
docker run -it --name {docker_name} --device=/dev/kfd --privileged --network=host --device=/dev/dri --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /public/LLM-Models:/home/LLM-Models:ro -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --group-add video --shm-size 64G {imageID} bash
cd /your_code_path/qwen3-30b-a3b_vllm
```
### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
```bash
DTK: 25.04
python: 3.10
vllm: 0.8.5
torch: 2.4.1+das.opt1.dtk25041
```
`Tips:以上dtk驱动、torch等DCU相关工具版本需要严格一一对应`
其它非深度学习库安装方式如下:
```bash
pip install transformers==4.51.1
```
## 数据集
暂无
## 训练
暂无
## 推理
### vllm推理Qwen3-30B-A3B
```bash
## Qwen3-30B-A3B 在 BF16 精度下,其模型权重本身大约是 61 GB,至少需要双卡部署推理
export HIP_VISIBLE_DEVICES=6,7
## 模型地址参数
python ./infer/infer_vllm.py --model /your_path/Qwen3-30B-A3B --tensor-parallel-size 2
```
## result
```
Original Input Prompt (if available):
'介绍一下北京.'
Generated text (full output):
'<think>\n好的,用户让我介绍一下北京。首先,我需要确定用户的需求是什么。可能他们计划去旅游,或者需要写一篇关于北京的文章,或者只是对北京感兴趣。不管怎样,我需要提供全面而简洁的信息。\n\n接下来,我应该考虑北京的主要特点。作为中国的首都,北京有重要的政治地位,比如中南海和人民大会堂。然后是历史文化方面,北京有众多的古迹,比如故宫、长城、颐和园,这些都是必提的。还有现代元素,比如CBD、中关村,显示北京的现代化发展。\n\n然后,用户可能还想知道北京的地理位置、气候、交通、美食等。比如,北京属于温带季风气候,四季分明,可能需要提到季节性的旅游建议。交通方面,地铁系统很发达,还有首都国际机场。美食的话,烤鸭、炸酱面、豆汁儿这些特色食物应该提到。\n\n另外,北京作为国际大都市,可能有国际学校、外国使馆,以及举办过奥运会,这些也是亮点。可能需要提到北京的教育和科技资源,比如清华、北大,以及中关村的科技企业。\n\n还要注意用户可能的深层需求。比如,如果他们计划旅游,可能需要推荐景点和最佳旅游时间。如果是学生,可能对教育机构感兴趣。如果是商务人士,可能关注经济和交通。\n\n需要确保信息准确,比如北京的面积、人口数据,以及历史沿革,比如作为多个朝代的都城。同时,避免过时的信息,比如最新的发展情况,比如北京冬奥会的影响。\n\n最后,结构要清晰,分点介绍,但不要太生硬。语言要口语化,自然流畅,避免使用专业术语过多,让不同背景的用户都能理解。可能需要检查是否有遗漏的重要信息,比如北京的空气质量或环保措施,但可能用户更关注旅游和文化方面,所以可以简要提及。\n\n总结下来,我需要涵盖政治、历史、文化、现代发展、地理、气候、交通、美食等方面,确保全面且重点突出,同时保持回答的易读性和实用性。\n</think>\n\n北京是中国的首都,也是世界著名古都和国际化大都市,拥有深厚的历史文化底蕴与现代化的城市风貌。以下是对北京的简要介绍:\n\n---\n\n### **1. 历史与文化**\n- **古都底蕴**:北京已有3000多年建城史,曾是元、明、清等朝代的都城,是中华文明的重要发源地之一。故宫、天坛、颐和园、长城等世界文化遗产,见证了其作为“帝王之都”的辉煌。\n- **文化中心**:北京是全国文化、教育、科技中心,拥有众多高校(如清华大学、北京大学)、博物馆(如国家博物馆、首都博物馆)和艺术机构,也是京剧、相声等传统文化的发源地。\n\n---\n\n### **2. 地理与气候**\n- **地理位置**:位于中国华北平原北端,背靠燕山,毗邻河北、天津,是连接华北与东北、西北的重要枢纽。\n- **气候特点**:属温带季风气候,四季分明,夏季炎热多雨,冬季寒冷干燥,春秋季短暂且多风沙。\n\n---\n\n### **3. 现代都市风貌**\n- **政治与经济**:作为中国的政治中心,中南海、人民大会堂等标志性建筑坐落于此;同时是经济、金融、科技高地,中关村聚集了众多科技企业,是“中国硅谷”。\n- **交通网络**:拥有发达的地铁系统(中国最密集的轨道交通之一)和首都国际机场,是全国铁路、航空枢纽。\n\n---\n\n### **4. 旅游景点**\n- **世界遗产**:长城(八达岭、慕田峪段)、故宫、颐和园、天坛、周口店北京人遗址等。\n- **现代地标**:国家体育场(鸟巢)、国家大剧院、央视大楼、三里屯、798艺术区等。\n- **自然风光**:香山红叶、十三陵水库、密云水库等。\n\n---\n\n### **5. 美食与生活**\n- **特色美食**:北京烤鸭(全聚德)、炸酱面、豆汁儿、卤煮、驴打滚等,小吃街如南锣鼓巷、簋街充满烟火气。\n- **生活节奏**:既有老北京的胡同文化(如南锣鼓巷、烟袋斜街),也有现代化的商圈(如国贸、金融街)。\n\n---\n\n### **6. 国际化与多元**\n- **国际交流**:北京是众多国际组织和外国使馆的所在地,也是2008年夏季奥运会和2022年冬季奥运会的举办城市。\n- **多元文化**:汇聚了来自世界各地的移民和留学生,形成了开放包容的城市氛围。\n\n---\n\n### **7. 挑战与机遇**\n- **环境问题**:曾面临雾霾等挑战,近年来通过治理空气质量、推广绿色能源等措施逐步改善。\n- **城市发展**:正通过“京津冀协同发展”战略,推动区域一体化,提升国际影响力。\n\n---\n\n北京是一座将历史与现代、传统与创新完美融合的城市,无论是探索古迹、感受文化,还是体验都市活力,都能找到独特的魅力。如果你有机会到访,不妨从故宫、长城开始,再深入胡同巷陌,感受这座城市的温度与故事。'
================================================================================
Logprobs per generated token:
Step 0:
- Generated Token: 151667 ('<think>')
- Top Logprobs:
- Rank 1: Token 151667 ('<think>') -> Logprob: -0.0000
- Rank 2: Token 32501 ('yped') -> Logprob: -16.6875
- Rank 3: Token 81218 (' zlib') -> Logprob: -17.5000
- Rank 4: Token 77899 (':len') -> Logprob: -17.9375
- Rank 5: Token 99048 (' zf') -> Logprob: -18.4375
- Rank 6: Token 117865 ('具体内容') -> Logprob: -18.5000
- Rank 7: Token 198 (' ') -> Logprob: -18.5625
- Rank 8: Token 18945 ('α') -> Logprob: -18.5625
- Rank 9: Token 67085 ('[param') -> Logprob: -19.0000
- Rank 10: Token 75025 ('yms') -> Logprob: -19.0000
...
...
成功将每个生成token的logprob写入到文件: ...
```
### 精度
```
# 分别在DCU和GPU上运行infer_vllm.py,得到各自的精度数据,并将精度数据复制粘贴到acc.py中运行
python ./infer/acc.py
```
结果
```
Qwen3-30B-A3B精度:0.002905419914469576
```
### vllm推理Qwen3-30B-A3B-Instruct-2507
```bash
## Qwen3-30B-A3B-Instruct-2507 至少需要双卡部署推理
export HIP_VISIBLE_DEVICES=6,7
## 模型地址参数
python ./infer/infer_vllm.py --model /your_path/Qwen3-30B-A3B-Instruct-2507 --tensor-parallel-size 2
```
## result
```
Original Input Prompt (if available):
'介绍一下北京.'
Generated text (full output):
'北京,简称“京”,是中国的首都,也是中华人民共和国的中央人民政府所在地,是全国的政治、文化、教育和国际交往中心。它位于中国华北平原的北部,地处燕山山脉与华北平原的交汇地带,地理坐标为北纬39°54′,东经116°23′,总面积约16,410平方公里。\n\n### 历史与文化\n北京拥有超过3000年的建城史和800多年的建都史,是中国历史上多个朝代的都城。自元朝起,北京成为全国的政治中心,明清两代在此建都,留下了大量珍贵的历史文化遗产。北京是世界著名的历史文化名城,拥有众多世界文化遗产,如:\n\n- **故宫**(紫禁城):明清两代的皇家宫殿,是世界上现存规模最大、保存最完整的古代宫殿建筑群。\n- **天坛**:明清皇帝祭天祈谷的场所,建筑布局严谨,象征“天圆地方”。\n- **颐和园**:中国现存规模最大、保存最完整的皇家园林,融合了自然景观与人工建筑。\n- **八达岭长城**:万里长城的代表段落,是世界文化遗产之一,也是中外游客必访之地。\n- **圆明园遗址**:曾被誉为“万园之园”,虽在第二次鸦片战争中被焚毁,但遗址仍具重要历史价值。\n- **天安门广场**:世界上最大的城市广场之一,是北京的象征性地标,也是国家举行重大庆典和政治活动的场所。\n\n### 城市风貌与现代发展\n北京是一座传统与现代交融的城市。在保留古都风貌的同时,也展现出高度现代化的城市面貌:\n\n- **城市布局**:以中轴线为核心,呈对称布局,从永定门到钟鼓楼,贯穿城市南北,体现了中国古代城市规划的智慧。\n- **现代地标**:国家大剧院(“蛋”)、中央电视台总部大楼(“大裤衩”)、北京国贸大厦、北京SKP等现代建筑彰显了城市的国际化形象。\n- **交通系统**:拥有发达的轨道交通网络,北京地铁是全球运营里程最长的城市地铁系统之一,覆盖全市主要区域。\n\n### 教育与科技\n北京是中国高等教育和科研的中心,拥有众多顶尖高校和研究机构,如:\n\n- 清华大学\n- 北京大学\n- 中国科学院\n- 中国工程院\n\n这些机构在科技、工程、医学、人文等领域具有国际影响力。\n\n### 旅游与美食\n北京是国内外游客向往的旅游目的地,每年吸引数千万游客。除了上述名胜古迹,还有:\n\n- **胡同与四合院**:如南锣鼓巷、什刹海,是体验老北京生活文化的窗口。\n- **北京烤鸭**:享誉世界的特色美食,以全聚德、便宜坊为代表。\n- **豆汁儿、焦圈、炸酱面、艾窝窝**等传统小吃也极具地方特色。\n\n### 环境与生态\n近年来,北京大力推进生态文明建设,实施“蓝天保卫战”,空气质量持续改善。城市绿化覆盖率不断提高,拥有奥林匹克森林公园、北京植物园、香山公园等大型生态空间。\n\n### 总结\n北京是一座集历史厚重感与现代活力于一体的城市,既是中华文明的重要象征,也是中国走向世界的重要窗口。无论你是追寻历史足迹,还是感受现代都市魅力,北京都能为你带来深刻而难忘的体验。'
================================================================================
Logprobs per generated token:
Step 0:
- Generated Token: 68990 ('北京')
- Top Logprobs:
- Rank 1: Token 68990 ('北京') -> Logprob: -0.0019
- Rank 2: Token 103942 ('当然') -> Logprob: -6.2519
- Rank 3: Token 104554 ('北京市') -> Logprob: -11.3769
- Rank 4: Token 99692 ('好的') -> Logprob: -13.5019
- Rank 5: Token 108386 ('你好') -> Logprob: -13.5019
- Rank 6: Token 111308 ('您好') -> Logprob: -14.1269
- Rank 7: Token 106287 ('嗯') -> Logprob: -15.2519
- Rank 8: Token 106114 ('首都') -> Logprob: -16.8769
- Rank 9: Token 110488 ('北京时间') -> Logprob: -16.8769
- Rank 10: Token 334 ('**') -> Logprob: -17.3769
...
...
成功将每个生成token的logprob写入到文件: ...
```
### 精度
```
# 分别在DCU和GPU上运行infer_vllm.py,得到各自的精度数据,并将精度数据复制粘贴到acc.py中运行
python ./infer/acc.py
```
结果
```
Qwen3-30B-A3B-Instruct-2507精度:0.006542379854522551
```
DCU与GPU精度一致,推理框架:vllm。
## 应用场景
### 算法类别
`对话`
### 热点应用行业
`金融,教育,政府,科研,制造,能源,交通`
## 预训练权重
- [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B)
- [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/granite-speech_pytorch
## 参考资料
- https://github.com/ibm-granite/granite-speech-models
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/vllm:0.8.5-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250724
\ No newline at end of file
icon.png

53.8 KB

[
-0.002492894185706973,
-0.20206475257873535,
-0.14872165024280548,
-3.6954811548639555e-06,
0.0,
-2.3841855067985307e-07,
-0.038103267550468445,
-0.0006967739318497479,
-6.0794889577664435e-05,
-3.099436753473128e-06
]
\ No newline at end of file
[
-0.001943962648510933,
-0.25255143642425537,
-0.1344442367553711,
-2.9802276912960224e-06,
0.0,
-2.3841855067985307e-07,
-0.03809638321399689,
-0.0007833749987185001,
-7.64102369430475e-05,
-4.0531076592742465e-06
]
\ No newline at end of file
[
-2.3841855067985307e-07,
-2.753696753643453e-05,
-0.0630415603518486,
-3.3378546504536644e-06,
-0.13829903304576874,
-0.018993528559803963,
-0.006734886672347784,
-2.3841855067985307e-07,
-0.038042906671762466,
-0.008138706907629967
]
\ No newline at end of file
[
-5.960462772236497e-07,
-1.8954096958623268e-05,
-0.06287578493356705,
-2.50339189733495e-06,
-0.12281982600688934,
-0.014945676550269127,
-0.006732518319040537,
-1.1920928955078125e-07,
-0.029751574620604515,
-0.0070809368044137955
]
\ No newline at end of file
import numpy as np
logprobs_1 = np.array([
-0.002492894185706973,
-0.20206475257873535,
-0.14872165024280548,
-3.6954811548639555e-06,
0.0,
-2.3841855067985307e-07,
-0.038103267550468445,
-0.0006967739318497479,
-6.0794889577664435e-05,
-3.099436753473128e-06
])
logprobs_2 = np.array([
-0.001943962648510933,
-0.25255143642425537,
-0.1344442367553711,
-2.9802276912960224e-06,
0.0,
-2.3841855067985307e-07,
-0.03809638321399689,
-0.0007833749987185001,
-7.64102369430475e-05,
-4.0531076592742465e-06
])
print(np.mean(np.abs(logprobs_1 - logprobs_2)))
\ No newline at end of file
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
from vllm import LLM, EngineArgs, SamplingParams
from vllm.utils import FlexibleArgumentParser
import json
def create_parser():
parser = FlexibleArgumentParser()
# Add engine args
EngineArgs.add_cli_args(parser)
parser.set_defaults(model="Qwn3/Qwen3-30B-A3B")
# Add sampling params
sampling_group = parser.add_argument_group("Sampling parameters")
sampling_group.add_argument("--max-tokens", type=int, default=8192,
help="Maximum number of tokens to generate in a single response.")
sampling_group.add_argument("--temperature", type=float, default=0.0,
help="Temperature for sampling. Higher values make output more random.")
sampling_group.add_argument("--top-p", type=float, default=1.0,
help="Top-p sampling probability. Only tokens with cumulative probability below top_p are considered.")
sampling_group.add_argument("--top-k", type=int, default=1,
help="Top-k sampling. -1 means no top-k.")
# Add example params
parser.add_argument("--chat-template-path", type=str,
help="Path to a custom chat template file (Jinja format).")
return parser
def main(args: dict):
# Pop arguments not used by LLM
max_tokens = args.pop("max_tokens")
temperature = args.pop("temperature")
top_p = args.pop("top_p")
top_k = args.pop("top_k")
chat_template_path = args.pop("chat_template_path")
# Create an LLM
llm = LLM(**args)
# Create sampling params object
sampling_params = SamplingParams(
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
top_k=top_k,
logprobs=10
)
# A chat template can be optionally supplied.
# If not, the model will use its default chat template.
chat_template = None
if chat_template_path is not None:
with open(chat_template_path) as f:
chat_template = f.read()
print(f"Loaded custom chat template from: {chat_template_path}")
# Define the single conversation for demonstration
single_conversation = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "介绍一下北京."},
]
outputs = llm.chat(single_conversation, sampling_params, use_tqdm=False, chat_template=chat_template)
print(f"Original Input Prompt (if available):\n{single_conversation[1]['content']!r}\n")
first_10_logprobs_to_save = []
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Generated text (full output):\n{generated_text!r}")
print("=" * 80)
logprobs_per_step = output.outputs[0].logprobs
if logprobs_per_step is None:
print("Logprobs not returned. Check your SamplingParams.")
continue
print("\nLogprobs per generated token:")
for step_idx, step_logprobs_dict in enumerate(logprobs_per_step[:10]):
generated_token_info = None
for token_id, logprob_obj in step_logprobs_dict.items():
if logprob_obj.rank == 1:
generated_token_info = (token_id, logprob_obj.decoded_token)
break
if generated_token_info:
token_id, token_text = generated_token_info
print(f" Step {step_idx}:")
print(f" - Generated Token: {token_id} ('{token_text}')")
else:
print(f" Step {step_idx}: (Could not find rank-1 token)")
continue
sorted_logprobs = sorted(step_logprobs_dict.values(), key=lambda x: x.rank)
print(" - Top Logprobs:")
for logprob_obj in sorted_logprobs:
token_id = next(tid for tid, lp in step_logprobs_dict.items() if lp is logprob_obj) #
token_text = logprob_obj.decoded_token
logprob_value = logprob_obj.logprob
rank = logprob_obj.rank
print(f" - Rank {rank}: Token {token_id} ('{token_text}') -> Logprob: {logprob_value:.4f}")
if rank == 1:
first_10_logprobs_to_save.append(logprob_value)
output_filename = './Qwen3-30B-A3B_logprobs_K100AI_fp16.json'
with open(output_filename, 'w') as f:
json.dump(first_10_logprobs_to_save, f, indent=2)
print(f"成功将每个生成token的logprob写入到文件: {output_filename}")
if __name__ == "__main__":
parser = create_parser()
args: dict = vars(parser.parse_args())
main(args)
\ No newline at end of file
# 模型唯一标识
modelCode = 1695
# 模型名称
modelName=qwen3-30b-a3b_vllm
# 模型描述
modelDescription=qwen3-30b-a3b是一个非思考模式(non-thinking mode)的新模型,仅激活3B参数,就能取得可媲美 Gemini 2.5-Flash(non-thinking)、GPT-4o等顶尖闭源模型的超强性能。
# 应用场景
appScenario=推理,对话问答,制造,广媒,金融,能源,医疗
# 框架类型
frameType=vllm
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment