all

b9a3367a · zhangwq5 · a9188971 · b9a3367a · b9a3367a · b9a3367a
Commit b9a3367a authored Aug 08, 2025 by zhangwq5
13 changed files
--- a/Contributors.md
+++ b/Contributors.md
+# Contributors
+This file contains the list of everyone who contributed to the repository
+<br>
+<table>
+<th>Contributors1</th><th>Contributors2</th>  <tr>
+    <td><img src="xxx1">
+    <br>
+    <a href="xxx1">xxx1</a></td>
+    <td><img src="xxx2">
+    <br>
+    <a href="xxx2">xxx2</a></td>
+  </tr>
+</table>
+<br>
+### Thanks to everyone who helped in building this Repository :)
--- a/LICENSE
+++ b/LICENSE
+Copyright 2018-2020 Open-MMLab. All rights reserved.
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright 2018-2020 Open-MMLab.
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.md
+++ b/README.md
 # Qwen3-30B-A3B_vllm
+## 论文
+`Qwen3 Technical Report`
+- https://arxiv.org/abs/2505.09388
-Qwen3-30B-A3B版本
+## 模型结构
\ No newline at end of file
+Qwen3-30B-A3B（Qwen/Qwen3-30B-A3B-Instruct-2507）在一般能力方面有显著提高，包括遵循指令、逻辑推理、文本理解、数学、科学、编码和工具使用。
+跨多种语言的长尾知识覆盖的实质性增长。
+在主观和开放式任务中与用户偏好明显更好的对齐，从而实现更有帮助的响应和更高质量的文本生成。
+增强了256K长上下文理解能力。
+<div align=center>
+    <img src="./doc/Qwen3-30B-A3B-Instruct-2507.jpeg"/>
+</div>
+## 环境配置
+### 硬件需求
+DCU型号：K100_AI,节点数量：1台,卡数：2张。
+### Docker（方法一）
+```bash
+docker pull image.sourcefind.cn:5000/dcu/admin/base/vllm:0.8.5-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250724
+docker run -it --name {docker_name} --device=/dev/kfd --privileged --network=host --device=/dev/dri --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /public/LLM-Models:/home/LLM-Models:ro -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --group-add video --shm-size 64G {imageID} bash
+cd /your_code_path/qwen3-30b-a3b_vllm
+```
+### Dockerfile（方法二）
+此处提供dockerfile的使用方法
+```bash
+cd docker
+docker build --no-cache -t qwen3-30b-a3b:latest .
+docker run -it --name {docker_name} --device=/dev/kfd --privileged --network=host --device=/dev/dri --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /public/LLM-Models:/home/LLM-Models:ro -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --group-add video --shm-size 64G {imageID} bash
+cd /your_code_path/qwen3-30b-a3b_vllm
+```
+### Anaconda（方法三）
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
+```bash
+DTK: 25.04
+python: 3.10
+vllm: 0.8.5
+torch: 2.4.1+das.opt1.dtk25041
+```
+`Tips：以上dtk驱动、torch等DCU相关工具版本需要严格一一对应`
+其它非深度学习库安装方式如下：
+```bash
+pip install transformers==4.51.1
+```
+## 数据集
+暂无
+## 训练
+暂无
+## 推理
+### vllm推理Qwen3-30B-A3B
+```bash
+## Qwen3-30B-A3B 在 BF16 精度下，其模型权重本身大约是 61 GB，至少需要双卡部署推理
+export HIP_VISIBLE_DEVICES=6,7 
+## 模型地址参数
+python ./infer/infer_vllm.py --model /your_path/Qwen3-30B-A3B --tensor-parallel-size 2
+```
+## result
+```
+Original Input Prompt (if available):
+'介绍一下北京.'
+Generated text (full output):
+'<think>\n好的，用户让我介绍一下北京。首先，我需要确定用户的需求是什么。可能他们计划去旅游，或者需要写一篇关于北京的文章，或者只是对北京感兴趣。不管怎样，我需要提供全面而简洁的信息。\n\n接下来，我应该考虑北京的主要特点。作为中国的首都，北京有重要的政治地位，比如中南海和人民大会堂。然后是历史文化方面，北京有众多的古迹，比如故宫、长城、颐和园，这些都是必提的。还有现代元素，比如CBD、中关村，显示北京的现代化发展。\n\n然后，用户可能还想知道北京的地理位置、气候、交通、美食等。比如，北京属于温带季风气候，四季分明，可能需要提到季节性的旅游建议。交通方面，地铁系统很发达，还有首都国际机场。美食的话，烤鸭、炸酱面、豆汁儿这些特色食物应该提到。\n\n另外，北京作为国际大都市，可能有国际学校、外国使馆，以及举办过奥运会，这些也是亮点。可能需要提到北京的教育和科技资源，比如清华、北大，以及中关村的科技企业。\n\n还要注意用户可能的深层需求。比如，如果他们计划旅游，可能需要推荐景点和最佳旅游时间。如果是学生，可能对教育机构感兴趣。如果是商务人士，可能关注经济和交通。\n\n需要确保信息准确，比如北京的面积、人口数据，以及历史沿革，比如作为多个朝代的都城。同时，避免过时的信息，比如最新的发展情况，比如北京冬奥会的影响。\n\n最后，结构要清晰，分点介绍，但不要太生硬。语言要口语化，自然流畅，避免使用专业术语过多，让不同背景的用户都能理解。可能需要检查是否有遗漏的重要信息，比如北京的空气质量或环保措施，但可能用户更关注旅游和文化方面，所以可以简要提及。\n\n总结下来，我需要涵盖政治、历史、文化、现代发展、地理、气候、交通、美食等方面，确保全面且重点突出，同时保持回答的易读性和实用性。\n</think>\n\n北京是中国的首都，也是世界著名古都和国际化大都市，拥有深厚的历史文化底蕴与现代化的城市风貌。以下是对北京的简要介绍：\n\n---\n\n### **1. 历史与文化**\n- **古都底蕴**：北京已有3000多年建城史，曾是元、明、清等朝代的都城，是中华文明的重要发源地之一。故宫、天坛、颐和园、长城等世界文化遗产，见证了其作为“帝王之都”的辉煌。\n- **文化中心**：北京是全国文化、教育、科技中心，拥有众多高校（如清华大学、北京大学）、博物馆（如国家博物馆、首都博物馆）和艺术机构，也是京剧、相声等传统文化的发源地。\n\n---\n\n### **2. 地理与气候**\n- **地理位置**：位于中国华北平原北端，背靠燕山，毗邻河北、天津，是连接华北与东北、西北的重要枢纽。\n- **气候特点**：属温带季风气候，四季分明，夏季炎热多雨，冬季寒冷干燥，春秋季短暂且多风沙。\n\n---\n\n### **3. 现代都市风貌**\n- **政治与经济**：作为中国的政治中心，中南海、人民大会堂等标志性建筑坐落于此；同时是经济、金融、科技高地，中关村聚集了众多科技企业，是“中国硅谷”。\n- **交通网络**：拥有发达的地铁系统（中国最密集的轨道交通之一）和首都国际机场，是全国铁路、航空枢纽。\n\n---\n\n### **4. 旅游景点**\n- **世界遗产**：长城（八达岭、慕田峪段）、故宫、颐和园、天坛、周口店北京人遗址等。\n- **现代地标**：国家体育场（鸟巢）、国家大剧院、央视大楼、三里屯、798艺术区等。\n- **自然风光**：香山红叶、十三陵水库、密云水库等。\n\n---\n\n### **5. 美食与生活**\n- **特色美食**：北京烤鸭（全聚德）、炸酱面、豆汁儿、卤煮、驴打滚等，小吃街如南锣鼓巷、簋街充满烟火气。\n- **生活节奏**：既有老北京的胡同文化（如南锣鼓巷、烟袋斜街），也有现代化的商圈（如国贸、金融街）。\n\n---\n\n### **6. 国际化与多元**\n- **国际交流**：北京是众多国际组织和外国使馆的所在地，也是2008年夏季奥运会和2022年冬季奥运会的举办城市。\n- **多元文化**：汇聚了来自世界各地的移民和留学生，形成了开放包容的城市氛围。\n\n---\n\n### **7. 挑战与机遇**\n- **环境问题**：曾面临雾霾等挑战，近年来通过治理空气质量、推广绿色能源等措施逐步改善。\n- **城市发展**：正通过“京津冀协同发展”战略，推动区域一体化，提升国际影响力。\n\n---\n\n北京是一座将历史与现代、传统与创新完美融合的城市，无论是探索古迹、感受文化，还是体验都市活力，都能找到独特的魅力。如果你有机会到访，不妨从故宫、长城开始，再深入胡同巷陌，感受这座城市的温度与故事。'
+================================================================================
+Logprobs per generated token:
+  Step 0:
+    - Generated Token: 151667 ('<think>')
+    - Top Logprobs:
+        - Rank 1: Token 151667 ('<think>') -> Logprob: -0.0000
+        - Rank 2: Token 32501 ('yped') -> Logprob: -16.6875
+        - Rank 3: Token 81218 (' zlib') -> Logprob: -17.5000
+        - Rank 4: Token 77899 (':len') -> Logprob: -17.9375
+        - Rank 5: Token 99048 (' zf') -> Logprob: -18.4375
+        - Rank 6: Token 117865 ('具体内容') -> Logprob: -18.5000
+        - Rank 7: Token 198 (' ') -> Logprob: -18.5625
+        - Rank 8: Token 18945 ('α') -> Logprob: -18.5625
+        - Rank 9: Token 67085 ('[param') -> Logprob: -19.0000
+        - Rank 10: Token 75025 ('yms') -> Logprob: -19.0000
+    ...
+    ...
+成功将每个生成token的logprob写入到文件: ...
+```
+### 精度
+```
+# 分别在DCU和GPU上运行infer_vllm.py，得到各自的精度数据，并将精度数据复制粘贴到acc.py中运行
+python ./infer/acc.py
+```
+结果
+```
+Qwen3-30B-A3B精度：0.002905419914469576
+```
+### vllm推理Qwen3-30B-A3B-Instruct-2507
+```bash
+## Qwen3-30B-A3B-Instruct-2507 至少需要双卡部署推理
+export HIP_VISIBLE_DEVICES=6,7 
+## 模型地址参数
+python ./infer/infer_vllm.py --model /your_path/Qwen3-30B-A3B-Instruct-2507 --tensor-parallel-size 2
+```
+## result
+```
+Original Input Prompt (if available):
+'介绍一下北京.'
+Generated text (full output):
+'北京，简称“京”，是中国的首都，也是中华人民共和国的中央人民政府所在地，是全国的政治、文化、教育和国际交往中心。它位于中国华北平原的北部，地处燕山山脉与华北平原的交汇地带，地理坐标为北纬39°54′，东经116°23′，总面积约16,410平方公里。\n\n### 历史与文化\n北京拥有超过3000年的建城史和800多年的建都史，是中国历史上多个朝代的都城。自元朝起，北京成为全国的政治中心，明清两代在此建都，留下了大量珍贵的历史文化遗产。北京是世界著名的历史文化名城，拥有众多世界文化遗产，如：\n\n- **故宫**（紫禁城）：明清两代的皇家宫殿，是世界上现存规模最大、保存最完整的古代宫殿建筑群。\n- **天坛**：明清皇帝祭天祈谷的场所，建筑布局严谨，象征“天圆地方”。\n- **颐和园**：中国现存规模最大、保存最完整的皇家园林，融合了自然景观与人工建筑。\n- **八达岭长城**：万里长城的代表段落，是世界文化遗产之一，也是中外游客必访之地。\n- **圆明园遗址**：曾被誉为“万园之园”，虽在第二次鸦片战争中被焚毁，但遗址仍具重要历史价值。\n- **天安门广场**：世界上最大的城市广场之一，是北京的象征性地标，也是国家举行重大庆典和政治活动的场所。\n\n### 城市风貌与现代发展\n北京是一座传统与现代交融的城市。在保留古都风貌的同时，也展现出高度现代化的城市面貌：\n\n- **城市布局**：以中轴线为核心，呈对称布局，从永定门到钟鼓楼，贯穿城市南北，体现了中国古代城市规划的智慧。\n- **现代地标**：国家大剧院（“蛋”）、中央电视台总部大楼（“大裤衩”）、北京国贸大厦、北京SKP等现代建筑彰显了城市的国际化形象。\n- **交通系统**：拥有发达的轨道交通网络，北京地铁是全球运营里程最长的城市地铁系统之一，覆盖全市主要区域。\n\n### 教育与科技\n北京是中国高等教育和科研的中心，拥有众多顶尖高校和研究机构，如：\n\n- 清华大学\n- 北京大学\n- 中国科学院\n- 中国工程院\n\n这些机构在科技、工程、医学、人文等领域具有国际影响力。\n\n### 旅游与美食\n北京是国内外游客向往的旅游目的地，每年吸引数千万游客。除了上述名胜古迹，还有：\n\n- **胡同与四合院**：如南锣鼓巷、什刹海，是体验老北京生活文化的窗口。\n- **北京烤鸭**：享誉世界的特色美食，以全聚德、便宜坊为代表。\n- **豆汁儿、焦圈、炸酱面、艾窝窝**等传统小吃也极具地方特色。\n\n### 环境与生态\n近年来，北京大力推进生态文明建设，实施“蓝天保卫战”，空气质量持续改善。城市绿化覆盖率不断提高，拥有奥林匹克森林公园、北京植物园、香山公园等大型生态空间。\n\n### 总结\n北京是一座集历史厚重感与现代活力于一体的城市，既是中华文明的重要象征，也是中国走向世界的重要窗口。无论你是追寻历史足迹，还是感受现代都市魅力，北京都能为你带来深刻而难忘的体验。'
+================================================================================
+Logprobs per generated token:
+  Step 0:
+    - Generated Token: 68990 ('北京')
+    - Top Logprobs:
+        - Rank 1: Token 68990 ('北京') -> Logprob: -0.0019
+        - Rank 2: Token 103942 ('当然') -> Logprob: -6.2519
+        - Rank 3: Token 104554 ('北京市') -> Logprob: -11.3769
+        - Rank 4: Token 99692 ('好的') -> Logprob: -13.5019
+        - Rank 5: Token 108386 ('你好') -> Logprob: -13.5019
+        - Rank 6: Token 111308 ('您好') -> Logprob: -14.1269
+        - Rank 7: Token 106287 ('嗯') -> Logprob: -15.2519
+        - Rank 8: Token 106114 ('首都') -> Logprob: -16.8769
+        - Rank 9: Token 110488 ('北京时间') -> Logprob: -16.8769
+        - Rank 10: Token 334 ('**') -> Logprob: -17.3769
+    ...
+    ...
+成功将每个生成token的logprob写入到文件: ...
+```
+### 精度
+```
+# 分别在DCU和GPU上运行infer_vllm.py，得到各自的精度数据，并将精度数据复制粘贴到acc.py中运行
+python ./infer/acc.py
+```
+结果
+```
+Qwen3-30B-A3B-Instruct-2507精度：0.006542379854522551
+```
+DCU与GPU精度一致，推理框架：vllm。
+## 应用场景
+### 算法类别
+`对话`
+### 热点应用行业
+`金融,教育,政府,科研,制造,能源,交通`
+## 预训练权重
+- [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B)
+- [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/granite-speech_pytorch
+## 参考资料
+- https://github.com/ibm-granite/granite-speech-models
\ No newline at end of file
--- a/doc/Qwen3-30B-A3B-Instruct-2507.jpeg
+++ b/doc/Qwen3-30B-A3B-Instruct-2507.jpeg
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/vllm:0.8.5-ubuntu22.04-dtk25.04.1-rc5-das1.6-py3.10-20250724
\ No newline at end of file
--- a/icon.png
+++ b/icon.png
--- a/infer/Qwen3-30B-A3B-Instruct-2507_logprobs_A800_fp16.json
+++ b/infer/Qwen3-30B-A3B-Instruct-2507_logprobs_A800_fp16.json
+[
+  -0.002492894185706973,
+  -0.20206475257873535,
+  -0.14872165024280548,
+  -3.6954811548639555e-06,
+  0.0,
+  -2.3841855067985307e-07,
+  -0.038103267550468445,
+  -0.0006967739318497479,
+  -6.0794889577664435e-05,
+  -3.099436753473128e-06
+]
\ No newline at end of file
--- a/infer/Qwen3-30B-A3B-Instruct-2507_logprobs_K100AI_fp16.json
+++ b/infer/Qwen3-30B-A3B-Instruct-2507_logprobs_K100AI_fp16.json
+[
+  -0.001943962648510933,
+  -0.25255143642425537,
+  -0.1344442367553711,
+  -2.9802276912960224e-06,
+  0.0,
+  -2.3841855067985307e-07,
+  -0.03809638321399689,
+  -0.0007833749987185001,
+  -7.64102369430475e-05,
+  -4.0531076592742465e-06
+]
\ No newline at end of file
--- a/infer/Qwen3-30B-A3B_logprobs_A800_fp16.json
+++ b/infer/Qwen3-30B-A3B_logprobs_A800_fp16.json
+[
+  -2.3841855067985307e-07,
+  -2.753696753643453e-05,
+  -0.0630415603518486,
+  -3.3378546504536644e-06,
+  -0.13829903304576874,
+  -0.018993528559803963,
+  -0.006734886672347784,
+  -2.3841855067985307e-07,
+  -0.038042906671762466,
+  -0.008138706907629967
+]
\ No newline at end of file
--- a/infer/Qwen3-30B-A3B_logprobs_K100AI_fp16.json
+++ b/infer/Qwen3-30B-A3B_logprobs_K100AI_fp16.json
+[
+  -5.960462772236497e-07,
+  -1.8954096958623268e-05,
+  -0.06287578493356705,
+  -2.50339189733495e-06,
+  -0.12281982600688934,
+  -0.014945676550269127,
+  -0.006732518319040537,
+  -1.1920928955078125e-07,
+  -0.029751574620604515,
+  -0.0070809368044137955
+]
\ No newline at end of file
--- a/infer/acc.py
+++ b/infer/acc.py
+import numpy as np
+logprobs_1 = np.array([
+    -0.002492894185706973,
+  -0.20206475257873535,
+  -0.14872165024280548,
+  -3.6954811548639555e-06,
+  0.0,
+  -2.3841855067985307e-07,
+  -0.038103267550468445,
+  -0.0006967739318497479,
+  -6.0794889577664435e-05,
+  -3.099436753473128e-06
+])
+logprobs_2 = np.array([
+    -0.001943962648510933,
+  -0.25255143642425537,
+  -0.1344442367553711,
+  -2.9802276912960224e-06,
+  0.0,
+  -2.3841855067985307e-07,
+  -0.03809638321399689,
+  -0.0007833749987185001,
+  -7.64102369430475e-05,
+  -4.0531076592742465e-06
+])
+print(np.mean(np.abs(logprobs_1 - logprobs_2)))
\ No newline at end of file
--- a/infer/infer_vllm.py
+++ b/infer/infer_vllm.py
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+from vllm import LLM, EngineArgs, SamplingParams
+from vllm.utils import FlexibleArgumentParser
+import json
+def create_parser():
+    parser = FlexibleArgumentParser()
+    # Add engine args
+    EngineArgs.add_cli_args(parser)
+    parser.set_defaults(model="Qwn3/Qwen3-30B-A3B") 
+    # Add sampling params
+    sampling_group = parser.add_argument_group("Sampling parameters")
+    sampling_group.add_argument("--max-tokens", type=int, default=8192,
+                                help="Maximum number of tokens to generate in a single response.")
+    sampling_group.add_argument("--temperature", type=float, default=0.0,
+                                help="Temperature for sampling. Higher values make output more random.")
+    sampling_group.add_argument("--top-p", type=float, default=1.0,
+                                help="Top-p sampling probability. Only tokens with cumulative probability below top_p are considered.")
+    sampling_group.add_argument("--top-k", type=int, default=1,
+                                help="Top-k sampling. -1 means no top-k.")
+    # Add example params
+    parser.add_argument("--chat-template-path", type=str,
+                        help="Path to a custom chat template file (Jinja format).")
+    return parser
+def main(args: dict):
+    # Pop arguments not used by LLM
+    max_tokens = args.pop("max_tokens")
+    temperature = args.pop("temperature")
+    top_p = args.pop("top_p")
+    top_k = args.pop("top_k")
+    chat_template_path = args.pop("chat_template_path")
+    # Create an LLM
+    llm = LLM(**args)
+    # Create sampling params object
+    sampling_params = SamplingParams(
+        max_tokens=max_tokens,
+        temperature=temperature,
+        top_p=top_p,
+        top_k=top_k,
+        logprobs=10
+    )
+    # A chat template can be optionally supplied.
+    # If not, the model will use its default chat template.
+    chat_template = None
+    if chat_template_path is not None:
+        with open(chat_template_path) as f:
+            chat_template = f.read()
+        print(f"Loaded custom chat template from: {chat_template_path}")
+    # Define the single conversation for demonstration
+    single_conversation = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "介绍一下北京."},
+    ]
+    outputs = llm.chat(single_conversation, sampling_params, use_tqdm=False, chat_template=chat_template)
+    print(f"Original Input Prompt (if available):\n{single_conversation[1]['content']!r}\n")
+    first_10_logprobs_to_save = []
+    for output in outputs:
+        prompt = output.prompt
+        generated_text = output.outputs[0].text
+        print(f"Generated text (full output):\n{generated_text!r}")
+        print("=" * 80)
+        logprobs_per_step = output.outputs[0].logprobs 
+        if logprobs_per_step is None:
+            print("Logprobs not returned. Check your SamplingParams.")
+            continue
+        print("\nLogprobs per generated token:")
+        for step_idx, step_logprobs_dict in enumerate(logprobs_per_step[:10]):
+            generated_token_info = None
+            for token_id, logprob_obj in step_logprobs_dict.items():
+                if logprob_obj.rank == 1:
+                    generated_token_info = (token_id, logprob_obj.decoded_token)
+                    break 
+            if generated_token_info:
+                token_id, token_text = generated_token_info
+                print(f"  Step {step_idx}:")
+                print(f"    - Generated Token: {token_id} ('{token_text}')")
+            else:
+                print(f"  Step {step_idx}: (Could not find rank-1 token)")
+                continue
+            sorted_logprobs = sorted(step_logprobs_dict.values(), key=lambda x: x.rank)
+            print("    - Top Logprobs:")
+            for logprob_obj in sorted_logprobs:
+                token_id = next(tid for tid, lp in step_logprobs_dict.items() if lp is logprob_obj) #
+                token_text = logprob_obj.decoded_token
+                logprob_value = logprob_obj.logprob
+                rank = logprob_obj.rank
+                print(f"        - Rank {rank}: Token {token_id} ('{token_text}') -> Logprob: {logprob_value:.4f}")
+                if rank == 1:
+                    first_10_logprobs_to_save.append(logprob_value)
+    output_filename = './Qwen3-30B-A3B_logprobs_K100AI_fp16.json'
+    with open(output_filename, 'w') as f:
+        json.dump(first_10_logprobs_to_save, f, indent=2) 
+    print(f"成功将每个生成token的logprob写入到文件: {output_filename}")
+if __name__ == "__main__":
+    parser = create_parser()
+    args: dict = vars(parser.parse_args())
+    main(args)
\ No newline at end of file
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode = 1695
+# 模型名称
+modelName=qwen3-30b-a3b_vllm
+# 模型描述
+modelDescription=qwen3-30b-a3b是一个非思考模式（non-thinking mode）的新模型，仅激活3B参数，就能取得可媲美 Gemini 2.5-Flash（non-thinking）、GPT-4o等顶尖闭源模型的超强性能。
+# 应用场景
+appScenario=推理,对话问答,制造,广媒,金融,能源,医疗
+# 框架类型
+frameType=vllm