add

e702e403 · zhangwq5 · dd8051d0 · e702e403 · e702e403 · e702e403
Commit e702e403 authored Jul 10, 2025 by zhangwq5
13 changed files
--- a/Contributors.md
+++ b/Contributors.md
+# Contributors
+This file contains the list of everyone who contributed to the repository
+
+<br>
+<table>
+<th>Contributors1</th><th>Contributors2</th>  <tr>
+    <td><img src="xxx1">
+    <br>
+    <a href="xxx1">xxx1</a></td>
+    <td><img src="xxx2">
+    <br>
+    <a href="xxx2">xxx2</a></td>
+  </tr>
+</table>
+<br>
+
+### Thanks to everyone who helped in building this Repository :)
--- a/LICENSE
+++ b/LICENSE
+Copyright 2018-2020 Open-MMLab. All rights reserved.
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2018-2020 Open-MMLab.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.md
+++ b/README.md
 # Granite-Speech_pytorch
+## 论文
+`Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities`
+- https://arxiv.org/abs/2505.08699

-Granite-speech-3.3-8b 是一款小巧且高效的语音语言模型，专为自动语音识别（ASR）和自动语音翻译（AST）而设计。
\ No newline at end of file
+## 模型结构
+Granite-speech 采用三段式模块化架构，由一个 Conformer 声学编码器、一个 Q-former 多模态适配器和一个基于 LoRA 适配的 Granite 文本大语言模型（LLM）组成，实现了音频和文本处理路径的解耦与融合。
+
+<div align=center>
+    <img src="./doc/gs.png"/>
+</div>
+
+## 算法原理
+Granite-speech 通过Q-former 适配器，将 Conformer 编码器提取的高维音频序列高效地降采样并投影到与文本嵌入相同的语义空间中，再利用 LoRA 技术对大语言模型进行轻量化微调，使其能够在不损害原有文本能力的前提下，理解并处理这些融合后的多模态声学特征。
+
+<div align=center>
+    <img src="./doc/qformer.png"/>
+</div>
+
+## 环境配置
+### 硬件需求
+DCU型号：K100_AI,节点数量：1台,卡数：1张。
+### Docker（方法一）
+```bash
+docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.5-ubuntu22.04-dtk25.04-rc7-das1.5-py3.10-20250612-fixpy-rocblas0611-rc2
+
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
+
+cd /your_code_path/granite-speech_pytorch
+pip install transformers>=4.53.1
+```
+### Dockerfile（方法二）
+此处提供dockerfile的使用方法
+```bash
+cd docker
+docker build --no-cache -t granite-speech:latest .
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
+
+cd /your_code_path/granite-speech_pytorch
+pip install transformers>=4.53.1
+```
+### Anaconda（方法三）
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
+```bash
+DTK: 25.04
+python: 3.10
+vllm: 0.8.5
+torch: 2.4.1+das.opt2.dtk2504
+deepspeed: 0.14.2+das.opt2.dtk2504
+```
+`Tips：以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
+
+其它非深度学习库安装方式如下：
+```bash
+pip install transformers>=4.53.1
+```
+## 数据集
+暂无
+## 训练
+暂无
+## 推理
+### vllm推理方法
+```bash
+## 添加如下环境变量
+export HF_ENDPOINT=https://hf-mirror.com
+export LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/torchaudio.libs:$LD_LIBRARY_PATH
+## 模型地址参数
+python ./infer/infer_vllm.py --model-type granite_speech --model_name /your_path/granite-speech-3.3-8b
+```
+
+## result
+```
+--- Prompt 1 ---
+Generated Text: the first words i spoke in the original phonograph a little piece of practical poetry mary had a little lamb its fleece was white as snow and everywhere that mary went the lamb was sure to go
+
+Logprobs per generated token:
+  Step 0:
+    - Generated Token: 1382 ('the')
+    - Top Logprobs:
+        - Rank 1: Token 1382 ('the') -> Logprob: -0.1331
+        - Rank 2: Token 37711 ('these') -> Logprob: -3.5237
+        - Rank 3: Token 31181 ('they') -> Logprob: -5.1253
+        - Rank 4: Token 1772 ('my') -> Logprob: -5.1800
+        - Rank 5: Token 292 ('he') -> Logprob: -5.4612
+        - Rank 6: Token 2232 ('first') -> Logprob: -5.7268
+        - Rank 7: Token 91 ('i') -> Logprob: -5.7503
+        - Rank 8: Token 266 ('in') -> Logprob: -5.9378
+        - Rank 9: Token 83 ('a') -> Logprob: -5.9378
+        - Rank 10: Token 7020 ('here') -> Logprob: -6.0159
+  Step 1:
+    ...
+    ...
+
+成功将每个生成token的logprob写入到文件: ...
+```
+
+### 精度
+```
+# 分别在DCU和GPU上运行infer_vllm.py，得到各自的精度数据
+python ./infer/calc_mae.py
+```
+结果
+```
+0.00040159359081176795
+```
+
+DCU与GPU精度一致，推理框架：vllm。
+## 应用场景
+### 算法类别
+`语音对话`
+### 热点应用行业
+`金融,教育,政府,科研,制造,能源,交通`
+## 预训练权重
+- [ibm-granite/granite-speech-3.3-8b](https://hf-mirror.com/ibm-granite/granite-speech-3.3-8b)
+- [ibm-granite/granite-speech-3.3-2b](https://hf-mirror.com/ibm-granite/granite-speech-3.3-2b)
+
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/granite-speech_pytorch
+## 参考资料
+- https://github.com/ibm-granite/granite-speech-models
\ No newline at end of file
--- a/doc/gs.png
+++ b/doc/gs.png
--- a/doc/qformer.png
+++ b/doc/qformer.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.5-ubuntu22.04-dtk25.04-rc7-das1.5-py3.10-20250612-fixpy-rocblas0611-rc2
\ No newline at end of file
--- a/icon.png
+++ b/icon.png
--- a/infer/calc_mae.py
+++ b/infer/calc_mae.py
+import numpy as np
+
+logprobs_1 = np.array([
+    -0.13309252262115479, -0.08196669071912766, -0.00731302984058857,
+    -0.018770331516861916, -0.02480202354490757, -0.00014423283573705703,
+    -0.042519960552453995, -0.0190336462110281, -0.11823561042547226,
+    -0.051496025174856186, -0.0029785337392240763, -0.0808056890964508,
+    -0.33003905415534973, -0.016798585653305054, -0.03529466316103935,
+    -0.0021007629111409187, -0.15580753982067108, -0.002179034985601902,
+    -0.0022494508884847164, -0.017301112413406372, -0.028195271268486977,
+    -0.004367456305772066, -0.004129336215555668, -0.00812652800232172,
+    -0.01322639174759388, -0.4532623887062073, -0.05896913260221481,
+    -0.015254967845976353, -0.0002047805901383981, -0.07973029464483261,
+    -0.009050771594047546, -0.0008920027757994831, -0.003724900772795081,
+    -0.024257060140371323, -0.03248818591237068, -0.056766025722026825,
+    -0.0011604249011725187, -0.0001736728590913117, -0.002415122464299202,
+    -0.021963220089673996, -0.0010249129263684154, -0.06032927706837654,
+    -0.0007666985620744526, -0.003093697363510728, -0.0011801904765889049,
+    -0.01496575865894556
+])
+
+logprobs_2 = np.array([
+    -0.1329359710211548, -0.08299195766448975, -0.007307467982172966,
+    -0.018774425610899925, -0.024925051257014275, -0.00014530555927194655,
+    -0.04304695874452591, -0.018735699355602264, -0.11716093868017197,
+    -0.05228014662861824, -0.002978414995595813, -0.08092431724071503,
+    -0.32985153794288635, -0.016861414536833763, -0.0352136455476284,
+    -0.0020737587474286556, -0.15646468102931976, -0.00218593399040401,
+    -0.0022474287543445826, -0.017639895901083946, -0.02812526933848858,
+    -0.004360456305772066, -0.004184419754892588, -0.008132558315992355,
+    -0.013352379202842712, -0.4424692392349243, -0.0590624064207077,
+    -0.015369313769042492, -0.00020454222976695746, -0.0797300711274147,
+    -0.009092000313103199, -0.0008904544520191848, -0.0037233568727970123,
+    -0.024085894227027893, -0.03258546069264412, -0.05590718612074852,
+    -0.0011379203060641885, -0.00017343449871987104, -0.0023765910882502794,
+    -0.02211015112698078, -0.0010122896637767553, -0.060326918959617615,
+    -0.000770391256082803, -0.003151928074657917, -0.0011779282940551639,
+    -0.015233483165502548
+])
+
+print(np.mean(np.abs(logprobs_1 - logprobs_2)))
\ No newline at end of file
--- a/infer/generated_token_logprobs_A800_fp16.json
+++ b/infer/generated_token_logprobs_A800_fp16.json
+[
+  [
+    -0.13309252262115479,
+    -0.08196669071912766,
+    -0.00731302984058857,
+    -0.018770331516861916,
+    -0.02480202354490757,
+    -0.00014423283573705703,
+    -0.042519960552453995,
+    -0.0190336462110281,
+    -0.11823561042547226,
+    -0.051496025174856186,
+    -0.0029785337392240763,
+    -0.0808056890964508,
+    -0.33003905415534973,
+    -0.016798585653305054,
+    -0.03529466316103935,
+    -0.0021007629111409187,
+    -0.15580753982067108,
+    -0.002179034985601902,
+    -0.0022494508884847164,
+    -0.017301112413406372,
+    -0.028195271268486977,
+    -0.004367456305772066,
+    -0.004129336215555668,
+    -0.00812652800232172,
+    -0.01322639174759388,
+    -0.4532623887062073,
+    -0.05896913260221481,
+    -0.015254967845976353,
+    -0.0002047805901383981,
+    -0.07973029464483261,
+    -0.009050771594047546,
+    -0.0008920027757994831,
+    -0.003724900772795081,
+    -0.024257060140371323,
+    -0.03248818591237068,
+    -0.056766025722026825,
+    -0.0011604249011725187,
+    -0.0001736728590913117,
+    -0.002415122464299202,
+    -0.021963220089673996,
+    -0.0010249129263684154,
+    -0.06032927706837654,
+    -0.0007666985620744526,
+    -0.003093697363510728,
+    -0.0011801904765889049,
+    -0.01496575865894556
+  ]
+]
\ No newline at end of file
--- a/infer/generated_token_logprobs_K100_AI_fp16.json
+++ b/infer/generated_token_logprobs_K100_AI_fp16.json
+[
+  [
+    -0.13293597102165222,
+    -0.08299195766448975,
+    -0.007307467982172966,
+    -0.018774425610899925,
+    -0.024925051257014275,
+    -0.00014530555927194655,
+    -0.04304695874452591,
+    -0.018735699355602264,
+    -0.11716093868017197,
+    -0.05228014662861824,
+    -0.002978414995595813,
+    -0.08092431724071503,
+    -0.32985153794288635,
+    -0.016861414536833763,
+    -0.0352136455476284,
+    -0.0020737587474286556,
+    -0.15646468102931976,
+    -0.00218593399040401,
+    -0.0022474287543445826,
+    -0.017639895901083946,
+    -0.02812526933848858,
+    -0.004360453691333532,
+    -0.004184419754892588,
+    -0.008132558315992355,
+    -0.013352379202842712,
+    -0.4424692392349243,
+    -0.0590624064207077,
+    -0.015369313769042492,
+    -0.00020454222976695746,
+    -0.0797300711274147,
+    -0.009092000313103199,
+    -0.0008904544520191848,
+    -0.0037233568727970123,
+    -0.024085894227027893,
+    -0.03258546069264412,
+    -0.05590718612074852,
+    -0.0011379203060641885,
+    -0.00017343449871987104,
+    -0.0023765910882502794,
+    -0.02211015112698078,
+    -0.0010122896637767553,
+    -0.060326918959617615,
+    -0.000770391256082803,
+    -0.003151928074657917,
+    -0.0011779282940551639,
+    -0.015233483165502548
+  ]
+]
\ No newline at end of file
--- a/infer/infer_vllm.py
+++ b/infer/infer_vllm.py
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
+"""
+This example shows how to use vLLM for running offline inference
+with the correct prompt format on audio language models.
+
+For most models, the prompt format should follow corresponding examples
+on HuggingFace model repository.
+"""
+
+import os
+from dataclasses import asdict
+from typing import NamedTuple, Optional
+
+from huggingface_hub import snapshot_download
+from transformers import AutoTokenizer
+
+from vllm import LLM, EngineArgs, SamplingParams
+from vllm.assets.audio import AudioAsset
+from vllm.lora.request import LoRARequest
+from vllm.utils import FlexibleArgumentParser
+
+audio_assets = [AudioAsset("mary_had_lamb"), AudioAsset("winning_call")]
+question_per_audio_count = {
+    0: "What is 1+1?",
+    1: "What is recited in the audio?",
+    2: "What sport and what nursery rhyme are referenced?",
+}
+
+
+class ModelRequestData(NamedTuple):
+    engine_args: EngineArgs
+    prompt: str
+    stop_token_ids: Optional[list[int]] = None
+    lora_requests: Optional[list[LoRARequest]] = None
+
+
+# NOTE: The default `max_num_seqs` and `max_model_len` may result in OOM on
+# lower-end GPUs.
+# Unless specified, these settings have been tested to work on a single L4.
+
+
+# Granite Speech
+def run_granite_speech(model_name:str, question: str, audio_count: int) -> ModelRequestData:
+    # NOTE - the setting in this example are somehat different than what is
+    # optimal for granite speech, and it is generally recommended to use beam
+    # search. Check the model README for suggested settings.
+    # https://huggingface.co/ibm-granite/granite-speech-3.3-8b
+
+    engine_args = EngineArgs(
+        dtype="float16",
+        model=model_name,
+        trust_remote_code=True,
+        max_model_len=2048,
+        max_num_seqs=2,
+        enable_lora=True,
+        max_lora_rank=64,
+        limit_mm_per_prompt={"audio": audio_count},
+    )
+
+    # The model has an audio-specific lora directly in its model dir;
+    # it should be enabled whenever you pass audio inputs to the model.
+    speech_lora_path = model_name
+    audio_placeholder = "<|audio|>" * audio_count
+    prompts = f"<|start_of_role|>system<|end_of_role|>Knowledge Cutoff Date: April 2024.\nToday's Date: December 19, 2024.\nYou are Granite, developed by IBM. You are a helpful AI assistant<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>{audio_placeholder}{question}<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>"  # noqa: E501
+
+    return ModelRequestData(
+        engine_args=engine_args,
+        prompt=prompts,
+        lora_requests=[LoRARequest("speech", 1, speech_lora_path)],
+    )
+
+
+# Ultravox 0.5-1B
+def run_ultravox(model_name: str, question: str, audio_count: int) -> ModelRequestData:
+
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    messages = [{"role": "user", "content": "<|audio|>\n" * audio_count + question}]
+    prompt = tokenizer.apply_chat_template(
+        messages, tokenize=False, add_generation_prompt=True
+    )
+
+    engine_args = EngineArgs(
+        model=model_name,
+        max_model_len=4096,
+        max_num_seqs=5,
+        trust_remote_code=True,
+        limit_mm_per_prompt={"audio": audio_count},
+    )
+
+    return ModelRequestData(
+        engine_args=engine_args,
+        prompt=prompt,
+    )
+
+
+model_example_map = {
+    "granite_speech": run_granite_speech,
+    "ultravox": run_ultravox
+}
+
+
+def parse_args():
+    parser = FlexibleArgumentParser(
+        description="Demo on using vLLM for offline inference with "
+        "audio language models"
+    )
+    parser.add_argument(
+        "--model-type",
+        "-m",
+        type=str,
+        default="ultravox",
+        choices=model_example_map.keys(),
+        help='Huggingface "model_type".',
+    )
+    parser.add_argument(
+        "--model-name", type=str, default=None, help="Path to the model directory."
+    )
+    parser.add_argument(
+        "--num-prompts", type=int, default=1, help="Number of prompts to run."
+    )
+    parser.add_argument(
+        "--num-audios",
+        type=int,
+        default=1,
+        choices=[0, 1, 2],
+        help="Number of audio items per prompt.",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=2,
+        help="Set the seed when initializing `vllm.LLM`.",
+    )
+
+    return parser.parse_args()
+
+
+def main(args):
+    model = args.model_type
+    if model not in model_example_map:
+        raise ValueError(f"Model type {model} is not supported.")
+
+    audio_count = args.num_audios
+    req_data = model_example_map[model](
+        args.model_name, question_per_audio_count[audio_count], audio_count
+    )
+
+    # Disable other modalities to save memory
+    default_limits = {"image": 0, "video": 0, "audio": 0}
+    req_data.engine_args.limit_mm_per_prompt = default_limits | dict(
+        req_data.engine_args.limit_mm_per_prompt or {}
+    )
+
+    engine_args = asdict(req_data.engine_args) | {"seed": args.seed, "gpu_memory_utilization": 0.9}
+    llm = LLM(**engine_args)
+
+    sampling_params = SamplingParams(
+        temperature=0.0, 
+        max_tokens=64, 
+        stop_token_ids=req_data.stop_token_ids,
+        logprobs=10  
+    )
+
+    mm_data = {}
+    if audio_count > 0:
+        mm_data = {
+            "audio": [
+                asset.audio_and_sample_rate for asset in audio_assets[:audio_count]
+            ]
+        }
+
+    assert args.num_prompts > 0
+    inputs = {"prompt": req_data.prompt, "multi_modal_data": mm_data}
+    if args.num_prompts > 1:
+        inputs = [inputs] * args.num_prompts
+
+    lora_request = (
+        req_data.lora_requests * args.num_prompts if req_data.lora_requests else None
+    )
+
+    outputs = llm.generate(
+        inputs,
+        sampling_params=sampling_params,
+        lora_request=lora_request,
+    )
+
+    for i, o in enumerate(outputs):
+        print(f"--- Prompt {i+1} ---")
+        generated_text = o.outputs[0].text
+        print(f"Generated Text: {generated_text}")
+        
+        logprobs_per_step = o.outputs[0].logprobs
+        
+        if logprobs_per_step is None:
+            print("Logprobs not returned. Check your SamplingParams.")
+            continue
+
+        print("\nLogprobs per generated token:")
+        for step_idx, step_logprobs_dict in enumerate(logprobs_per_step):
+            
+            generated_token_info = None
+            for token_id, logprob_obj in step_logprobs_dict.items():
+                if logprob_obj.rank == 1:
+                    generated_token_info = (token_id, logprob_obj.decoded_token)
+                    break 
+            
+            if generated_token_info:
+                token_id, token_text = generated_token_info
+                print(f"  Step {step_idx}:")
+                print(f"    - Generated Token: {token_id} ('{token_text}')")
+            else:
+                print(f"  Step {step_idx}: (Could not find rank-1 token)")
+                continue
+
+            sorted_logprobs = sorted(step_logprobs_dict.values(), key=lambda x: x.rank)
+            
+            print("    - Top Logprobs:")
+            for logprob_obj in sorted_logprobs:
+                token_id = next(tid for tid, lp in step_logprobs_dict.items() if lp is logprob_obj) # 反向查找ID
+                token_text = logprob_obj.decoded_token
+                logprob_value = logprob_obj.logprob
+                rank = logprob_obj.rank
+                
+                print(f"        - Rank {rank}: Token {token_id} ('{token_text}') -> Logprob: {logprob_value:.4f}")
+
+
+    import json
+
+    serializable_data_all_prompts = []
+
+    for o in outputs:
+        logprobs_per_step = o.outputs[0].logprobs
+        
+        generated_token_logprobs = []
+        
+        if logprobs_per_step:
+            for step_logprobs_dict in logprobs_per_step:
+
+                found_token = False
+                for token_id, logprob_obj in step_logprobs_dict.items():
+                    if logprob_obj.rank == 1:
+                        generated_token_logprobs.append(logprob_obj.logprob)
+                        found_token = True
+                        break  
+                
+                if not found_token:
+                    generated_token_logprobs.append(None) 
+
+        serializable_data_all_prompts.append(generated_token_logprobs)
+
+    output_filename = './generated_token_logprobs_A800_fp16.json'
+    with open(output_filename, 'w') as f:
+        json.dump(serializable_data_all_prompts, f, indent=2) 
+
+    print(f"成功将每个生成token的logprob写入到文件: {output_filename}")
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)
\ No newline at end of file
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode = 1665
+# 模型名称
+modelName=granite-speech_pytorch
+# 模型描述
+modelDescription=Granite-speech-3.3-8b 是一款小巧且高效的语音语言模型，专为自动语音识别（ASR）和自动语音翻译（AST）而设计。
+# 应用场景
+appScenario=推理,对话问答,制造,广媒,金融,能源,医疗
+# 框架类型
+frameType=pytorch
--- a/requirement.txt
+++ b/requirement.txt
+transformers>=4.53.1
\ No newline at end of file