First add

a3a02796 · chenych · a3a02796 · a3a02796 · a3a02796 · a3a02796
Commit a3a02796 authored Jan 14, 2026 by chenych
17 changed files
--- a/LICENSE
+++ b/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.md
+++ b/README.md
+# Fun-ASR-Nano
+## 论文
+[Fun-ASR Technical Report](https://arxiv.org/abs/2509.12508)
+## 模型简介
+Fun-ASR 是通义实验室推出的一款端到端语音识别大模型。它基于数千万小时的真实语音数据训练而成，具备强大的上下文理解能力和行业适应性。支持低延迟实时转写，覆盖31种语言。在教育、金融等垂直领域表现出色，能够精准识别专业术语和行业表达，有效解决“幻觉”生成和语言混淆等问题，实现“听得清、懂得意、写得准”。
+<div align=center>
+    <img src="./doc/funasr-v2.png"/>
+</div>
+## 环境依赖
+- 列举基础环境需求，根据实际情况填写
+| 软件 | 版本 |
+| :------: | :------: |
+| DTK | 25.04.2 |
+| python | 3.10.12 |
+| transformers | 4.51.0 |
+| fastpt | 2.1.1+das.dtk25042 |
+| torch | 2.5.1+das.opt1.dtk25042 |
+| torchaudio | 2.5.1+das.opt1.dtk25042 |
+推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-tx-1226-das1.7-py3.10-20251226
+- 挂载地址`-v`根据实际模型情况修改
+```bash
+docker run -it \
+    --shm-size 60g \
+    --network=host \
+    --name fun-asr-nano \
+    --privileged \
+    --device=/dev/kfd \
+    --device=/dev/dri \
+    --device=/dev/mkfd \
+    --group-add video \
+    --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -u root \
+    -v /opt/hyhal/:/opt/hyhal/:ro \
+    -v /path/your_code_data/:/path/your_code_data/ \
+    harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-tx-1226-das1.7-py3.10-20251226 bash
+```
+更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装，其它包参照requirements.txt安装：
+```bash
+pip install -r requirements.txt
+source fastpt -E  # torchaudio 所需环境，不执行会报错 OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory
+```
+## 数据集
+`暂无`
+## 训练
+`暂无`
+## 推理
+### transformers
+#### 单机推理
+```bash
+# 使用 funasr 推理
+python demo1.py
+# 直接推理
+python demo2.py
+```
+## 效果展示
+<div align=center>
+    <img src="./doc/results.png"/>
+</div>
+### 精度
+`DCU与GPU精度一致，推理框架：pytorch。`
+## 预训练权重
+| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 |下载地址|
+|:-----:|:----------:|:----------:|:---------------------:|:----------:|
+| Fun-ASR-Nano-2512 | 800M | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) |
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/fun-asr-nano_pytorch
+## 参考资料
+- https://github.com/FunAudioLLM/Fun-ASR
--- a/__init__.py
+++ b/__init__.py
--- a/ctc.py
+++ b/ctc.py
+import torch
+import torch.nn.functional as F
+class CTC(torch.nn.Module):
+    """CTC module.
+    Args:
+        odim: dimension of outputs
+        encoder_output_size: number of encoder projection units
+        dropout_rate: dropout rate (0.0 ~ 1.0)
+        reduce: reduce the CTC loss into a scalar
+    """
+    def __init__(
+        self,
+        odim: int,
+        encoder_output_size: int,
+        dropout_rate: float = 0.0,
+        reduce: bool = True,
+        blank_id: int = 0,
+        **kwargs,
+    ):
+        super().__init__()
+        eprojs = encoder_output_size
+        self.dropout_rate = dropout_rate
+        self.ctc_lo = torch.nn.Linear(eprojs, odim)
+        self.blank_id = blank_id
+        self.ctc_loss = torch.nn.CTCLoss(reduction="none", blank=blank_id)
+        self.reduce = reduce
+    def softmax(self, hs_pad):
+        """softmax of frame activations
+        Args:
+            Tensor hs_pad: 3d tensor (B, Tmax, eprojs)
+        Returns:
+            torch.Tensor: softmax applied 3d tensor (B, Tmax, odim)
+        """
+        return F.softmax(self.ctc_lo(hs_pad), dim=2)
+    def log_softmax(self, hs_pad):
+        """log_softmax of frame activations
+        Args:
+            Tensor hs_pad: 3d tensor (B, Tmax, eprojs)
+        Returns:
+            torch.Tensor: log softmax applied 3d tensor (B, Tmax, odim)
+        """
+        return F.log_softmax(self.ctc_lo(hs_pad), dim=2)
+    def argmax(self, hs_pad):
+        """argmax of frame activations
+        Args:
+            torch.Tensor hs_pad: 3d tensor (B, Tmax, eprojs)
+        Returns:
+            torch.Tensor: argmax applied 2d tensor (B, Tmax)
+        """
+        return torch.argmax(self.ctc_lo(hs_pad), dim=2)
--- a/decode.py
+++ b/decode.py
+import os
+import hydra
+import torch
+from omegaconf import DictConfig, ListConfig, OmegaConf
+@hydra.main(config_name=None, version_base=None)
+def main_hydra(cfg: DictConfig):
+    def to_plain_list(cfg_item):
+        if isinstance(cfg_item, ListConfig):
+            return OmegaConf.to_container(cfg_item, resolve=True)
+        elif isinstance(cfg_item, DictConfig):
+            return {k: to_plain_list(v) for k, v in cfg_item.items()}
+        else:
+            return cfg_item
+    kwargs = to_plain_list(cfg)
+    model_dir = kwargs.get("model_dir", "FunAudioLLM/Fun-ASR-Nano-2512")
+    scp_file = kwargs["scp_file"]
+    output_file = kwargs["output_file"]
+    device = (
+        "cuda:0"
+        if torch.cuda.is_available()
+        else "mps"
+        if torch.backends.mps.is_available()
+        else "cpu"
+    )
+    from funasr import AutoModel
+    model = AutoModel(
+        model=model_dir,
+        trust_remote_code=True,
+        vad_model="fsmn-vad",
+        vad_kwargs={"max_single_segment_time": 30000},
+        remote_code="./model.py",
+        device=device,
+    )
+    output_dir = os.path.dirname(output_file)
+    if output_dir and not os.path.exists(output_dir):
+        os.makedirs(output_dir, exist_ok=True)
+    with open(scp_file, "r", encoding="utf-8") as f1:
+        with open(output_file, "w", encoding="utf-8") as f2:
+            for line in f1:
+                line = line.strip()
+                if not line:
+                    continue
+                parts = line.split(maxsplit=1)
+                if len(parts) == 2:
+                    text = model.generate(input=[parts[1]], cache={}, batch_size=1)[0]["text"]
+                    f2.write(f"{parts[0]}\t{text}\n")
+if __name__ == "__main__":
+    main_hydra()
--- a/demo1.py
+++ b/demo1.py
+from funasr import AutoModel
+def main():
+    model_dir = "FunAudioLLM/Fun-ASR-Nano-2512"
+    device = "cuda:0"
+    model = AutoModel(
+        model=model_dir,
+        trust_remote_code=True,
+        remote_code="./model.py",
+        device=device,
+        hub="ms"
+    )
+    wav_path = f"{model.model_path}/example/zh.mp3"
+    res = model.generate(
+        input=[wav_path],
+        cache={},
+        batch_size=1,
+        hotwords=["开放时间"],
+        # 中文、英文、日文 for Fun-ASR-Nano-2512
+        # 中文、英文、粤语、日文、韩文、越南语、印尼语、泰语、马来语、菲律宾语、阿拉伯语、
+        # 印地语、保加利亚语、克罗地亚语、捷克语、丹麦语、荷兰语、爱沙尼亚语、芬兰语、希腊语、
+        # 匈牙利语、爱尔兰语、拉脱维亚语、立陶宛语、马耳他语、波兰语、葡萄牙语、罗马尼亚语、
+        # 斯洛伐克语、斯洛文尼亚语、瑞典语 for Fun-ASR-MLT-Nano-2512
+        language="中文",
+        itn=True,  # or False
+    )
+    text = res[0]["text"]
+    print(text)
+    model = AutoModel(
+        model=model_dir,
+        trust_remote_code=True,
+        vad_model="fsmn-vad",
+        vad_kwargs={"max_single_segment_time": 30000},
+        remote_code="./model.py",
+        device=device,
+    )
+    res = model.generate(input=[wav_path], cache={}, batch_size=1)
+    text = res[0]["text"]
+    print(text)
+if __name__ == "__main__":
+    main()
--- a/demo2.py
+++ b/demo2.py
+from model import FunASRNano
+def main():
+    model_dir = "FunAudioLLM/Fun-ASR-Nano-2512"
+    m, kwargs = FunASRNano.from_pretrained(model=model_dir, device="cuda:0")
+    m.eval()
+    wav_path = f"{model_dir}/example/zh.mp3"
+    res = m.inference(data_in=[wav_path], **kwargs)
+    text = res[0][0]["text"]
+    print(text)
+if __name__ == "__main__":
+    main()
--- a/doc/funasr-v2.png
+++ b/doc/funasr-v2.png
--- a/doc/results.png
+++ b/doc/results.png
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=1953
+# 模型名称
+modelName=Fun-ASR-Nano_pytorch
+# 模型描述
+modelDescription=通义实验室推出的一款端到端语音识别大模型。
+# 运行过程
+processType=推理
+# 算法类别
+appCategory=语音识别
+# 框架类型
+frameType=pytorch
+# 加速卡类型
+accelerateType=BW1000
--- a/model.py
+++ b/model.py
--- a/requirements.txt
+++ b/requirements.txt
+transformers==4.51.0
+funasr>=1.3.0
+zhconv
+whisper_normalizer
+pyopenjtalk-plus
+compute-wer
--- a/tools/cn_tn.py
+++ b/tools/cn_tn.py
--- a/tools/format5res.py
+++ b/tools/format5res.py
+# -*- coding: utf-8 -*-
+#!/usr/bin/python
+# Author: Mengze Chen
+import re
+import sys
+def scoreformat(name, line, flag=1):
+    newline = ""
+    for i in range(0, len(line)):
+        curr = line[i]
+        currEn = False
+        if curr == "":
+            continue
+        if (
+            (curr >= "\u0041" and curr <= "\u005a")  # eng
+            or (curr >= "\u0061" and curr <= "\u007a")  # eng
+            or (curr >= "\u0000" and curr <= "\u007f")  # de fr es it
+            or (curr >= "\u0400" and curr <= "\u04ff")  # ru
+            or (curr >= "\u0100" and curr <= "\u017f")  # latin1
+            or (curr >= "\u0080" and curr <= "\u00ff")  # latin2
+            or curr == "'"
+        ) and (curr < "\u0030" or curr > "\u0039"):
+            currEn = True
+        if i == 0:
+            newline = newline + curr
+        else:
+            if lastEn == True and currEn == True:
+                newline = newline + curr
+            else:
+                newline = newline + " " + curr
+        if flag == -1:
+            lastEn = False
+        else:
+            lastEn = currEn
+    ret = re.sub("[ ]{1,}", " ", newline)
+    ret = ret
+    if name == "":
+        ret = ret
+    else:
+        if flag <= 0:
+            ret = ret + " " + "(" + name + ")"
+        else:
+            ret = name + "\t" + ret
+    return ret
+def recoformat(line):
+    newline = ""
+    en_flag = 0  # 0: no-english   1 : english   2: former
+    for i in range(0, len(line)):
+        word = line[i]
+        if ord(word) == 32:
+            if en_flag == 0:
+                continue
+            else:
+                en_flag = 0
+                newline += " "
+        if (word >= "\u4e00" and word <= "\u9fa5") or (
+            word >= "\u0030" and word <= "\u0039"
+        ):
+            if en_flag == 1:
+                newline += " " + word
+            else:
+                newline += word
+            en_flag = 0
+        elif (
+            (word >= "\u0041" and word <= "\u005a")  # eng
+            or (word >= "\u0061" and word <= "\u007a")  # eng
+            or (word >= "\u0000" and word <= "\u007f")  # de fr es it
+            or (word >= "\u0400" and word <= "\u04ff")  # ru
+            or (word >= "\u0100" and word <= "\u017f")  # latin1
+            or (word >= "\u0080" and word <= "\u00ff")  # latin2
+            or word == "'"
+        ):
+            if en_flag == 0:
+                newline += " " + ("" if (word == "'") else word)
+            else:
+                newline += word
+            en_flag = 1
+        else:
+            newline += " " + word
+    newline = newline
+    newline = re.sub("[ ]{1,}", " ", newline)
+    newline = newline
+    return newline
+def numbersingle(line):
+    chnu = ["零", "一", "二", "两", "三", "四", "五", "六", "七", "八", "九", "点"]
+    newline = ""
+    for id in range(len(line)):
+        if re.findall(r"\.", line[id]):
+            if re.findall(r"\.\s*$", line[id]):
+                newline += "."
+            else:
+                newline += chnu[10]
+        elif re.search(r"0", line[id]):
+            if id > 0 and id < len(line) - 1:
+                if (
+                    re.search(r"\d", line[id - 1])
+                    and (not re.search(r"\d", line[id + 1]))
+                    and (not re.search(r"0", line[id - 1]))
+                ):
+                    if (
+                        id > 2
+                        and len(line) > 2
+                        and (not re.search(r"\d", line[id - 1]))
+                    ):
+                        newline = newline[:-1]
+                        newline += chnu[int(line[id - 1])] + "十"
+                    else:
+                        newline += chnu[int(line[id])]
+                else:
+                    newline += chnu[int(line[id])]
+            else:
+                newline += chnu[int(line[id])]
+        elif re.search(r"\d", line[id]):
+            newline += chnu[int(line[id])]
+        else:
+            newline += line[id]
+    return newline
+def ch_number2digit(line):
+    number_flag = 0
+    zero_flag = 0
+    bits = {
+        "零": "1",
+        "十": "2",
+        "百": "3",
+        "千": "4",
+        "万": "5",
+        "十万": "6",
+        "百万": "7",
+        "千万": "8",
+    }
+    chsh = {
+        "一": "1",
+        "二": "2",
+        "三": "3",
+        "四": "4",
+        "五": "5",
+        "六": "6",
+        "七": "7",
+        "八": "8",
+        "九": "9",
+        "两": "2",
+        "幺": "1",
+    }
+    unit = {"里": "1", "克": "1", "米": "1"}
+    newline = ""
+    digit = []
+    bit = []
+    onebit = ""
+    for i in range(len(line)):
+        if ord(line[i]) == 32:
+            newline += " "
+            continue
+        if line[i] in chsh:
+            number_flag = 1
+            if line[i] == "两":
+                if (i == len(line) - 1) or (
+                    (line[i + 1] not in chsh.keys())
+                    and (line[i + 1] not in bits.keys())
+                ):
+                    number_flag = -1
+            if number_flag == 1:
+                digit.append(chsh[line[i]])
+        elif "十" == line[i] and number_flag == 0:
+            number_flag = 2
+            digit.append("1")
+            bit.append(line[i])
+        elif "十" == line[i] and number_flag == 3:
+            digit.append("1")
+            bit.append(line[i])
+        elif ("零" == line[i]) and (number_flag == 0 or number_flag == 1):
+            digit.append("0")
+        elif ("零" == line[i]) and number_flag == 3:
+            zero_flag = 1
+        elif number_flag == 1 and line[i] in bits:
+            number_flag = 3
+            if line[i] == "千":
+                if i < len(line) - 1:
+                    if line[i + 1] in unit:
+                        number_flag = -1
+            if number_flag == 3:
+                onebit = line[i]
+                bit.append(onebit)
+        elif number_flag == 3 and line[i] in bits:
+            onebit = bit[-1] + line[i]
+            if onebit in bits:
+                bit[-1] = onebit
+            else:
+                number_flag = -2
+        else:
+            number_flag = -1
+        if len(digit) > 0 and number_flag == -1:
+            number_flag = -2
+        if i == (len(line) - 1) and number_flag >= 0:
+            number_flag = -1
+        if number_flag < 0:
+            newdigit = ""
+            if len(digit) > 0:  # and (len(digit) == len(bit))):
+                if (
+                    len(bit) == 1
+                    and zero_flag == 0
+                    and bit[0] == "百"
+                    and len(bit) != len(digit)
+                ):
+                    bit.append("十")
+                if len(digit) == (len(bit) + 1):
+                    bit.append("零")
+                if len(digit) == len(bit):
+                    for m in range(len(digit))[-1::-1]:
+                        if int(bits[bit[m]]) == int(len(newdigit) + 1):
+                            newdigit += digit[m]
+                        else:
+                            nu = int(bits[bit[m]]) - len(newdigit) - 1
+                            for n in range(nu):
+                                newdigit += "0"
+                            newdigit += digit[m]
+                    for z in range(len(newdigit))[-1::-1]:
+                        newline += newdigit[z]
+                else:
+                    newline += "".join(digit)
+                bit = []
+                digit = []
+                zero_flag = 0
+            else:
+                newline += line[i]
+            if number_flag == -2:
+                newline += line[i]
+            number_flag = 0
+    return newline
+def special(line):
+    newline = ""
+    for e in range(len(line)):
+        if ord(line[e]) == 247:
+            newline += "除以"
+        elif ord(line[e]) == 215:
+            newline += "乘以"
+        elif ord(line[e]) == 61:
+            newline += "等于"
+        elif ord(line[e]) == 43:
+            newline += "加"
+        elif ord(line[e]) == 45:
+            newline += "负"
+        elif ord(line[e]) == 8451:
+            newline += "摄氏度"
+        elif ord(line[e]) == 13217:
+            newline += "平方米"
+        elif ord(line[e]) == 8240 or ord(line[e]) == 65130:
+            newline += "%"
+        elif ord(line[e]) == 46:
+            newline += "点"
+        elif ord(line[e]) == 176:
+            newline += "度"
+            angel = 1
+        elif ord(line[e]) == 8242 and angel == 1:
+            newline += "分"
+        else:
+            newline += line[e]
+    return newline
+def all_convert(content):
+    content = recoformat(content)
+    content = numbersingle(content)
+    content = ch_number2digit(content)
+    content = special(content)
+    content = scoreformat("", content)
+    return content
+if __name__ == "__main__":
+    if len(sys.argv[1:]) < 1:
+        sys.stderr.write("Usage:\n .py  reco.result\n")
+        sys.stderr.write(" reco.result:   id<tab>recoresult\n")
+        sys.exit(1)
+    f = open(sys.argv[1])
+    flag = 0
+    if len(sys.argv[1:]) > 1:
+        flag = int(sys.argv[2])
+    for line in f.readlines():
+        if not line:
+            continue
+        line = line.rstrip()
+        tmp = line.split("\t")
+        if len(tmp) < 2:
+            tmp = line.split(",")
+            if len(tmp) < 2:
+                tmp = line.split(" ", 1)
+                if len(tmp) < 2:
+                    name = tmp[0]
+                    content = ""
+                    print(content)
+                    continue
+        name = tmp[0]
+        content = tmp[1]
+        name = re.sub("\.pcm", "", name)
+        name = re.sub("\.wav", "", name)
+        content = recoformat(content)
+        content = numbersingle(content)
+        content = ch_number2digit(content)
+        content = special(content)
+        content = scoreformat(name, content, flag)
+        print(content)
+    f.close()
--- a/tools/scp2jsonl.py
+++ b/tools/scp2jsonl.py
+import hydra
+import json
+import os
+import threading
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from io import BytesIO
+from typing import Dict, Optional, Tuple
+from urllib.request import urlopen
+import soundfile as sf
+from modelscope import AutoTokenizer
+from tqdm import tqdm
+from omegaconf import DictConfig, OmegaConf, ListConfig
+class LineProcessor:
+    def __init__(self, tokenizer):
+        self.tokenizer = tokenizer
+        self.lock = threading.Lock()
+    def process_line(self, line_pair: Tuple[str, str]) -> Optional[Dict]:
+        line1, line2 = line_pair
+        line1, line2 = line1.strip(), line2.strip()
+        if not line1 or not line2:
+            return None
+        parts1, parts2 = line1.split(maxsplit=1), line2.split(maxsplit=1)
+        if len(parts1) != 2 or len(parts2) != 2:
+            return None
+        utt1, utt2 = parts1[0], parts2[0]
+        wav_path, text = parts1[1], parts2[1]
+        if utt1 != utt2:
+            return {"error": f"UTT mismatch: {utt1} vs {utt2}"}
+        try:
+            if wav_path.startswith("http"):
+                response = urlopen(wav_path)
+                if response.status != 200:
+                    return {"error": f"WAV not found: {wav_path}"}
+                audio_file = BytesIO(response.read())
+                duration = sf.info(audio_file).duration
+            else:
+                if not os.path.exists(wav_path):
+                    return {"error": f"WAV not found: {wav_path}"}
+                duration = sf.info(wav_path).duration
+            data = {
+                "messages": [
+                    {"role": "system", "content": "You are a helpful assistant."},
+                    {
+                        "role": "user",
+                        "content": f"语音转写：<|startofspeech|>!{wav_path}<|endofspeech|>",
+                    },
+                    {"role": "assistant", "content": text},
+                ],
+                "speech_length": int((duration * 1000 - 25) // 10 + 1),
+                "text_length": len(self.tokenizer.tokenize(text)),
+            }
+            return {"success": data, "utt": utt1}
+        except Exception as e:
+            return {"error": f"Error processing {wav_path}: {str(e)}"}
+@hydra.main(config_name=None, version_base=None)
+def main_hydra(cfg: DictConfig):
+    def to_plain_list(cfg_item):
+        if isinstance(cfg_item, ListConfig):
+            return OmegaConf.to_container(cfg_item, resolve=True)
+        elif isinstance(cfg_item, DictConfig):
+            return {k: to_plain_list(v) for k, v in cfg_item.items()}
+        else:
+            return cfg_item
+    kwargs = to_plain_list(cfg)
+    scp_file = kwargs["scp_file"]
+    transcript_file = kwargs["transcript_file"]
+    max_workers = kwargs.get("max_workers", os.cpu_count())
+    jsonl_file = kwargs["jsonl_file"]
+    with open(scp_file, "r") as f1, open(transcript_file, "r") as f2:
+        scp_lines = f1.readlines()
+        transcript_lines = f2.readlines()
+    if len(scp_lines) != len(transcript_lines):
+        print(
+            f"Warning: Line count mismatch - scp: {len(scp_lines)}, transcript: {len(transcript_lines)}"
+        )
+    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
+    processor = LineProcessor(tokenizer)
+    data_pairs = list(zip(scp_lines, transcript_lines))
+    processed_count = 0
+    failed_count = 0
+    error_messages = []
+    with tqdm(total=len(data_pairs), desc="Processing") as pbar:
+        with ThreadPoolExecutor(max_workers=max_workers) as executor:
+            with open(jsonl_file, "w") as f_out:
+                futures = {
+                    executor.submit(processor.process_line, pair): i
+                    for i, pair in enumerate(data_pairs)
+                }
+                for future in as_completed(futures):
+                    result = future.result()
+                    if result and "success" in result:
+                        with processor.lock:
+                            json.dump(result["success"], f_out, ensure_ascii=False)
+                            f_out.write("\n")
+                        processed_count += 1
+                    elif result and "error" in result:
+                        failed_count += 1
+                        error_messages.append(result["error"])
+                    pbar.update(1)
+                    pbar.set_postfix(
+                        {"processed": processed_count, "failed": failed_count}
+                    )
+    print(f"\nProcessing completed:")
+    print(f"  Total lines: {len(data_pairs)}")
+    print(f"  Successfully processed: {processed_count}")
+    print(f"  Failed: {failed_count}")
+    if error_messages and len(error_messages) <= 10:
+        print(f"\nSample errors:")
+        for error in error_messages[:10]:
+            print(f"  - {error}")
+    elif error_messages:
+        print(f"\nFirst 10 errors:")
+        for error in error_messages[:10]:
+            print(f"  - {error}")
+        print(f"  ... and {len(error_messages) - 10} more errors")
+if __name__ == "__main__":
+    main_hydra()
--- a/tools/whisper_mix_normalize.py
+++ b/tools/whisper_mix_normalize.py
+# -*- coding: utf-8 -*-
+#!/usr/bin/python
+# Author: Mengze Chen
+import re
+import sys
+import cn_tn as cn_tn
+import format5res as cn_itn
+import pyopenjtalk
+import zhconv
+from whisper_normalizer.basic import BasicTextNormalizer
+from whisper_normalizer.english import EnglishTextNormalizer
+basic_normalizer = BasicTextNormalizer()
+english_normalizer = EnglishTextNormalizer()
+def is_only_chinese_and_english(s):
+    # 定义正则表达式模式，匹配中文字符范围和英文字母（包括大小写）
+    pattern = r"^[\u4e00-\u9fa5A-Za-z0-9,\.!\?:;，。！？：；、%\'\s\-\~]+$"
+    # 使用正则表达式进行匹配
+    return re.match(pattern, s) is not None
+def is_only_english(s):
+    # 定义正则表达式模式，匹配中文字符范围和英文字母（包括大小写）
+    pattern = r"^[A-Za-z0-9,\.!\?:;，。！？：；、%\'\s\-\~]+$"
+    # 使用正则表达式进行匹配
+    return re.match(pattern, s) is not None
+def is_number(s):
+    # 定义正则表达式模式，匹配中文字符范围和英文字母（包括大小写）
+    pattern = r"^[0-9,\.!\?:;，。！？：；、%\'\s]+$"
+    # 使用正则表达式进行匹配
+    return re.match(pattern, s) is not None
+def safe_ja_g2p(text, kana=True, max_length=100):
+    if len(text) > max_length:
+        # 如果文本过长，分段处理
+        parts = []
+        for i in range(0, len(text), max_length):
+            part = text[i:i+max_length]
+            try:
+                converted = pyopenjtalk.g2p(part, kana=kana)
+                parts.append(converted)
+            except:
+                parts.append(part)  # 如果转换失败，使用原文本
+        return ' '.join(parts)
+    else:
+        try:
+            return pyopenjtalk.g2p(text, kana=kana)
+        except:
+            return text  # 如果转换失败，返回原文本
+def normalize_text(srcfn, dstfn, kana=False):
+    with open(srcfn, "r") as f_read, open(dstfn, "w") as f_write:
+        all_lines = f_read.readlines()
+        for line in all_lines:
+            line = line.strip()
+            line_arr = line.split(maxsplit=1)
+            if len(line_arr) < 1:
+                continue
+            if len(line_arr) == 1:
+                line_arr.append("")
+            key = line_arr[0]
+            line_arr[1] = re.sub(r"=", " ", line_arr[1])
+            line_arr[1] = re.sub(r"\(", " ", line_arr[1])
+            line_arr[1] = re.sub(r"\)", " ", line_arr[1])
+            # From Chongjia Ni
+            if kana:
+                line_arr[1] = safe_ja_g2p(line_arr[1], kana=True, max_length=100)
+            line_arr = f"{key}\t{line_arr[1]}".split()
+            conts = []
+            language_bak = ""
+            part = []
+            for i in range(1, len(line_arr)):
+                out_part = ""
+                chn_eng_bool = is_only_chinese_and_english(line_arr[i])
+                eng_bool = is_only_english(line_arr[i])
+                num_bool = is_number(line_arr[i])
+                if eng_bool and not num_bool:
+                    language = "en"
+                elif chn_eng_bool:
+                    language = "chn_en"
+                else:
+                    language = "not_chn_en"
+                if language == language_bak or language_bak == "":
+                    part.append(line_arr[i])
+                    language_bak = language
+                else:
+                    if language_bak == "en":
+                        out_part1 = english_normalizer(" ".join(part))
+                        out_part = cn_itn.scoreformat("", out_part1)
+                    elif language_bak == "chn_en":
+                        out_part1 = english_normalizer(" ".join(part))
+                        out_part2 = cn_tn.normalize_nsw(out_part1)
+                        out_part3 = cn_itn.all_convert(out_part2)
+                        out_part = zhconv.convert(out_part3, "zh-cn")
+                    else:
+                        out_part1 = basic_normalizer(" ".join(part))
+                        out_part2 = cn_tn.normalize_nsw(out_part1)
+                        out_part3 = cn_itn.all_convert(out_part2)
+                        out_part = zhconv.convert(out_part3, "zh-cn")
+                    conts.append(out_part)
+                    language_bak = language
+                    part = []
+                    part.append(line_arr[i])
+                if i == len(line_arr) - 1:
+                    if language == "en":
+                        out_part1 = english_normalizer(" ".join(part))
+                        out_part = cn_itn.scoreformat("", out_part1)
+                    elif language == "chn_en":
+                        out_part1 = english_normalizer(" ".join(part))
+                        out_part2 = cn_tn.normalize_nsw(out_part1)
+                        out_part3 = cn_itn.all_convert(out_part2)
+                        out_part = zhconv.convert(out_part3, "zh-cn")
+                    else:
+                        out_part1 = basic_normalizer(" ".join(part))
+                        out_part2 = cn_tn.normalize_nsw(out_part1)
+                        out_part3 = cn_itn.all_convert(out_part2)
+                        out_part = zhconv.convert(out_part3, "zh-cn")
+                    conts.append(out_part)
+            f_write.write("{0}\t{1}\n".format(key, " ".join(conts).strip()))
+if __name__ == "__main__":
+    srcfn = sys.argv[1]
+    dstfn = sys.argv[2]
+    normalize_text(srcfn, dstfn, True if len(sys.argv) > 3 else False)