README.md

# NVIDIA-Nemotron-3-Super-120B-A12B-BF16

## 论文

[NVIDIA Nemotron-3 Series Technical Report](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf)

## 模型简介

Nemotron-3-Super-120B-A12B-BF16 是由英伟达 (NVIDIA) 训练的大语言模型 (LLM)，旨在提供强大的智能体 (Agentic)、推理及对话能力。该模型针对协作智能体和高负载工作场景（如 IT 工单自动化）进行了深度优化。与该系列的其他模型类似，它在响应用户查询或任务时，会采取“先生成推理轨迹 (Reasoning Trace)，后给出最终回复”的模式。此外，模型的推理能力可以通过聊天模板中的标志位 (Flag) 进行灵活配置。

在架构方面，该模型采用了混合潜变量混合专家 (Latent Mixture-of-Experts, LatentMoE) 架构，通过交替堆叠 Mamba-2 层、MoE 层以及精选的注意力 (Attention) 层实现。与 Nano 版本不同，Super 模型引入了多 Token 预测 (Multi-Token Prediction, MTP) 层，从而在提升文本生成质量的同时显著加快了生成速度。为了最大化计算效率，该模型在训练过程中采用了 NVFP4 量化技术。

该模型拥有 12B 激活参数，总参数量达 120B。目前支持包括英语、法语、德语、意大利语、日语、西班牙语和中文在内的多种语言。该模型已具备商用能力。

## 环境依赖

|   **软件**   |                      **版本**                      |
| :----------: | :------------------------------------------------: |
|     DTK      |                       26.04                        |
|    python    |                      3.10.12                       |
| transformers |                     5.2.0.dev0                     |
|     vllm     |           0.15.1+das.opt1.alpha.dtk2604            |
|    triton    | 3.3.0+das.opt2.dtk2604.torch291.20260210.g1329924c |
|    torch     |     22.9.0+das.opt1.dtk2604.20260206.g275d08c2     |
|    numpy     |                       1.26.1                       |

当前仅支持定制镜像: `harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220`

- 挂载地址`-v` 根据实际模型情况修改

```
docker run -it --shm-size 200g \
                --network=host \
                --name nemotron \
                --privileged \
                --device=/dev/kfd \
                --device=/dev/dri \
                --device=/dev/mkfd \
                --group-add video \
                --cap-add=SYS_PTRACE \
                --security-opt seccomp=unconfined \
                -u root \
                -v /opt/hyhal/:/opt/hyhal/:ro \
                harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220 bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

关于本项目 DCU 显卡所需的特殊深度学习库，numpy、vllm 库需要替换安装：

```
pip uninstall vllm
pip uninstall numpy
pip install vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl
pip install numpy==1.26.1
```

## 数据集

暂无

## 训练

暂无

## 推理

### vllm

#### 单机推理

```
## serve启动
export VLLM_USE_NN=0
export VLLM_ENABLE_MOE_FUSED_GATE=0

vllm serve nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
    --served-model-name nemotron \
    --dtype bfloat16 \
    --trust-remote-code \
    --mamba-ssm-cache-dtype float32 \
    --tensor-parallel-size 8 \
    --host 0.0.0.0 \
    --port 8000 \
    --tool-call-parser qwen3_coder \
    --enable-auto-tool-choice \
    --reasoning-parser super_v3 \
    --reasoning-parser-plugin super_v3_reasoning_parser.py

## client访问
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nemotron",
    "messages": [
      {"role": "user", "content": "帮我查下北京天气，顺便把结果翻译成英文。"},
      {"role": "assistant", "tool_calls": [{"id": "chatcmpl-tool-a3ba5e50a56e4f3b", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"北京\"}"}}]},
      {"role": "tool", "tool_call_id": "chatcmpl-tool-a3ba5e50a56e4f3b", "content": "{\"weather\": \"晴朗\", \"temperature\": \"25度\"}"}
    ],
    "tools": [
      {"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}},
      {"type": "function", "function": {"name": "translate", "parameters": {"type": "object", "properties": {"text": {"type": "string"}, "target_lang": {"type": "string"}}}}}
    ]
  }'
```


## 效果展示
<div align=center>
    <img src="./doc/1.png"/>
</div>

### 精度

DCU 与 GPU 精度一致，推理框架：vllm。

## 预训练权重

| **模型名称**                    | **权重大小** | **DCU型号**   | **最低卡数需求** | **下载地址**                                                 |
| :-----------------------------: | :----------: | :-----------: | :--------------: | :----------------------------------------------------------: |
| Nemotron-3-Super-120B-A12B      | 120B         | BW1000 | 8                | [Hugging Face](https://www.google.com/search?q=https://huggingface.co/nvidia/nemotron-3-super-120b) |

## 源码仓库及问题反馈

- [https://developer.sourcefind.cn/codes/modelzoo/nemotron3_vllm](https://developer.sourcefind.cn/codes/modelzoo/nvidia-nemotron-3-super-120b-a12b-bf16_vllm)

## 参考资料

- [https://github.com/NVIDIA-NeMo/Nemotron](https://github.com/NVIDIA-NeMo/Nemotron)