README.md 5.44 KB
Newer Older
raojy's avatar
raojy committed
1
# NVIDIA-Nemotron-3-Super
raojy's avatar
raojy committed
2
3
4
5
6
7
8

## 论文

[NVIDIA Nemotron-3 Series Technical Report](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf)

## 模型简介

raojy's avatar
raojy committed
9
Nemotron-3-Super 是由英伟达 (NVIDIA) 训练的大语言模型 (LLM),旨在提供强大的智能体 (Agentic)、推理及对话能力。该模型针对协作智能体和高负载工作场景(如 IT 工单自动化)进行了深度优化。与该系列的其他模型类似,它在响应用户查询或任务时,会采取“先生成推理轨迹 (Reasoning Trace),后给出最终回复”的模式。此外,模型的推理能力可以通过聊天模板中的标志位 (Flag) 进行灵活配置。
raojy's avatar
raojy committed
10
11
12
13
14
15
16

在架构方面,该模型采用了混合潜变量混合专家 (Latent Mixture-of-Experts, LatentMoE) 架构,通过交替堆叠 Mamba-2 层、MoE 层以及精选的注意力 (Attention) 层实现。与 Nano 版本不同,Super 模型引入了多 Token 预测 (Multi-Token Prediction, MTP) 层,从而在提升文本生成质量的同时显著加快了生成速度。为了最大化计算效率,该模型在训练过程中采用了 NVFP4 量化技术。

该模型拥有 12B 激活参数,总参数量达 120B。目前支持包括英语、法语、德语、意大利语、日语、西班牙语和中文在内的多种语言。该模型已具备商用能力。

## 环境依赖

raojy's avatar
raojy committed
17
18
19
20
21
22
23
24
25
|   **软件**   |                      **版本**                      |
| :----------: | :------------------------------------------------: |
|     DTK      |                       26.04                        |
|    python    |                      3.10.12                       |
| transformers |                     5.2.0.dev0                     |
|     vllm     |           0.15.1+das.opt1.alpha.dtk2604            |
|    triton    | 3.3.0+das.opt2.dtk2604.torch291.20260210.g1329924c |
|    torch     |     22.9.0+das.opt1.dtk2604.20260206.g275d08c2     |
|    numpy     |                       1.26.1                       |
raojy's avatar
raojy committed
26

raojy's avatar
raojy committed
27
当前仅支持以下镜像: `harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220`
raojy's avatar
raojy committed
28
29
30
31
32
33

- 挂载地址`-v` 根据实际模型情况修改

```
docker run -it --shm-size 200g \
                --network=host \
raojy's avatar
raojy committed
34
                --name nemotron \
raojy's avatar
raojy committed
35
36
37
38
39
40
41
42
43
44
45
                --privileged \
                --device=/dev/kfd \
                --device=/dev/dri \
                --device=/dev/mkfd \
                --group-add video \
                --cap-add=SYS_PTRACE \
                --security-opt seccomp=unconfined \
                -u root \
                -v /opt/hyhal/:/opt/hyhal/:ro \
                harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220 bash
```
raojy's avatar
raojy committed
46
47
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

raojy's avatar
raojy committed
48
关于本项目 DCU 显卡所需的特殊深度学习库,numpy、vllm 库需要替换安装:
raojy's avatar
raojy committed
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

```
pip uninstall vllm
pip uninstall numpy
pip install vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl
pip install numpy==1.26.1
```

## 数据集

暂无

## 训练

暂无

## 推理

### vllm

raojy's avatar
raojy committed
69
#### 单机推理
raojy's avatar
raojy committed
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

```
## serve启动
export VLLM_USE_NN=0
export VLLM_ENABLE_MOE_FUSED_GATE=0

vllm serve nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
    --served-model-name nemotron \
    --dtype bfloat16 \
    --trust-remote-code \
    --mamba-ssm-cache-dtype float32 \
    --tensor-parallel-size 8 \
    --tool-call-parser qwen3_coder \
    --enable-auto-tool-choice \
    --reasoning-parser super_v3 \
raojy's avatar
raojy committed
85
    --reasoning-parser-plugin nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/super_v3_reasoning_parser.py
raojy's avatar
raojy committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

## client访问
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nemotron",
    "messages": [
      {"role": "user", "content": "帮我查下北京天气,顺便把结果翻译成英文。"},
      {"role": "assistant", "tool_calls": [{"id": "chatcmpl-tool-a3ba5e50a56e4f3b", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"北京\"}"}}]},
      {"role": "tool", "tool_call_id": "chatcmpl-tool-a3ba5e50a56e4f3b", "content": "{\"weather\": \"晴朗\", \"temperature\": \"25度\"}"}
    ],
    "tools": [
      {"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}},
      {"type": "function", "function": {"name": "translate", "parameters": {"type": "object", "properties": {"text": {"type": "string"}, "target_lang": {"type": "string"}}}}}
    ]
  }'
```


## 效果展示
raojy's avatar
raojy committed
106
107
108
<div align=center>
    <img src="./doc/1.png"/>
</div>
raojy's avatar
raojy committed
109
110
111
112
113
114
115
116

### 精度

DCU 与 GPU 精度一致,推理框架:vllm。

## 预训练权重

| **模型名称**                    | **权重大小** | **DCU型号**   | **最低卡数需求** | **下载地址**                                                 |
raojy's avatar
raojy committed
117
| :-----------------------------: | :----------: | :-----------: | :--------------: | :----------------------------------------------------------: |
raojy's avatar
raojy committed
118
| NVIDIA-Nemotron-3-Super-120B-A12B-BF16      | 120B         | BW1000 | 8                | [Hugging Face](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) |
raojy's avatar
raojy committed
119
120
121

## 源码仓库及问题反馈

raojy's avatar
raojy committed
122
- [https://developer.sourcefind.cn/codes/modelzoo/nvidia-nemotron-3-super_vllm](https://developer.sourcefind.cn/codes/modelzoo/nvidia-nemotron-3-super_vllm)
raojy's avatar
raojy committed
123
124
125

## 参考资料

raojy's avatar
raojy committed
126
- [https://github.com/NVIDIA-NeMo/Nemotron](https://github.com/NVIDIA-NeMo/Nemotron)
raojy's avatar
raojy committed
127