"tests_mpi/test_low_latency.py" did not exist on "f4b3020e2bf2575aa56f5f0a1556636dc832b60e"
Commit 2a934cec authored by raojy's avatar raojy
Browse files

first

parent 4b618aa3
# SenseNova-U1 # SenseNova-U1
## 论文
[SenseNova-U1](https://arxiv.org/abs/2605.12500)
## 模型简介
由 Inclusion AI 推出的160 亿参数 MoE 混合专家统一扩散大语言模型,基于掩码词预测范式打通多模态理解与生成全能力,依托 SigLIP-VQ 视觉分词器实现高效视觉编码,搭配蒸馏扩散解码器仅需 8 步即可完成高清图像生成;支持文生图、图文理解、指令式图像编辑、带思维推理生成等功能,还搭载 SPRINT 推理加速方案大幅提升运行速度,开源协议为 Apache2.0,仅需加载完整模型权重即可实现多模态全场景任务,是兼顾理解与创作的全能型多模态大模型。
<div align=center>
<img src="./doc/1.png"/>
</div>
## 环境依赖
| 软件 | 版本 |
| :------: |:-----------------------------------------:|
| DTK | 26.04 |
| Python | 3.11.9 |
| Transformers | 4.57.1 |
| Torch | 2.5.1+das.opt1.dtk2604 |
| Flash_attn | 2.8.3+das.opt1.dtk2604.torch251 |
推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-nova
```bash
docker run -it \
--shm-size 256g \
--network=host \
--name nova \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mkfd \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-u root \
-v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-nova bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
## 预训练权重
**请根据`支持的DCU型号`选择对应模型下载,FP8模型仅在BW1100/BW1101上支持,其他型号请勿使用!**
| 模型名称 | 权重大小 | 数据类型 |支持的DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----:|:----------:|:------:|:---------------------:|
| SenseNova-U1-8B-MoT | 8B | BF16 | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-U1-8B-MoT) |
## 数据集
暂无
## 训练
暂无
## 推理
### Transformers
#### 单机推理
##### BF16
##### 视觉理解
```
python examples/vqa/inference.py --model_path sensenova/SenseNova-U1-8B-MoT --image examples/vqa/data/images/menu.jpg --question "My friend and I are dining together tonight. Looking at this menu, can you recommend a good combination of dishes for 2 people? We want a balanced meal — a mix of mains and maybe a starter or dessert. Budget-conscious but want to try the highlights." --output outputs/answer.txt --max_new_tokens 8192 --do_sample --temperature 0.6 --top_p 0.95 --top_k 20 --repetition_penalty 1.05 --profile
```
##### 文生图
```
python examples/t2i/inference.py --model_path sensenova/SenseNova-U1-8B-MoT --prompt "这张信息图的标题是“SenseNova-U1”,采用现代极简科技矩阵风格。整体布局为水平三列网格结构,背景是带有极浅银灰色细密点阵的哑光纯白高级纸张纹理,画面长宽比为16:9。\n\n排版采用严谨的视觉层级:主标题使用粗体无衬线黑体字,正文使用清晰的现代等宽字体。配色方案极其克制,以纯白色为底,深炭黑为主视觉文字和边框,浅石板灰用于背景色块和次要信息区分,图标采用精致的银灰色线框绘制。\n\n在画面正上方居中位置,使用醒目的深炭黑粗体字排布着大标题“SenseNova-U1”。标题正下方是浅石板灰色的等宽字体副标题“新一代端到端统一多模态大模型家族”。\n\n画面主体分为左、中、右三个相等的垂直信息区块,区块之间通过充足的负空间进行物理隔离。\n\n左侧区块的主题是概述。顶部有一个银灰色线框绘制的、由放大镜和齿轮交织的图标,旁边是粗体小标题“Overview”。该区块内从上到下垂直排列着三个要点:第一个要点旁边是一个代表文档与照片重叠的极简图标,紧跟着文字“多模态模型家族,统一文本/图像理解和生成”。向下是由两个相连的同心圆组成的架构图标,配有文字“基于NEO-Unify架构(端到端统一理解和生成)”。最下方是一个带有斜线划掉的眼睛和漏斗形状的图标,明确指示文本“无需视觉编码器(VE)和变分自编码器(VAE)”。\n\n中间区块展示模型矩阵。顶部是一个包含两个分支节点的树状网络图标,旁边是粗体小标题“两个模型规格”。区块内分为上下两个包裹在浅石板灰色极细边框内的卡片。上方的卡片内画着一个代表高密度的实心几何立方体图标,大字标注“SenseNova-U1-8B-MoT”,下方是等宽字体说明“8B MoT 密集主干模型”。下方的卡片内画着一个带有闪电符号的网状发光大脑图标,大字标注“SenseNova-U1-A3B-MoT”,下方是等宽字体说明“A3B MoT 混合专家(MoE)主干模型”。在这两个独立卡片的正下方,左侧放置一个笑脸轮廓图标搭配文字“将在HF等平台公开”,右侧放置一个带有折角的书面报告图标搭配文字“将发布技术报告”。\n\n右侧区块呈现核心优势。顶部是一个代表巅峰的上升阶梯折线图图标,旁边是粗体小标题“Highlights”。该区块内部垂直分布着四个带有浅石板灰底色的长方形色块,每个色块内部左侧对应一个具体的图标,右侧为文字。第一个色块内是一个无缝相连的莫比乌斯环图标,配文“原生统一架构,无VE和VAE”。第二个色块内是一个顶端带有星星的奖杯图标,配文“单一统一模型在理解和生成任务上均达到SOTA性能”。第三个色块内是代表文本行与拍立得照片交替穿插的图标,配文“强大的原生交错推理能力(模型原生生成图像进行推理)”。最后一个色块内是一个被切分出一小块的硬币与详细饼状图结合的图标,配文“能生成复杂信息图表,性价比出色”。" --width 2720 --height 1536 --cfg_scale 4.0 --cfg_norm none --timestep_shift 3.0 --num_steps 50 --output output.png --profile
```
##### 图像编辑
```
python examples/editing/inference.py --model_path sensenova/SenseNova-U1-8B-MoT --prompt "Change the animal's fur color to a darker shade." --image examples/editing/data/images/1.webp --cfg_scale 4.0 --img_cfg_scale 1.0 --cfg_norm none --timestep_shift 3.0 --num_steps 50 --output output_edited.png --profile --compare
```
##### 图文交错生成
```
python examples/interleave/inference.py --model_path sensenova/SenseNova-U1-8B-MoT --prompt "I want to learn how to cook tomato and egg stir-fry. Please give me a beginner-friendly illustrated tutorial." --resolution "16:9" --output_dir outputs/interleave/ --stem demo --profile
```
## 效果展示
<div align=center>
<img src="./doc/ou0.png"/>
</div>
<div align=center>
<img src="./doc/output1.png"/>
</div>
<div align=center>
<img src="./doc/33.png"/>
</div>
### 精度
DCU与GPU精度一致,推理框架:pytorch。
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/sensenova-u1
## 参考资料
- https://github.com/OpenSenseNova/SenseNova-U1
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.14.4
hooks:
- id: ruff
name: ruff check (import sorting)
args: ["--select", "I", "--fix", "--exit-non-zero-on-fix"]
- id: ruff-format
name: ruff format
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: pretty-format-json
name: format ComfyUI example workflows
args: ["--autofix", "--indent=2", "--no-sort-keys", "--no-ensure-ascii"]
files: ^apps/comfyui/example_workflows/.*\.json$
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This diff is collapsed.
This diff is collapsed.
# Runs in the published mirror repo (OpenSenseNova/ComfyUI-SenseNova-U1) after
# the monorepo's `publish-comfyui.yml` pushes a fresh subtree + tag here.
# It does not run in the SenseNova-U1 monorepo because the trigger tag pattern
# `v*.*.*` is rewritten from the monorepo's `comfyui-v*.*.*` tags during sync.
name: Publish to Comfy registry
on:
workflow_dispatch:
push:
tags:
- "v*.*.*"
permissions:
contents: read
issues: write
jobs:
publish-node:
name: Publish Custom Node to registry
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Publish Custom Node
uses: Comfy-Org/publish-node-action@v1
with:
personal_access_token: ${{ secrets.REGISTRY_ACCESS_TOKEN }}
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# SenseNova-U1 for ComfyUI
ComfyUI custom nodes for SenseNova-U1 API and local inference.
> Source of truth lives in [`OpenSenseNova/SenseNova-U1`](https://github.com/OpenSenseNova/SenseNova-U1)
> under `apps/comfyui/`. The standalone repo
> [`OpenSenseNova/ComfyUI-SenseNova-U1`](https://github.com/OpenSenseNova/ComfyUI-SenseNova-U1)
> is a read-only publish mirror used by Comfy Registry; please open PRs against
> the monorepo.
> Requires a ComfyUI build that ships the v3 node API (`comfy_api.latest`). The nodes are registered through `comfy_entrypoint()`; older ComfyUI installs that only support the v1 `NODE_CLASS_MAPPINGS` registration will not load them.
## Nodes
- `SenseNova Image Generate`: calls the U1-Fast image API.
- `SenseNova Chat`, `SenseNova Vision URL`, `SenseNova Vision Image`: utility API nodes.
- `SenseNova Prompt Builder`: rewrites raw ideas into image-generation prompts.
- `SenseNova U1 Local Loader`: loads a local or HuggingFace SenseNova-U1 checkpoint.
- `SenseNova U1 Local Text to Image`: runs local `t2i_generate`.
- `SenseNova U1 Local Image Edit`: runs local `it2i_generate`.
- `SenseNova U1 Local Interleave`: runs local `interleave_gen`.
- `SenseNova Interleave Preview`: renders ordered interleaved text / image results.
## Install
### Recommended (end users): ComfyUI Manager / Comfy Registry
Search for **SenseNova-U1** in ComfyUI Manager, or:
```bash
comfy node install ComfyUI-SenseNova-U1
```
This pulls the latest published release from https://registry.comfy.org and
installs the declared dependencies (including the `sensenova-u1` Python package
needed for local inference) into ComfyUI's Python environment automatically.
Restart ComfyUI afterwards.
### Developer install (from the SenseNova-U1 monorepo)
If you're hacking on the nodes alongside the model source:
```bash
python apps/comfyui/install.py --comfyui /path/to/ComfyUI
python -m pip install -r apps/comfyui/requirements.txt --no-deps # skip the git-URL line
python -m pip install -e . # install sensenova-u1 from src/
```
`install.py` symlinks (or copies, with `--copy`) `apps/comfyui/` into
`<ComfyUI>/custom_nodes/ComfyUI-SenseNova-U1`. Restart ComfyUI after installation.
**Source path auto-discovery is location-bound to the symlink.** In the
default symlink mode, `local_pipeline.default_source_path()` resolves
`__file__` through the symlink and uses `<repo>/src/` if it sees the file
sitting under `apps/comfyui/` — no `SENSENOVA_U1_SRC` needed. If you move,
rename, or delete the monorepo checkout, the link breaks; re-run
`install.py` to recreate it. With `--copy`, the files no longer point back
to the repo, so set `SENSENOVA_U1_SRC=/path/to/SenseNova-U1/src` (or fill
the loader node's `sensenova_u1_src` input) yourself.
## Workflows
Example workflows live in `example_workflows/`. Each links to a screenshot of the loaded graph in `docs/`:
| Workflow | Description | Preview |
| --- | --- | --- |
| `api_u1_fast_t2i.json` | API U1-Fast text-to-image | ![api_u1_fast_t2i](docs/api_u1_fast_t2i.jpg) |
| `local_t2i.json` | Local SenseNova-U1 text-to-image | ![t2i](docs/t2i.jpg) |
| `local_editing.json` | Local SenseNova-U1 image editing | ![editing](docs/editing.jpg) |
| `local_interleave.json` | Local SenseNova-U1 interleaved generation | ![interleave](docs/interleave.jpg) |
Drag a workflow JSON into ComfyUI, then update `model_path`, `device`, `device_map`, and prompt
settings as needed. For a smoke test, set `num_steps` to `1` or `2` before returning to the
recommended `50`.
## API Environment
API nodes read credentials from environment variables or `.env`:
```bash
export SN_API_KEY="your-api-token"
export SN_BASE_URL="https://token.sensenova.cn/v1"
```
Tokens are not exposed as node inputs, so they are not saved into ComfyUI workflows.
## GGUF Quantized Checkpoints
The `SenseNova U1 Local Loader` exposes an optional `gguf_checkpoint` dropdown
populated from `<comfyui>/models/gguf/` and the stock ComfyUI
`<comfyui>/models/diffusion_models/` folder (the default location used by
ComfyUI-GGUF style distributions). When a file is selected, weights are loaded
through `diffusers`' GGUF quantizer (dequantizing `nn.Linear` -> `GGUFLinear`)
instead of safetensors; config and tokenizer still come from `model_path`. The
default empty selection keeps the safetensors path.
Drop your `.gguf` file into either folder and restart ComfyUI to refresh the
dropdown.
Requirements: install the `gguf` extra in the ComfyUI Python environment, e.g.
```bash
python -m pip install -e ".[gguf]" # from this repo, or
python -m pip install "gguf>=0.10.0" "diffusers>=0.30.0"
```
`gguf_checkpoint` cannot be combined with a non-`none` `device_map` — pick one.
## Notes On Samplers
Local U1 generation uses the sampling loop implemented by `t2i_generate`, `it2i_generate`, and
`interleave_gen`. It does not directly plug into ComfyUI's `KSampler` / latent model interface.
You can still reuse ComfyUI image IO and post-processing nodes around these U1 nodes.
# Register `<comfyui>/models/gguf` as a model folder before nodes.py is imported,
# so SenseNovaU1LocalLoader's `gguf_checkpoint` dropdown can be populated via
# folder_paths.get_filename_list("gguf"). Tolerant when folder_paths isn't
# importable (e.g. running tests outside ComfyUI).
try:
import os as _os
import folder_paths as _folder_paths
_gguf_dir = _os.path.join(_folder_paths.models_dir, "gguf")
_existing = _folder_paths.folder_names_and_paths.get("gguf")
if _existing is None:
_folder_paths.folder_names_and_paths["gguf"] = ([_gguf_dir], {".gguf"})
else:
_paths, _exts = _existing
if _gguf_dir not in _paths:
_paths.append(_gguf_dir)
_exts.add(".gguf")
except Exception: # pragma: no cover - non-ComfyUI env or registration race
pass
try:
from .nodes import comfy_entrypoint
except ImportError: # pragma: no cover - supports direct pytest collection
from nodes import comfy_entrypoint
# ComfyUI auto-loads every JS file under this directory as a frontend extension.
# Used to render `ui.text` produced by SenseNovaInterleavePreview, which the
# stock frontend does not display on the node itself.
WEB_DIRECTORY = "./web"
__all__ = ["comfy_entrypoint", "WEB_DIRECTORY"]
from __future__ import annotations
import json
import time
from dataclasses import dataclass
from typing import Any
import httpx
try:
from .config import SenseNovaConfig, load_config
from .image_utils import MAX_IMAGE_BYTES, is_http_url, is_supported_vision_image_url
except ImportError: # pragma: no cover - supports direct test imports
from config import SenseNovaConfig, load_config
from image_utils import MAX_IMAGE_BYTES, is_http_url, is_supported_vision_image_url
CHAT_MODELS = ("sensenova-6.7-flash-lite", "deepseek-v4")
VISION_MODELS = ("sensenova-6.7-flash-lite",)
IMAGE_MODELS = ("sensenova-u1-fast",)
IMAGE_SIZES = (
"2752x1536",
"1536x2752",
"2048x2048",
"2496x1664",
"1664x2496",
"2368x1760",
"1760x2368",
"2272x1824",
"1824x2272",
"3072x1376",
"1344x3136",
)
IMAGE_SIZE_OPTIONS = (
"2752x1536|16:9",
"1536x2752|9:16",
"2048x2048|1:1",
"2496x1664|3:2",
"1664x2496|2:3",
"2368x1760|4:3",
"1760x2368|3:4",
"2272x1824|5:4",
"1824x2272|4:5",
"3072x1376|21:9",
"1344x3136|9:21",
)
@dataclass(frozen=True)
class ChatResult:
text: str
usage: dict[str, Any]
raw: dict[str, Any]
@dataclass(frozen=True)
class ImageGenerationResult:
image_base64: str
image_url: str
image_bytes: bytes
raw: dict[str, Any]
class SenseNovaClient:
def __init__(self, config: SenseNovaConfig):
self.config = config
@classmethod
def from_env(cls) -> SenseNovaClient:
return cls(load_config())
def chat(
self,
*,
text: str,
system_prompt: str,
model: str,
temperature: float,
top_p: float,
max_tokens: int,
timeout: int,
) -> ChatResult:
if model not in CHAT_MODELS:
raise RuntimeError(f"Unsupported chat model: {model}")
if not text.strip():
raise RuntimeError("Chat text cannot be empty.")
payload: dict[str, Any] = {
"model": model,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": text},
],
"stream": False,
"temperature": temperature,
"top_p": top_p,
"max_tokens": max_tokens,
}
raw = self._post_json("/chat/completions", payload, timeout=timeout)
return ChatResult(text=_extract_chat_text(raw), usage=raw.get("usage", {}), raw=raw)
def vision_chat(
self,
*,
image_url: str,
prompt: str,
system_prompt: str,
model: str,
temperature: float,
top_p: float,
max_tokens: int,
timeout: int,
) -> ChatResult:
if model not in VISION_MODELS:
raise RuntimeError(f"Unsupported vision model: {model}")
if not prompt.strip():
raise RuntimeError("Vision prompt cannot be empty.")
if not is_supported_vision_image_url(image_url):
raise RuntimeError("Vision image URL must be http(s) or a base64 image data URL.")
payload: dict[str, Any] = {
"model": model,
"messages": [
{"role": "system", "content": system_prompt},
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
],
"stream": False,
"temperature": temperature,
"top_p": top_p,
"max_tokens": max_tokens,
}
raw = self._post_json("/chat/completions", payload, timeout=timeout)
return ChatResult(text=_extract_chat_text(raw), usage=raw.get("usage", {}), raw=raw)
def generate_image(
self,
*,
prompt: str,
model: str,
size: str,
timeout: int,
) -> ImageGenerationResult:
if model not in IMAGE_MODELS:
raise RuntimeError(f"Unsupported image model: {model}")
normalized_size = normalize_image_size(size)
if normalized_size not in IMAGE_SIZES:
raise RuntimeError(f"Unsupported image size: {size}")
if not prompt.strip():
raise RuntimeError("Image prompt cannot be empty.")
payload: dict[str, Any] = {
"model": model,
"prompt": prompt,
"size": normalized_size,
"n": 1,
}
raw = self._post_json("/images/generations", payload, timeout=timeout)
image_base64, image_url = _extract_image_payload(raw)
image_bytes = b""
if image_base64:
import base64
try:
from .image_utils import strip_data_url
except ImportError: # pragma: no cover - supports direct test imports
from image_utils import strip_data_url
image_bytes = base64.b64decode(strip_data_url(image_base64), validate=True)
elif image_url:
image_bytes = self.download_image(image_url, timeout=timeout)
else:
raise RuntimeError("Image response did not contain b64_json, base64, or url.")
return ImageGenerationResult(
image_base64=image_base64,
image_url=image_url,
image_bytes=image_bytes,
raw=raw,
)
def download_image(self, url: str, *, timeout: int) -> bytes:
if not is_http_url(url):
raise RuntimeError("Image URL must use http or https.")
try:
with (
httpx.Client(timeout=timeout, follow_redirects=True) as client,
client.stream("GET", url) as response,
):
response.raise_for_status()
chunks: list[bytes] = []
total = 0
for chunk in response.iter_bytes():
total += len(chunk)
if total > MAX_IMAGE_BYTES:
raise RuntimeError("Downloaded image is larger than 50MB.")
chunks.append(chunk)
return b"".join(chunks)
except httpx.HTTPStatusError as exc:
status_code = exc.response.status_code
raise RuntimeError(f"Image download failed with HTTP {status_code}.") from exc
except httpx.HTTPError as exc:
raise RuntimeError(f"Image download failed: {exc.__class__.__name__}.") from exc
def _post_json(self, path: str, payload: dict[str, Any], *, timeout: int) -> dict[str, Any]:
url = f"{self.config.base_url}{path}"
headers = {
"Authorization": f"Bearer {self.config.api_key}",
"Content-Type": "application/json",
}
last_error: Exception | None = None
for attempt in range(3):
try:
with httpx.Client(timeout=timeout) as client:
response = client.post(url, headers=headers, json=payload)
if response.status_code in {429, 500, 502, 503, 504} and attempt < 2:
time.sleep(2**attempt)
continue
response.raise_for_status()
return response.json()
except httpx.HTTPStatusError as exc:
status_code = exc.response.status_code
if status_code in {429, 500, 502, 503, 504} and attempt < 2:
time.sleep(2**attempt)
last_error = exc
continue
raise RuntimeError(_format_api_error(exc.response, self.config.api_key)) from exc
except httpx.HTTPError as exc:
if attempt < 2:
time.sleep(2**attempt)
last_error = exc
continue
raise RuntimeError(f"SenseNova request failed: {exc.__class__.__name__}.") from exc
except json.JSONDecodeError as exc:
raise RuntimeError("SenseNova response was not valid JSON.") from exc
raise RuntimeError(f"SenseNova request failed: {last_error.__class__.__name__}.")
def _extract_chat_text(raw: dict[str, Any]) -> str:
try:
return raw["choices"][0]["message"]["content"]
except (KeyError, IndexError, TypeError) as exc:
raise RuntimeError("Chat response did not contain choices[0].message.content.") from exc
def normalize_image_size(size: str) -> str:
return size.split("|", 1)[0].strip()
def _extract_image_payload(raw: dict[str, Any]) -> tuple[str, str]:
try:
first = raw["data"][0]
except (KeyError, IndexError, TypeError) as exc:
raise RuntimeError("Image response did not contain data[0].") from exc
if not isinstance(first, dict):
raise RuntimeError("Image response data[0] was not an object.")
image_base64 = first.get("b64_json") or first.get("base64") or first.get("image_base64") or ""
image_url = first.get("url") or ""
return str(image_base64), str(image_url)
def _format_api_error(response: httpx.Response, api_key: str = "") -> str:
message = ""
try:
body = response.json()
message = body.get("error", {}).get("message") or body.get("message") or ""
except Exception:
message = response.text[:500]
if message:
return f"SenseNova API error HTTP {response.status_code}: {_redact(message, api_key)}"
return f"SenseNova API error HTTP {response.status_code}."
def _redact(value: str, api_key: str = "") -> str:
redacted = value.replace("Bearer ", "Bearer [REDACTED] ")
if api_key:
redacted = redacted.replace(api_key, "[REDACTED]")
return redacted
from __future__ import annotations
import os
from dataclasses import dataclass
from dotenv import load_dotenv
DEFAULT_BASE_URL = "https://token.sensenova.cn/v1"
API_KEY_ENV = "SN_API_KEY"
BASE_URL_ENV = "SN_BASE_URL"
@dataclass(frozen=True)
class SenseNovaConfig:
api_key: str
base_url: str = DEFAULT_BASE_URL
def load_config(*, load_env_file: bool = True) -> SenseNovaConfig:
if load_env_file:
load_dotenv()
api_key = os.getenv(API_KEY_ENV, "").strip()
if not api_key:
raise RuntimeError(f"Missing {API_KEY_ENV}. Set it in your environment or in a local .env file.")
base_url = os.getenv(BASE_URL_ENV, DEFAULT_BASE_URL).strip().rstrip("/")
if not base_url:
base_url = DEFAULT_BASE_URL
return SenseNovaConfig(api_key=api_key, base_url=base_url)
{
"id": "03c13bce-fb68-4284-bd5a-88aed2db6cec",
"revision": 0,
"last_node_id": 8,
"last_link_id": 7,
"nodes": [
{
"id": 3,
"type": "SenseNovaImageGenerate",
"pos": [
224.37622016118027,
254.75802785642676
],
"size": [
400,
240
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "prompt",
"type": "STRING",
"widget": {
"name": "prompt"
},
"link": 6
}
],
"outputs": [
{
"name": "images",
"type": "IMAGE",
"links": [
7
]
},
{
"name": "image_base64",
"type": "STRING",
"links": null
},
{
"name": "image_url",
"type": "STRING",
"links": null
},
{
"name": "raw_json",
"type": "STRING",
"links": []
},
{
"name": "image_info",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaImageGenerate"
},
"widgets_values": [
"",
"sensenova-u1-fast",
"2752x1536|16:9",
300
]
},
{
"id": 8,
"type": "PreviewImage",
"pos": [
663.0133442095159,
255.6695527433532
],
"size": [
140,
246
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 7
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 7,
"type": "SenseNovaPromptBuilder",
"pos": [
-201.9358334262903,
356.0050478984727
],
"size": [
400,
302
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "prompt",
"type": "STRING",
"links": [
6
]
},
{
"name": "usage_json",
"type": "STRING",
"links": null
},
{
"name": "raw_json",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaPromptBuilder"
},
"widgets_values": [
"如何养猫咪",
"You are a world-renowned \"Senior Visual Information Architect\" and \"AI Image Prompt Engineering Expert.\" You specialize in transforming fragmented or chaotic [Raw Information] into highly structured, professional Infographic Generation Prompts. Your work is defined by rigorous visual logic, precise spatial organization, and a dense amount of useful information.\n\n# Task\nReconstruct the user's [Raw Information] into a comprehensive visual synthesis prompt. Your objective is to guide an image generation model to render an information-dense infographic with advanced typography, vivid visual style, and clear structure based only on the user's text.\n\n# Step-by-Step Methodology\n1. Content Expansion and Textualization: Analyze the [Raw Information] to extract its core intent.\n - Detailing: Extract every entity, number, color, date, and phrase from the [Raw Information]. Do not omit provided facts.\n - Categorization: Define sub-categories with distinct visual markers.\n - Density Enrichment: If the input is brief, add professional annotations, sub-headings, body text, key insights, or practical notes that fit the topic.\n2. Adaptive Structural Analysis:\n - User-Defined Priority: If the user provides layout instructions, strictly follow them.\n - Logic-Driven Inference: If no layout is specified, infer whether the content is chronological, hierarchical, process-oriented, comparative, or modular, then choose the best spatial architecture.\n3. Style Tonal Setting: If no style is provided, assign an aesthetic that complements the topic, such as modern editorial infographic, technical blueprint, clean SaaS dashboard, or hand-drawn knowledge poster.\n4. Data Preservation and Encoding: Preserve all numbers, dates, colors, and proper nouns exactly. Convert them into explicit visual labels, charts, or callouts within the prompt.\n5. Language Parity: Detect the language of the [Raw Information] and use that language for the entire output. If the input is Chinese, output Chinese. If the input is English, output English. Do not mix languages.\n\n# Strict Constraints\n1. Do not include introductory, summary, or meta-commentary text. Start directly with the final visual prompt.\n2. Every piece of text intended to appear in the image must be enclosed in quotation marks.\n3. Do not use quotation marks for style descriptions, layout descriptions, colors, or non-textual elements.\n4. Describe the layout, reading order, background texture, visual hierarchy, typography, color palette, icons, charts, and callouts explicitly.\n5. Describe every icon semantically. Do not write generic phrases like \"an icon\" without specifying its visual content.\n6. Minimize arrows unless the user asks for them. Prefer alignment, grouping, proximity, and numbered steps to show relationships.\n7. Do not use hexadecimal color codes. Use descriptive color names.\n8. Do not invent factual numbers, dates, names, or claims that conflict with the user's input.\n\n# Output\nReturn only the final image generation prompt. The prompt should be directly usable as input to an image generation node.",
"sensenova-6.7-flash-lite",
0.3,
1,
2048,
120
]
}
],
"links": [
[
6,
7,
0,
3,
0,
"STRING"
],
[
7,
3,
0,
8,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"workflowRendererVersion": "LG",
"ds": {
"scale": 1.8954988944389861,
"offset": [
329.6590232185344,
-134.93740305896515
]
},
"frontendVersion": "1.39.19"
},
"version": 0.4
}
{
"id": "8a36c97b-03db-45f4-9154-0e59b7f61366",
"revision": 0,
"last_node_id": 6,
"last_link_id": 4,
"nodes": [
{
"id": 1,
"type": "SenseNovaU1LocalLoader",
"pos": [
-622.3850180121219,
-19.147734766664918
],
"size": [
504,
432
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "u1_model",
"type": "SENSENOVA_U1_LOCAL_MODEL",
"links": [
1
]
},
{
"name": "model_info_json",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaU1LocalLoader"
},
"widgets_values": [
"sensenova/SenseNova-U1-8B-MoT",
"",
"cuda",
"bfloat16",
"auto",
"none",
"",
"full",
""
]
},
{
"id": 2,
"type": "LoadImage",
"pos": [
-620.497673660328,
480.7228565428773
],
"size": [
501.1875,
646.90625
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
2
]
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"1.webp",
"image"
]
},
{
"id": 3,
"type": "SenseNovaU1LocalImageEdit",
"pos": [
1.2299845460901224,
146.87837744855761
],
"size": [
528,
639.65625
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "u1_model",
"type": "SENSENOVA_U1_LOCAL_MODEL",
"link": 1
},
{
"name": "image",
"type": "IMAGE",
"link": 2
}
],
"outputs": [
{
"name": "images",
"type": "IMAGE",
"links": [
3
]
},
{
"name": "text",
"type": "STRING",
"links": null
},
{
"name": "think_text",
"type": "STRING",
"links": [
4
]
},
{
"name": "metadata_json",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaU1LocalImageEdit"
},
"widgets_values": [
"Change the jacket of the person on the left to bright yellow.",
true,
2048,
2048,
4.194304,
4,
1,
"none",
3,
0,
1,
50,
1,
42,
"fixed",
false
]
},
{
"id": 4,
"type": "PreviewImage",
"pos": [
663.2786584907477,
-37.692689060801655
],
"size": [
464.25,
746.078125
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 3
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 6,
"type": "PreviewAny",
"pos": [
666.6133339501529,
809.4684559313681
],
"size": [
464.015625,
208.015625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "source",
"type": "*",
"link": 4
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewAny"
},
"widgets_values": [
null,
null,
null
]
}
],
"links": [
[
1,
1,
0,
3,
0,
"SENSENOVA_U1_LOCAL_MODEL"
],
[
2,
2,
0,
3,
1,
"IMAGE"
],
[
3,
3,
0,
4,
0,
"IMAGE"
],
[
4,
3,
2,
6,
0,
"STRING"
]
],
"groups": [],
"config": {},
"extra": {
"workflowRendererVersion": "Vue",
"ds": {
"scale": 0.7149221062580672,
"offset": [
1225.7407146168464,
100.87314475139422
]
},
"frontendVersion": "1.39.19",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
{
"id": "a92af27a-0106-4c6f-9d1c-f9783b652f44",
"revision": 0,
"last_node_id": 7,
"last_link_id": 5,
"nodes": [
{
"id": 1,
"type": "SenseNovaU1LocalLoader",
"pos": [
-586.4999290408795,
148.5000079099019
],
"size": [
500.234375,
390.625
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "u1_model",
"type": "SENSENOVA_U1_LOCAL_MODEL",
"links": [
1
]
},
{
"name": "model_info_json",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaU1LocalLoader"
},
"widgets_values": [
"sensenova/SenseNova-U1-8B-MoT-Infographic",
"",
"cuda",
"bfloat16",
"auto",
"none",
"",
"full",
""
]
},
{
"id": 2,
"type": "SenseNovaU1LocalTextToImage",
"pos": [
-19.2498760630192,
149.75002977542277
],
"size": [
565.953125,
887.203125
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "u1_model",
"type": "SENSENOVA_U1_LOCAL_MODEL",
"link": 1
},
{
"name": "prompt",
"type": "STRING",
"widget": {
"name": "prompt"
},
"link": 5
}
],
"outputs": [
{
"name": "images",
"type": "IMAGE",
"links": [
2
]
},
{
"name": "text",
"type": "STRING",
"links": null
},
{
"name": "think_text",
"type": "STRING",
"links": [
3
]
},
{
"name": "metadata_json",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaU1LocalTextToImage"
},
"widgets_values": [
"这张信息图的标题是“SenseNova-U1”,采用现代极简科技矩阵风格。整体布局为水平三列网格结构,背景是带有极浅银灰色细密点阵的哑光纯白高级纸张纹理,画面长宽比为16:9。\\n\\n排版采用严谨的视觉层级:主标题使用粗体无衬线黑体字,正文使用清晰的现代等宽字体。配色方案极其克制,以纯白色为底,深炭黑为主视觉文字和边框,浅石板灰用于背景色块和次要信息区分,图标采用精致的银灰色线框绘制。\\n\\n在画面正上方居中位置,使用醒目的深炭黑粗体字排布着大标题“SenseNova-U1”。标题正下方是浅石板灰色的等宽字体副标题“新一代端到端统一多模态大模型家族”。\\n\\n画面主体分为左、中、右三个相等的垂直信息区块,区块之间通过充足的负空间进行物理隔离。\\n\\n左侧区块的主题是概述。顶部有一个银灰色线框绘制的、由放大镜和齿轮交织的图标,旁边是粗体小标题“Overview”。该区块内从上到下垂直排列着三个要点:第一个要点旁边是一个代表文档与照片重叠的极简图标,紧跟着文字“多模态模型家族,统一文本/图像理解和生成”。向下是由两个相连的同心圆组成的架构图标,配有文字“基于NEO-Unify架构(端到端统一理解和生成)”。最下方是一个带有斜线划掉的眼睛和漏斗形状的图标,明确指示文本“无需视觉编码器(VE)和变分自编码器(VAE)”。\\n\\n中间区块展示模型矩阵。顶部是一个包含两个分支节点的树状网络图标,旁边是粗体小标题“两个模型规格”。区块内分为上下两个包裹在浅石板灰色极细边框内的卡片。上方的卡片内画着一个代表高密度的实心几何立方体图标,大字标注“SenseNova-U1-8B-MoT”,下方是等宽字体说明“8B MoT 密集主干模型”。下方的卡片内画着一个带有闪电符号的网状发光大脑图标,大字标注“SenseNova-U1-A3B-MoT”,下方是等宽字体说明“A3B MoT 混合专家(MoE)主干模型”。在这两个独立卡片的正下方,左侧放置一个笑脸轮廓图标搭配文字“将在HF等平台公开”,右侧放置一个带有折角的书面报告图标搭配文字“将发布技术报告”。\\n\\n右侧区块呈现核心优势。顶部是一个代表巅峰的上升阶梯折线图图标,旁边是粗体小标题“Highlights”。该区块内部垂直分布着四个带有浅石板灰底色的长方形色块,每个色块内部左侧对应一个具体的图标,右侧为文字。第一个色块内是一个无缝相连的莫比乌斯环图标,配文“原生统一架构,无VE和VAE”。第二个色块内是一个顶端带有星星的奖杯图标,配文“单一统一模型在理解和生成任务上均达到SOTA性能”。第三个色块内是代表文本行与拍立得照片交替穿插的图标,配文“强大的原生交错推理能力(模型原生生成图像进行推理)”。最后一个色块内是一个被切分出一小块的硬币与详细饼状图结合的图标,配文“能生成复杂信息图表,性价比出色”。",
"2720x1536|16:9",
4,
"none",
3,
0,
1,
50,
1,
42,
false,
false
]
},
{
"id": 3,
"type": "PreviewImage",
"pos": [
612.0374341866545,
153.07410621275454
],
"size": [
614.578125,
393.609375
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 2
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 4,
"type": "PreviewAny",
"pos": [
614.1411713052378,
610.8904862658945
],
"size": [
606.28125,
415.96875
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "source",
"type": "*",
"link": 3
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewAny"
},
"widgets_values": [
null,
null,
null
]
},
{
"id": 6,
"type": "Note",
"pos": [
-874.1990665007161,
624.8566777241045
],
"size": [
238.546875,
133.046875
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"This is a prompt enhancement module; you can turn it off if you don't need it."
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 7,
"type": "SenseNovaPromptBuilder",
"pos": [
-585.0759757790407,
605.0166969060281
],
"size": [
504.09375,
412.65625
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "prompt",
"type": "STRING",
"links": null
},
{
"name": "usage_json",
"type": "STRING",
"links": [
5
]
},
{
"name": "raw_json",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaPromptBuilder"
},
"widgets_values": [
"生成一张教育预防电信诈骗的信息图",
"You are a world-renowned \"Senior Visual Information Architect\" and \"AI Image Prompt Engineering Expert.\" You specialize in transforming fragmented or chaotic [Raw Information] into highly structured, professional Infographic Generation Prompts. Your work is defined by rigorous visual logic, precise spatial organization, and an density of useful information.\n\n# Task\nReconstruct the user’s [Raw Information] into a comprehensive visual synthesis prompt (approx. 400-600 words). Your objective is to guide large image models (e.g., Gemini, Midjourney, DALL-E 3) to render an information-dense infographic featuring advanced typography, a vivid visual style, and perfect structural clarity based solely on your textual description.\n\n# Step-by-Step Methodology\n1. **Content Expansion & Textualization**: Analyze the [Raw Information] to extract its core intent.\n - Detailing: Extract every entity, number, color, and phrase from the [Raw Information]. Do not summarize.\n - Categorization: Define sub-categories with distinct visual markers.\n - Density Enrichment: If the input is brief, supplement it with professional annotations, sub-headings, body text and \"Pro-tips\" or \"Key Insights\" related to the topic to maximize the \"information load\".\n2. **Adaptive Structural Analysis**:\n - User-Defined Priority: First, check if the user has provided specific layout instructions (e.g., \"three-column grid,\" \"horizontal timeline\"). If present, strictly follow these instructions.\n - Logic-Driven Inference: If no layout is specified, analyze the [Raw Information] for its underlying logic (chronological, hierarchical, process-oriented, or comparative) and design a spatial architecture that best serves that logic.\n3. **Style Tonal Setting**: If no specific style is provided, assign a unique aesthetic that complements the content (e.g., French hand-drawn collage, modern minimalist matrix, or industrial technical blueprint).\n4. **Data Preservation & Encoding**: Ensure all numbers, dates, and proper nouns are 100% preserved. Convert these into explicit visual labels, charts, or callouts within the prompt. Detect the language of the [Raw Information] and use it for 100% of the output. If input is Chinese, output Chinese. If input is English, output English. No mixing.\n\n\n# Strict Constraints\n1. **Strict Language Parity**: Maintain absolute language consistency. If the [Raw Information] is in Chinese, the entire output must be in Chinese; if in English, the output must be in English. No code-switching.\n2. **Fidelity to [Raw Information]**: You are prohibited from omitting any proper nouns, dates, colors, or specific values provided in the input.\n3. **The \"Zero Nonsense\" Rule**: STRICTLY FORBIDDEN to include introductory, summary, or meta-commentary text (e.g., \"Here is the refined prompt...\"). Do not explain design choices or justify element omissions (e.g., do not mention \"implied flow\"). Start the response immediately with the visual description.\n4. **Visual Precision:\n - Textures: Mandatorily describe background textures (e.g., off-white aged paper, light gray grid, or black halftone shadows).\n - Typography: Explicitly specify font styles for different hierarchies (e.g., bold serif for titles, condensed mono-space for technical data).\n5. **Text Rendering Protocol**:\n - Quotes for Content: Every piece of text intended to appear in the image MUST be enclosed in quotes.\n - No Quotes for Style: NEVER use quotation marks for descriptions of [Style Description], [Layout Structure], colors or any non-textual elements.\n6. **Relational Arrow Logic**: Minimize the use of arrows. Rely on spatial proximity or alignment to imply connectivity. If arrows are requested, avoid generic orientations like \"horizontal.\" Instead, specify their precise starting point and target destination.\n7. **Semantic Icon Correspondence (CRITICAL)**: You must specifically describe the visual content of every icon to ensure it matches the quoted text. (e.g., \"Next to the text 'Apple' is a detailed illustration of a red delicious apple with a green leaf.\") Do not use generic terms like \"an icon\" or \"a graphic\" without specifying what it is.\n8. **No Hexadecimal Codes**: Never use codes like #xxxx. Use descriptive color names (e.g., sage green, deep navy blue, terracotta).\n\n# Output Format (If the [Raw Information] is in Chinese, please translate the following content into Chinese. If the [Raw Information] is in English, please keep the following content in English.)\nThe theme of the infographic is [Subject Name] (or 此信息图的主题是: [Subject Name]), [Style Description]. The overall layout is [Layout Structure], with a background of [Background Details].\nProvide a smooth and fluent description of the prompts for generating professional infographics. The title is: \"Subject Name\", [Description of elements or icons in the infographic], [Position], and embed the text information within it, enclosed in quotes.\n\n---\nPlease receive the user's [Raw Information] and directly output the restructured professional image generation prompt:",
"sensenova-6.7-flash-lite",
0.3,
1,
4096,
120
]
}
],
"links": [
[
1,
1,
0,
2,
0,
"SENSENOVA_U1_LOCAL_MODEL"
],
[
2,
2,
0,
3,
0,
"IMAGE"
],
[
3,
2,
2,
4,
0,
"STRING"
],
[
5,
7,
1,
2,
1,
"STRING"
]
],
"groups": [],
"config": {},
"extra": {
"workflowRendererVersion": "Vue",
"ds": {
"scale": 0.6412973759090729,
"offset": [
1055.6855096974812,
19.214707810300403
]
},
"frontendVersion": "1.39.19",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
{
"id": "be8d8cc8-cbee-4189-9901-f3562f2b9815",
"revision": 0,
"last_node_id": 6,
"last_link_id": 8,
"nodes": [
{
"id": 1,
"type": "SenseNovaU1LocalLoader",
"pos": [
-699.5435454242499,
144.33814729971365
],
"size": [
504,
432
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "u1_model",
"type": "SENSENOVA_U1_LOCAL_MODEL",
"links": [
5
]
},
{
"name": "model_info_json",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaU1LocalLoader"
},
"widgets_values": [
"sensenova/SenseNova-U1-8B-MoT",
"",
"cuda",
"bfloat16",
"auto",
"none",
"",
"full",
""
]
},
{
"id": 3,
"type": "PreviewImage",
"pos": [
1558.6051808986454,
127.84686237113888
],
"size": [
1080,
684
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 6
}
],
"outputs": [],
"properties": {
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
},
{
"id": 4,
"type": "SenseNovaInterleavePreview",
"pos": [
743.748594409936,
132.9936483478975
],
"size": [
702.609375,
2427.171875
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "interleave_result",
"type": "SENSENOVA_INTERLEAVE_RESULT",
"link": 7
},
{
"name": "images",
"shape": 7,
"type": "IMAGE",
"link": 8
}
],
"outputs": [
{
"name": "markdown",
"type": "STRING",
"links": null
}
],
"properties": {
"Node name for S&R": "SenseNovaInterleavePreview"
},
"widgets_values": [
false,
""
]
},
{
"id": 5,
"type": "SenseNovaU1LocalInterleave",
"pos": [
-73.00482369366728,
142.500843346499
],
"size": [
689.09375,
955.40625
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [
{
"name": "u1_model",
"type": "SENSENOVA_U1_LOCAL_MODEL",
"link": 5
},
{
"name": "image",
"shape": 7,
"type": "IMAGE",
"link": null
}
],
"outputs": [
{
"name": "images",
"type": "IMAGE",
"links": [
6,
8
]
},
{
"name": "text",
"type": "STRING",
"links": null
},
{
"name": "think_text",
"type": "STRING",
"links": null
},
{
"name": "metadata_json",
"type": "STRING",
"links": null
},
{
"name": "interleave_result",
"type": "SENSENOVA_INTERLEAVE_RESULT",
"links": [
7
]
}
],
"properties": {
"Node name for S&R": "SenseNovaU1LocalInterleave"
},
"widgets_values": [
"讲一下经典童话《卖火柴的小女孩》,但这次请给出一个温暖的平行宇宙改编版图文绘本。在最后一次擦亮火柴时,出现的不是幻象,而是一只拥有魔法的驯鹿,它载着小女孩飞向了有糖果和壁炉的城堡",
"2048x1152|16:9",
"You are a multimodal assistant capable of reasoning with both text and images. You support two modes:\n\nThink Mode: When reasoning is needed, you MUST start with a <think></think> block and place all reasoning inside it. You MUST interleave text with generated images using tags like <image1>, <image2>. Images can ONLY be generated between <think> and </think>, and may be referenced in the final answer.\n\nNon-Think Mode: When no reasoning is needed, directly provide the answer without reasoning. Do not use tags like <image1>, <image2>; present any images naturally alongside the text.\n\nAfter the think block, always provide a concise, user-facing final answer. The answer may include text, images, or both. Match the user's language in both reasoning and the final answer.",
4,
1,
3,
0,
1,
50,
42,
"fixed",
false
]
}
],
"links": [
[
5,
1,
0,
5,
0,
"SENSENOVA_U1_LOCAL_MODEL"
],
[
6,
5,
0,
3,
0,
"IMAGE"
],
[
7,
5,
4,
4,
0,
"SENSENOVA_INTERLEAVE_RESULT"
],
[
8,
5,
0,
4,
1,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"workflowRendererVersion": "Vue",
"ds": {
"scale": 0.626887968379132,
"offset": [
1496.3740592420436,
92.9135509708664
]
},
"frontendVersion": "1.39.19",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment