# SenseNova-SI
## 论文
[SenseNova-SI](https://arxiv.org/abs/2511.13719)
## 模型简介
SenseNova-SI 是开源多模态空间智能模型系列,旨在补齐传统多模态模型在三维空间感知与几何推理上的不足,该模型基于 InternVL3、Qwen3-VL、BAGEL 三大基座打造,拥有 2B、8B 等主流参数量版本,其中 1.3 系列综合空间能力最优,多项基准达成同规模开源模型 SOTA,1.4 版本强化目标定位与深度估计,1.5 版本擅长立体几何解答;它可胜任方位判断、三维解析等各类空间任务,整体性能领先同量级开源模型,部分能力比肩主流闭源模型,且全系开源,支持单图与多图输入,并配套完整的推理和微调方案。
## 环境依赖
| 软件 | 版本 |
| :------: |:-----------------------------------------:|
| DTK | 26.04 |
| Python | 3.11.9 |
| Transformers | 4.57.1 |
| Torch | 2.5.1+das.opt1.dtk2604 |
| Flash_attn | 2.8.3+das.opt1.dtk2604.torch251 |
推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova
```bash
docker run -it \
--shm-size 256g \
--network=host \
--name nova \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mkfd \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-u root \
-v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
## 预训练权重
| 模型名称 | 权重大小 | 数据类型 |支持的DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----:|:----------:|:------:|:---------------------:|
| SenseNova-SI-1.1-InternVL3-8B | 8B | BF16 | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-InternVL3-8B) |
| SenseNova-SI-1.1-BAGEL-7B-MoT | 8B | BF16 | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-BAGEL-7B-MoT) |
## 数据集
暂无
## 训练
暂无
## 推理
### Transformers
#### 单机推理
##### Example for BAGEL generation
```
cd sensenova-si
python example_bagel.py \
--model_path sensenova/SenseNova-SI-1.1-BAGEL-7B-MoT \
--prompt "A chubby cat made of 3D point clouds, stretching its body, translucent with a soft glow." \
--mode generate
```
##### Example 1
```
python example.py \
--image_paths examples/Q1_1.png \
--question "Question: Consider the real-world 3D locations of the objects. Which is closer to the sink, the toilet paper or the towel?\nOptions: \nA. toilet paper\nB. towel\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
##### Example 2
```
python example.py \
--image_paths examples/Q2_1.png examples/Q2_2.png \
--question "If the landscape painting is on the east side of the bedroom, where is the window located in the bedroom?\nOptions: A. North side, B. South side, C. West side, D. East side\nAnswer with the option's letter from the given choices directly. Enclose the option's letter within ``." \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
##### Example 3
```
python example.py \
--image_paths examples/Q3_1.png examples/Q3_2.png examples/Q3_3.png \
--question "The robot is making tea. What is the order in which the pictures were taken?" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
##### Example 4
```
python example.py \
--image_paths examples/Q4.png \
--question "Please provide the bounding box coordinate of the region this sentence describes: [blue shirt lady]" \
--model_path sensenova/SenseNova-SI-1.4-InternVL3-8B
```
## 效果展示
### 精度
DCU与GPU精度一致,推理框架:pytorch。
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/sensenova-si
## 参考资料
- https://github.com/OpenSenseNova/SenseNova-SI