README.md

# SenseNova-SI
## 论文
[SenseNova-SI](https://arxiv.org/abs/2511.13719)

## 模型简介
SenseNova-SI 是开源多模态空间智能模型系列，旨在补齐传统多模态模型在三维空间感知与几何推理上的不足，该模型基于 InternVL3、Qwen3-VL、BAGEL 三大基座打造，拥有 2B、8B 等主流参数量版本，其中 1.3 系列综合空间能力最优，多项基准达成同规模开源模型 SOTA，1.4 版本强化目标定位与深度估计，1.5 版本擅长立体几何解答；它可胜任方位判断、三维解析等各类空间任务，整体性能领先同量级开源模型，部分能力比肩主流闭源模型，且全系开源，支持单图与多图输入，并配套完整的推理和微调方案。


<div align=center>
    <img src="./doc/1.png"/>
</div>

## 环境依赖
| 软件 |                    版本                     |
| :------: |:-----------------------------------------:|
| DTK |                   26.04                   |
| Python |                  3.11.9                  |
| Transformers |            4.57.1               |
| Torch |   2.5.1+das.opt1.dtk2604   |
| Flash_attn |   2.8.3+das.opt1.dtk2604.torch251   |

推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova

```bash
docker run -it \
    --shm-size 256g \
    --network=host \
    --name nova \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
    harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。


## 预训练权重
|  模型名称  | 权重大小 | 数据类型 |支持的DCU型号  | 最低卡数需求 |         下载地址          |
|:------:|:----:|:----:|:----------:|:------:|:---------------------:|
| SenseNova-SI-1.1-InternVL3-8B	 | 8B | BF16 | BW1000 |   1   | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-InternVL3-8B) |
| SenseNova-SI-1.1-BAGEL-7B-MoT	 | 8B | BF16 | BW1000 |   1   | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-BAGEL-7B-MoT) |

## 数据集
暂无

## 训练
暂无

## 推理
### Transformers
#### 单机推理
##### Example for BAGEL generation
```
cd sensenova-si
python example_bagel.py \
  --model_path sensenova/SenseNova-SI-1.1-BAGEL-7B-MoT \
  --prompt "A chubby cat made of 3D point clouds, stretching its body, translucent with a soft glow." \
  --mode generate
```
##### Example 1
```
python example.py \
  --image_paths examples/Q1_1.png \
  --question "Question: Consider the real-world 3D locations of the objects. Which is closer to the sink, the toilet paper or the towel?\nOptions: \nA. toilet paper\nB. towel\nGive me the answer letter directly. The best answer is:" \
  --model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
##### Example 2
```
python example.py \
  --image_paths examples/Q2_1.png examples/Q2_2.png \
  --question "If the landscape painting is on the east side of the bedroom, where is the window located in the bedroom?\nOptions: A. North side, B. South side, C. West side, D. East side\nAnswer with the option's letter from the given choices directly. Enclose the option's letter within ``." \
  --model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
##### Example 3
```
python example.py \
  --image_paths examples/Q3_1.png examples/Q3_2.png examples/Q3_3.png \
  --question "The robot is making tea. What is the order in which the pictures were taken?" \
  --model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
##### Example 4
python example.py \
  --image_paths examples/Q4.png \
  --question "Please provide the bounding box coordinate of the region this sentence describes: <ref>blue shirt lady</ref>" \
  --model_path sensenova/SenseNova-SI-1.4-InternVL3-8B


## 效果展示
<div align=center>
    <img src="./doc/2.jpg"/>
</div>

<div align=center>
    <img src="./doc/3.png"/>
</div>

<div align=center>
    <img src="./doc/4.png"/>
</div>


<div align=center>
    <img src="./doc/5.png"/>
</div>

### 精度
DCU与GPU精度一致，推理框架：pytorch。

## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/sensenova-si

## 参考资料
- https://github.com/OpenSenseNova/SenseNova-SI