# SenseNova-SI ## 论文 [SenseNova-SI](https://arxiv.org/abs/2511.13719) ## 模型简介 SenseNova-SI 是开源多模态空间智能模型系列,旨在补齐传统多模态模型在三维空间感知与几何推理上的不足,该模型基于 InternVL3、Qwen3-VL、BAGEL 三大基座打造,拥有 2B、8B 等主流参数量版本,其中 1.3 系列综合空间能力最优,多项基准达成同规模开源模型 SOTA,1.4 版本强化目标定位与深度估计,1.5 版本擅长立体几何解答;它可胜任方位判断、三维解析等各类空间任务,整体性能领先同量级开源模型,部分能力比肩主流闭源模型,且全系开源,支持单图与多图输入,并配套完整的推理和微调方案。
## 环境依赖 | 软件 | 版本 | | :------: |:-----------------------------------------:| | DTK | 26.04 | | Python | 3.11.9 | | Transformers | 4.57.1 | | Torch | 2.5.1+das.opt1.dtk2604 | | Flash_attn | 2.8.3+das.opt1.dtk2604.torch251 | 推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova ```bash docker run -it \ --shm-size 256g \ --network=host \ --name nova \ --privileged \ --device=/dev/kfd \ --device=/dev/dri \ --device=/dev/mkfd \ --group-add video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -u root \ -v /opt/hyhal/:/opt/hyhal/:ro \ -v /path/your_code_data/:/path/your_code_data/ \ harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova bash ``` 更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。 关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。 ## 预训练权重 | 模型名称 | 权重大小 | 数据类型 |支持的DCU型号 | 最低卡数需求 | 下载地址 | |:------:|:----:|:----:|:----------:|:------:|:---------------------:| | SenseNova-SI-1.4-InternVL3-8B | 8B | BF16 | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-InternVL3-8B) | | SenseNova-SI-1.1-BAGEL-7B-MoT | 8B | BF16 | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-BAGEL-7B-MoT) | ## 数据集 暂无 ## 训练 暂无 ## 推理 ### Transformers #### 单机推理 ##### Example for BAGEL generation ``` cd sensenova-si python example_bagel.py \ --model_path sensenova/SenseNova-SI-1.1-BAGEL-7B-MoT \ --prompt "A chubby cat made of 3D point clouds, stretching its body, translucent with a soft glow." \ --mode generate ``` ##### Example 1 ``` python example.py \ --image_paths examples/Q1_1.png \ --question "Question: Consider the real-world 3D locations of the objects. Which is closer to the sink, the toilet paper or the towel?\nOptions: \nA. toilet paper\nB. towel\nGive me the answer letter directly. The best answer is:" \ --model_path sensenova/SenseNova-SI-1.5-InternVL3-8B ``` ##### Example 2 ``` python example.py \ --image_paths examples/Q2_1.png examples/Q2_2.png \ --question "If the landscape painting is on the east side of the bedroom, where is the window located in the bedroom?\nOptions: A. North side, B. South side, C. West side, D. East side\nAnswer with the option's letter from the given choices directly. Enclose the option's letter within ``." \ --model_path sensenova/SenseNova-SI-1.5-InternVL3-8B ``` ##### Example 3 ``` python example.py \ --image_paths examples/Q3_1.png examples/Q3_2.png examples/Q3_3.png \ --question "The robot is making tea. What is the order in which the pictures were taken?" \ --model_path sensenova/SenseNova-SI-1.3-InternVL3-8B ``` ##### Example 4 ``` python example.py \ --image_paths examples/Q4.png \ --question "Please provide the bounding box coordinate of the region this sentence describes: blue shirt lady" \ --model_path sensenova/SenseNova-SI-1.4-InternVL3-8B ``` ## 效果展示
### 精度 DCU与GPU精度一致,推理框架:pytorch。 ## 源码仓库及问题反馈 - https://developer.sourcefind.cn/codes/modelzoo/sensenova-si ## 参考资料 - https://github.com/OpenSenseNova/SenseNova-SI