README.md 4.63 KB
Newer Older
raojy's avatar
raojy committed
1
# SenseNova-SI
raojy's avatar
first  
raojy committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
## 论文
[SenseNova-SI](https://arxiv.org/abs/2511.13719)

## 模型简介
SenseNova-SI 是开源多模态空间智能模型系列,旨在补齐传统多模态模型在三维空间感知与几何推理上的不足,该模型基于 InternVL3、Qwen3-VL、BAGEL 三大基座打造,拥有 2B、8B 等主流参数量版本,其中 1.3 系列综合空间能力最优,多项基准达成同规模开源模型 SOTA,1.4 版本强化目标定位与深度估计,1.5 版本擅长立体几何解答;它可胜任方位判断、三维解析等各类空间任务,整体性能领先同量级开源模型,部分能力比肩主流闭源模型,且全系开源,支持单图与多图输入,并配套完整的推理和微调方案。



<div align=center>
    <img src="./doc/1.png"/>
</div>

## 环境依赖
| 软件 |                    版本                     |
| :------: |:-----------------------------------------:|
| DTK |                   26.04                   |
| Python |                  3.11.9                  |
| Transformers |            4.57.1               |
| Torch |   2.5.1+das.opt1.dtk2604   |
| Flash_attn |   2.8.3+das.opt1.dtk2604.torch251   |

推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova

```bash
docker run -it \
    --shm-size 256g \
    --network=host \
    --name nova \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
    harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。


## 预训练权重
|  模型名称  | 权重大小 | 数据类型 |支持的DCU型号  | 最低卡数需求 |         下载地址          |
|:------:|:----:|:----:|:----------:|:------:|:---------------------:|
raojy's avatar
raojy committed
50
51
| SenseNova-SI-1.4-InternVL3-8B	 | 8B | BF16 | BW1000 |   1   | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.4-InternVL3-8B) |
| SenseNova-SI-1.1-BAGEL-7B-MoT	 | 7B | BF16 | BW1000 |   1   | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-BAGEL-7B-MoT) |
raojy's avatar
first  
raojy committed
52
53
54
55
56
57
58
59
60
61

## 数据集
暂无

## 训练
暂无

## 推理
### Transformers
#### 单机推理
raojy's avatar
raojy committed
62
##### BAGEL 模型图像生成示例
raojy's avatar
first  
raojy committed
63
```
raojy's avatar
raojy committed
64
cd SenseNova-SI
raojy's avatar
first  
raojy committed
65
66
67
68
69
python example_bagel.py \
  --model_path sensenova/SenseNova-SI-1.1-BAGEL-7B-MoT \
  --prompt "A chubby cat made of 3D point clouds, stretching its body, translucent with a soft glow." \
  --mode generate
```
raojy's avatar
raojy committed
70
##### 三维空间位置推理示例
raojy's avatar
first  
raojy committed
71
72
73
74
```
python example.py \
  --image_paths examples/Q1_1.png \
  --question "Question: Consider the real-world 3D locations of the objects. Which is closer to the sink, the toilet paper or the towel?\nOptions: \nA. toilet paper\nB. towel\nGive me the answer letter directly. The best answer is:" \
raojy's avatar
raojy committed
75
  --model_path sensenova/SenseNova-SI-1.4-InternVL3-8B
raojy's avatar
first  
raojy committed
76
```
raojy's avatar
raojy committed
77
##### 空间方位逻辑推理示例
raojy's avatar
first  
raojy committed
78
79
80
81
```
python example.py \
  --image_paths examples/Q2_1.png examples/Q2_2.png \
  --question "If the landscape painting is on the east side of the bedroom, where is the window located in the bedroom?\nOptions: A. North side, B. South side, C. West side, D. East side\nAnswer with the option's letter from the given choices directly. Enclose the option's letter within ``." \
raojy's avatar
raojy committed
82
  --model_path sensenova/SenseNova-SI-1.4-InternVL3-8B
raojy's avatar
first  
raojy committed
83
```
raojy's avatar
raojy committed
84
##### 画面时序排序问答示例
raojy's avatar
first  
raojy committed
85
86
87
88
```
python example.py \
  --image_paths examples/Q3_1.png examples/Q3_2.png examples/Q3_3.png \
  --question "The robot is making tea. What is the order in which the pictures were taken?" \
raojy's avatar
raojy committed
89
  --model_path sensenova/SenseNova-SI-1.4-InternVL3-8B
raojy's avatar
first  
raojy committed
90
```
raojy's avatar
raojy committed
91
##### 视觉目标定位示例
raojy's avatar
raojy committed
92
```
raojy's avatar
first  
raojy committed
93
94
95
96
python example.py \
  --image_paths examples/Q4.png \
  --question "Please provide the bounding box coordinate of the region this sentence describes: <ref>blue shirt lady</ref>" \
  --model_path sensenova/SenseNova-SI-1.4-InternVL3-8B
raojy's avatar
raojy committed
97
```
raojy's avatar
first  
raojy committed
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

## 效果展示
<div align=center>
    <img src="./doc/2.jpg"/>
</div>

<div align=center>
    <img src="./doc/3.png"/>
</div>

<div align=center>
    <img src="./doc/4.png"/>
</div>


<div align=center>
    <img src="./doc/5.png"/>
</div>

### 精度
DCU与GPU精度一致,推理框架:pytorch。

## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/sensenova-si

## 参考资料
- https://github.com/OpenSenseNova/SenseNova-SI

raojy's avatar
raojy committed
126