Commit 876a36a4 authored by raojy's avatar raojy
Browse files

first

parent eda2afb8
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# SenseNova-SI # SenseNova-SI
## 论文
[SenseNova-SI](https://arxiv.org/abs/2511.13719)
## 模型简介
SenseNova-SI 是开源多模态空间智能模型系列,旨在补齐传统多模态模型在三维空间感知与几何推理上的不足,该模型基于 InternVL3、Qwen3-VL、BAGEL 三大基座打造,拥有 2B、8B 等主流参数量版本,其中 1.3 系列综合空间能力最优,多项基准达成同规模开源模型 SOTA,1.4 版本强化目标定位与深度估计,1.5 版本擅长立体几何解答;它可胜任方位判断、三维解析等各类空间任务,整体性能领先同量级开源模型,部分能力比肩主流闭源模型,且全系开源,支持单图与多图输入,并配套完整的推理和微调方案。
<div align=center>
<img src="./doc/1.png"/>
</div>
## 环境依赖
| 软件 | 版本 |
| :------: |:-----------------------------------------:|
| DTK | 26.04 |
| Python | 3.11.9 |
| Transformers | 4.57.1 |
| Torch | 2.5.1+das.opt1.dtk2604 |
| Flash_attn | 2.8.3+das.opt1.dtk2604.torch251 |
推荐使用镜像: harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova
```bash
docker run -it \
--shm-size 256g \
--network=host \
--name nova \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mkfd \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-u root \
-v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm011-ubuntu22.04-dtk26.04-SenseNova bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
## 预训练权重
| 模型名称 | 权重大小 | 数据类型 |支持的DCU型号 | 最低卡数需求 | 下载地址 |
|:------:|:----:|:----:|:----------:|:------:|:---------------------:|
| SenseNova-SI-1.1-InternVL3-8B | 8B | BF16 | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-InternVL3-8B) |
| SenseNova-SI-1.1-BAGEL-7B-MoT | 8B | BF16 | BW1000 | 1 | [Modelscope](https://modelscope.cn/models/SenseNova/SenseNova-SI-1.1-BAGEL-7B-MoT) |
## 数据集
暂无
## 训练
暂无
## 推理
### Transformers
#### 单机推理
##### Example for BAGEL generation
```
cd sensenova-si
python example_bagel.py \
--model_path sensenova/SenseNova-SI-1.1-BAGEL-7B-MoT \
--prompt "A chubby cat made of 3D point clouds, stretching its body, translucent with a soft glow." \
--mode generate
```
##### Example 1
```
python example.py \
--image_paths examples/Q1_1.png \
--question "Question: Consider the real-world 3D locations of the objects. Which is closer to the sink, the toilet paper or the towel?\nOptions: \nA. toilet paper\nB. towel\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
##### Example 2
```
python example.py \
--image_paths examples/Q2_1.png examples/Q2_2.png \
--question "If the landscape painting is on the east side of the bedroom, where is the window located in the bedroom?\nOptions: A. North side, B. South side, C. West side, D. East side\nAnswer with the option's letter from the given choices directly. Enclose the option's letter within ``." \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
##### Example 3
```
python example.py \
--image_paths examples/Q3_1.png examples/Q3_2.png examples/Q3_3.png \
--question "The robot is making tea. What is the order in which the pictures were taken?" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
##### Example 4
python example.py \
--image_paths examples/Q4.png \
--question "Please provide the bounding box coordinate of the region this sentence describes: <ref>blue shirt lady</ref>" \
--model_path sensenova/SenseNova-SI-1.4-InternVL3-8B
## 效果展示
<div align=center>
<img src="./doc/2.jpg"/>
</div>
<div align=center>
<img src="./doc/3.png"/>
</div>
<div align=center>
<img src="./doc/4.png"/>
</div>
<div align=center>
<img src="./doc/5.png"/>
</div>
### 精度
DCU与GPU精度一致,推理框架:pytorch。
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/sensenova-si
## 参考资料
- https://github.com/OpenSenseNova/SenseNova-SI
# File created using '.gitignore Generator' for Visual Studio Code: https://bit.ly/vscode-gig
# Created by https://www.toptal.com/developers/gitignore/api/visualstudiocode,linux,python
# Edit at https://www.toptal.com/developers/gitignore?templates=visualstudiocode,linux,python
### Linux ###
*~
# temporary files which can be created if a process still has a handle open of a deleted file
.fuse_hidden*
# KDE directory preferences
.directory
# Linux trash folder which might appear on any partition or disk
.Trash-*
# .nfs files are created when an open file is removed but is still being accessed
.nfs*
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
### Python Patch ###
# Poetry local configuration file - https://python-poetry.org/docs/configuration/#local-configuration
poetry.toml
# ruff
.ruff_cache/
# LSP config files
pyrightconfig.json
### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets
# Local History for Visual Studio Code
.history/
# Built Visual Studio Code Extensions
*.vsix
### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide
# End of https://www.toptal.com/developers/gitignore/api/visualstudiocode,linux,python
# Custom rules (everything added below won't be overriden by 'Generate .gitignore File' if you use 'Update' option)
### Examples ###
examples/*.jsonl
examples/*.png
### Training data and results ###
training/pretrained_models/
training/data/
training/results/
\ No newline at end of file
[submodule "training/lmms-engine"]
path = training/lmms-engine
url = https://github.com/EvolvingLMMs-Lab/lmms-engine
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.14.4
hooks:
# Run the linter.
- id: ruff-check
args: [ --fix, --select, I ]
# Run the formatter.
- id: ruff-format
\ No newline at end of file
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This diff is collapsed.
This diff is collapsed.
default_generation_config:
do_sample: False
max_new_tokens: 8192
top_p: 1.0
temperature: 0.0
repetition_penalty: 1
num_beams: 1
\ No newline at end of file
# More Examples
This document lists more examples beyond those in the main [README](../../README.md). To run all of them in one go, use [examples/examples.jsonl](../../examples/examples.jsonl) with the `--jsonl_path` option (see the README section [Test Multiple Questions in a Single Run](../../README.md#test-multiple-questions-in-a-single-run)).
---
#### Example 8
This example is from [MindCube](https://github.com/mll-lab-nu/MindCube):
```bash
python example.py \
--image_paths examples/Q8_1.jpg examples/Q8_2.jpg examples/Q8_3.jpg examples/Q8_4.jpg \
--question "Based on these four images (image 1, 2, 3, and 4) showing the pink bottle from different viewpoints (front, left, back, and right), with each camera aligned with room walls and partially capturing the surroundings: From the viewpoint presented in image 4, what is to the left of the pink bottle?\nOptions: A. Pink plush toy and headboard B. Window and blue curtain C. Closet and door D. White wall\nAnswer with the option's letter from the given choices directly." \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>Details of Example 8</strong></summary>
<p><strong>Q: </strong>Based on these four images (image 1, 2, 3, and 4) showing the pink bottle from different viewpoints (front, left, back, and right), with each camera aligned with room walls and partially capturing the surroundings: From the viewpoint presented in image 4, what is to the left of the pink bottle?\nOptions: A. Pink plush toy and headboard B. Window and blue curtain C. Closet and door D. White wall\nAnswer with the option's letter from the given choices directly.</p>
<table>
<tr>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_1.jpg" alt="Image 1" width="100%">
</td>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_2.jpg" alt="Image 2" width="100%">
</td>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_3.jpg" alt="Image 3" width="100%">
</td>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_4.jpg" alt="Image 4" width="100%">
</td>
</tr>
</table>
<p><strong>GT: C</strong></p>
</details>
---
#### Example 9
This example is from [SITE-Bench](https://github.com/wenqi-wang20/SITE-Bench):
```bash
python example.py \
--image_paths examples/Q9.jpg \
--question "Question: Consider the real-world 3D locations and orientations of the objects. Which side of the bus in the center is facing the bus stop?\nOptions: \nA. front\nB. left\nC. back\nD. right\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>Details of Example 9</strong></summary>
<p><strong>Q: </strong>Question: Consider the real-world 3D locations and orientations of the objects. Which side of the bus in the center is facing the bus stop?\nOptions: \nA. front\nB. left\nC. back\nD. right\nGive me the answer letter directly. The best answer is:</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q9.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: D</strong></p>
</details>
---
#### Example 10
This example is from [SITE-Bench](https://github.com/wenqi-wang20/SITE-Bench):
```bash
python example.py \
--image_paths examples/Q10.jpg \
--question "Question: Consider the real-world 3D orientations of the objects. Are the arrow on street sign and the taxi facing same or similar directions, or very different directions?\nOptions: \nA. same or similar directions\nB. very different directions\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>Details of Example 10</strong></summary>
<p><strong>Q: </strong>Question: Consider the real-world 3D orientations of the objects. Are the arrow on street sign and the taxi facing same or similar directions, or very different directions? Options: A. same or similar directions, B. very different directions. Give me the answer letter directly. The best answer is:</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q10.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: A</strong></p>
</details>
---
#### Example 11
This example is from [SITE-Bench](https://github.com/wenqi-wang20/SITE-Bench):
```bash
python example.py \
--image_paths examples/Q11.jpg \
--question "Question: What shape are all the men standing in?\nOptions: A. circle B. rectangle C. triangle D. square\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>Details of Example 11</strong></summary>
<p><strong>Q: </strong>Question: What shape are all the men standing in?\nOptions: A. circle B. rectangle C. triangle D. square\nGive me the answer letter directly. The best answer is:</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q11.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: A</strong></p>
</details>
---
#### Example 12
This example is from [ViewSpatial-Bench](https://github.com/ZJU-REAL/ViewSpatial-Bench):
```bash
python example.py \
--image_paths examples/Q12.jpg \
--question "From the perspective of this man who doesn't wear glasses, where is the man wearing glasses located beside him?\nOptions: A. left B. back-right C. front D. right\nAnswer with the option's letter from the given choices directly." \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>Details of Example 12</strong></summary>
<p><strong>Q: </strong>From the perspective of this man who doesn't wear glasses, where is the man wearing glasses located beside him? Options: A. left, B. back-right, C. front, D. right. Answer with the option's letter from the given choices directly.</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q12.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: A</strong></p>
</details>
---
#### Example 13
This example is from [MMSI-Bench](https://github.com/InternRobotics/MMSI-Bench) and test the model's capability in open-ended short-answer questions:
```bash
python example.py \
--image_paths examples/Q13_1.png examples/Q13_2.png \
--question "The iMac is in the northern part of the room. In which direction is the area where students do their homework?" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>Details of Example 13</strong></summary>
<p><strong>Q: </strong>The iMac is in the northern part of the room. In which direction is the area where students do their homework?</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q13_1.png" alt="First image" width="100%">
</td>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q13_2.png" alt="Second image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: Northwest corner</strong></p>
</details>
---
#### Example 14
This example is from [MMSI-Bench](https://github.com/InternRobotics/MMSI-Bench) and test the model's capability in open-ended short-answer questions:
```bash
python example.py \
--image_paths examples/Q14_1.png examples/Q14_2.png \
--question "How many building models are captured in total in these two pictures?" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>Details of Example 14</strong></summary>
<p><strong>Q: </strong>How many building models are captured in total in these two pictures?</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q14_1.png" alt="First image" width="100%">
</td>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q14_2.png" alt="Second image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: 4</strong></p>
</details>
---
#### Example 15
This example demonstrates the model's capability in **solid geometry(Three views)**:
```bash
python example.py \
--image_paths examples/Q15.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 5 -->
<details open>
<summary><strong>Details of Example 15</strong></summary>
<p><strong>Q:</strong> Enclose your thinking process in &lt;think> &lt;/think> tags and your final answer in &lt;answer> &lt;/answer></p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q15.png" alt="First image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: B</strong></p>
</details>
---
#### Example 16
This example demonstrates the model's capability in **solid geometry(Three views)**:
```bash
python example.py \
--image_paths examples/Q16.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 6 -->
<details open>
<summary><strong>Details of Example 16</strong></summary>
<p><strong>Q:</strong> Enclose your thinking process in &lt;think> &lt;/think> tags and your final answer in &lt;answer> &lt;/answer></p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q16.png" alt="First image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: C</strong></p>
</details>
---
#### Example 17
This example demonstrates the model's capability in **solid geometry(3D graphic reasoning)**:
```bash
python example.py \
--image_paths examples/Q17.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 7 -->
<details open>
<summary><strong>Details of Example 17</strong></summary>
<p><strong>Q:</strong> Enclose your thinking process in &lt;think> &lt;/think> tags and your final answer in &lt;answer> &lt;/answer></p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q17.png" alt="First image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: C</strong></p>
</details>
---
#### Example 18
This example demonstrates the model's capability in **solid geometry(Three views)**:
```bash
python example.py \
--image_paths examples/Q18.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 3 -->
<details open>
<summary><strong>Details of Example 18</strong></summary>
<p><strong>Q:</strong> Enclose your thinking process in &lt;think> &lt;/think> tags and your final answer in &lt;answer> &lt;/answer></p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q18.png" alt="First image" width="100%">
</td>
</tr>
</table>
<p><strong>GT: A</strong></p>
</details>
\ No newline at end of file
# 更多示例
本文档展示了 [README](../../README_CN.md) 之外的更多示例。若需一次性运行全部示例,可使用 [examples/examples.jsonl](../../examples/examples.jsonl) 并配合 `--jsonl_path` 参数(参见 README 中[「一次测试多个问题」](../../README_CN.md#一次测试多个问题)小节)。
---
#### 示例8
该例题源自 [MindCube](https://github.com/mll-lab-nu/MindCube)
```bash
python example.py \
--image_paths examples/Q8_1.jpg examples/Q8_2.jpg examples/Q8_3.jpg examples/Q8_4.jpg \
--question "Based on these four images (image 1, 2, 3, and 4) showing the pink bottle from different viewpoints (front, left, back, and right), with each camera aligned with room walls and partially capturing the surroundings: From the viewpoint presented in image 4, what is to the left of the pink bottle?\nOptions: A. Pink plush toy and headboard B. Window and blue curtain C. Closet and door D. White wall\nAnswer with the option's letter from the given choices directly." \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>示例8详情</strong></summary>
<p><strong>Q: </strong>Based on these four images (image 1, 2, 3, and 4) showing the pink bottle from different viewpoints (front, left, back, and right), with each camera aligned with room walls and partially capturing the surroundings: From the viewpoint presented in image 4, what is to the left of the pink bottle?\nOptions: A. Pink plush toy and headboard B. Window and blue curtain C. Closet and door D. White wall\nAnswer with the option's letter from the given choices directly.</p>
<table>
<tr>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_1.jpg" alt="Image 1" width="100%">
</td>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_2.jpg" alt="Image 2" width="100%">
</td>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_3.jpg" alt="Image 3" width="100%">
</td>
<td align="center" width="25%" style="padding:4px;">
<img src="../../examples/Q8_4.jpg" alt="Image 4" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案: C</strong></p>
</details>
---
#### 示例9
该例题源自 [SITE-Bench](https://github.com/wenqi-wang20/SITE-Bench)
```bash
python example.py \
--image_paths examples/Q9.jpg \
--question "Question: Consider the real-world 3D locations and orientations of the objects. Which side of the bus in the center is facing the bus stop?\nOptions: \nA. front\nB. left\nC. back\nD. right\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>示例9详情</strong></summary>
<p><strong>Q: </strong>Question: Consider the real-world 3D locations and orientations of the objects. Which side of the bus in the center is facing the bus stop?\nOptions: \nA. front\nB. left\nC. back\nD. right\nGive me the answer letter directly. The best answer is:</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q9.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案: D</strong></p>
</details>
---
#### 示例10
该例题源自 [SITE-Bench](https://github.com/wenqi-wang20/SITE-Bench):
```bash
python example.py \
--image_paths examples/Q10.jpg \
--question "Question: Consider the real-world 3D orientations of the objects. Are the arrow on street sign and the taxi facing same or similar directions, or very different directions?\nOptions: \nA. same or similar directions\nB. very different directions\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>示例10详情</strong></summary>
<p><strong>Q: </strong>Question: Consider the real-world 3D orientations of the objects. Are the arrow on street sign and the taxi facing same or similar directions, or very different directions? Options: A. same or similar directions, B. very different directions. Give me the answer letter directly. The best answer is:</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q10.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案: A</strong></p>
</details>
---
#### 示例11
该例题源自 [SITE-Bench](https://github.com/wenqi-wang20/SITE-Bench):
```bash
python example.py \
--image_paths examples/Q11.jpg \
--question "Question: What shape are all the men standing in?\nOptions: A. circle B. rectangle C. triangle D. square\nGive me the answer letter directly. The best answer is:" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>示例11详情</strong></summary>
<p><strong>Q: </strong>Question: What shape are all the men standing in?\nOptions: A. circle B. rectangle C. triangle D. square\nGive me the answer letter directly. The best answer is:</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q11.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案: A</strong></p>
</details>
---
#### 示例12
该例题源自 [ViewSpatial-Bench](https://github.com/ZJU-REAL/ViewSpatial-Bench)
```bash
python example.py \
--image_paths examples/Q12.jpg \
--question "From the perspective of this man who doesn't wear glasses, where is the man wearing glasses located beside him?\nOptions: A. left B. back-right C. front D. right\nAnswer with the option's letter from the given choices directly." \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>示例12详情</strong></summary>
<p><strong>Q: </strong>From the perspective of this man who doesn't wear glasses, where is the man wearing glasses located beside him? Options: A. left, B. back-right, C. front, D. right. Answer with the option's letter from the given choices directly.</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q12.jpg" alt="Image" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案: A</strong></p>
</details>
---
#### 示例13
该例题源自 [MMSI-Bench](https://github.com/InternRobotics/MMSI-Bench),测试模型在开放式简答题上的能力:
```bash
python example.py \
--image_paths examples/Q13_1.png examples/Q13_2.png \
--question "The iMac is in the northern part of the room. In which direction is the area where students do their homework?" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>示例13详情</strong></summary>
<p><strong>Q: </strong>The iMac is in the northern part of the room. In which direction is the area where students do their homework?</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q13_1.png" alt="First image" width="100%">
</td>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q13_2.png" alt="Second image" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案: Northwest corner</strong></p>
</details>
---
#### 示例14
该例题源自 [MMSI-Bench](https://github.com/InternRobotics/MMSI-Bench),测试模型在开放式简答题上的能力:
```bash
python example.py \
--image_paths examples/Q14_1.png examples/Q14_2.png \
--question "How many building models are captured in total in these two pictures?" \
--model_path sensenova/SenseNova-SI-1.3-InternVL3-8B
```
<details open>
<summary><strong>示例14详情</strong></summary>
<p><strong>Q: </strong>How many building models are captured in total in these two pictures?</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q14_1.png" alt="First image" width="100%">
</td>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q14_2.png" alt="Second image" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案: 4</strong></p>
</details>
---
#### 示例 15
此示例展示模型的 **立体几何(三视图)** 能力:
```bash
python example.py \
--image_paths examples/Q15.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 15 -->
<details open>
<summary><strong>示例 15 详情</strong></summary>
<p><strong>问题:</strong>请将你的思考过程放在&lt;think> &lt;/think>标签内,并将你的最终答案放在&lt;answer> &lt;/answer>标签内。</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q15.png" alt="第一张图片" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案:B</strong></p>
</details>
---
#### 示例 16
此示例展示模型的 **立体几何(三视图)** 能力:
```bash
python example.py \
--image_paths examples/Q16.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 6 -->
<details open>
<summary><strong>示例 16 详情</strong></summary>
<p><strong>问题:</strong>请将你的思考过程放在&lt;think> &lt;/think>标签内,并将你的最终答案放在&lt;answer> &lt;/answer>标签内。</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q16.png" alt="第一张图片" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案:C</strong></p>
</details>
---
#### 示例 17
此示例展示模型的 **立体几何(3D图形推理)** 能力:
```bash
python example.py \
--image_paths examples/Q17.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 7 -->
<details open>
<summary><strong>示例 17 详情</strong></summary>
<p><strong>问题:</strong>请将你的思考过程放在&lt;think> &lt;/think>标签内,并将你的最终答案放在&lt;answer> &lt;/answer>标签内。</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q17.png" alt="第一张图片" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案:C</strong></p>
</details>
---
#### 示例 18
此示例展示模型的 **立体几何(三视图)** 能力:
```bash
python example.py \
--image_paths examples/Q18.png \
--question "请将你的思考过程放在<think></think>标签内,并将你的最终答案放在<answer></answer>标签内。" \
--model_path sensenova/SenseNova-SI-1.5-InternVL3-8B
```
<!-- Example 8 -->
<details open>
<summary><strong>示例 18 详情</strong></summary>
<p><strong>问题:</strong>请将你的思考过程放在&lt;think> &lt;/think>标签内,并将你的最终答案放在&lt;answer> &lt;/answer>标签内。</p>
<table>
<tr>
<td align="center" width="50%" style="padding:4px;">
<img src="../../examples/Q18.png" alt="第一张图片" width="100%">
</td>
</tr>
</table>
<p><strong>正确答案:A</strong></p>
</details>
\ No newline at end of file
import argparse
import json
import torch
from sensenova_si import get_model
def set_seed(seed=42):
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
if __name__ == "__main__":
set_seed()
parser = argparse.ArgumentParser(
description="Examples for SenseNova-SI single-run MCQ"
)
parser.add_argument(
"--model_path",
type=str,
default="sensenova/SenseNova-SI-1.3-InternVL3-8B",
help="Model path",
)
parser.add_argument(
"--image_paths",
type=str,
nargs="+",
default=[],
help="Path to image files, can specify multiple",
)
parser.add_argument(
"--question",
type=str,
default="Please describe the image in detail.",
help="Question to ask the model",
)
parser.add_argument(
"--jsonl_path",
type=str,
default=None,
help="Path to jsonl file containing examples",
)
parser.add_argument(
"--model_type",
type=str,
default="auto",
choices=["qwen", "internvl", "auto"],
help="Model type",
)
args = parser.parse_args()
model_path = args.model_path
print(f"Model path: {model_path}")
model = get_model(model_path, model_type=args.model_type)
if args.jsonl_path:
with open(args.jsonl_path, "r") as f:
for line in f:
entry = json.loads(line.strip())
image_paths = entry.get("image", [])
conversations = entry.get("conversations", [])
if conversations:
question = conversations[0].get("value", "")
else:
question = ""
id_ = entry.get("id", "")
gt = entry.get("GT", "")
if not image_paths or not question:
print(f"Skipping invalid entry id {id_}")
continue
print(f"Processing question id: {id_}")
response = model.generate(question, images=image_paths)
print(f"User: {question}")
print(f"Assistant: {response}")
print(f"Ground Truth: {gt}")
print("-" * 50)
else:
question = args.question
response = model.generate(question, images=args.image_paths)
print(f"User: {question}")
print(f"Assistant: {response}")
import argparse
import torch
from sensenova_si import SenseNovaSIBagelModel
def set_seed(seed=42):
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
def main():
parser = argparse.ArgumentParser(
description="BAGEL image generation example - generate image from text prompt"
)
parser.add_argument(
"--model_path",
type=str,
default="sensenova/SenseNova-SI-1.1-BAGEL-7B-MoT",
help="BAGEL model path",
)
parser.add_argument(
"--prompt",
type=str,
default="A chubby cat made of 3D point clouds, stretching its body, translucent with a soft glow.",
help="Text prompt used to generate an image",
)
parser.add_argument(
"--mode",
type=str,
default="generate",
choices=["generate", "think_generate"],
help="BAGEL mode: generate or think_generate",
)
parser.add_argument(
"--out_img_dir",
type=str,
default="./output_images/test_bagel/",
help="Directory to save generated images",
)
parser.add_argument(
"--dtype",
type=str,
default="bf16",
choices=["bf16"],
help="Model precision type",
)
args = parser.parse_args()
# Set random seed for reproducibility
set_seed()
print(f"Model path: {args.model_path}")
print(f"Mode: {args.mode}")
print(f"Prompt: {args.prompt}")
print("-" * 50)
# Initialize BAGEL model with generate mode
print("Loading model...")
model = SenseNovaSIBagelModel(
model_path=args.model_path,
mode=args.mode,
out_img_dir=args.out_img_dir,
dtype=args.dtype,
)
print("Generating image...")
# Call generate with the prompt; images not needed for generate mode
generated_image_path = model.generate(question=args.prompt, images=None)
print("-" * 50)
print("Done!")
print(f"Image saved to: {generated_image_path}")
if __name__ == "__main__":
main()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment