README.md

# Ovis-Image
## 论文
[Ovis-Image](https://arxiv.org/abs/2511.22982)

## 模型简介
基于 Ovis-U1，Ovis-Image 是一个 7B 的文本到图像模型，专门针对高质量文本渲染进行了优化，设计用于在严格的计算约束下高效运行。

**紧凑的 7B 规模下的强大文本渲染能力：** Ovis-Image 是一个 7B 的文本到图像模型，其文本渲染质量可与更大的 20B 级系统（如 Qwen-Image）相媲美，并且在以文本为中心的场景中与领先的闭源模型（如 GPT4o）竞争，同时保持足够小的规模，可以在广泛可用的硬件上运行。
**对文本密集型、布局敏感的提示具有高保真度：** 该模型在需要语言内容和渲染排版紧密对齐的提示（例如海报、横幅、标志、UI 模拟图、信息图表）上表现出色，能够在多种字体、大小和宽高比下生成可读、拼写正确且语义一致的文本，而不会影响整体视觉质量。
**效率和可部署性：** 凭借 7B 参数预算和简化的架构，Ovis-Image 可以在单个高端 GPU 上运行，支持低延迟的交互使用，并扩展到批量生产服务，将接近前沿的文本渲染能力带到数十亿参数模型不切实际的应用中。

<div align=center>
    <img src="./doc/model.png"/>
</div>


## 环境依赖
```bsah
# 额外安装diffusers环境
pip install git+https://github.com/huggingface/diffusers
```


| 软件 | 版本 |
| :------: | :------: |
| DTK | 25.04.2 |
| python | 3.10.12 |
| transformers | >=4.57.1 |
| vllm |  0.9.2+das.opt1.dtk25042 |
| torch | 2.5.1+das.opt1.dtk25042 |
| triton | 3.1+das.opt1.3c5d12d.dtk25041 |
| flash_attn | 2.6.1+das.opt1.dtk2504 |
| flash_mla | 1.0.0+das.opt1.dtk25042 |

推荐使用镜像:
- 挂载地址`-v`根据实际模型情况修改

```bash
docker run -it --shm-size 60g --network=host --name ovis-image --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro -v /path/your_code_path/:/path/your_code_path/ image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10 bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。

## 数据集
暂无

## 训练
暂无

## 推理

### pytorch
#### 单机推理

```python 
# 可参考run.sh脚本
import torch
from diffusers import OvisImagePipeline

pipe = OvisImagePipeline.from_pretrained("/home/dengjb/download/AIDC-AI/Ovis-Image-7B/", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail."
image = pipe(prompt, negative_prompt="", num_inference_steps=50, guidance_scale=5.0).images[0]
image.save("ovis_image.png")
```

## 效果展示
<div align=center>
    <img src="./doc/example.png"/>
</div>

### 精度
DCU与GPU精度一致，推理框架：pytorch。

## 预训练权重
| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| Ovis-Image-7B | 7B | BW1000 | 1 | [modelscope](https://modelscope.cn/models/AIDC-AI/Ovis-Image-7B) |

## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/modelzoo/ovis-image_pytorch

## 参考资料
- https://github.com/AIDC-AI/Ovis-Image/tree/main