update

1034e764 · dengjb · 1034e764 · 1034e764 · 1034e764 · 1034e764
Commit 1034e764 authored Dec 04, 2025 by dengjb
8 changed files
--- a/LICENSE
+++ b/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# OLMo-3
+## 论文
+暂无
+## 模型简介
+我们推出了 Olmo 3，这是一个新的 7B 和 32B 模型系列，包括 Instruct 和 Think 变体。长链思维可以改进数学和编码等推理任务。
+Olmo 是一系列开源语言模型，旨在推动语言模型的科学研究。 这些模型在 Dolma 3 数据集上进行了预训练，并在 Dolci 数据集上进行了后训练。我们将发布所有代码、检查点、日志（即将推出）以及相关的训练细节。
+本次发布的核心模型包括以下内容：
+**OLMo 3 Base：** 在 5.9T tokens 上预训练，通过独特的 Token 约束混合与质量感知上采样策略优化数据分布；引入了大规模的 olmOCR 处理后的科学 PDF 数据；并在 100B tokens 的中期训练（Midtraining）中针对代码、数学和 QA 进行了强化。
+**OLMo 3 Think：** 这是 OLMo 3 的旗舰推理模型，采用了 SFT -> DPO -> RLVR（带验证奖励的强化学习）的三阶段后训练配方。报告详细披露了如何通过“Delta Learning”构建偏好数据，以及 OlmoRL 框架在算法和基础设施上的改进（如去除了 KL 散度项、引入 Token 级损失等）。
+**全栈数据公开：** 发布了预训练数据 Dolma 3 Mix、中期训练数据 Dolmino Mix、长上下文数据 Longmino Mix 以及后训练数据 Dolci 系列。
+| Benchmark | Olmo 3 Think 32B SFT | Olmo 3 Think 32B DPO | Olmo 3 Think 32B | Qwen 3 32B | Qwen 3 VL 32B Thinking | Qwen 2.5 32B | Gemma 3 27B Instruct | Gemma 2 27B Instruct | Olmo 2 32B Instruct | DeepSeek-R1-Distill-Qwen-32B |
+|-----------|-----------------------|-----------------------|-------------------|-------------|-------------------------|---------------|------------------------|------------------------|---------------------------|---------------------------------|
+| **Math** | | | | | | | | | | |
+| MATH | 95.6 | 95.9 | 96.1 | 95.4 | 96.7 | 80.2 | 87.4 | 51.5 | 49.2 | 92.6 |
+| AIME 2024 | 73.5 | 76.0 | 76.8 | 80.8 | 86.3 | 15.7 | 28.9 | 4.7 | 4.6 | 70.3 |
+| AIME 2025 | 66.2 | 70.7 | 72.5 | 70.9 | 78.8 | 13.4 | 22.9 | 0.9 | 0.9 | 56.3 |
+| OMEGA | 43.1 | 45.2 | 50.8 | 47.7 | 50.8 | 19.2 | 24.0 | 9.1 | 9.8 | 38.9 |
+| **Reasoning** | | | | | | | | | | |
+| BigBenchHard | 88.8 | 89.1 | 89.8 | 90.6 | 91.1 | 80.9 | 82.4 | 66.0 | 65.6 | 89.7 |
+| ZebraLogic | 70.5 | 74.5 | 76.0 | 88.3 | 96.1 | 24.1 | 24.8 | 17.2 | 13.3 | 69.4 |
+| AGI Eval English | 85.9 | 87.8 | 88.2 | 90.0 | 92.2 | 78.9 | 76.9 | 70.9 | 68.4 | 88.1 |
+| **Coding** | | | | | | | | | | |
+| HumanEvalPlus | 90.0 | 91.6 | 91.4 | 91.2 | 90.6 | 82.6 | 79.2 | 67.5 | 44.4 | 92.3 |
+| MBPP+ | 66.7 | 67.2 | 68.0 | 70.6 | 66.2 | 66.6 | 65.7 | 61.2 | 49.0 | 70.1 |
+| LiveCodeBench v3 | 75.8 | 81.9 | 83.5 | 90.2 | 84.8 | 49.9 | 39.0 | 28.7 | 10.6 | 79.5 |
+| **IF** | | | | | | | | | | |
+| IFEval | 83.9 | 80.6 | 89.0 | 86.5 | 85.5 | 81.9 | 85.4 | 62.1 | 85.8 | 78.7 |
+| IFBench | 37.0 | 34.4 | 47.6 | 37.3 | 55.1 | 36.7 | 31.3 | 27.8 | 36.4 | 23.8 |
+| **Knowledge & QA** | | | | | | | | | | |
+| MMLU | 85.3 | 85.2 | 85.4 | 88.8 | 90.1 | 84.6 | 74.6 | 76.1 | 77.1 | 88.0 |
+| PopQA | 33.1 | 37.0 | 31.9 | 30.7 | 32.2 | 28.0 | 30.2 | 30.4 | 37.2 | 26.7 |
+| GPQA | 55.7 | 57.6 | 58.1 | 67.3 | 67.4 | 44.6 | 45.0 | 39.9 | 36.4 | 61.8 |
+| **Chat** | | | | | | | | | | |
+| AlpacaEval 2 LC | 69.1 | 78.6 | 74.2 | 75.6 | 80.9 | 81.9 | 65.5 | 39.8 | 38.0 | 26.2 |
+| **Safety** | 64.8 | 65.3 | 68.8 | 69.0 | 82.7 | 81.9 | 68.6 | 74.3 | 83.8 | 63.6 |
+## 环境依赖
+| 软件 | 版本 |
+| :------: | :------: |
+| DTK | 25.04.2 |
+| python | 3.10.12 |
+| transformers | >=4.57.1 |
+| vllm |  0.9.2+das.opt1.dtk25042 |
+| torch | 2.5.1+das.opt1.dtk25042 |
+| triton | 3.1+das.opt1.3c5d12d.dtk25041 |
+| flash_attn | 2.6.1+das.opt1.dtk2504 |
+| flash_mla | 1.0.0+das.opt1.dtk25042 |
+当前仅支持镜像:
+- 挂载地址`-v`根据实际模型情况修改
+```bash
+docker run -it --shm-size 60g --network=host --name olmo-3 --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro -v /path/your_code_path/:/path/your_code_path/ image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10 bash
+```
+更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
+## 数据集
+暂无
+## 训练
+暂无
+## 推理
+### pytorch
+#### 单机推理
+可参考run.sh脚本
+```python 
+from transformers import AutoModelForCausalLM, AutoTokenizer
+olmo = AutoModelForCausalLM.from_pretrained("/path/to/allenai/Olmo-3-7B-Think")
+tokenizer = AutoTokenizer.from_pretrained("/path/to/allenai/Olmo-3-7B-Think")
+message = ["Language modeling is "]
+inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
+# optional verifying cuda
+inputs = {k: v.to('cuda') for k,v in inputs.items()}
+olmo = olmo.to('cuda')
+response = olmo.generate(**inputs, max_new_tokens=2048, do_sample=True, top_k=50, top_p=0.95)
+print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
+```
+## 效果展示
+<div align=center>
+    <img src="./doc/example.png"/>
+</div>
+### 精度
+DCU与GPU精度一致，推理框架：pytorch。
+## 预训练权重
+| 模型名称  | 权重大小  | DCU型号  | 最低卡数需求 |下载地址|
+|:-----:|:----------:|:----------:|:---------------------:|:----------:|
+| Olmo-3-7B-Think | 7B | BW1000 | 1 | [下载地址](https://modelscope.cn/models/allenai/Olmo-3-7B-Think) |
+## 源码仓库及问题反馈
+- https://developer.sourcefind.cn/codes/modelzoo/olmo3_pytorch
+## 参考资料
+- https://github.com/allenai/OLMo
--- a/README_origin.md
+++ b/README_origin.md
+<!-- <p align="center">
+<img src=https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/3IK823BZ8w-mz_QfeYkDn.png width="30%"/>
+</p> -->
+<p align="center">
+  <img src="docs/imgs/ovis_image_title.png" width="40%">
+</p>
+<!-- <h1 align="center">
+Ovis-Image
+</h1> -->
+<p align="center">
+  <a href="https://arxiv.org/abs/2511.22982"><img src="https://img.shields.io/badge/arXiv_paper-2511.22982-b31b1b.svg" alt="arxiv"></a>
+  <a href="https://github.com/AIDC-AI/Ovis-Image/blob/main/docs/Ovis_Image_Technical_Report.pdf"><img src="https://img.shields.io/badge/Paper-PDF-b31b1b" alt="paper"></a>
+  <a href="https://github.com/AIDC-AI/Ovis-Image"><img src="https://img.shields.io/badge/GitHub-AIDC--AI/Ovis--Image-blue?style=flat&logo=github" alt="code"></a>
+  <a href="https://huggingface.co/spaces/AIDC-AI/Ovis-Image-7B"><img src="https://img.shields.io/badge/🎨_HF_Spaces-AIDC--AI/Ovis--Image--7B-lightblack" alt="demo"></a>
+  <a href="https://huggingface.co/AIDC-AI/Ovis-Image-7B"><img src="https://img.shields.io/badge/🤗_Model-AIDC--AI/Ovis--Image--7B-yellow" alt="model"></a>
+</p>
+Built upon [Ovis-U1](https://github.com/AIDC-AI/Ovis-U1), Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational constraints. 
+<p align="center">
+  <img src="docs/imgs/ovis_image_arch.png" width="95%">
+  <br>
+  <em>The overall architecture of Ovis-Image (cf. Fig.2 in our report).</em>
+</p>
+## 🏆 Highlights
+*   **Strong text rendering at a compact 7B scale**: Ovis-Image is a 7B text-to-image model that delivers text rendering quality comparable to much larger 20B-class systems such as Qwen-Image and competitive with leading closed-source models like GPT4o in text-centric scenarios, while remaining small enough to run on widely accessible hardware.
+*   **High fidelity on text-heavy, layout-sensitive prompts**: The model excels on prompts that demand tight alignment between linguistic content and rendered typography (e.g., posters, banners, logos, UI mockups, infographics), producing legible, correctly spelled, and semantically consistent text across diverse fonts, sizes, and aspect ratios without compromising overall visual quality.
+*   **Efficiency and deployability**: With its 7B parameter budget and streamlined architecture, Ovis-Image fits on a single high-end GPU with moderate memory, supports low-latency interactive use, and scales to batch production serving, bringing near–frontier text rendering to applications where tens-of-billions–parameter models are impractical.
+## ✨ Showcase
+Here are some examples demonstrating the capabilities of Ovis-Image.
+<figure>
+  <img src="docs/imgs/ovis_image_case.png" alt="Ovis-Image examples">
+  <figcaption style="text-align: center;"></figcaption>
+</figure>
+## 🚀 News
+- [2025/12/3] 🔥 Ovis-Image has been merged into [`diffusers`](https://github.com/huggingface/diffusers/pull/12740)!
+- [2025/12/2] 🔥 Ovis-Image has been merged into [`ComfyUI`](https://github.com/comfyanonymous/ComfyUI/pull/11030)!
+- [2025/11/29] 🔥 Announcing Ovis-Image ([Model](https://huggingface.co/AIDC-AI/Ovis-Image-7B))!
+## 🛠️ Inference
+### Inference with Diffusers
+First, install the `diffusers` library with support for Ovis-Image.
+```bash
+# pip install git+https://github.com/DoctorKey/diffusers.git@ovis-image
+pip install git+https://github.com/huggingface/diffusers
+```
+Next, use the `OvisImagePipeline` to generate the image.
+```python
+import torch
+from diffusers import OvisImagePipeline
+pipe = OvisImagePipeline.from_pretrained("AIDC-AI/Ovis-Image-7B", torch_dtype=torch.bfloat16)
+pipe.to("cuda")
+prompt = "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail."
+image = pipe(prompt, negative_prompt="", num_inference_steps=50, guidance_scale=5.0).images[0]
+image.save("ovis_image.png")
+```
+### Inference with Pytorch
+Ovis-Image has been tested with Python 3.10, Torch 2.6.0, and Transformers 4.57.1. For a full list of package dependencies, please see `requirements.txt`.
+```bash
+git clone git@github.com:AIDC-AI/Ovis-Image.git
+conda create -n ovis-image python=3.10 -y
+conda activate ovis-image
+cd Ovis-Image
+pip install -r requirements.txt
+pip install -e .
+```
+For text-to-image, please run
+```bash
+python ovis_image/test.py \
+    --model_path AIDC-AI/Ovis-Image-7B/ovis_image.safetensors \
+    --vae_path AIDC-AI/Ovis-Image-7B/ae.safetensors \
+    --ovis_path AIDC-AI/Ovis-Image-7B/Ovis2.5-2B \
+    --image_size 1024 \
+    --denoising_steps 50 \
+    --cfg_scale 5.0 \
+    --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \
+```
+Alternatively, you can try Ovis-Image directly in your browser on [![Hugging Face Space](https://img.shields.io/badge/🎨_HF_Spaces-AIDC--AI/Ovis--Image--7B-lightblack)](https://huggingface.co/spaces/AIDC-AI/Ovis-Image-7B)
+## 📊 Performance
+**Evaluation of text rendering ability on CVTG-2K.**
+| Model | #Params. | WA (2 regions) | WA (3 regions) | WA (4 regions) | WA (5 regions) | WA (average) | NED↑ | CLIPScore↑ |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| Seedream 3.0 | - | 0.6282 | 0.5962 | 0.6043 | 0.5610 | 0.5924 | 0.8537 | 0.7821 |
+| GPT4o | - | 0.8779 | 0.8659 | 0.8731 | 0.8218 | 0.8569 | 0.9478 | 0.7982 |
+| SD3.5 Large | 11B+8B | 0.7293 | 0.6825 | 0.6574 | 0.5940 | 0.6548 | 0.8470 | 0.7797 |
+| RAG-Diffusion | 11B+12B | 0.4388 | 0.3316 | 0.2116 | 0.1910 | 0.2648 | 0.4498 | 0.7797 |
+| FLUX.1-dev | 11B+12B | 0.6089 | 0.5531 | 0.4661 | 0.4316 | 0.4965 | 0.6879 | 0.7401 |
+| TextCrafter | 11B+12B | 0.7628 | 0.7628 | 0.7406 | 0.6977 | 0.7370 | 0.8679 | 0.7868 |
+| Qwen-Image | 7B+20B | 0.8370 | 0.8364 | 0.8313 | 0.8158 | 0.8288 | 0.9116 | 0.8017 |
+| Ovis-Image | 2B+7B | **0.9248** | **0.9239** | **0.9180** | **0.9166** | **0.9200** | **0.9695** | **0.8368** |
+**Evaluation of text rendering ability on LongText-Bench.**
+| Model | #Params. | LongText-Bench-EN | LongText-Bench-ZN |
+| :--- | :---: | :---: | :---: |
+| Kolors 2.0 | - | 0.258 | 0.329 |
+| GPT4o | - | **0.956** | 0.619 |
+| Seedream 3.0 | - | 0.896 | 0.878 |
+| OmniGen2 | 3B+4B | 0.561 | 0.059 |
+| Janus-Pro | 7B | 0.019 | 0.006 |
+| BLIP3-o | 7B+1B | 0.021 | 0.018 |
+| FLUX.1-dev | 11B+12B | 0.607 | 0.005 |
+| BAGEL | 7B+7B | 0.373 | 0.310 |
+| HiDream-I1-Full | 11B+17B | 0.543 | 0.024 |
+| Qwen-Image | 7B+20B | 0.943 | 0.946 |
+| Ovis-Image | 2B+7B | 0.922 | **0.964** |
+**Evaluation of text-to-image generation ability on DPG-Bench.**
+| Model | #Params. | Global | Entity | Attribute | Relation | Other | Overall |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| Seedream 3.0 | - | **94.31** | **92.65** | 91.36 | 92.78 | 88.24 | 88.27 |
+| GPT4o | - | 88.89 | 88.94 | 89.84 | 92.63 | 90.96 | 85.15 |
+| Ovis-U1 | 2B+1B | 82.37 | 90.08 | 88.68 | 93.35 | 85.20 | 83.72 |
+| OmniGen2 | 3B+4B | 88.81 | 88.83 | 90.18 | 89.37 | 90.27 | 83.57 |
+| Janus-Pro | 7B | 86.90 | 88.90 | 89.40 | 89.32 | 89.48 | 84.19 |
+| BAGEL | 7B+7B | 88.94 | 90.37 | 91.29 | 90.82 | 88.67 | 85.07 |
+| HiDream-I1-Full | 11B+17B | 76.44 | 90.22 | 89.48 | 93.74 | 91.83 | 85.89 |
+| UniWorld-V1 | 7B+12B | 83.64 | 88.39 | 88.44 | 89.27 | 87.22 | 81.38 |
+| Qwen-Image | 7B+20B | 91.32 | 91.56 | **92.02** | **94.31** | **92.73** | **88.32** |
+| Ovis-Image | 2B+7B | 82.37 | 92.38 | 90.42 | 93.98 | 91.20 | 86.59 |
+**Evaluation of text-to-image generation ability on GenEval.**
+| Model | #Params. | Single object | Two object | Counting | Colors | Position | Attribute binding | Overall |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| Seedream 3.0 | - | 0.99 | 0.96 | **0.91** | **0.93** | 0.47 | **0.80** | 0.84 |
+| GPT4o | - | 0.99 | 0.92 | 0.85 | 0.92 | 0.75 | 0.61 | 0.84 |
+| Ovis-U1 | 2B+1B | 0.98 | **0.98** | 0.90 | 0.92 | **0.79** | 0.75 | **0.89** |
+| OmniGen2 | 3B+4B | **1.00** | 0.95 | 0.64 | 0.88 | 0.55 | 0.76 | 0.80 |
+| Janus-Pro | 7B | 0.99 | 0.89 | 0.59 | 0.90 | **0.79** | 0.66 | 0.80 |
+| BAGEL | 7B+7B | 0.99 | 0.94 | 0.81 | 0.88 | 0.64 | 0.63 | 0.82 |
+| HiDream-I1-Full | 11B+17B | 1.00 | **0.98** | 0.79 | 0.91 | 0.60 | 0.72 | 0.83 |
+| UniWorld-V1 | 7B+12B | 0.99 | 0.93 | 0.79 | 0.89 | 0.49 | 0.70 | 0.80 |
+| Qwen-Image | 7B+20B | 0.99 | 0.92 | 0.89 | 0.88 | 0.76 | 0.77 | 0.87 |
+| Ovis-Image | 2B+7B | **1.00** | 0.97 | 0.76 | 0.86 | 0.67 | **0.80** | 0.84 |
+**Evaluation of text-to-image generation ability on OneIG-EN.**
+| Model | #Params. | Alignment | Text | Reasoning | Style | Diversity | Overall |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| Kolors 2.0 | - | 0.820 | 0.427 | 0.262 | 0.360 | 0.300 | 0.434 |
+| Imagen4 | - | 0.857 | 0.805 | 0.338 | 0.377 | 0.199 | 0.515 |
+| Seedream 3.0 | - | 0.818 | 0.865 | 0.275 | 0.413 | 0.277 | 0.530 |
+| GPT4o | - | 0.851 | 0.857 | **0.345** | **0.462** | 0.151 | 0.533 |
+| Ovis-U1 | 2B+1B | 0.816 | 0.034 | 0.226 | 0.443 | 0.191 | 0.342 |
+| CogView4 | 6B | 0.786 | 0.641 | 0.246 | 0.353 | 0.205 | 0.446 |
+| Janus-Pro | 7B | 0.553 | 0.001 | 0.139 | 0.276 | **0.365** | 0.267 |
+| OmniGen2 | 3B+4B | 0.804 | 0.680 | 0.271 | 0.377 | 0.242 | 0.475 |
+| BLIP3-o | 7B+1B | 0.711 | 0.013 | 0.223 | 0.361 | 0.229 | 0.307 |
+| FLUX.1-dev | 11B+12B | 0.786 | 0.523 | 0.253 | 0.368 | 0.238 | 0.434 |
+| BAGEL | 7B+7B | 0.769 | 0.244 | 0.173 | 0.367 | 0.251 | 0.361 |
+| BAGEL+CoT | 7B+7B | 0.793 | 0.020 | 0.206 | 0.390 | 0.209 | 0.324 |
+| HiDream-I1-Full | 11B+17B | 0.829 | 0.707 | 0.317 | 0.347 | 0.186 | 0.477 |
+| HunyuanImage-2.1 | 7B+17B | 0.835 | 0.816 | 0.299 | 0.355 | 0.127 | 0.486 |
+| Qwen-Image | 7B+20B | **0.882** | 0.891 | 0.306 | 0.418 | 0.197 | **0.539** |
+| Ovis-Image | 2B+7B | 0.858 | **0.914** | 0.308 | 0.386 | 0.186 | 0.530 |
+**Evaluation of text-to-image generation ability on OneIG-ZN.**
+| Model | #Params. | Alignment | Text | Reasoning | Style | Diversity | Overall |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| Kolors 2.0 | - | 0.738 | 0.502 | 0.226 | 0.331 | 0.333 | 0.426 |
+| Seedream 3.0 | - | 0.793 | 0.928 | 0.281 | 0.397 | 0.243 | 0.528 |
+| GPT4o | - | 0.812 | 0.650 | **0.300** | **0.449** | 0.159 | 0.474 |
+| CogView4 | 6B | 0.700 | 0.193 | 0.236 | 0.348 | 0.214 | 0.338 |
+| Janus-Pro | 7B | 0.324 | 0.148 | 0.104 | 0.264 | **0.358** | 0.240 |
+| BLIP3-o | 7B+1B | 0.608 | 0.092 | 0.213 | 0.369 | 0.233 | 0.303 |
+| BAGEL | 7B+7B | 0.672 | 0.365 | 0.186 | 0.357 | 0.268 | 0.370 |
+| BAGEL+CoT | 7B+7B | 0.719 | 0.127 | 0.219 | 0.385 | 0.197 | 0.329 |
+| HiDream-I1-Full | 11B+17B | 0.620 | 0.205 | 0.256 | 0.304 | 0.300 | 0.337 |
+| HunyuanImage-2.1 | 7B+17B | 0.775 | 0.896 | 0.271 | 0.348 | 0.114 | 0.481 |
+| Qwen-Image | 7B+20B | **0.825** | **0.963** | 0.267 | 0.405 | 0.279 | **0.548** |
+| Ovis-Image | 2B+7B | 0.805 | 0.961 | 0.273 | 0.368 | 0.198 | 0.521 |
+## 📚 Citation
+If you find Ovis-Image useful for your research or applications, please cite our technical report:
+```bibtex
+@article{wang2025ovis_image,
+  title={Ovis-Image Technical Report}, 
+  author={Wang, Guo-Hua and Cao, Liangfu and Cui, Tianyu and Fu, Minghao and Chen, Xiaohao and Zhan, Pengxin and Zhao, Jianshan and Li, Lan and Fu, Bowen and Liu, Jiaqi and Chen, Qing-Guo},
+  journal={arXiv preprint arXiv:2511.22982},
+  year={2025}
+}
+```
+## 🙏 Acknowledgments
+The code is built upon [Ovis](https://github.com/AIDC-AI/Ovis) and [FLUX](https://github.com/black-forest-labs/flux). We thank their authors for open-sourcing their great work.
+## 📄 License
+This project is licensed under the Apache License, Version 2.0 (SPDX-License-Identifier: Apache-2.0). 
+## 🚨 Disclaimer
+We used compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.
+## 🔥 We are hiring!
+We are looking for both interns and full-time researchers to join our team, focusing on multimodal understanding, generation, reasoning, AI agents, and unified multimodal models. If you are interested in exploring these exciting areas, please reach out to us at qingguo.cqg@alibaba-inc.com.
\ No newline at end of file
--- a/doc/example.png
+++ b/doc/example.png
--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=1865
+# 模型名称
+modelName=olmo3_pytorch
+# 模型描述
+modelDescription=Olmo3是一个新的 7B 和 32B 模型系列，包括 Instruct 和 Think 变体。长链思维可以改进数学和编码等推理任务。
+# 应用场景
+processType=推理
+# 算法类别
+appScenario=文本生成
+# 框架类型
+frameType=pytorch
+# 加速卡类型
+accelerateType=BW1000
\ No newline at end of file
--- a/run.py
+++ b/run.py
+from transformers import AutoModelForCausalLM, AutoTokenizer
+olmo = AutoModelForCausalLM.from_pretrained("/path/to/allenai/Olmo-3-7B-Think")
+tokenizer = AutoTokenizer.from_pretrained("/path/to/allenai/Olmo-3-7B-Think")
+message = ["Language modeling is "]
+inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
+# optional verifying cuda
+inputs = {k: v.to('cuda') for k,v in inputs.items()}
+olmo = olmo.to('cuda')
+response = olmo.generate(**inputs, max_new_tokens=2048, do_sample=True, top_k=50, top_p=0.95)
+print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
\ No newline at end of file
--- a/run.sh
+++ b/run.sh
+HIP_VISIBLE_DEVICES=0 python run.py
\ No newline at end of file