Initial commit

5e887c2c · wanglch · 5e887c2c · 5e887c2c · 5e887c2c · 5e887c2c
Commit 5e887c2c authored May 31, 2024 by wanglch
20 changed files
--- a/.github/ISSUE_TEMPLATE/bug_report.yaml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yaml
+name: 🐞 Bug
+description: 提交错误报告 | File a bug/issue
+title: "[BUG] <title>"
+labels: []
+body:
+  - type: checkboxes
+    attributes:
+      label: 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?
+      description: |
+        请先搜索您遇到的错误是否在已有的issues或讨论中提到过。
+        Please search to see if an issue / discussion already exists for the bug you encountered.
+        [Issues](https://github.com/QwenLM/Qwen-7B/issues)
+        [Discussions](https://github.com/QwenLM/Qwen-7B/discussions)
+      options:
+        - label: 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
+          required: true
+  - type: checkboxes
+    attributes:
+      label: 该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?
+      description: |
+        请先搜索您遇到的错误是否已在FAQ中有相关解答。
+        Please search to see if an answer already exists in FAQ for the bug you encountered.
+        [FAQ-en](https://github.com/QwenLM/Qwen-7B/blob/main/FAQ.md)
+        [FAQ-zh](https://github.com/QwenLM/Qwen-7B/blob/main/FAQ_zh.md)
+      options:
+        - label: 我已经搜索过FAQ | I have searched FAQ
+          required: true
+  - type: textarea
+    attributes:
+      label: 当前行为 | Current Behavior
+      description: |
+        准确描述遇到的行为。
+        A concise description of what you're experiencing.
+    validations:
+      required: false
+  - type: textarea
+    attributes:
+      label: 期望行为 | Expected Behavior
+      description: |
+        准确描述预期的行为。
+        A concise description of what you expected to happen.
+    validations:
+      required: false
+  - type: textarea
+    attributes:
+      label: 复现方法 | Steps To Reproduce
+      description: |
+        复现当前行为的详细步骤。
+        Steps to reproduce the behavior.
+      placeholder: |
+        1. In this environment...
+        2. With this config...
+        3. Run '...'
+        4. See error...
+    validations:
+      required: false
+  - type: textarea
+    attributes:
+      label: 运行环境 | Environment
+      description: |
+        examples:
+          - **OS**: Ubuntu 20.04
+          - **Python**: 3.8
+          - **Transformers**: 4.31.0
+          - **PyTorch**: 2.0.1
+          - **CUDA**: 11.4
+      value: |
+        - OS:
+        - Python:
+        - Transformers:
+        - PyTorch:
+        - CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
+      render: Markdown
+    validations:
+      required: false
+  - type: textarea
+    attributes:
+      label: 备注 | Anything else?
+      description: |
+        您可以在这里补充其他关于该问题背景信息的描述、链接或引用等。
+        您可以通过点击高亮此区域然后拖动文件的方式上传图片或日志文件。
+        Links? References? Anything that will give us more context about the issue you are encountering!
+        Tip: You can attach images or log files by clicking this area to highlight it and then dragging files in.
+    validations:
+      required: false
--- a/.github/ISSUE_TEMPLATE/config.yaml
+++ b/.github/ISSUE_TEMPLATE/config.yaml
+blank_issues_enabled: true
--- a/.github/ISSUE_TEMPLATE/feature_request.yaml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yaml
+name: "💡 Feature Request"
+description: 创建新功能请求 | Create a new ticket for a new feature request
+title: "💡 [REQUEST] - <title>"
+labels: [
+  "question"
+]
+body:
+  - type: input
+    id: start_date
+    attributes:
+      label: "起始日期 | Start Date"
+      description: |
+        起始开发日期
+        Start of development
+      placeholder: "month/day/year"
+    validations:
+      required: false
+  - type: textarea
+    id: implementation_pr
+    attributes:
+      label: "实现PR | Implementation PR"
+      description: |
+        实现该功能的Pull request
+        Pull request used
+      placeholder: "#Pull Request ID"
+    validations:
+      required: false
+  - type: textarea
+    id: reference_issues
+    attributes:
+      label: "相关Issues | Reference Issues"
+      description: |
+        与该功能相关的issues
+        Common issues
+      placeholder: "#Issues IDs"
+    validations:
+      required: false
+  - type: textarea
+    id: summary
+    attributes:
+      label: "摘要 | Summary"
+      description: |
+        简要描述新功能的特点
+        Provide a brief explanation of the feature
+      placeholder: |
+        Describe in a few lines your feature request
+    validations:
+      required: true
+  - type: textarea
+    id: basic_example
+    attributes:
+      label: "基本示例 | Basic Example"
+      description: Indicate here some basic examples of your feature.
+      placeholder: A few specific words about your feature request.
+    validations:
+      required: true
+  - type: textarea
+    id: drawbacks
+    attributes:
+      label: "缺陷 | Drawbacks"
+      description: |
+        该新功能有哪些缺陷/可能造成哪些影响？
+        What are the drawbacks/impacts of your feature request ?
+      placeholder: |
+        Identify the drawbacks and impacts while being neutral on your feature request
+    validations:
+      required: true
+  - type: textarea
+    id: unresolved_question
+    attributes:
+      label: "未解决问题 | Unresolved questions"
+      description: |
+        有哪些尚未解决的问题？
+        What questions still remain unresolved ?
+      placeholder: |
+        Identify any unresolved issues.
+    validations:
+      required: false
\ No newline at end of file
--- a/.gitignore
+++ b/.gitignore
+__pycache__
+*.so
+build
+.coverage_*
+*.egg-info
+*~
+.vscode/
+.idea/
+.DS_Store
+/private/
+Qwen-VL-Chat/
+Qwen-VL-Chat-Int4/
+SimSun.ttf
--- a/2308.12966v3.pdf
+++ b/2308.12966v3.pdf
--- a/Contributors.md
+++ b/Contributors.md
+## Code of Conduct 🤝
+Before diving in, please take a moment to review our Code of Conduct. It sets the tone for our community and emphasizes the importance of respect and inclusivity. [Read the Code of Conduct](LICENSE.md).
+## Contribution Types 🦠🚀📚
+### Bug Reports 🐞
+If you encounter any bugs during your journey, don't fret! We have the Bug Busters ready to help. To report a bug, follow these steps:
+1. Check if the bug has already been reported in [GitHub Issues](https://github.com/AI4Finance-Foundation/FinGPT/issues).
+2. If it's a new bug, open a new issue with a concise description and provide detailed, step-by-step instructions to reproduce it.
+### Feature Requests 💡
+Do you have visionary ideas that could elevate FinGPT? Share them with us! When submitting a feature request, be sure to include:
+1. A clear and vivid description of the feature you envision.
+2. Discuss the impact and potential benefits.
+### Documentation 📖
+For those with a penchant for words and an eye for detail, consider contributing to our documentation. You can make the documentation more enlightening for everyone. 🧙📜
+### Code Contributions 💻
+Calling all AI heroes and wizards! You are the secret sauce behind the FinGPT project. To contribute code and save the financial world:
+1. **Fork the Repository**: Click the "Fork" button on the top right of the repository's page. This creates your own copy of the project.
+2. **Clone your Fork**: In your terminal, use the following command to clone your fork to your local machine:
+   ```bash
+   git clone https://github.com/YourUsername/FinGPT.git
+   ```
+3. **Create a New Branch**: Make a new branch for your adventures. This helps keep the main codebase clean:
+   ```bash
+   git checkout -b your-feature-branch
+   ```
+4. **Work Your Magic**: Implement your code or changes.
+5. **Commit and Push**: Use these commands to commit your changes and push them to your fork:
+   ```bash
+   git commit -m "Your commit message"
+   git push origin your-feature-branch
+   ```
+6. **Create a Pull Request**: Go to the original FinGPT repository and click "New Pull Request." Select your branch, write a description, and submit.
+## Seeking Assistance ❓🙋‍♀️
+If you find yourself stuck or have questions, remember that our support team is your sidekick. Don't hesitate to reach out. We are here to guide you through the process and provide any necessary assistance.
+## Getting Started 🚀🚀
+Are you ready to make a mark on the FinGPT project? Grab your cape and join us in our mission to make finance and AI even more incredible. Your contributions are the magic that fuels our journey.
+🔗 [FinGPT GitHub Repository](https://github.com/AI4Finance-Foundation/FinGPT)
+### May your contributions be as amazing as you are! 🌌🚀
\ No newline at end of file
--- a/FAQ_zh.md
+++ b/FAQ_zh.md
+# FAQ
+## 安装&环境
+#### 我应该用哪个transformers版本？
+建议使用4.31.0。
+#### 我把模型和代码下到本地，按照教程无法使用，该怎么办？
+答：别着急，先检查你的代码是不是更新到最新版本，然后确认你是否完整地将模型checkpoint下到本地。
+#### `qwen.tiktoken`这个文件找不到，怎么办？
+这个是我们的tokenizer的merge文件，你必须下载它才能使用我们的tokenizer。注意，如果你使用git clone却没有使用git-lfs，这个文件不会被下载。如果你不了解git-lfs，可点击[官网](https://git-lfs.com/)了解。
+#### transformers_stream_generator/tiktoken/accelerate，这几个库提示找不到，怎么办？
+运行如下命令：`pip install -r requirements.txt`。相关依赖库在[https://github.com/QwenLM/Qwen-VL/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-VL/blob/main/requirements.txt) 可以找到。
+<br><br>
+## Demo & 推理
+#### 是否提供Demo？
+`web_demo_mm.py`提供了Web UI。请查看README相关内容了解更多。
+#### Qwen-VL支持流式推理吗？
+Qwen-VL当前不支持流式推理。
+#### 模型的输出看起来与输入无关/没有遵循指令/看起来呆呆的
+请检查是否加载的是Qwen-VL-Chat模型进行推理，Qwen-VL模型是未经align的预训练基模型，不期望具备响应用户指令的能力。我们在模型最新版本已经对`chat`接口内进行了检查，避免您误将预训练模型作为SFT/Chat模型使用。
+#### 是否有量化版本模型
+目前Qwen-VL不支持量化，后续我们将支持高效的量化推理实现。
+#### 处理长序列时效果有问题
+请确认是否开启ntk。若要启用这些技巧，请将`config.json`里的`use_dynamc_ntk`和`use_logn_attn`设置为`true`。最新代码默认为`true`。
+<br><br>
+## Tokenizer
+#### bos_id/eos_id/pad_id，这些token id不存在，为什么？
+在训练过程中，我们仅使用<|endoftext|>这一token作为sample/document之间的分隔符及padding位置占位符，你可以将bos_id, eos_id, pad_id均指向tokenizer.eod_id。请阅读我们关于tokenizer的文档，了解如何设置这些id。
--- a/LICENSE
+++ b/LICENSE
+Tongyi Qianwen LICENSE AGREEMENT
+Tongyi Qianwen Release Date: August 23, 2023
+By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, you will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
+1. Definitions
+    a. This Tongyi Qianwen LICENSE AGREEMENT (this "Agreement") shall mean the terms and conditions for use, reproduction, distribution and modification of the Materials as defined by this Agreement.
+    b. "We"(or "Us") shall mean Alibaba Cloud.
+    c. "You" (or "Your") shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Materials for any purpose and in any field of use.
+    d. "Third Parties" shall mean individuals or legal entities that are not under common control with Us or You.
+    e. "Tongyi Qianwen" shall mean the large language models (including Qwen-VL model and Qwen-VL-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
+    f. "Materials" shall mean, collectively, Alibaba Cloud's proprietary Tongyi Qianwen and Documentation (and any portion thereof) made available under this Agreement.
+    g. "Source" form shall mean the preferred form for making modifications, including but not limited to model source code, documentation source, and configuration files.
+    h. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+2. Grant of Rights
+You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Alibaba Cloud's intellectual property or other rights owned by Us embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials.
+3. Redistribution
+You may reproduce and distribute copies of the Materials or derivative works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+    a. You shall give any other recipients of the Materials or derivative works a copy of this Agreement;
+    b. You shall cause any modified files to carry prominent notices stating that You changed the files;
+    c. You shall retain in all copies of the Materials that You distribute the following attribution notices within a "Notice" text file distributed as a part of such copies: "Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved."; and
+    d. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such derivative works as a whole, provided Your use, reproduction, and distribution of the work otherwise complies with the terms and conditions of this Agreement.
+4. Restrictions
+If you are commercially using the Materials, and your product or service has more than 100 million monthly active users, You shall request a license from Us. You cannot exercise your rights under this Agreement without our express authorization.
+5. Rules of use
+    a. The Materials may be subject to export controls or restrictions in China, the United States or other countries or regions. You shall comply with applicable laws and regulations in your use of the Materials.
+    b. You can not use the Materials or any output therefrom to improve any other large language model (excluding Tongyi Qianwen or derivative works thereof).
+6. Intellectual Property
+    a. We retain ownership of all intellectual property rights in and to the Materials and derivatives made by or for Us. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
+    b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
+    c. If you commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any entity alleging that the Materials or any output therefrom, or any part of the foregoing, infringe any intellectual property or other right owned or licensable by you, then all licences granted to you under this Agreement shall terminate as of the date such lawsuit or other proceeding is commenced or brought.
+7. Disclaimer of Warranty and Limitation of Liability
+    a. We are not obligated to support, update, provide training for, or develop any further version of the Tongyi Qianwen Materials or to grant any license thereto.
+    b. THE MATERIALS ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
+    c. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW IT’S CAUSED.
+    d. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
+8. Survival and Termination.
+    a. The term of this Agreement shall commence upon your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
+    b. We may terminate this Agreement if you breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, you must delete and cease use of the Materials. Sections 7 and 9 shall survive the termination of this Agreement.
+9. Governing Law and Jurisdiction.
+    a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
+    b. The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# Qwen-VL
+Qwen-VL 是阿里云研发的大规模视觉语言模型（Large Vision Language Model, LVLM）。Qwen-VL 可以以图像、文本、检测框作为输入，并以文本和检测框作为输出。
+## 论文
+- [Qwen-VL: A Versatile Vision-Language Model for
+Understanding, Localization, Text Reading, and Beyond](https://arxiv.org/pdf/2308.12966)
+## 模型结构
+Qwen-VL的多语言视觉语言模型系列，基于Qwen-7B语言模型。该模型通过视觉编码器和位置感知的视觉语言适配器，赋予语言模型视觉理解能力。
+<div align="center">
+    <img src="./assets/transformer.jpg"/>
+</div>
+## 算法原理
+Qwen-VL: Qwen-VL 以 Qwen-7B 的预训练模型作为语言模型的初始化，并以 Openclip ViT-bigG 作为视觉编码器的初始化，中间加入单层随机初始化的 cross-attention，经过约1.5B的图文数据训练得到。最终图像输入分辨率为448。
+<div align=center>
+    <img src="./assets/transformer.png"/>
+</div>
+Qwen-VL采用了三阶段的训练流程，并在多个视觉语言理解基准测试中取得了领先的成绩。该模型支持多语言、多图像输入，具备细粒度的视觉理解能力。
+另外，通过指令调优，生成了交互式的Qwen-VL-Chat模型，在现实世界用户行为的评估中展现出了优异的表现。总体而言，Qwen-VL系列模型在视觉语言理解任务上取得了显著的成果，并在开源社区中具有领先的地位。
+<div align=center>
+    <img src="./assets/qwenvl.jpeg"/>
+</div>
+## 环境配置
+### Docker（方法一）
+[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu22.04-dtk23.10.1-py310
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=64G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name qwen-vl <your imageID> bash
+cd /path/your_code_data/
+pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
+pip install -r requirements_web_demo.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
+```
+### Dockerfile（方法二）
+```
+cd /path/your_code_data/docker
+docker build --no-cache -t qwen-vl:latest .
+docker run --shm-size=64G --name qwen-vl -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it qwen-vl bash
+```
+### Anaconda（方法三）
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+DTK驱动：dtk23.10
+python：python3.10
+torch:2.1
+torchvision: 0.16.0
+deepspped: 0.12.3
+```
+`Tips：以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+conda create -n qwen-vl python=3.10
+conda activate qwen-vl
+cd /path/your_code_data/
+pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple
+pip install -r requirements_web_demo.txt -i http://mirrors.aliyun.com/pypi/simple
+```
+## 数据集
+迷你数据集 [assets/mm_tutorial](./assets/mm_tutorial) 
+预训练需要准备你的训练数据，需要将所有样本放到一个列表中并存入json文件中。每个样本对应一个字典，包含id和conversation，其中后者为一个列表。示例如下所示：用于正常训练的完整数据集请按此目录结构进行制备：
+```
+[
+  {
+    "id": "identity_0",
+    "conversations": [
+      {
+        "from": "user",
+        "value": "你好"
+      },
+      {
+        "from": "assistant",
+        "value": "我是Qwen-VL,一个支持视觉输入的大模型。"
+      }
+    ]
+  },
+  {
+    "id": "identity_1",
+    "conversations": [
+      {
+        "from": "user",
+        "value": "Picture 1: <img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>\n图中的狗是什么品种？"
+      },
+      {
+        "from": "assistant",
+        "value": "图中是一只拉布拉多犬。"
+      },
+      {
+        "from": "user",
+        "value": "框出图中的格子衬衫"
+      },
+      {
+        "from": "assistant",
+        "value": "<ref>格子衬衫</ref><box>(588,499),(725,789)</box>"
+      }
+    ]
+  },
+  { 
+    "id": "identity_2",
+    "conversations": [
+      {
+        "from": "user",
+        "value": "Picture 1: <img>assets/mm_tutorial/Chongqing.jpeg</img>\nPicture 2: <img>assets/mm_tutorial/Beijing.jpeg</img>\n图中都是哪"
+      },
+      {
+        "from": "assistant",
+        "value": "第一张图片是重庆的城市天际线，第二张图片是北京的天际线。"
+      }
+    ]
+  }
+]
+```
+## 训练
+### 单机单卡
+```
+sh finetune/finetune_lora_single_gpu.sh
+```
+## 推理
+执行多种任务时需要对以下参数进行修改，可使用中文指令，如下：
+`'image'= 图片路径`
+`'text'= 任务需求`
+```
+query = tokenizer.from_list_format([
+    {'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
+    {'text': 'Generate the caption in English with grounding:'},
+])
+```
+### 单机单卡
+```
+python qwen_vl_inference.py
+```
+## result
+### 检测任务
+<div align=center>
+    <img src="./assets/mm_tutorial\2.jpg"/>
+</div>
+### 车牌识别
+<div align=center>
+    <img src="./assets/mm_tutorial/car.png"/>
+</div>
+<div align=center>
+    <img src="./assets/car_num.png"/>
+</div>
+### 火车票识别
+<div align=center>
+    <img src="./assets/train_ticket.jpg"/>
+</div>
+<div align=center>
+    <img src="./assets/train_ticket_info.png"/>
+</div>
+### 精度
+测试数据： [assets/mm_tutorial](./assets/mm_tutorial) ，使用的加速卡:V100S/K100。
+| device | train_loss | 
+| :------: | :------: | 
+| V100s | 1.9149 | 
+| K100 | 1.9149 | 
+## 应用场景
+### 算法类别
+`ocr`
+### 热点应用行业
+`金融,教育,政府,科研,制造,能源,交通`
+## 预训练权重
+- [Qwen/Qwen-VL-Chat](https://hf-mirror.com/Qwen/Qwen-VL-Chat/tree/main)
+- [Qwen/Qwen-VL](https://hf-mirror.com/Qwen/Qwen-VL/tree/main) 
+## 源码仓库及问题反馈
+- http://developer.hpccube.com/codes/modelzoo/umt5.git
+## 参考资料
+- [Qwen-VL: A Versatile Vision-Language Model for
+Understanding, Localization, Text Reading, and Beyond](https://arxiv.org/pdf/2308.12966)
+- [Qwen-VL github](https://github.com/QwenLM/Qwen-VL)
--- a/README_CN.md
+++ b/README_CN.md
--- a/TUTORIAL_zh.md
+++ b/TUTORIAL_zh.md
+# Qwen-VL-Chat使用教程
+Qwen-VL-Chat是通用多模态大规模语言模型，因此它可以完成多种视觉语言任务。在本教程之中，我们会给出一些简明的例子，用以展示Qwen-VL-Chat在**视觉问答，文字理解，图表数学推理，多图理解和Grounding**(根据指令标注图片中指定区域的包围框)等多方面的能力。请注意，展示的例子远非Qwen-VL-Chat能力的极限，**您可以通过更换不同的输入图像和提示词（Prompt），来进一步挖掘Qwen-VL-Chat的能力！**
+## 初始化Qwen-VL-Chat模型
+在使用Qwen-VL-Chat之前，您首先需要初始化Qwen-VL-Chat的分词器（Tokenizer）和Qwen-VL-Chat的模型：
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.generation import GenerationConfig
+# 如果您希望结果可复现，可以设置随机数种子。
+# torch.manual_seed(1234)
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cuda", trust_remote_code=True).eval()
+model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
+```
+在执行完上述代码后，```tokenizer```将对应Qwen-VL-Chat使用的分词器，而```model```将对应Qwen-VL-Chat的模型。```tokenizer```用于对图文混排输入进行分词和预处理，而```model```则是Qwen-VL-Chat模型本身。
+## 使用Qwen-VL-Chat
+### **多轮视觉问答**
+#### **第一个问题**
+首先我们来看一个最简单的例子，如下图所示，文件```assets/mm_tutorial/Rebecca_(1939_poster).jpeg```是1940年电影Rebecca的于1939发布的海报。
+![](assets/mm_tutorial/Rebecca_(1939_poster)_Small.jpeg)
+我们来问一问Qwen-VL-Chat海报上电影的名称是什么。首先，我们使用tokenizer.from_list_format可以对图文混排输入进行分词与处理：
+```python
+query = tokenizer.from_list_format([
+    {'image': 'assets/mm_tutorial/Rebecca_(1939_poster).jpeg'},
+    {'text': 'What is the name of the movie in the poster?'},
+])
+```
+接下来，我们可以使用```model.chat```向Qwen-VL-Chat模型提问并获得回复。注意在第一次提问时，对话历史为空，因此我们使用```history=None```。
+```python
+response, history = model.chat(tokenizer, query=query, history=None)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+> The name of the movie in the poster is "Rebecca."
+这说明模型正确的回答了问题！根据海报，该电影的名称的确是**Rebecca**。
+#### **多轮问答**
+我们还可以继续向模型发问，例如询问电影的导演是谁。在后续提问时，对话历史并不为空，我们使用```history=history```向```model.chat```传递之前的对话历史：
+```python
+query = tokenizer.from_list_format([
+    {'text': 'Who directed this movie?'},
+])
+response, history = model.chat(tokenizer, query=query, history=history)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+> The movie "Rebecca" was directed by Alfred Hitchcock.
+模型再次正确回答了问题！根据海报，该电影的导演是Alfred Hitchcock。
+### **文字理解**
+Qwen-VL-Chat具有一定的针对包含密集文字图片的理解能力。如下图所示，文件```assets/mm_tutorial/Hospital.jpeg```是一个包含密集文字的医院指示牌。
+![](assets/mm_tutorial/Hospital_Small.jpg)
+我们可以像之前一样向模型询问医院中各个科室的位置，对话历史为空，因此我们使用```history=None```。
+```python
+query = tokenizer.from_list_format([
+    {'image': 'assets/mm_tutorial/Hospital.jpg'},
+    {'text': 'Based on the photo, which floor is the Department of Otorhinolaryngology on?'},
+])
+response, history = model.chat(tokenizer, query=query, history=None)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+> The Department of Otorhinolaryngology is located on the 4th floor.
+您同样可以进一步提出后续问题，此时需要使用```history=history```向```model.chat```传递之前的对话历史。
+```python
+query = tokenizer.from_list_format([
+    {'text': 'Based on the photo, which floor is the Department of Surgery on?'},
+])
+response, history = model.chat(tokenizer, query=query, history=history)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+> The Department of Surgery is located on the 3rd floor.
+### **图表数学推理**
+利用模型的图表理解和数学推理能力，Qwen-VL-Chat还可以完成更复杂的一些任务！如下图所示，文件```assets/mm_tutorial/Menu.jpeg```展示了一家餐厅的菜单。现在我们想知道，如果购买两个Salmon Burger和三个Meat Lover's Pizza需要花多少钱呢？
+![](assets/mm_tutorial/Menu.jpeg)
+```python
+query = tokenizer.from_list_format([
+    {'image': 'assets/mm_tutorial/Menu.jpeg'},
+    {'text': 'How much would I pay if I want to order two Salmon Burger and three Meat Lover\'s Pizza? Think carefully step by step.'},
+])
+response, history = model.chat(tokenizer, query=query, history=None)
+print(response)
+```
+```Think carefully step by step.```是一个引导模型分步处理复杂任务的常见提示词，如果您需要完成的任务较为复杂，可以试着使用它来提高准确率。您应该会得到类似下列的输出结果：
+> To order two Salmon Burgers and three Meat Lover's Pizzas, you would need to pay the following:
+> 
+> 1. For two Salmon Burgers: x2 Salmon Burgers at $10 each = $20
+> 2. For three Meat Lover's Pizzas: x3 Meat Lover's Pizzas at $12 each = $36
+> 
+> Therefore, the total cost would be $56.
+### **多图理解与中文输入**
+在之前的例子中，我们主要展示了Qwen-VL-Chat针对单张图像和英文问题的问答能力。但实际上，Qwen-VL-Chat是支持中文输入的多语言模型，而且也支持多张图片的输入！下面的例子中，我们用中文让Qwen-VL-Chat来为我们比较重庆和北京这两个城市的照片（```assets/mm_tutorial/Chongqing.jpeg```和```assets/mm_tutorial/Beijing.jpeg```）：
+![](assets/mm_tutorial/Chongqing_Small.jpeg)
+![](assets/mm_tutorial/Beijing_Small.jpeg)
+```python
+query = tokenizer.from_list_format([
+    {'image': 'assets/mm_tutorial/Chongqing.jpeg'},
+    {'image': 'assets/mm_tutorial/Beijing.jpeg'},
+    {'text': '上面两张图片分别是哪两个城市？请对它们进行对比。'},
+])
+torch.manual_seed(5678)
+response, history = model.chat(tokenizer, query=query, history=None)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+> 第一张图片是重庆的城市天际线，它反映了现代都市的繁华与喧嚣。第二张图片是北京的天际线，它象征着中国首都的现代化和国际化。两座城市都是中国的重要城市，拥有独特的文化和发展历史。
+**请注意，城市间的比较是一个具有相当主观性的问题，因此模型产生的回复可能具有相当高的随机性。若不使用```torch.manual_seed(5678)```设置随机数种子，每次的输出结果会不一样。即使您设置了随机数种子，由于软硬件环境的差异，得到的结果也可能与本文档中的有所不同。**
+### **Grounding能力**
+在最后，我们展示Qwen-VL-Chat模型产生包围框的能力。Qwen-VL-Chat可以根据您的语言描述，在图像中用矩形框框出指定区域。这样说可能有些抽象，让我们来看下面的例子。如下图所示，文件```assets/mm_tutorial/Shanghai.jpg```是上海的一张照片，我们先用常规的提示词，问一下模型图里有什么。
+![](assets/mm_tutorial/Shanghai_Small.jpeg)
+```python
+torch.manual_seed(1234)
+query = tokenizer.from_list_format([
+    {'image': 'assets/mm_tutorial/Shanghai.jpg'},
+    {'text': '图里有啥'},
+])
+response, history = model.chat(tokenizer, query=query, history=None)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+> 图中是中国上海的天际线，包括了上海塔、金茂大厦、上海环球金融中心、海洋大厦等著名建筑。
+接下来，我们通过使用```请给我框出图中上海环球金融中心和东方明珠```这个提示词来和模型对话，看看会发生什么。注意此时需要使用```history=history```向```model.chat```传递之前的对话历史。
+```python
+query = tokenizer.from_list_format([
+    {'text': '请给我框出图中上海环球金融中心和东方明珠'},
+])
+response, history = model.chat(tokenizer, query=query, history=history)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+```xml
+<ref>上海环球金融中心</ref><box>(667,437),(760,874)</box>和<ref>东方明珠</ref><box>(506,75),(582,946)</box>
+```
+Qwen-VL-Chat模型没有手，但也没有拒绝您的请求，而是输出了一些“奇怪”的东西——并不是，实际上，模型的输出以标记语言的形式给出了上海环球金融中心和东方明珠在图中的具体位置。您可以使用下列代码将其可视化：
+```python
+image = tokenizer.draw_bbox_on_latest_picture(response, history)
+image.save('Shanghai_Output.jpg')
+```
+保存下来的```Shanghai_Output.jpg```结果将类似于下面的截图：
+![](assets/mm_tutorial/Shanghai_Output_Small.jpeg)
+在此之后，您还可以继续照常和Qwen-VL-Chat对话：
+```python
+query = tokenizer.from_list_format([
+    {'text': '帮我写个这座城市的旅游计划'},
+])
+response, history = model.chat(tokenizer, query=query, history=history)
+print(response)
+```
+您应该会得到类似下列的输出结果：
+> 好的，以下是一个简单的上海旅游计划：
+>
+> 第一天：
+> 上午：抵达上海，前往酒店办理入住手续。
+> 中午：享用午餐后，前往外滩，欣赏黄浦江畔的美景，游览上海地标性建筑如浦发银行大楼、汇丰银行大楼等。
+> 下午：游览南京路步行街，购买特色礼品或品尝当地美食。
+> 晚上：在南京路附近的餐厅享用晚餐，然后去看上海的夜景。
+>
+> 第二天：
+> 上午：前往上海科技馆，了解科技发展历史，观看各种科技展览。
+> 中午：在科技馆附近的餐厅享用午餐。
+> 下午：游览世纪公园，欣赏美景并放松身心。
+> 晚上：在南京路或附近的陆家嘴地区享用晚餐，然后去看上海的夜景。
+>
+> 第三天：
+> 上午：游览上海迪士尼乐园或上海海昌海洋公园，与各种迪士尼角色互动，或者在海洋公园观看海洋生物表演。
+> 中午：在迪士尼乐园或海洋公园附近的餐厅享用午餐。
+> 下午：自由活动，可以去购物、品尝当地美食或者去博物馆等。
+> 晚上：在酒店附近享用晚餐，然后离开上海。
+>
+> 当然，以上只是一个简单的计划，上海有许多其他景点和活动，例如参观上海博物馆、游览田子坊、观看上海话剧等。具体计划可以根据个人兴趣和时间进行调整。
+**请注意，旅游计划是一个具有相当主观性的问题，因此模型产生的回复可能具有相当高的随机性。若不使用```torch.manual_seed(1234)```设置随机数种子，每次的输出结果会不一样。即使您设置了随机数种子，由于软硬件环境的差异，得到的结果也可能与本文档中的有所不同。**
--- a/assets/apple.jpeg
+++ b/assets/apple.jpeg
--- a/assets/apple_r.jpeg
+++ b/assets/apple_r.jpeg
--- a/assets/car_num.png
+++ b/assets/car_num.png
--- a/assets/demo_highfive.jpg
+++ b/assets/demo_highfive.jpg
--- a/assets/demo_spotting_caption.jpg
+++ b/assets/demo_spotting_caption.jpg
--- a/assets/demo_vl.gif
+++ b/assets/demo_vl.gif
--- a/assets/logo.jpg
+++ b/assets/logo.jpg
--- a/assets/mm_tutorial/2.jpg
+++ b/assets/mm_tutorial/2.jpg
--- a/assets/mm_tutorial/Beijing.jpeg
+++ b/assets/mm_tutorial/Beijing.jpeg