Initial commit

1bfbcff0 · wanglch · 1bfbcff0 · 1bfbcff0 · 1bfbcff0 · 1bfbcff0
Commit 1bfbcff0 authored Jun 13, 2024 by wanglch
20 changed files
--- a/swift-main/docs/source/Multi-Modal/index.md
+++ b/swift-main/docs/source/Multi-Modal/index.md
+## Multi-Modal文档
+
+### 📚教程
+1. [MLLM部署文档](MLLM部署文档.md)
+
+### ⭐️最佳实践系列
+
+一轮对话可以包含多张图片（或不含图片）:
+1. [Qwen-VL最佳实践](qwen-vl最佳实践.md)
+2. [Qwen-Audio最佳实践](qwen-audio最佳实践.md)
+3. [Deepseek-VL最佳实践](deepseek-vl最佳实践.md)
+4. [Internlm2-Xcomposers最佳实践](internlm-xcomposer2最佳实践.md)
+5. [Phi3-Vision最佳实践](phi3-vision最佳实践.md)
+
+
+一轮对话只能包含一张图片:
+1. [Llava最佳实践](llava最佳实践.md)
+2. [Yi-VL最佳实践.md](yi-vl最佳实践.md)
+3. [mPLUG-Owl2最佳实践](mplug-owl2最佳实践.md)
+
+
+整个对话围绕一张图片:
+1. [CogVLM最佳实践](cogvlm最佳实践.md), [CogVLM2最佳实践](cogvlm2最佳实践.md), [glm4v最佳实践](glm4v最佳实践.md)
+2. [MiniCPM-V最佳实践](minicpm-v最佳实践.md), [MiniCPM-V-2最佳实践](minicpm-v-2最佳实践.md), [MiniCPM-V-2.5最佳实践](minicpm-v-2.5最佳实践.md)
+3. [InternVL-Chat-V1.5最佳实践](internvl最佳实践.md)
--- a/swift-main/docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md
+
+# Internlm-Xcomposer2 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+pip install 'ms-swift[llm]' -U
+```
+
+## 推理
+
+推理[internlm-xcomposer2-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary):
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 21GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type internlm-xcomposer2-7b-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< 你是谁？
+ 我是你的助手，一个基于语言的人工智能模型，可以回答你的问题。
+--------------------------------------------------
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>这两张图片有什么区别
+ 这两张图片是不同的, 第一张是羊的图片, 第二张是猫的图片
+--------------------------------------------------
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>图中有几只羊
+ 图中有4只羊
+--------------------------------------------------
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>计算结果是多少
+ 计算结果是1452+45304=46756
+--------------------------------------------------
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>根据图片中的内容写首诗
+ 湖面波光粼粼，小舟独自飘荡。
+船上点灯，照亮夜色，
+星星点点，倒映水中。
+
+远处山峦，云雾缭绕，
+天空繁星，闪烁不停。
+湖面如镜，倒影清晰，
+小舟穿行，如诗如画。
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.internlm_xcomposer2_7b_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？"""
+response, history = inference(model, template, query)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '距离最远的城市是哪？'
+gen = inference_stream(model, template, query, history)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？
+response: 马鞍山距离阳江62公里，广州距离广州293公里。
+query: 距离最远的城市是哪？
+response: 距离最最远的城市是广州，距离广州293公里。
+history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远？', ' 马鞍山距离阳江62公里，广州距离广州293公里。'], ['距离最远的城市是哪？', ' 距离最远的城市是广州，距离广州293公里。']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+(默认只对LLM部分的qkv进行lora微调. 不支持`--lora_target_modules ALL`. 支持全参数微调.)
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 21GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type internlm-xcomposer2-7b-chat \
+    --dataset coco-en-mini \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL. 该模型不支持merge-lora)
+
+```json
+[
+    {"conversations": [
+        {"from": "user", "value": "<img>img_path</img>11111"},
+        {"from": "assistant", "value": "22222"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "<img>img_path</img><img>img_path2</img><img>img_path3</img>aaaaa"},
+        {"from": "assistant", "value": "bbbbb"},
+        {"from": "user", "value": "<img>img_path</img>ccccc"},
+        {"from": "assistant", "value": "ddddd"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "AAAAA"},
+        {"from": "assistant", "value": "BBBBB"},
+        {"from": "user", "value": "CCCCC"},
+        {"from": "assistant", "value": "DDDDD"}
+    ]}
+]
+```
+
+
+## 微调后推理
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/internlm-xcomposer2-7b-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
--- a/swift-main/docs/source/Multi-Modal/internvl最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/internvl最佳实践.md
+
+# InternVL 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+pip install Pillow
+```
+
+## 推理
+
+推理[internvl-chat-v1.5](https://www.modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5/summary)和[internvl-chat-v1.5-int8](https://www.modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5-int8/summary)
+
+下面教程以`internvl-chat-v1.5`为例，你可以修改`--model_type internvl-chat-v1_5-int8`来选择int8版本的模型，使用`mini-internvl-chat-2b-v1_5`或
+`mini-internvl-chat-4b-v1_5`来使用Mini-Internvl
+
+**注意**
+- 如果要使用本地模型文件，加上参数 `--model_id_or_path /path/to/model`
+- 如果你的GPU不支持flash attention, 使用参数`--use_flash_attn false`。且对于int8模型，推理时需要指定`dtype --bf16`, 否则可能会出现乱码
+- 模型本身config中的max_length较小，为2048，可以设置`--max_length`来修改
+- 可以使用参数`--gradient_checkpoting true`减少显存占用
+- InternVL系列模型的**训练**只支持带有图片的数据集
+
+```shell
+# Experimental environment: A100
+# 55GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5 --dtype bf16 --max_length 4096
+
+# 2*30GB GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift infer --model_type internvl-chat-v1_5 --dtype bf16 --max_length 4096
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< Describe this image.
+Input a media path or URL <<<  http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+This is a high-resolution image of a kitten. The kitten has striking blue eyes and a fluffy white and grey coat. The fur pattern suggests that it may be a Maine Coon or a similar breed. The kitten's ears are perked up, and it has a curious and innocent expression. The background is blurred, which brings the focus to the kitten's face.
+--------------------------------------------------
+<<< How many sheep are in the picture?
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+There are four sheep in the picture.
+--------------------------------------------------
+<<< What is the calculation result?
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+The calculation result is 59,856.
+--------------------------------------------------
+<<< Write a poem based on the content of the picture.
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+Token indices sequence length is longer than the specified maximum sequence length for this model (5142 > 4096). Running this sequence through the model will result in indexing errors
+In the still of the night,
+A lone boat sails on the light.
+The stars above, a twinkling sight,
+Reflecting in the water's might.
+
+The trees stand tall, a silent guard,
+Their leaves rustling in the yard.
+The boatman's lantern, a beacon bright,
+Guiding him through the night.
+
+The river flows, a gentle stream,
+Carrying the boatman's dream.
+His journey long, his heart serene,
+In the beauty of the scene.
+
+The stars above, a guiding light,
+Leading him through the night.
+The boatman's journey, a tale to tell,
+Of courage, hope, and love as well.
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.internvl_chat_v1_5
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
+                                       model_kwargs={'device_map': 'auto'})
+
+# for GPUs that do not support flash attention
+# model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+#                                        model_kwargs={'device_map': 'auto'},
+#                                        use_flash_attn = False)
+
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = '距离各城市多远？'
+response, history = inference(model, template, query, images=images) # chat with image
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '距离最远的城市是哪？'
+gen = inference_stream(model, template, query, history) # chat without image
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: 距离各城市多远？
+response: 这张图片显示的是一个路标，上面标示了三个目的地及其距离：
+
+- 马踏（Mata）：14公里
+- 阳江（Yangjiang）：62公里
+- 广州（Guangzhou）：293公里
+
+这些距离是按照路标上的指示来计算的。
+query: 距离最远的城市是哪？
+response: 根据这张图片，距离最远的城市是广州（Guangzhou），距离为293公里。
+history: [['距离各城市多远？', '这张图片显示的是一个路标，上面标示了三个目的地及其距离：\n\n- 马踏（Mata）：14公里\n- 阳江（Yangjiang）：62公里\n- 广州（Guangzhou）：293公里\n\n这些距离是按照路标上的指示来计算的。 '], ['距离最远的城市是哪？', '根据这张图片，距离最远的城市是广州（Guangzhou），距离为293公里。 ']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+LoRA微调:
+
+**注意**
+- 默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`.
+- 如果你的GPU不支持flash attention, 使用参数`--use_flash_attn false`
+
+```shell
+# Experimental environment: A100
+# 80GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type internvl-chat-v1_5 \
+    --dataset coco-en-2-mini \
+    --max_length 4096
+
+# device_map
+# Experimental environment: 2*A100...
+# 2*43GB GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type  internvl-chat-v1_5 \
+    --dataset coco-en-2-mini \
+    --max_length 4096
+
+# ddp + deepspeed-zero2
+# Experimental environment: 2*A100...
+# 2*80GB GPU memory
+NPROC_PER_NODE=2 \
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type  internvl-chat-v1_5 \
+    --dataset coco-en-2-mini \
+    --max_length 4096 \
+    --deepspeed default-zero2
+```
+
+全参数微调:
+```bash
+# Experimental environment: 4 * A100
+# device map
+# 4 * 72 GPU memory
+CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
+    --model_type internvl-chat-v1_5 \
+    --dataset coco-en-2-mini \
+    --max_length 4096 \
+    --sft_type full \
+```
+
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(只支持单轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL)
+
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "images": ["image_path"]}
+```
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+    --max_length 4096
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir "output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx" \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir "output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx-merged" \
+    --load_dataset_config true \
+    --max_length 4096
+
+# device map
+CUDA_VISIBLE_DEVICES=0,1 swift infer \
+    --ckpt_dir "output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx-merged" \
+    --load_dataset_config true \
+    --max_length 4096
+```
--- a/swift-main/docs/source/Multi-Modal/llava最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/llava最佳实践.md
+
+# Llava 最佳实践
+本篇文档对应的模型
+
+| model | model_type |
+|-------|------------|
+| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1_6-mistral-7b-instruct |
+| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1_6-yi-34b-instruct |
+|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b|
+|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b|
+|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b|
+
+以下实践以`llava-v1.6-mistral-7b`为例，你也可以通过指定`--model_type`切换为其他模型
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+```
+
+## 推理
+```shell
+# Experimental environment: A100
+# 20GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-mistral-7b-instruct
+
+# 70GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-yi-34b-instruct
+
+# 4*20GB GPU memory
+CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer --model_type llava1_6-yi-34b-instruct
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< Describe this image.
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+The image shows a close-up of a kitten with a soft, blurred background that suggests a natural, outdoor setting. The kitten has a mix of white and gray fur with darker stripes, typical of a tabby pattern. Its eyes are wide open, with a striking blue color that contrasts with the kitten's fur. The kitten's nose is small and pink, and its whiskers are long and white, adding to the kitten's cute and innocent appearance. The lighting in the image is soft and diffused, creating a gentle and warm atmosphere. The focus is sharp on the kitten's face, while the rest of the image is slightly out of focus, which draws attention to the kitten's features.
+--------------------------------------------------
+<<< How many sheep are in the picture?
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+There are four sheep in the picture.
+--------------------------------------------------
+<<< What is the calculation result?
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+The calculation result is 14352 + 45304 = 145304.
+--------------------------------------------------
+<<< Write a poem based on the content of the picture.
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+In the quiet of the night,
+A solitary boat takes flight,
+Across the water's gentle swell,
+Underneath the stars that softly fell.
+
+The boat, a vessel of the night,
+Carries but one, a lone delight,
+A solitary figure, lost in thought,
+In the tranquil calm, they find a wraith.
+
+The stars above, like diamonds bright,
+Reflect upon the water's surface light,
+Creating a path for the boat's journey,
+Guiding through the night with a gentle purity.
+
+The boat, a silent sentinel,
+In the stillness, it gently swells,
+A vessel of peace and calm,
+In the quiet of the night, it carries on.
+
+The figure on board, a soul at ease,
+In the serene embrace of nature's peace,
+They sail through the night,
+Under the watchful eyes of the stars' light.
+
+The boat, a symbol of solitude,
+In the vast expanse of the universe's beauty,
+A lone journey, a solitary quest,
+In the quiet of the night, it finds its rest.
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = 'llava1_6-mistral-7b-instruct'
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = 'How far is it from each city?'
+response, _ = inference(model, template, query, images=images)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = 'Which city is the farthest?'
+gen = inference_stream(model, template, query, images=images)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, _ in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+"""
+query: How far is it from each city?
+response: The image shows a road sign indicating the distances to three cities: Mata, Yangjiang, and Guangzhou. The distances are given in kilometers.
+
+- Mata is 14 kilometers away.
+- Yangjiang is 62 kilometers away.
+- Guangzhou is 293 kilometers away.
+
+Please note that these distances are as the crow flies and do not take into account the actual driving distance due to road conditions, traffic, or other factors.
+query: Which city is the farthest?
+response: The farthest city listed on the sign is Mata, which is 14 kilometers away.
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+LoRA微调:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`.)
+```shell
+# Experimental environment: A10, 3090, V100...
+# 21GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type llava1_6-mistral-7b-instruct \
+    --dataset coco-en-2-mini \
+
+# Experimental environment: 2*A100...
+# 2*45GB GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type llava1_6-yi-34b-instruct \
+    --dataset coco-en-2-mini \
+```
+
+全参数微调:
+```shell
+# Experimental environment: 4 * A100
+# 4 * 70 GPU memory
+NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
+    --model_type llava1_6-mistral-7b-instruct \
+    --dataset coco-en-2-mini \
+    --sft_type full \
+    --deepspeed default-zero2
+
+# 8 * 50 GPU memory
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \
+    --model_type llava1_6-yi-34b-instruct \
+    --dataset coco-en-2-mini \
+    --sft_type full \
+```
+
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(只支持单轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL)
+
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "images": ["image_path"]}
+```
+
+## 微调后推理
+直接推理:
+```shell
+model_type="llava1_6-mistral-7b-instruct"
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true
+```
+
+**merge-lora**并推理:
+```shell
+model_type="llava1_6-mistral-7b-instruct"
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \
+    --merge_lora true
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx-merged" \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md
+
+# MiniCPM-V-2.5 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+```
+模型链接:
+- minicpm-v-v2_5-chat: [https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5/summary](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5/summary)
+
+
+## 推理
+
+推理 minicpm-v-v2_5-chat:
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 20GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_5-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< 描述这张图片
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+这张图片展示了一只年轻的猫咪的特写，可能是一只小猫，具有明显的特征。它的毛发主要是白色的，带有灰色和黑色的条纹和斑点，这是虎斑猫的典型特征。小猫的眼睛是蓝色的，瞳孔是圆形的，给人一种好奇和专注的表情。它的耳朵尖尖的，竖立着，显示出警觉性。小猫的鼻子是粉红色的，鼻孔是可见的。背景模糊不清，突出了小猫的特征。整体的色调柔和，重点放在小猫的毛发和眼睛上。
+--------------------------------------------------
+<<< clear
+<<< 图中有几只羊？
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+图中有四只羊。
+--------------------------------------------------
+<<< clear
+<<< 计算结果是多少
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+计算结果是1452 + 4530 = 5982。
+--------------------------------------------------
+<<< clear
+<<< 根据图片中的内容写首诗
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+在宁静的夜晚，船只航行，
+在星光闪烁的水面上，
+一只熊猫乘风破浪，
+在夜空的映衬下。
+船上灯火通明，照亮了前方的道路，
+在宁静的水面上投下温暖的光芒，
+熊猫坐在船头，享受着旅程，
+在这宁静的夜晚中，享受着旅程。
+星星在上方闪烁，点缀着天空，
+在这宁静的夜晚中，创造出一幅美丽的画面，
+船只在水面上轻轻摇晃，
+在这宁静的夜晚中，创造出一幅美丽的画面。
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.minicpm_v_v2_5_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = '距离各城市多远？'
+response, history = inference(model, template, query, images=images)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '距离最远的城市是哪？'
+gen = inference_stream(model, template, query, history, images=images)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: 距离各城市多远？
+response: 马踏到阳江的距离是62公里，阳江到广州的距离是293公里。
+query: 距离最远的城市是哪？
+response: 距离最远的城市是广州，到广州的距离为293公里。
+history: [['距离各城市多远？', '马踏到阳江的距离是62公里，阳江到广州的距离是293公里。'], ['距离最远的城市是哪？', '距离最远的城市是广州，到广州的距离为293公里。']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
+```shell
+# Experimental environment: A100
+# 32GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type minicpm-v-v2_5-chat \
+    --dataset coco-en-2-mini \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 但总的轮次对话只能包含一张图片, 支持传入本地路径或URL)
+
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/minicpm-v-2最佳实践.md
+
+# MiniCPM-V-2 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+```
+
+## 推理
+
+推理[minicpm-v-v2-chat](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2/summary):
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 10GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< 描述这张图片
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+这张图片展示了一只小猫的特写，它的毛色主要是黑白相间，带有一些浅色条纹，可能暗示着虎斑猫品种。小猫的眼睛是蓝色的，瞳孔看起来是黑色的，给人一种深邃和好奇的感觉。它的耳朵竖立着，尖端是白色的，与毛色相匹配。小猫的鼻子是黑色的，嘴巴微微张开，露出牙齿，表明它可能在微笑或嬉戏。背景模糊，但似乎是室内环境，可能是地板或墙壁，颜色柔和，与小猫的毛色相融合。图片中的风格化效果使小猫看起来像一幅绘画或插图，而不是一张真实的照片。
+--------------------------------------------------
+<<< clear
+<<< 图中有几只羊？
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+这幅图片描绘了一群羊在草地上。总共有四只羊，它们都长着白色的毛和棕色的角。这些羊看起来大小不一，其中一只看起来比另外三只要小一些。它们站在一片郁郁葱葱的绿草中，背景是起伏的山丘和天空。这幅图片的风格是卡通化的，羊的面部特征和身体特征都非常夸张。
+--------------------------------------------------
+<<< clear
+<<< 计算结果是多少
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+计算结果是1452 + 45304 = 46756。
+--------------------------------------------------
+<<< clear
+<<< 根据图片中的内容写首诗
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+这幅图片描绘了一个宁静的夜晚场景，一艘船漂浮在水面之上。船看起来是一艘小木船，船头有一个桅杆，上面挂着一个灯笼，发出温暖的光芒。船身涂成深棕色，与水面形成鲜明对比。水面反射着星星和船只的灯光，营造出一种宁静而梦幻的氛围。背景中，树木繁茂，树叶呈现出金色和绿色，暗示着可能是黄昏或黎明时分。天空布满星星，给整个场景增添了神秘感。整体氛围宁静而幽静，让人联想到一个童话般的场景。
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.minicpm_v_v2_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = '距离各城市多远？'
+response, history = inference(model, template, query, images=images)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '距离最远的城市是哪？'
+gen = inference_stream(model, template, query, history, images=images)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: 距离各城市多远？
+response:  马踏到马塔14公里，到阳江62公里，到广州293公里。
+query: 距离最远的城市是哪？
+response: 距离最远的城市是广州，距离为293公里。
+history: [['距离各城市多远？', ' 马踏到马塔14公里，到阳江62公里，到广州293公里。'], ['距离最远的城市是哪？', '距离最远的城市是广州，距离为293公里。']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 10GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type minicpm-v-v2-chat \
+    --dataset coco-en-2-mini \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 但总的轮次对话只能包含一张图片, 支持传入本地路径或URL)
+
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/minicpm-v-v2-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/minicpm-v-v2-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/minicpm-v-v2-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/minicpm-v最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/minicpm-v最佳实践.md
+
+# MiniCPM-V 最佳实践
+以下内容以`minicpm-v-3b-chat`为例, 如果你想要使用更新版本的 MiniCPM-V 多模态模型(v2), 你可以将`--model_type minicpm-v-3b-chat`切换成`--model_type minicpm-v-v2-chat`.
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+pip install 'ms-swift[llm]' -U
+```
+
+模型链接:
+- minicpm-v-3b-chat: [https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary](https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary)
+- minicpm-v-v2-chat: [https://modelscope.cn/models/OpenBMB/MiniCPM-V-2/summary](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2/summary)
+
+
+## 推理
+
+推理minicpm-v-3b-chat:
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 10GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-3b-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< 描述这张图片
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+该图像的特点是一只黑白相间的猫，它的眼睛睁得大大的，似乎在凝视着相机。这只猫看起来很小，可能是一只幼猫。
+--------------------------------------------------
+<<< clear
+<<< 图中有几只羊？
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+图中有四只羊。
+--------------------------------------------------
+<<< clear
+<<< 计算结果是多少
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+计算结果为1452 + 4530 = 5982。
+--------------------------------------------------
+<<< clear
+<<< 根据图片中的内容写首诗
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+在宁静的夜晚，一艘船在平静的湖面上航行。
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.minicpm_v_3b_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = '距离各城市多远？'
+response, history = inference(model, template, query, images=images)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '距离最远的城市是哪？'
+gen = inference_stream(model, template, query, history, images=images)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: 距离各城市多远？
+response:  广州到深圳的距离是230公里，而深圳到广州的距离是14公里。
+query: 距离最远的城市是哪？
+response: 距离最远的城市是深圳，它位于广州和深圳之间，距离广州230公里，距离深圳14公里。
+history: [['距离各城市多远？', ' 广州到深圳的距离是230公里，而深圳到广州的距离是14公里。'], ['距离最远的城市是哪？', '距离最远的城市是深圳，它位于广州和深圳之间，距离广州230公里，距离深圳14公里。']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 10GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type minicpm-v-3b-chat \
+    --dataset coco-en-2-mini \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 但总的轮次对话只能包含一张图片, 支持传入本地路径或URL)
+
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/minicpm-v-3b-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/minicpm-v-3b-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/minicpm-v-3b-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/mplug-owl2最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/mplug-owl2最佳实践.md
+
+# mPLUG-Owl2 最佳实践
+以下内容以`mplug-owl2_1-chat`为例, 你也可以选择`mplug-owl2-chat`.
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+```
+
+模型链接:
+- mplug-owl2_1-chat: [https://modelscope.cn/models/iic/mPLUG-Owl2.1/summary](https://modelscope.cn/models/iic/mPLUG-Owl2.1/summary)
+- mplug-owl2-chat: [https://modelscope.cn/models/iic/mPLUG-Owl2/summary](https://modelscope.cn/models/iic/mPLUG-Owl2/summary)
+
+
+## 推理
+
+推理`mplug-owl2_1-chat`:
+```shell
+# Experimental environment: A10, 3090, V100...
+# 24GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type mplug-owl2_1-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< Describe this image.
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+The image features a close-up of a cute, gray and white kitten with big blue eyes. The kitten is sitting on a table, looking directly at the viewer. The scene captures the kitten's adorable features, including its whiskers and the fur on its face. The kitten appears to be staring into the camera, creating a captivating and endearing atmosphere.
+--------------------------------------------------
+<<< How many sheep are in the picture?
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+There are four sheep in the picture.
+--------------------------------------------------
+<<< What is the calculation result?
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+The calculation result is 1452 + 45304 = 46756.
+--------------------------------------------------
+<<< Write a poem based on the content of the picture.
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+In the stillness of the night, a boat glides across the water, its light shining bright. The stars twinkle above, casting a magical glow. A man and a dog are on board, enjoying the serene journey. The boat floats gently, as if it's floating on air. The calm waters reflect the stars, creating a breathtaking scene. The man and his dog are lost in their thoughts, taking in the beauty of nature. The boat seems to be floating in a dream, as if they are on a journey to find their way back home.
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.mplug_owl2_1_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = 'How far is it from each city?'
+response, history = inference(model, template, query, images=images)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = 'Which city is the farthest?'
+images = images * 2
+gen = inference_stream(model, template, query, history, images=images)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: How far is it from each city?
+response: From the given information, it is 14 km from the city of Mata, 62 km from Yangjiang, and 293 km from Guangzhou.
+query: Which city is the farthest?
+response: The farthest city is Guangzhou, which is 293 km away.
+history: [['How far is it from each city?', 'From the given information, it is 14 km from the city of Mata, 62 km from Yangjiang, and 293 km from Guangzhou.'], ['Which city is the farthest?', 'The farthest city is Guangzhou, which is 293 km away.']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
+```shell
+# Experimental environment: A10, 3090, V100...
+# 24GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type mplug-owl2_1-chat \
+    --dataset coco-en-2-mini \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL)
+
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/mplug-owl2_1-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/mplug-owl2_1-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/mplug-owl2_1-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/phi3-vision最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/phi3-vision最佳实践.md
+
+# Phi3-Vision 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+```
+模型链接:
+- phi3-vision-128k-instruct: [https://modelscope.cn/models/LLM-Research/Phi-3-vision-128k-instruct/summary](https://modelscope.cn/models/LLM-Research/Phi-3-vision-128k-instruct/summary)
+
+
+## 推理
+
+推理 phi3-vision-128k-instruct:
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 16GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type phi3-vision-128k-instruct
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< Who are you?
+I am Phi, an AI developed by Microsoft to assist with providing information, answering questions, and helping users find solutions to their queries. How can I assist you today?
+--------------------------------------------------
+<<< clear
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?
+The first picture shows a group of four cartoon sheep standing in a field, while the second picture is a close-up of a kitten with a blurred background. The main difference between these two pictures is the subject matter and the setting. The first picture features animals that are typically associated with farm life and agriculture, while the second picture focuses on a domestic animal, a kitten, which is more commonly found in households. Additionally, the first picture has a more peaceful and serene atmosphere, while the second picture has a more intimate and detailed view of the kitten.
+--------------------------------------------------
+<<< clear
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>How many sheep are there in the picture?
+There are four sheep in the picture.
+--------------------------------------------------
+<<< clear
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>What is the result of the calculation?
+The result of the calculation 1452 + 45304 is 46756.
+--------------------------------------------------
+<<< clear
+<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>Write a poem based on the content of the picture.
+In the tranquil night, a boat sails,
+Through the darkened river, it sets sail.
+A single candle flickers, casting light,
+Guiding the way through the endless night.
+
+The stars above, like diamonds bright,
+Gleam down upon the boat's gentle flight.
+The moon, a silent guardian in the sky,
+Watches over the boat as it sails by.
+
+The river, a mirror to the night,
+Reflects the boat's journey, a beautiful sight.
+The trees on either side, standing tall,
+Whisper secrets to the boat, one and all.
+
+In the stillness of the night, a sense of peace,
+The boat, the river, the trees, all in their place.
+A moment frozen in time, a scene so serene,
+A journey through the night, a dream so unseen.
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.phi3_vision_128k_instruct
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it from each city?"""
+response, history = inference(model, template, query)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = 'Which city is the farthest?'
+gen = inference_stream(model, template, query, history)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: Which city is the farthest?
+response: Guangzhou is the farthest city, located 293km away.
+history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it from each city?', 'The distances are as follows: Mata is 14km away, Yangjiang is 62km away, and Guangzhou is 293km away.'], ['Which city is the farthest?', 'Guangzhou is the farthest city, located 293km away.']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
+```shell
+# Experimental environment: A10, 3090, V100, ...
+# 16GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type phi3-vision-128k-instruct \
+    --dataset coco-en-mini \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
+
+```json
+[
+    {"conversations": [
+        {"from": "user", "value": "<img>img_path</img>11111"},
+        {"from": "assistant", "value": "22222"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "<img>img_path</img><img>img_path2</img><img>img_path3</img>aaaaa"},
+        {"from": "assistant", "value": "bbbbb"},
+        {"from": "user", "value": "<img>img_path</img>ccccc"},
+        {"from": "assistant", "value": "ddddd"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "AAAAA"},
+        {"from": "assistant", "value": "BBBBB"},
+        {"from": "user", "value": "CCCCC"},
+        {"from": "assistant", "value": "DDDDD"}
+    ]}
+]
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/phi3-vision-128k-instruct/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/phi3-vision-128k-instruct/vx-xxx/checkpoint-xxx \
+    --merge_lora true --safe_serialization false
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/phi3-vision-128k-instruct/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/qwen-audio最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/qwen-audio最佳实践.md
+# Qwen-Audio 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+pip install 'ms-swift[llm]' -U
+```
+
+## 推理
+
+推理[qwen-audio-chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary):
+```shell
+# Experimental environment: A10, 3090, V100...
+# 21GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-audio-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< multi-line
+[INFO:swift] End multi-line input with `#`.
+[INFO:swift] Input `single-line` to switch to single-line input mode.
+<<<[M] 你是谁？#
+我是来自达摩院的大规模语言模型，我叫通义千问。
+--------------------------------------------------
+<<<[M] Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/music.wav</audio>
+这是首什么样的音乐#
+这是电子、实验流行风格的音乐。
+--------------------------------------------------
+<<<[M] Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
+这段语音说了什么#
+这段语音中说了中文："今天天气真好呀"。
+--------------------------------------------------
+<<<[M] 这段语音是男生还是女生#
+根据音色判断，这段语音是男性。
+"""
+```
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.qwen_audio_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+query = """Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
+这段语音说了什么"""
+response, history = inference(model, template, query)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '这段语音是男生还是女生'
+gen = inference_stream(model, template, query, history)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
+这段语音说了什么
+response: 这段语音说了中文："今天天气真好呀"。
+query: 这段语音是男生还是女生
+response: 根据音色判断，这段语音是男性。
+history: [['Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>\n这段语音说了什么', '这段语音说了中文："今天天气真好呀"。'], ['这段语音是男生还是女生', '根据音色判断，这段语音是男性。']]
+"""
+```
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+LoRA微调:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含audio模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
+```shell
+# Experimental environment: A10, 3090, V100...
+# 22GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+```
+
+全参数微调:
+```shell
+# MP
+# Experimental environment: 2 * A100
+# 2 * 50 GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+    --sft_type full \
+
+# ZeRO2
+# Experimental environment: 4 * A100
+# 4 * 80 GPU memory
+NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+    --sft_type full \
+    --use_flash_attn true \
+    --deepspeed default-zero2
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 支持每轮对话含多段语音或不含语音, 支持传入本地路径或URL)
+
+```json
+[
+    {"conversations": [
+        {"from": "user", "value": "Audio 1:<audio>audio_path</audio>\n11111"},
+        {"from": "assistant", "value": "22222"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nAudio 2:<audio>audio_path2</audio>\nAudio 3:<audio>audio_path3</audio>\naaaaa"},
+        {"from": "assistant", "value": "bbbbb"},
+        {"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nccccc"},
+        {"from": "assistant", "value": "ddddd"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "AAAAA"},
+        {"from": "assistant", "value": "BBBBB"},
+        {"from": "user", "value": "CCCCC"},
+        {"from": "assistant", "value": "DDDDD"}
+    ]}
+]
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/qwen-vl最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/qwen-vl最佳实践.md
+
+# Qwen-VL 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+pip install 'ms-swift[llm]' -U
+```
+
+## 推理
+
+推理[qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary):
+```shell
+# Experimental environment: 3090
+# 24GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-vl-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< multi-line
+[INFO:swift] End multi-line input with `#`.
+[INFO:swift] Input `single-line` to switch to single-line input mode.
+<<<[M] 你是谁？#
+我是通义千问，由阿里云开发的AI助手。我被设计用来回答各种问题、提供信息和与用户进行对话。有什么我可以帮助你的吗？
+--------------------------------------------------
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>
+Picture 2:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>
+这两张图片有什么区别#
+两张图片的相同点是它们都是关于动物的插画，但是它们的动物不同。
+第一张图片中的动物是绵羊，而第二张图片中的动物是小猫。
+--------------------------------------------------
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>
+图中有几只羊#
+图中有四只羊。
+--------------------------------------------------
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>
+计算结果是多少#
+1452 + 45304 = 46756
+--------------------------------------------------
+<<< clear
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>
+根据图片中的内容写首诗#
+月光如水船如星，独坐船头吹夜风。深林倒影照水面，萤火点点照船行。
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.qwen_vl_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+
+query = """Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
+距离各城市多远？"""
+response, history = inference(model, template, query)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '距离最远的城市是哪？'
+gen = inference_stream(model, template, query, history)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
+距离各城市多远？
+response: 马路边距离马路边14公里；阳江边距离马路边62公里；广州边距离马路边293公里。
+query: 距离最远的城市是哪？
+response: 距离最远的城市是广州，距离马路边293公里。
+history: [['Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>\n距离各城市多远？', '马路边距离马路边14公里；阳江边距离马路边62公里；广州边距离马路边293公里。'], ['距离最远的城市是哪？', '距离最远的城市是广州，距离马路边293公里。']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+LoRA微调:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
+```shell
+# Experimental environment: 3090
+# 23GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type qwen-vl-chat \
+    --dataset coco-en-mini \
+```
+
+全参数微调:
+```shell
+# Experimental environment: 4 * A100
+# 4 * 70 GPU memory
+NPROC_PER_NODE=2 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
+    --model_type qwen-vl-chat \
+    --dataset coco-en-mini \
+    --sft_type full \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
+
+```json
+[
+    {"conversations": [
+        {"from": "user", "value": "Picture 1:<img>img_path</img>\n11111"},
+        {"from": "assistant", "value": "22222"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "Picture 1:<img>img_path</img>\nPicture 2:<img>img_path2</img>\nPicture 3:<img>img_path3</img>\naaaaa"},
+        {"from": "assistant", "value": "bbbbb"},
+        {"from": "user", "value": "Picture 1:<img>img_path</img>\nccccc"},
+        {"from": "assistant", "value": "ddddd"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "AAAAA"},
+        {"from": "assistant", "value": "BBBBB"},
+        {"from": "user", "value": "CCCCC"},
+        {"from": "assistant", "value": "DDDDD"}
+    ]}
+]
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/Multi-Modal/yi-vl最佳实践.md
+++ b/swift-main/docs/source/Multi-Modal/yi-vl最佳实践.md
+
+# Yi-VL 最佳实践
+
+## 目录
+- [环境准备](#环境准备)
+- [推理](#推理)
+- [微调](#微调)
+- [微调后推理](#微调后推理)
+
+
+## 环境准备
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+```
+
+## 推理
+
+推理[yi-vl-6b-chat](https://modelscope.cn/models/01ai/Yi-VL-6B/summary):
+```shell
+# Experimental environment: A10, 3090, V100...
+# 18GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-vl-6b-chat
+```
+
+输出: (支持传入本地路径或URL)
+```python
+"""
+<<< 描述这张图片
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+图片显示一只小猫坐在地板上,眼睛睁开,凝视着摄像机。小猫看起来很可爱,有灰色和白色的毛皮,以及蓝色的眼睛。它似乎正在看摄像机,可能对周围环境很好奇。
+--------------------------------------------------
+<<< 图中有几只羊
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+图中有四只羊.
+--------------------------------------------------
+<<< clear
+<<< 计算结果是多少
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+1452 + 45304 = 46756
+--------------------------------------------------
+<<< clear
+<<< 根据图片中的内容写首诗
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+夜幕降临,星光闪烁,
+一艘小船在河上飘荡,
+船头挂着一盏明亮的灯,
+照亮了周围的黑暗。
+
+船上有两个人,
+一个在船头,另一个在船尾,
+他们似乎在谈话,
+在星光下享受着宁静的时刻。
+
+河岸边,树木在黑暗中站着,
+在星光下投下长长的影子。
+这景象是那么的宁静,
+让人想起一个古老的传说。
+
+小船,人,和星光,
+构成了一个美丽的画面,
+它唤起一种宁静的感觉,
+在喧嚣的城市生活之外。
+"""
+```
+
+示例图片如下:
+
+cat:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+
+animal:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+
+math:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+
+poem:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+
+**单样本推理**
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+
+model_type = ModelType.yi_vl_6b_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(2)  # ...
+
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = '距离各城市多远？'
+response, history = inference(model, template, query, images=images)
+print(f'query: {query}')
+print(f'response: {response}')
+
+# 流式
+query = '距离最远的城市是哪？'
+images = images * 2
+gen = inference_stream(model, template, query, history, images=images)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: 距离各城市多远？
+response: 距离甲塔14公里,距离阳江62公里,距离广州293公里,距离广州293公里。
+query: 距离最远的城市是哪？
+response: 最远的距离是293公里。
+history: [['距离各城市多远？', '距离甲塔14公里,距离阳江62公里,距离广州293公里,距离广州293公里。'], ['距离最远的城市是哪？', '最远的距离是293公里。']]
+"""
+```
+
+示例图片如下:
+
+road:
+
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+
+
+## 微调
+多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
+
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`. 支持全参数微调.)
+```shell
+# Experimental environment: A10, 3090, V100...
+# 19GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type yi-vl-6b-chat \
+    --dataset coco-en-2-mini \
+```
+
+[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
+
+(支持多轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL)
+
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}
+```
+
+
+## 微调后推理
+直接推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source/_templates/autosummary/class.rst
+++ b/swift-main/docs/source/_templates/autosummary/class.rst
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :inherited-members:
+    :members:
+
+.. autogenerated from source/_templates/autosummary/class.rst
--- a/swift-main/docs/source/_templates/classtemplate.rst
+++ b/swift-main/docs/source/_templates/classtemplate.rst
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :members:
+    :special-members: __init__, __call__
+
+..
+  autogenerated from source/_templates/classtemplate.rst
+  note it does not have :inherited-members:
--- a/swift-main/docs/source/_templates/sobolengine.rst
+++ b/swift-main/docs/source/_templates/sobolengine.rst
+.. currentmodule:: {{ module }}
+
+
+{{ name | underline}}
+
+.. autoclass:: {{ name }}
+    :members:
+    :exclude-members: MAXBIT, MAXDIM
+    :undoc-members:
+
+
+..
+  autogenerated from source/_templates/sobolengine.rst
+  note it has specific options
--- a/swift-main/docs/source/api/swift.hub.rst
+++ b/swift-main/docs/source/api/swift.hub.rst
+swift.hub
+==============
+
+.. automodule:: swift.hub
+
+.. currentmodule:: swift.hub
+
+.. autosummary::
+    :toctree: generated
+    :nosignatures:
+    :template: classtemplate.rst
+
+    api.HubApi
+    check_model.check_local_model_is_latest
+    push_to_hub.push_to_hub
+    push_to_hub.push_to_hub_async
+    snapshot_download.snapshot_download
+    file_download.model_file_download
--- a/swift-main/docs/source/api/swift.trainers.rst
+++ b/swift-main/docs/source/api/swift.trainers.rst
+swift.trainers
+==============
+
+.. automodule:: swift.trainers
+
+.. currentmodule:: swift.trainers
+
+.. autosummary::
+    :toctree: generated
+    :nosignatures:
+    :template: classtemplate.rst
+
+    trainers.Seq2SeqTrainer
+    trainers.Trainer
--- a/swift-main/docs/source/api/swift.tuners.rst
+++ b/swift-main/docs/source/api/swift.tuners.rst
+swift.tuners
+==============
+
+.. automodule:: swift.tuners
+
+.. currentmodule:: swift.tuners
+
+.. autosummary::
+    :toctree: generated
+    :nosignatures:
+    :template: classtemplate.rst
+
+    adapter.AdapterConfig
+    base.SwiftModel
+    base.Swift
+    lora.LoRAConfig
+    prompt.PromptConfig
+    restuning.ResTuningConfig
+    side.SideConfig
+    utils.SwiftConfig
+    utils.SwiftOutput
--- a/swift-main/docs/source/conf.py
+++ b/swift-main/docs/source/conf.py
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+
+# import sphinx_book_theme
+
+sys.path.insert(0, os.path.abspath('../../'))
+# -- Project information -----------------------------------------------------
+
+project = 'swift'
+copyright = '2022-2023, Alibaba ModelScope'
+author = 'modelscope Authors'
+version_file = '../../swift/version.py'
+
+
+def get_version():
+    with open(version_file, 'r', encoding='utf-8') as f:
+        exec(compile(f.read(), version_file, 'exec'))
+    return locals()['__version__']
+
+
+# The full version, including alpha/beta/rc tags
+version = get_version()
+release = version
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.napoleon',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.autodoc',
+    'sphinx.ext.viewcode',
+    'sphinx_markdown_tables',
+    'sphinx_copybutton',
+    'myst_parser',
+]
+
+# build the templated autosummary files
+autosummary_generate = True
+numpydoc_show_class_members = False
+
+# Enable overriding of function signatures in the first line of the docstring.
+autodoc_docstring_signature = True
+
+# Disable docstring inheritance
+autodoc_inherit_docstrings = False
+
+# Show type hints in the description
+autodoc_typehints = 'description'
+
+# Add parameter types if the parameter is documented in the docstring
+autodoc_typehints_description_target = 'documented_params'
+
+autodoc_default_options = {
+    'member-order': 'bysource',
+}
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_suffix = ['.rst', '.md']
+
+# The master toctree document.
+root_doc = 'index'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['build', 'source/.ipynb_checkpoints', 'source/api/generated', 'Thumbs.db', '.DS_Store']
+# A list of glob-style patterns [1] that are used to find source files.
+# They are matched against the source file names relative to the source directory,
+# using slashes as directory separators on all platforms.
+# The default is **, meaning that all files are recursively included from the source directory.
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'sphinx_book_theme'
+# html_theme_path = [sphinx_book_theme.get_html_theme_path()]
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+# html_css_files = ['css/readthedocs.css']
+
+# -- Options for HTMLHelp output ---------------------------------------------
+# Output file base name for HTML help builder.
+
+# -- Extension configuration -------------------------------------------------
+# Ignore >>> when copying code
+copybutton_prompt_text = r'>>> |\.\.\. '
+copybutton_prompt_is_regexp = True
+
+# Example configuration for intersphinx: refer to the Python standard library.
+intersphinx_mapping = {'https://docs.python.org/': None}
--- a/swift-main/docs/source/cources/README.md
+++ b/swift-main/docs/source/cources/README.md
+The courses of this folder are transfered to [the classroom repo](https://github.com/modelscope/modelscope-classroom).