Initial commit

1bfbcff0 · wanglch · 1bfbcff0 · 1bfbcff0 · 1bfbcff0 · 1bfbcff0
Commit 1bfbcff0 authored Jun 13, 2024 by wanglch
20 changed files
--- a/swift-main/docs/source_en/Multi-Modal/qwen-audio-best-practice.md
+++ b/swift-main/docs/source_en/Multi-Modal/qwen-audio-best-practice.md
+# Qwen-Audio Best Practice
+## Table of Contents
+- [Environment Setup](#environment-setup)
+- [Inference](#inference)
+- [Fine-tuning](#fine-tuning)
+- [Inference After Fine-tuning](#inference-after-fine-tuning)
+## Environment Setup
+```shell
+pip install 'ms-swift[llm]' -U
+```
+## Inference
+Inference with [qwen-audio-chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary):
+```shell
+# Experimental environment: A10, 3090, V100...
+# 21GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-audio-chat
+```
+Output: (supports passing local path or URL)
+```python
+"""
+<<< multi-line
+[INFO:swift] End multi-line input with `#`.
+[INFO:swift] Input `single-line` to switch to single-line input mode.
+<<<[M] Who are you?#
+I am a large language model from DAMO Academy, my name is Tongyee Qianwen.
+--------------------------------------------------
+<<<[M] Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/music.wav</audio>
+What kind of music is this?#
+This is electronic, experimental pop style music.
+--------------------------------------------------
+<<<[M] Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
+What did this speech say?#
+This speech said in Chinese: "The weather is really nice today".
+--------------------------------------------------
+<<<[M] Is this speech male or female?#
+Based on the timbre, this speech is male.
+"""
+```
+**Single-sample Inference**
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+model_type = ModelType.qwen_audio_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+query = """Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
+What did this speech say"""
+response, history = inference(model, template, query)
+print(f'query: {query}')
+print(f'response: {response}')
+# Streaming
+query = 'Is this speech male or female'
+gen = inference_stream(model, template, query, history)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
+What did this speech say
+response: This speech said in Chinese: "The weather is really nice today".
+query: Is this speech male or female
+response: Based on the timbre, this speech is male.
+history: [['Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>\nWhat did this speech say',
+'This speech said in Chinese: "The weather is really nice today".'], ['Is this speech male or female', 'Based on the timbre, this speech is male.']]
+"""
+```
+## Fine-tuning
+Multimodal large model fine-tuning usually uses **custom datasets** for fine-tuning. Here shows a demo that can be run directly:
+LoRA fine-tuning:
+(By default, only the qkv of the LLM part is lora fine-tuned. If you want to fine-tune all linear including the audio model part, you can specify `--lora_target_modules ALL`)
+```shell
+# Experimental environment: A10, 3090, V100...
+# 22GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+```
+Full-parameter fine-tuning:
+```shell
+# MP
+# Experimental environment: 2 * A100
+# 2 * 50 GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+    --sft_type full \
+# ZeRO2
+# Experimental environment: 4 * A100
+# 2 * 80 GPU memory
+NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+    --sft_type full \
+    --use_flash_attn true \
+    --deepspeed default-zero2
+```
+[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments)  supports json, jsonl styles, the following is an example of a custom dataset:
+(Supports multi-turn conversations, supports each turn of conversation containing multiple or no audio segments, supports passing local paths or URLs)
+```json
+[
+    {"conversations": [
+        {"from": "user", "value": "Audio 1:<audio>audio_path</audio>\n11111"},
+        {"from": "assistant", "value": "22222"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nAudio 2:<audio>audio_path2</audio>\nAudio 3: <audio>audio_path3</audio>\naaaaa"},
+        {"from": "assistant", "value": "bbbbb"},
+        {"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nccccc"},
+        {"from": "assistant", "value": "ddddd"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "AAAAA"},
+        {"from": "assistant", "value": "BBBBB"},
+        {"from": "user", "value": "CCCCC"},
+        {"from": "assistant", "value": "DDDDD"}
+    ]}
+]
+```
+## Inference After Fine-tuning
+Direct inference:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+**merge-lora** and inference:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source_en/Multi-Modal/qwen-vl-best-practice.md
+++ b/swift-main/docs/source_en/Multi-Modal/qwen-vl-best-practice.md
+# Qwen-VL Best Practice
+## Table of Contents
+- [Environment Setup](#environment-setup)
+- [Inference](#inference)
+- [Fine-tuning](#fine-tuning)
+- [Inference after Fine-tuning](#inference-after-fine-tuning)
+## Environment Setup
+```shell
+pip install 'ms-swift[llm]' -U
+```
+## Inference
+Infer using [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary):
+```shell
+# Experimental environment: 3090
+# 24GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-vl-chat
+```
+Output: (supports passing in local paths or URLs)
+```python
+"""
+<<< multi-line
+[INFO:swift] End multi-line input with `#`.
+[INFO:swift] Input `single-line` to switch to single-line input mode.
+<<<[M] Who are you?#
+I am Tongyi Qianwen, an AI assistant developed by Alibaba Cloud. I am designed to answer various questions, provide information and converse with users. Is there anything I can help you with?
+--------------------------------------------------
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>
+Picture 2:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>
+What are the differences between these two pictures#
+The two pictures are similar in that they are both illustrations about animals, but the animals are different.
+The first picture shows sheep, while the second picture shows a kitten.
+--------------------------------------------------
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>
+How many sheep are in the picture#
+There are four sheep in the picture.
+--------------------------------------------------
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>
+What is the calculation result#
+1452 + 45304 = 46756
+--------------------------------------------------
+<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>
+Write a poem based on the content in the picture#
+Starlight sparkling on the lake surface, a lone boat's shadow still as if asleep.
+The man holds up a lantern to illuminate the valley, with a kitten accompanying by his side.
+"""
+```
+Sample images are as follows:
+cat:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+animal:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+math:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+poem:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+**Single Sample Inference**
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+model_type = ModelType.qwen_vl_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+query = """Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
+How far is it to each city?"""
+response, history = inference(model, template, query)
+print(f'query: {query}')
+print(f'response: {response}')
+# Streaming
+query = 'Which city is the farthest away?'
+gen = inference_stream(model, template, query, history)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
+How far is it to each city?
+response: Malu边 is 14 km away from Malu; Yangjiang边 is 62 km away from Malu; Guangzhou边 is 293 km away from Malu.
+query: Which city is the farthest away?
+response: The farthest city is Guangzhou, 293 km away from Malu.
+history: [['Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>\nHow far is it to each city?', 'Malu边 is 14 km away from Malu; Yangjiang边 is 62 km away from Malu; Guangzhou边 is 293 km away from Malu.'], ['Which city is the farthest away?', 'The farthest city is Guangzhou, 293 km away from Malu.']]
+"""
+```
+Sample image is as follows:
+road:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+## Fine-tuning
+Multimodal large model fine-tuning usually uses **custom datasets**. Here is a demo that can be run directly:
+LoRA fine-tuning:
+(By default, only the qkv part of the LLM is lora fine-tuned. If you want to fine-tune all linear modules including the vision model, you can specify `--lora_target_modules ALL`)
+```shell
+# Experimental environment: 3090
+# 23GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type qwen-vl-chat \
+    --dataset coco-en-mini \
+```
+Full parameter fine-tuning:
+```shell
+# Experimental environment: 2 * A100
+# 2 * 55 GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type qwen-vl-chat \
+    --dataset coco-en-mini \
+    --sft_type full \
+```
+[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments)  support json and jsonl formats. Here is an example of a custom dataset:
+(Supports multi-turn dialogues, where each turn can contain multiple images or no images, and supports passing in local paths or URLs)
+```json
+[
+    {"conversations": [
+        {"from": "user", "value": "Picture 1:<img>img_path</img>\n11111"},
+        {"from": "assistant", "value": "22222"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "Picture 1:<img>img_path</img>\nPicture 2:<img>img_path2</img>\nPicture 3:<img>img_path3</img>\naaaaa"},
+        {"from": "assistant", "value": "bbbbb"},
+        {"from": "user", "value": "Picture 1:<img>img_path</img>\nccccc"},
+        {"from": "assistant", "value": "ddddd"}
+    ]},
+    {"conversations": [
+        {"from": "user", "value": "AAAAA"},
+        {"from": "assistant", "value": "BBBBB"},
+        {"from": "user", "value": "CCCCC"},
+        {"from": "assistant", "value": "DDDDD"}
+    ]}
+]
+```
+## Inference after Fine-tuning
+Direct inference:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+**merge-lora** and infer:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source_en/Multi-Modal/yi-vl-best-practice.md
+++ b/swift-main/docs/source_en/Multi-Modal/yi-vl-best-practice.md
+# Yi-VL Best Practice
+## Table of Contents
+- [Environment Setup](#environment-setup)
+- [Inference](#inference)
+- [Fine-tuning](#fine-tuning)
+- [Inference After Fine-tuning](#inference-after-fine-tuning)
+## Environment Setup
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+```
+## Inference
+Inference for [yi-vl-6b-chat](https://modelscope.cn/models/01ai/Yi-VL-6B/summary):
+```shell
+# Experimental environment: A10, 3090, V100...
+# 18GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-vl-6b-chat
+```
+Output: (supports passing in local path or URL)
+```python
+"""
+<<< Describe this type of image
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+The image shows a kitten sitting on the floor, eyes open, staring at the camera. The kitten looks very cute, with gray and white fur, and blue eyes. It seems to be looking at the camera, possibly curious about the surroundings.
+--------------------------------------------------
+<<< How many sheep are in the picture
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+There are four sheep in the image.
+--------------------------------------------------
+<<< clear
+<<< What is the calculation result
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
+1452 + 45304 = 46756
+--------------------------------------------------
+<<< clear
+<<< Write a poem based on the content in the image
+Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
+Night falls, starlight twinkles,
+A small boat drifts on the river,
+A bright lantern hangs on the bow,
+Illuminating the surrounding darkness.
+Two people are on the boat,
+One at the bow, the other at the stern,
+They seem to be talking,
+Enjoying a tranquil moment under the starlight.
+On the riverbank, trees stand in the dark,
+Casting long shadows in the starlight.
+The scene is so peaceful,
+Reminiscent of an ancient legend.
+The boat, the people, and the starlight,
+Form a beautiful picture,
+Evoking a feeling of serenity,
+Beyond the hustle and bustle of city life.
+"""
+```
+Sample images are as follows:
+cat:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
+animal:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
+math:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
+poem:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
+**Single Sample Inference**
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+)
+from swift.utils import seed_everything
+import torch
+model_type = ModelType.yi_vl_6b_chat
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+model, tokenizer = get_model_tokenizer(model_type, torch.float16,
+                                       model_kwargs={'device_map': 'auto'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(2)  # ...
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
+query = 'How far is it from each city?'
+response, history = inference(model, template, query, images=images)
+print(f'query: {query}')
+print(f'response: {response}')
+# Streaming
+query = 'Which city is the furthest away?'
+images = images * 2
+gen = inference_stream(model, template, query, history, images=images)
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
+"""
+query: How far is it from each city?
+response: It's 14 kilometers from Jiata, 62 kilometers from Yangjiang, 293 kilometers from Guangzhou, 293 kilometers from Guangzhou.
+query: Which city is the furthest away?
+response: The furthest distance is 293 kilometers.
+history: [['How far is it from each city?', "It's 14 kilometers from Jiata, 62 kilometers from Yangjiang, 293 kilometers from Guangzhou, 293 kilometers from Guangzhou."], ['Which city is the furthest away?', 'The furthest distance is 293 kilometers.']]
+"""
+```
+Sample image as follows:
+road:
+<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
+## Fine-tuning
+Fine-tuning multimodal large models usually uses **custom datasets**. Here shows a demo that can run directly:
+(By default, only the qkv of the LLM part is lora fine-tuned. If you want to fine-tune all linears including the vision model part, you can specify `--lora_target_modules ALL`. Full parameter fine-tuning is also supported.)
+```shell
+# Experimental environment: A10, 3090, V100...
+# 19GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+    --model_type yi-vl-6b-chat \
+    --dataset coco-en-2-mini \
+```
+[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments)  support json, jsonl format, here is an example of a custom dataset:
+(Multi-turn dialogue is supported, each turn must include an image, which can be passed as a local path or URL)
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}
+```
+## Inference After Fine-tuning
+Direct inference:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
+    --load_dataset_config true \
+```
+**merge-lora** and inference:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
--- a/swift-main/docs/source_en/_templates/autosummary/class.rst
+++ b/swift-main/docs/source_en/_templates/autosummary/class.rst
+.. currentmodule:: {{ module }}
+{{ name | underline}}
+.. autoclass:: {{ name }}
+    :inherited-members:
+    :members:
+.. autogenerated from source/_templates/autosummary/class.rst
--- a/swift-main/docs/source_en/_templates/classtemplate.rst
+++ b/swift-main/docs/source_en/_templates/classtemplate.rst
+.. currentmodule:: {{ module }}
+{{ name | underline}}
+.. autoclass:: {{ name }}
+    :members:
+    :special-members: __init__, __call__
+..
+  autogenerated from source/_templates/classtemplate.rst
+  note it does not have :inherited-members:
--- a/swift-main/docs/source_en/_templates/sobolengine.rst
+++ b/swift-main/docs/source_en/_templates/sobolengine.rst
+.. currentmodule:: {{ module }}
+{{ name | underline}}
+.. autoclass:: {{ name }}
+    :members:
+    :exclude-members: MAXBIT, MAXDIM
+    :undoc-members:
+..
+  autogenerated from source/_templates/sobolengine.rst
+  note it has specific options
--- a/swift-main/docs/source_en/api/swift.hub.rst
+++ b/swift-main/docs/source_en/api/swift.hub.rst
+swift.hub
+==============
+.. automodule:: swift.hub
+.. currentmodule:: swift.hub
+.. autosummary::
+    :toctree: generated
+    :nosignatures:
+    :template: classtemplate.rst
+    api.HubApi
+    check_model.check_local_model_is_latest
+    push_to_hub.push_to_hub
+    push_to_hub.push_to_hub_async
+    snapshot_download.snapshot_download
+    file_download.model_file_download
--- a/swift-main/docs/source_en/api/swift.trainers.rst
+++ b/swift-main/docs/source_en/api/swift.trainers.rst
+swift.trainers
+==============
+.. automodule:: swift.trainers
+.. currentmodule:: swift.trainers
+.. autosummary::
+    :toctree: generated
+    :nosignatures:
+    :template: classtemplate.rst
+    trainers.Seq2SeqTrainer
+    trainers.Trainer
--- a/swift-main/docs/source_en/api/swift.tuners.rst
+++ b/swift-main/docs/source_en/api/swift.tuners.rst
+swift.tuners
+==============
+.. automodule:: swift.tuners
+.. currentmodule:: swift.tuners
+.. autosummary::
+    :toctree: generated
+    :nosignatures:
+    :template: classtemplate.rst
+    adapter.AdapterConfig
+    base.SwiftModel
+    base.Swift
+    lora.LoRAConfig
+    prompt.PromptConfig
+    restuning.ResTuningConfig
+    side.SideConfig
+    utils.SwiftConfig
+    utils.SwiftOutput
--- a/swift-main/docs/source_en/conf.py
+++ b/swift-main/docs/source_en/conf.py
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+# -- Path setup --------------------------------------------------------------
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+# import sphinx_book_theme
+sys.path.insert(0, os.path.abspath('../../'))
+# -- Project information -----------------------------------------------------
+project = 'swift'
+copyright = '2022-2023, Alibaba ModelScope'
+author = 'modelscope Authors'
+version_file = '../../swift/version.py'
+def get_version():
+    with open(version_file, 'r', encoding='utf-8') as f:
+        exec(compile(f.read(), version_file, 'exec'))
+    return locals()['__version__']
+# The full version, including alpha/beta/rc tags
+version = get_version()
+release = version
+# -- General configuration ---------------------------------------------------
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.napoleon',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.autodoc',
+    'sphinx.ext.viewcode',
+    'sphinx_markdown_tables',
+    'sphinx_copybutton',
+    'myst_parser',
+]
+# build the templated autosummary files
+autosummary_generate = True
+numpydoc_show_class_members = False
+# Enable overriding of function signatures in the first line of the docstring.
+autodoc_docstring_signature = True
+# Disable docstring inheritance
+autodoc_inherit_docstrings = False
+# Show type hints in the description
+autodoc_typehints = 'description'
+# Add parameter types if the parameter is documented in the docstring
+autodoc_typehints_description_target = 'documented_params'
+autodoc_default_options = {
+    'member-order': 'bysource',
+}
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_suffix = ['.rst', '.md']
+# The master toctree document.
+root_doc = 'index'
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['build', 'source_en/.ipynb_checkpoints', 'source_en/api/generated', 'Thumbs.db', '.DS_Store']
+# A list of glob-style patterns [1] that are used to find source files.
+# They are matched against the source file names relative to the source directory,
+# using slashes as directory separators on all platforms.
+# The default is **, meaning that all files are recursively included from the source directory.
+# -- Options for HTML output -------------------------------------------------
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'sphinx_book_theme'
+# html_theme_path = [sphinx_book_theme.get_html_theme_path()]
+# html_theme_options = {}
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+# html_css_files = ['css/readthedocs.css']
+# -- Options for HTMLHelp output ---------------------------------------------
+# Output file base name for HTML help builder.
+# -- Extension configuration -------------------------------------------------
+# Ignore >>> when copying code
+copybutton_prompt_text = r'>>> |\.\.\. '
+copybutton_prompt_is_regexp = True
+# Example configuration for intersphinx: refer to the Python standard library.
+intersphinx_mapping = {'https://docs.python.org/': None}
--- a/swift-main/docs/source_en/cources/README.md
+++ b/swift-main/docs/source_en/cources/README.md
+The courses of this folder are transfered to [the classroom repo](https://github.com/modelscope/modelscope-classroom).
--- a/swift-main/docs/source_en/index.rst
+++ b/swift-main/docs/source_en/index.rst
+.. swift documentation file,
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+Swift DOCUMENTATION
+========================
+.. toctree::
+   :maxdepth: 2
+   :caption: Get Started
+   GetStarted/Installation.md
+   GetStarted/Web-ui.md
+   GetStarted/Tuners.md
+   GetStarted/ResTuning.md
+   GetStarted/SCEdit.md
+   GetStarted/Use-PEFT.md
+.. toctree::
+   :maxdepth: 2
+   :caption: LLM Training and Inference
+   LLM/LLM-fine-tuning.md
+   LLM/LLM-inference.md
+   LLM/DPO.md
+   LLM/LLM-eval.md
+   LLM/LLM-quantization.md
+   LLM/VLLM-inference-acceleration-and-deployment.md
+   LLM/LLM-exp.md
+   LLM/Command-line-parameters.md
+   LLM/Supported-models-datasets.md
+   LLM/Customization.md
+   LLM/Self-cognition-best-practice.md
+   LLM/Agent-fine-tuning-best-practice.md
+   LLM/Agent-deployment-best-practice.md
+   LLM/Qwen1.5-best-practice.md
+   LLM/Grok-1-best-practice.md
+   LLM/ORPO.md
+   LLM/SimPO.md
+   LLM/Compat-HF.md
+   LLM/Benchmark.md
+.. toctree::
+   :maxdepth: 2
+   :caption: Multi-Modal LLM Training and Inference
+   Multi-Modal/qwen-vl-best-practice.md
+   Multi-Modal/qwen-audio-best-practice.md
+   Multi-Modal/deepseek-vl-best-practice.md
+   Multi-Modal/internlm-xcomposer2-best-practice.md
+   Multi-Modal/phi3-vision-best-practice.md
+   Multi-Modal/llava-best-practice.md
+   Multi-Modal/yi-vl-best-practice.md
+   Multi-Modal/cogvlm-best-practice.md
+   Multi-Modal/cogvlm2-best-practice.md
+   Multi-Modal/minicpm-v-best-practice.md
+   Multi-Modal/internvl-best-practice.md
+   Multi-Modal/mutlimodal-deployment.md
+.. toctree::
+   :maxdepth: 2
+   :caption: AIGC Training and Inference
+   AIGC/AnimateDiff-train-infer.md
+.. toctree::
+   :maxdepth: 2
+   :caption: API Doc
+   Hub <api/swift.hub>
+   Trainer <api/swift.trainers>
+   Tuner <api/swift.tuners>
+Indices and tables
+==================
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
--- a/swift-main/examples/pytorch/animatediff/animatediff_infer.py
+++ b/swift-main/examples/pytorch/animatediff/animatediff_infer.py
+# Copyright (c) Alibaba, Inc. and its affiliates.
+from swift.aigc import animatediff_infer_main
+if __name__ == '__main__':
+    animatediff_infer_main()
--- a/swift-main/examples/pytorch/animatediff/animatediff_sft.py
+++ b/swift-main/examples/pytorch/animatediff/animatediff_sft.py
+# Copyright (c) Alibaba, Inc. and its affiliates.
+from swift.aigc import animatediff_main
+if __name__ == '__main__':
+    animatediff_main()
--- a/swift-main/examples/pytorch/animatediff/scripts/full/infer.sh
+++ b/swift-main/examples/pytorch/animatediff/scripts/full/infer.sh
+# Experimental environment: A100
+# 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_infer.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --sft_type full \
+  --ckpt_dir /output/path/like/checkpoints/iter-xxx \
--- a/swift-main/examples/pytorch/animatediff/scripts/full/sft.sh
+++ b/swift-main/examples/pytorch/animatediff/scripts/full/sft.sh
+# Experimental environment: A100 * 4
+# 200GB GPU memory totally
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+torchrun --nproc_per_node=4 animatediff_sft.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
+  --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
+  --sft_type full \
+  --lr_scheduler_type constant \
+  --trainable_modules .*motion_modules.* \
+  --batch_size 4 \
+  --eval_steps 100 \
+  --gradient_accumulation_steps 16 \
--- a/swift-main/examples/pytorch/animatediff/scripts/lora/infer.sh
+++ b/swift-main/examples/pytorch/animatediff/scripts/lora/infer.sh
+# Experimental environment: A100
+# 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_infer.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
+  --sft_type lora \
+  --ckpt_dir /output/path/like/checkpoints/iter-xxx \
--- a/swift-main/examples/pytorch/animatediff/scripts/lora/sft.sh
+++ b/swift-main/examples/pytorch/animatediff/scripts/lora/sft.sh
+# Experimental environment: A100
+# 20GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_sft.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
+  --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
+  --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
+  --sft_type lora \
+  --lr_scheduler_type constant \
+  --trainable_modules .*motion_modules.* \
+  --batch_size 1 \
+  --eval_steps 200 \
+  --dataset_sample_size 10000 \
+  --gradient_accumulation_steps 16 \
--- a/swift-main/examples/pytorch/cv/notebook/resources/images/OxfordFlowers102_image_00001.jpeg
+++ b/swift-main/examples/pytorch/cv/notebook/resources/images/OxfordFlowers102_image_00001.jpeg
--- a/swift-main/examples/pytorch/cv/notebook/resources/images/adapter.png
+++ b/swift-main/examples/pytorch/cv/notebook/resources/images/adapter.png