Commit 1bfbcff0 authored by wanglch's avatar wanglch
Browse files

Initial commit

parents
Pipeline #1204 canceled with stages
# Qwen-Audio Best Practice
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
pip install 'ms-swift[llm]' -U
```
## Inference
Inference with [qwen-audio-chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary):
```shell
# Experimental environment: A10, 3090, V100...
# 21GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-audio-chat
```
Output: (supports passing local path or URL)
```python
"""
<<< multi-line
[INFO:swift] End multi-line input with `#`.
[INFO:swift] Input `single-line` to switch to single-line input mode.
<<<[M] Who are you?#
I am a large language model from DAMO Academy, my name is Tongyee Qianwen.
--------------------------------------------------
<<<[M] Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/music.wav</audio>
What kind of music is this?#
This is electronic, experimental pop style music.
--------------------------------------------------
<<<[M] Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
What did this speech say?#
This speech said in Chinese: "The weather is really nice today".
--------------------------------------------------
<<<[M] Is this speech male or female?#
Based on the timbre, this speech is male.
"""
```
**Single-sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.qwen_audio_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = """Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
What did this speech say"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Is this speech male or female'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
What did this speech say
response: This speech said in Chinese: "The weather is really nice today".
query: Is this speech male or female
response: Based on the timbre, this speech is male.
history: [['Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>\nWhat did this speech say',
'This speech said in Chinese: "The weather is really nice today".'], ['Is this speech male or female', 'Based on the timbre, this speech is male.']]
"""
```
## Fine-tuning
Multimodal large model fine-tuning usually uses **custom datasets** for fine-tuning. Here shows a demo that can be run directly:
LoRA fine-tuning:
(By default, only the qkv of the LLM part is lora fine-tuned. If you want to fine-tune all linear including the audio model part, you can specify `--lora_target_modules ALL`)
```shell
# Experimental environment: A10, 3090, V100...
# 22GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen-audio-chat \
--dataset aishell1-mini-zh \
```
Full-parameter fine-tuning:
```shell
# MP
# Experimental environment: 2 * A100
# 2 * 50 GPU memory
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type qwen-audio-chat \
--dataset aishell1-mini-zh \
--sft_type full \
# ZeRO2
# Experimental environment: 4 * A100
# 2 * 80 GPU memory
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type qwen-audio-chat \
--dataset aishell1-mini-zh \
--sft_type full \
--use_flash_attn true \
--deepspeed default-zero2
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) supports json, jsonl styles, the following is an example of a custom dataset:
(Supports multi-turn conversations, supports each turn of conversation containing multiple or no audio segments, supports passing local paths or URLs)
```json
[
{"conversations": [
{"from": "user", "value": "Audio 1:<audio>audio_path</audio>\n11111"},
{"from": "assistant", "value": "22222"}
]},
{"conversations": [
{"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nAudio 2:<audio>audio_path2</audio>\nAudio 3: <audio>audio_path3</audio>\naaaaa"},
{"from": "assistant", "value": "bbbbb"},
{"from": "user", "value": "Audio 1:<audio>audio_path</audio>\nccccc"},
{"from": "assistant", "value": "ddddd"}
]},
{"conversations": [
{"from": "user", "value": "AAAAA"},
{"from": "assistant", "value": "BBBBB"},
{"from": "user", "value": "CCCCC"},
{"from": "assistant", "value": "DDDDD"}
]}
]
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
# Qwen-VL Best Practice
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference after Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
pip install 'ms-swift[llm]' -U
```
## Inference
Infer using [qwen-vl-chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary):
```shell
# Experimental environment: 3090
# 24GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-vl-chat
```
Output: (supports passing in local paths or URLs)
```python
"""
<<< multi-line
[INFO:swift] End multi-line input with `#`.
[INFO:swift] Input `single-line` to switch to single-line input mode.
<<<[M] Who are you?#
I am Tongyi Qianwen, an AI assistant developed by Alibaba Cloud. I am designed to answer various questions, provide information and converse with users. Is there anything I can help you with?
--------------------------------------------------
<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>
Picture 2:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>
What are the differences between these two pictures#
The two pictures are similar in that they are both illustrations about animals, but the animals are different.
The first picture shows sheep, while the second picture shows a kitten.
--------------------------------------------------
<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>
How many sheep are in the picture#
There are four sheep in the picture.
--------------------------------------------------
<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>
What is the calculation result#
1452 + 45304 = 46756
--------------------------------------------------
<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>
Write a poem based on the content in the picture#
Starlight sparkling on the lake surface, a lone boat's shadow still as if asleep.
The man holds up a lantern to illuminate the valley, with a kitten accompanying by his side.
"""
```
Sample images are as follows:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single Sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.qwen_vl_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = """Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
How far is it to each city?"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest away?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
How far is it to each city?
response: Malu边 is 14 km away from Malu; Yangjiang边 is 62 km away from Malu; Guangzhou边 is 293 km away from Malu.
query: Which city is the farthest away?
response: The farthest city is Guangzhou, 293 km away from Malu.
history: [['Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>\nHow far is it to each city?', 'Malu边 is 14 km away from Malu; Yangjiang边 is 62 km away from Malu; Guangzhou边 is 293 km away from Malu.'], ['Which city is the farthest away?', 'The farthest city is Guangzhou, 293 km away from Malu.']]
"""
```
Sample image is as follows:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Multimodal large model fine-tuning usually uses **custom datasets**. Here is a demo that can be run directly:
LoRA fine-tuning:
(By default, only the qkv part of the LLM is lora fine-tuned. If you want to fine-tune all linear modules including the vision model, you can specify `--lora_target_modules ALL`)
```shell
# Experimental environment: 3090
# 23GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen-vl-chat \
--dataset coco-en-mini \
```
Full parameter fine-tuning:
```shell
# Experimental environment: 2 * A100
# 2 * 55 GPU memory
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type qwen-vl-chat \
--dataset coco-en-mini \
--sft_type full \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json and jsonl formats. Here is an example of a custom dataset:
(Supports multi-turn dialogues, where each turn can contain multiple images or no images, and supports passing in local paths or URLs)
```json
[
{"conversations": [
{"from": "user", "value": "Picture 1:<img>img_path</img>\n11111"},
{"from": "assistant", "value": "22222"}
]},
{"conversations": [
{"from": "user", "value": "Picture 1:<img>img_path</img>\nPicture 2:<img>img_path2</img>\nPicture 3:<img>img_path3</img>\naaaaa"},
{"from": "assistant", "value": "bbbbb"},
{"from": "user", "value": "Picture 1:<img>img_path</img>\nccccc"},
{"from": "assistant", "value": "ddddd"}
]},
{"conversations": [
{"from": "user", "value": "AAAAA"},
{"from": "assistant", "value": "BBBBB"},
{"from": "user", "value": "CCCCC"},
{"from": "assistant", "value": "DDDDD"}
]}
]
```
## Inference after Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and infer:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
# Yi-VL Best Practice
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```
## Inference
Inference for [yi-vl-6b-chat](https://modelscope.cn/models/01ai/Yi-VL-6B/summary):
```shell
# Experimental environment: A10, 3090, V100...
# 18GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-vl-6b-chat
```
Output: (supports passing in local path or URL)
```python
"""
<<< Describe this type of image
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
The image shows a kitten sitting on the floor, eyes open, staring at the camera. The kitten looks very cute, with gray and white fur, and blue eyes. It seems to be looking at the camera, possibly curious about the surroundings.
--------------------------------------------------
<<< How many sheep are in the picture
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the image.
--------------------------------------------------
<<< clear
<<< What is the calculation result
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
1452 + 45304 = 46756
--------------------------------------------------
<<< clear
<<< Write a poem based on the content in the image
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
Night falls, starlight twinkles,
A small boat drifts on the river,
A bright lantern hangs on the bow,
Illuminating the surrounding darkness.
Two people are on the boat,
One at the bow, the other at the stern,
They seem to be talking,
Enjoying a tranquil moment under the starlight.
On the riverbank, trees stand in the dark,
Casting long shadows in the starlight.
The scene is so peaceful,
Reminiscent of an ancient legend.
The boat, the people, and the starlight,
Form a beautiful picture,
Evoking a feeling of serenity,
Beyond the hustle and bustle of city life.
"""
```
Sample images are as follows:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single Sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.yi_vl_6b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(2) # ...
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, history = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the furthest away?'
images = images * 2
gen = inference_stream(model, template, query, history, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: How far is it from each city?
response: It's 14 kilometers from Jiata, 62 kilometers from Yangjiang, 293 kilometers from Guangzhou, 293 kilometers from Guangzhou.
query: Which city is the furthest away?
response: The furthest distance is 293 kilometers.
history: [['How far is it from each city?', "It's 14 kilometers from Jiata, 62 kilometers from Yangjiang, 293 kilometers from Guangzhou, 293 kilometers from Guangzhou."], ['Which city is the furthest away?', 'The furthest distance is 293 kilometers.']]
"""
```
Sample image as follows:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Fine-tuning multimodal large models usually uses **custom datasets**. Here shows a demo that can run directly:
(By default, only the qkv of the LLM part is lora fine-tuned. If you want to fine-tune all linears including the vision model part, you can specify `--lora_target_modules ALL`. Full parameter fine-tuning is also supported.)
```shell
# Experimental environment: A10, 3090, V100...
# 19GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type yi-vl-6b-chat \
--dataset coco-en-2-mini \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json, jsonl format, here is an example of a custom dataset:
(Multi-turn dialogue is supported, each turn must include an image, which can be passed as a local path or URL)
```jsonl
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:inherited-members:
:members:
.. autogenerated from source/_templates/autosummary/class.rst
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:members:
:special-members: __init__, __call__
..
autogenerated from source/_templates/classtemplate.rst
note it does not have :inherited-members:
.. currentmodule:: {{ module }}
{{ name | underline}}
.. autoclass:: {{ name }}
:members:
:exclude-members: MAXBIT, MAXDIM
:undoc-members:
..
autogenerated from source/_templates/sobolengine.rst
note it has specific options
swift.hub
==============
.. automodule:: swift.hub
.. currentmodule:: swift.hub
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
api.HubApi
check_model.check_local_model_is_latest
push_to_hub.push_to_hub
push_to_hub.push_to_hub_async
snapshot_download.snapshot_download
file_download.model_file_download
swift.trainers
==============
.. automodule:: swift.trainers
.. currentmodule:: swift.trainers
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
trainers.Seq2SeqTrainer
trainers.Trainer
swift.tuners
==============
.. automodule:: swift.tuners
.. currentmodule:: swift.tuners
.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst
adapter.AdapterConfig
base.SwiftModel
base.Swift
lora.LoRAConfig
prompt.PromptConfig
restuning.ResTuningConfig
side.SideConfig
utils.SwiftConfig
utils.SwiftOutput
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
# import sphinx_book_theme
sys.path.insert(0, os.path.abspath('../../'))
# -- Project information -----------------------------------------------------
project = 'swift'
copyright = '2022-2023, Alibaba ModelScope'
author = 'modelscope Authors'
version_file = '../../swift/version.py'
def get_version():
with open(version_file, 'r', encoding='utf-8') as f:
exec(compile(f.read(), version_file, 'exec'))
return locals()['__version__']
# The full version, including alpha/beta/rc tags
version = get_version()
release = version
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.napoleon',
'sphinx.ext.autosummary',
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx_markdown_tables',
'sphinx_copybutton',
'myst_parser',
]
# build the templated autosummary files
autosummary_generate = True
numpydoc_show_class_members = False
# Enable overriding of function signatures in the first line of the docstring.
autodoc_docstring_signature = True
# Disable docstring inheritance
autodoc_inherit_docstrings = False
# Show type hints in the description
autodoc_typehints = 'description'
# Add parameter types if the parameter is documented in the docstring
autodoc_typehints_description_target = 'documented_params'
autodoc_default_options = {
'member-order': 'bysource',
}
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ['.rst', '.md']
# The master toctree document.
root_doc = 'index'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['build', 'source_en/.ipynb_checkpoints', 'source_en/api/generated', 'Thumbs.db', '.DS_Store']
# A list of glob-style patterns [1] that are used to find source files.
# They are matched against the source file names relative to the source directory,
# using slashes as directory separators on all platforms.
# The default is **, meaning that all files are recursively included from the source directory.
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'sphinx_book_theme'
# html_theme_path = [sphinx_book_theme.get_html_theme_path()]
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# html_css_files = ['css/readthedocs.css']
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
# -- Extension configuration -------------------------------------------------
# Ignore >>> when copying code
copybutton_prompt_text = r'>>> |\.\.\. '
copybutton_prompt_is_regexp = True
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'https://docs.python.org/': None}
The courses of this folder are transfered to [the classroom repo](https://github.com/modelscope/modelscope-classroom).
.. swift documentation file,
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Swift DOCUMENTATION
========================
.. toctree::
:maxdepth: 2
:caption: Get Started
GetStarted/Installation.md
GetStarted/Web-ui.md
GetStarted/Tuners.md
GetStarted/ResTuning.md
GetStarted/SCEdit.md
GetStarted/Use-PEFT.md
.. toctree::
:maxdepth: 2
:caption: LLM Training and Inference
LLM/LLM-fine-tuning.md
LLM/LLM-inference.md
LLM/DPO.md
LLM/LLM-eval.md
LLM/LLM-quantization.md
LLM/VLLM-inference-acceleration-and-deployment.md
LLM/LLM-exp.md
LLM/Command-line-parameters.md
LLM/Supported-models-datasets.md
LLM/Customization.md
LLM/Self-cognition-best-practice.md
LLM/Agent-fine-tuning-best-practice.md
LLM/Agent-deployment-best-practice.md
LLM/Qwen1.5-best-practice.md
LLM/Grok-1-best-practice.md
LLM/ORPO.md
LLM/SimPO.md
LLM/Compat-HF.md
LLM/Benchmark.md
.. toctree::
:maxdepth: 2
:caption: Multi-Modal LLM Training and Inference
Multi-Modal/qwen-vl-best-practice.md
Multi-Modal/qwen-audio-best-practice.md
Multi-Modal/deepseek-vl-best-practice.md
Multi-Modal/internlm-xcomposer2-best-practice.md
Multi-Modal/phi3-vision-best-practice.md
Multi-Modal/llava-best-practice.md
Multi-Modal/yi-vl-best-practice.md
Multi-Modal/cogvlm-best-practice.md
Multi-Modal/cogvlm2-best-practice.md
Multi-Modal/minicpm-v-best-practice.md
Multi-Modal/internvl-best-practice.md
Multi-Modal/mutlimodal-deployment.md
.. toctree::
:maxdepth: 2
:caption: AIGC Training and Inference
AIGC/AnimateDiff-train-infer.md
.. toctree::
:maxdepth: 2
:caption: API Doc
Hub <api/swift.hub>
Trainer <api/swift.trainers>
Tuner <api/swift.tuners>
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
# Copyright (c) Alibaba, Inc. and its affiliates.
from swift.aigc import animatediff_infer_main
if __name__ == '__main__':
animatediff_infer_main()
# Copyright (c) Alibaba, Inc. and its affiliates.
from swift.aigc import animatediff_main
if __name__ == '__main__':
animatediff_main()
# Experimental environment: A100
# 18GB GPU memory
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0 \
python animatediff_infer.py \
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
--sft_type full \
--ckpt_dir /output/path/like/checkpoints/iter-xxx \
# Experimental environment: A100 * 4
# 200GB GPU memory totally
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun --nproc_per_node=4 animatediff_sft.py \
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
--csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
--video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
--sft_type full \
--lr_scheduler_type constant \
--trainable_modules .*motion_modules.* \
--batch_size 4 \
--eval_steps 100 \
--gradient_accumulation_steps 16 \
# Experimental environment: A100
# 18GB GPU memory
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0 \
python animatediff_infer.py \
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
--motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
--sft_type lora \
--ckpt_dir /output/path/like/checkpoints/iter-xxx \
# Experimental environment: A100
# 20GB GPU memory
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0 \
python animatediff_sft.py \
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
--csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
--video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
--motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
--sft_type lora \
--lr_scheduler_type constant \
--trainable_modules .*motion_modules.* \
--batch_size 1 \
--eval_steps 200 \
--dataset_sample_size 10000 \
--gradient_accumulation_steps 16 \
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment