first

f7db21eb · lvzhen · f7db21eb · f7db21eb · f7db21eb · f7db21eb
Commit f7db21eb authored Aug 22, 2024 by lvzhen
20 changed files
--- a/ms-swift/docs/resources/scedit_boy3.png
+++ b/ms-swift/docs/resources/scedit_boy3.png
--- a/ms-swift/docs/resources/scedit_boy4.png
+++ b/ms-swift/docs/resources/scedit_boy4.png
--- a/ms-swift/docs/resources/scedit_boy5.png
+++ b/ms-swift/docs/resources/scedit_boy5.png
--- a/ms-swift/docs/resources/scedit_boy6.png
+++ b/ms-swift/docs/resources/scedit_boy6.png
--- a/ms-swift/docs/resources/simpo1.png
+++ b/ms-swift/docs/resources/simpo1.png
--- a/ms-swift/docs/resources/simpo2.png
+++ b/ms-swift/docs/resources/simpo2.png
--- a/ms-swift/docs/resources/simpo3.png
+++ b/ms-swift/docs/resources/simpo3.png
--- a/ms-swift/docs/resources/simpo4.png
+++ b/ms-swift/docs/resources/simpo4.png
--- a/ms-swift/docs/resources/vdpo_data.png
+++ b/ms-swift/docs/resources/vdpo_data.png
--- a/ms-swift/docs/resources/web-ui-en.jpg
+++ b/ms-swift/docs/resources/web-ui-en.jpg
--- a/ms-swift/docs/resources/web-ui.png
+++ b/ms-swift/docs/resources/web-ui.png
--- a/ms-swift/docs/resources/xinference.jpg
+++ b/ms-swift/docs/resources/xinference.jpg
--- a/ms-swift/docs/source/.readthedocs.yaml
+++ b/ms-swift/docs/source/.readthedocs.yaml
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the OS, Python version and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.12"
+
+# Build documentation in the "docs/" directory with Sphinx
+sphinx:
+  configuration: docs/source/conf.py
+
+# Optionally build your docs in additional formats such as PDF and ePub
+# formats:
+#    - pdf
+#    - epub
+
+# Optional but recommended, declare the Python requirements required
+# to build your documentation
+# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+   install:
+      - requirements: requirements/docs.txt
+      - requirements: requirements/framework.txt
+      - requirements: requirements/llm.txt
--- a/ms-swift/docs/source/AIGC/AnimateDiff微调推理文档.md
+++ b/ms-swift/docs/source/AIGC/AnimateDiff微调推理文档.md
+# AnimateDiff的微调和推理
+
+SWIFT已经支持了AnimateDiff的微调和推理，目前支持两种方式：全参数微调和LoRA微调。
+
+首先需要clone并安装SWIFT：
+
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install ".[aigc]"
+```
+
+## 全参数训练
+
+### 训练效果
+
+全参数微调可以复现[官方提供的模型animatediff-motion-adapter-v1-5-2](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2/summary)的效果，需要的短视频数量较多，魔搭官方复现使用了官方数据集的subset版本：[WebVid 2.5M](https://maxbain.com/webvid-dataset/)。训练效果如下：
+
+```text
+Prompt:masterpiece, bestquality, highlydetailed, ultradetailed, girl, walking, on the street, flowers
+```
+
+
+
+![image.png](../../resources/1.gif)
+
+```text
+Prompt: masterpiece, bestquality, highlydetailed, ultradetailed, beautiful house, mountain, snow top
+```
+
+![image.png](../../resources/2.gif)
+
+2.5M子数据集训练的生成效果仍存在效果不稳定的情况，开发者使用10M数据集效果会更稳定。
+
+### 运行命令
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/full中
+# Experimental environment: A100 * 4
+# 200GB GPU memory totally
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+torchrun --nproc_per_node=4 animatediff_sft.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
+  --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
+  --sft_type full \
+  --lr_scheduler_type constant \
+  --trainable_modules .*motion_modules.* \
+  --batch_size 4 \
+  --eval_steps 100 \
+  --gradient_accumulation_steps 16
+```
+
+我们使用了A100 * 4进行训练，共需要200GB显存，训练时长约40小时。数据格式如下：
+
+```text
+--csv_path 传入一个csv文件，该csv文件应包含如下格式：
+name,contentUrl
+Travel blogger shoot a story on top of mountains. young man holds camera in forest.,stock-footage-travel-blogger-shoot-a-story-on-top-of-mountains-young-man-holds-camera-in-forest.mp4
+```
+
+name字段代表该短视频的prompt，contentUrl代表该视频文件的名称
+
+```text
+--video_folder 传入一个视频目录，该目录中包含了csv文件中，contentUrl指代的所有视频文件
+```
+
+使用全参数进行推理方式如下：
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/full中
+# Experimental environment: A100
+# 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_infer.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --sft_type full \
+  --ckpt_dir /output/path/like/checkpoints/iter-xxx \
+  --eval_human true
+```
+
+其中的--ckpt_dir 传入训练时输出的文件夹即可。
+
+## LoRA训练
+
+### 运行命令
+
+全参数训练会从0开始训练整个Motion-Adapter结构，用户可以使用现有的模型使用少量视频进行微调，只需要运行下面的命令：
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/lora中
+# Experimental environment: A100
+# 20GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_sft.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
+  --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
+  --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
+  --sft_type lora \
+  --lr_scheduler_type constant \
+  --trainable_modules .*motion_modules.* \
+  --batch_size 1 \
+  --eval_steps 200 \
+  --dataset_sample_size 10000 \
+  --gradient_accumulation_steps 16
+```
+
+视频数据参数同上。
+
+推理命令如下：
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/lora中
+# Experimental environment: A100
+# 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_infer.py \
+  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+  --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
+  --sft_type lora \
+  --ckpt_dir /output/path/like/checkpoints/iter-xxx \
+  --eval_human true
+```
+
+其中的--ckpt_dir 传入训练时输出的文件夹即可。
+
+## 参数列表
+
+下面给出训练和推理分别支持的参数列表及其含义：
+
+### 训练参数
+
+```text
+motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径，指定这个参数可以基于现有的官方模型效果继续训练
+motion_adapter_revision: Optional[str] = None # motion adapter的模型revision，仅在motion_adapter_id_or_path是模型id时有用
+
+model_id_or_path: str = None # sd基模型的模型id或模型路径
+model_revision: str = None # sd基模型的revision，仅在model_id_or_path是模型id时有用
+
+dataset_sample_size: int = None # 数据集训练条数，默认代表全量训练
+
+sft_type: str = field(
+    default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式，支持lora和全参数
+
+output_dir: str = 'output' # 输出文件夹
+ddp_backend: str = field(
+    default='nccl', metadata={'choices': ['nccl', 'gloo', 'mpi', 'ccl']}) # 如使用ddp训练，ddp backend
+
+seed: int = 42 # 随机种子
+
+lora_rank: int = 8 # lora 参数
+lora_alpha: int = 32 # lora 参数
+lora_dropout: float = 0.05 # lora 参数
+lora_dtype: str = 'fp32' # lora模块dtype类型，如果为`AUTO`则跟随原始模块的dtype设定
+
+gradient_checkpointing: bool = False # 是否开启gc，默认不开启。注：当前版本diffusers有问题，不支持该参数为True
+batch_size: int = 1 # batchsize
+num_train_epochs: int = 1 # epoch数
+# if max_steps >= 0, override num_train_epochs
+learning_rate: Optional[float] = None # 学习率
+weight_decay: float = 0.01 # adamw参数
+gradient_accumulation_steps: int = 16 # ga大小
+max_grad_norm: float = 1. # grad norm大小
+lr_scheduler_type: str = 'cosine' # lr_scheduler的类型
+warmup_ratio: float = 0.05 # 是否warmup及warmup占比
+
+eval_steps: int = 50 # eval step间隔
+save_steps: Optional[int] = None # save step间隔
+dataloader_num_workers: int = 1 # dataloader workers数量
+
+push_to_hub: bool = False # 是否推送到modelhub
+# 'user_name/repo_name' or 'repo_name'
+hub_model_id: Optional[str] = None # modelhub id
+hub_private_repo: bool = False
+push_hub_strategy: str = field( # 推送策略，推送最后一个还是每个都推送
+    default='push_best',
+    metadata={'choices': ['push_last', 'all_checkpoints']})
+# None: use env var `MODELSCOPE_API_TOKEN`
+hub_token: Optional[str] = field( # modelhub的token
+    default=None,
+    metadata={
+        'help':
+        'SDK token can be found in https://modelscope.cn/my/myaccesstoken'
+    })
+
+ignore_args_error: bool = False  # True: notebook compatibility
+
+text_dropout_rate: float = 0.1 # drop一定比例的文本保证模型鲁棒性
+
+validation_prompts_path: str = field( # 评测过程使用的prompt文件目录，默认使用swift/aigc/configs/validation.txt
+    default=None,
+    metadata={
+        'help':
+        'The validation prompts file path, use aigc/configs/validation.txt is None'
+    })
+
+trainable_modules: str = field( # 可训练模块，建议使用默认值
+    default='.*motion_modules.*',
+    metadata={
+        'help':
+        'The trainable modules, by default, the .*motion_modules.* will be trained'
+    })
+
+mixed_precision: bool = True # 混合精度训练
+
+enable_xformers_memory_efficient_attention: bool = True # 使用xformers
+
+num_inference_steps: int = 25 #
+guidance_scale: float = 8.
+sample_size: int = 256
+sample_stride: int = 4 # 训练视频最大长度秒数
+sample_n_frames: int = 16 # 每秒帧数
+
+csv_path: str = None # 输入数据集
+video_folder: str = None # 输入数据集
+
+motion_num_attention_heads: int = 8 # motion adapter参数
+motion_max_seq_length: int = 32 # motion adapter参数
+num_train_timesteps: int = 1000 # 推理pipeline参数
+beta_start: int = 0.00085 # 推理pipeline参数
+beta_end: int = 0.012 # 推理pipeline参数
+beta_schedule: str = 'linear' # 推理pipeline参数
+steps_offset: int = 1 # 推理pipeline参数
+clip_sample: bool = False # 推理pipeline参数
+
+use_wandb: bool = False # 是否使用wandb
+```
+
+### 推理参数
+
+```text
+motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径，指定这个参数可以基于现有的官方模型效果继续训练
+motion_adapter_revision: Optional[str] = None # motion adapter的模型revision，仅在motion_adapter_id_or_path是模型id时有用
+
+model_id_or_path: str = None # sd基模型的模型id或模型路径
+model_revision: str = None # sd基模型的revision，仅在model_id_or_path是模型id时有用
+
+sft_type: str = field(
+    default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式，支持lora和全参数
+
+ckpt_dir: Optional[str] = field(
+    default=None, metadata={'help': '/path/to/your/vx-xxx/checkpoint-xxx'}) # 训练的输出文件夹
+eval_human: bool = False  # False: eval val_dataset # 是否使用人工输入评测
+
+seed: int = 42 # 随机种子
+
+merge_lora: bool = False # Merge lora into the MotionAdapter and save the model.
+replace_if_exists: bool = False # Replace the files if the output merged dir exists when `merge_lora` is True.
+
+# other
+ignore_args_error: bool = False  # True: notebook compatibility
+
+validation_prompts_path: str = None # 用于validation的文件，eval_human=False时使用，每一行一个prompt
+
+output_path: str = './generated' # 输出gif的目录
+
+enable_xformers_memory_efficient_attention: bool = True # 使用xformers
+
+num_inference_steps: int = 25 #
+guidance_scale: float = 8.
+sample_size: int = 256
+sample_stride: int = 4 # 训练视频最大长度秒数
+sample_n_frames: int = 16 # 每秒帧数
+
+motion_num_attention_heads: int = 8 # motion adapter参数
+motion_max_seq_length: int = 32 # motion adapter参数
+num_train_timesteps: int = 1000 # 推理pipeline参数
+beta_start: int = 0.00085 # 推理pipeline参数
+beta_end: int = 0.012 # 推理pipeline参数
+beta_schedule: str = 'linear' # 推理pipeline参数
+steps_offset: int = 1 # 推理pipeline参数
+clip_sample: bool = False # 推理pipeline参数
+
+```
--- a/ms-swift/docs/source/GetStarted/ResTuning.md
+++ b/ms-swift/docs/source/GetStarted/ResTuning.md
+# Res-Tuning组件
+
+<div align="center">
+
+## [NeurIPS 2023] Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
+
+### [arXiv](https://arxiv.org/abs/2310.19859)  |  [Project Page](https://res-tuning.github.io/)
+
+</div>
+
+Res-Tuning 是一种灵活高效的微调tuner。我们把tuner的设计从模型网络结构中解耦出来以便灵活地组合，
+并进一步扩展实现了一种新的节省内存的旁路tuner，大大减少了显存消耗和多任务推理成本。
+
+目前Res-Tuning在[SWIFT](https://github.com/modelscope/swift)中以可插拔的tuner算法组件提供，开发者可以直接使用它。
+
+### 支持的组件列表
+
+- [x] Res-Adapter
+- [x] Res-Tuning-Bypass
+- [ ] Res-Prefix
+- [ ] Res-Prompt
+
+### 使用方式
+
+#### Demo
+- 可以使用我们提供的 [可视化例子](https://github.com/modelscope/swift/blob/main/examples/pytorch/cv/notebook/swift_vision.ipynb).
+
+#### 初始化Tuner
+
+```Python
+from swift import ResTuningConfig
+config = ResTuningConfig(
+    dims=768,
+    root_modules=r'.*blocks.0$',
+    stem_modules=r'.*blocks\.\d+$',
+    target_modules=r'norm',
+    tuner_cfg='res_adapter'
+)
+```
+- dims: The dimensions of the hidden states.
+- root_modules: The root module to be replaced.
+- stem_modules: The stem modules to be replaced.
+- target_modules: The target module to be replaced.
+- tuner_cfg: The configuration of the tuning module.
+
+#### 加载模型
+
+```Python
+from swift import Swift
+import timm, torch
+model = timm.create_model("vit_base_patch16_224", pretrained=False, num_classes=100)
+model_tune = Swift.prepare_model(model, config)
+print(model_tune.get_trainable_parameters())
+print(model(torch.ones(1, 3, 224, 224)).shape)
+```
+
+
+### 引用
+```
+@inproceedings{jiang2023restuning,
+  title={Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone},
+  author={Jiang, Zeyinzi and Mao, Chaojie and Huang, Ziyuan and Ma, Ao and Lv, Yiliang and Shen, Yujun and Zhao, Deli and Zhou, Jingren},
+  booktitle={Advances in Neural Information Processing Systems},
+  year={2023}
+}
+```
--- a/ms-swift/docs/source/GetStarted/SCEdit.md
+++ b/ms-swift/docs/source/GetStarted/SCEdit.md
+## 🔥SCEdit
+
+SCEdit由阿里巴巴通义实验室视觉智能团队(Alibaba TongYi Vision Intelligence Lab)所提出，是一个高效的生成式微调框架。该框架不仅支持文生图下游任务的微调能力，**相比LoRA节省30%-50%的训练显存开销**，实现快速迁移到特定的生成场景中；而且还可以**直接扩展到可控图像生成任务中，仅需ControlNet条件生成7.9%的参数量并节省30%的显存开销**，支持边缘图、深度图、分割图、姿态、颜色图、图像补全等条件生成任务。
+
+我们使用了[风格迁移数据集](https://modelscope.cn/datasets/damo/style_custom_dataset/dataPeview)中的3D风格数据进行了训练，并使用相同的`Prompt: A boy in a camouflage jacket with a scarf`进行测试，具体的定性和定量的结果如下：
+
+| Method    | bs   | ep   | Target Module | Param. (M)    | Mem. (MiB) | 3D style                                                     |
+| --------- | ---- | ---- | ------------- | ------------- | ---------- | ------------------------------------------------------------ |
+| LoRA/r=64 | 1    | 50   | q/k/v/out/mlp | 23.94 (2.20%) | 8440MiB    | <img src="https://intranetproxy.alipay.com/skylark/lark/0/2023/png/167218/1703665229562-0f33bbb0-c492-41b4-9f37-3ae720dca80d.png" alt="img" style="zoom:20%;" /> |
+| SCEdit    | 1    | 50   | up_blocks     | 19.68 (1.81%) | 7556MiB    | <img src="https://intranetproxy.alipay.com/skylark/lark/0/2023/png/167218/1703665933913-74b98741-3b57-46a4-9871-539df3a0112c.png" alt="img" style="zoom:20%;" /> |
+| LoRA/r=64 | 10   | 100  | q/k/v/out/mlp | 23.94 (2.20%) | 26300MiB   | <img src="https://intranetproxy.alipay.com/skylark/lark/0/2023/png/167218/1703750608529-de20d0e7-bf9c-4928-8e59-73cc54f2c8d7.png" alt="img" style="zoom:20%;" /> |
+| SCEdit    | 10   | 100  | up_blocks     | 19.68 (1.81%) | 18634MiB   | <img src="https://intranetproxy.alipay.com/skylark/lark/0/2023/png/167218/1703663033092-94492e44-341f-4259-9df4-13c168e3b5d6.png" alt="img" style="zoom:20%;" /> |
+| LoRA/r=64 | 30   | 200  | q/k/v/out/mlp | 23.94 (2.20%) | 69554MiB   | <img src="https://intranetproxy.alipay.com/skylark/lark/0/2023/png/167218/1703750626635-2e368d7b-5e99-4a06-b189-8615f302bcd7.png" alt="img" style="zoom:20%;" /> |
+| SCEdit    | 30   | 200  | up_blocks     | 19.68 (1.81%) | 43350MiB   | <img src="https://intranetproxy.alipay.com/skylark/lark/0/2023/png/167218/1703662246942-1102b1f4-93ab-4653-b943-3302f2a5259e.png" alt="img" style="zoom:20%;" /> |
+
+使用SCEdit执行训练任务并复现上述结果：
+
+```shell
+# 先执行下面章节的安装步骤
+cd examples/pytorch/multi_modal/notebook
+python text_to_image_synthesis.py
+```
--- a/ms-swift/docs/source/GetStarted/SWIFT安装.md
+++ b/ms-swift/docs/source/GetStarted/SWIFT安装.md
+# 安装和使用
+
+## Wheel包安装
+
+可以使用pip进行安装：
+
+```shell
+# 全量能力
+pip install 'ms-swift[all]' -U
+# 仅使用LLM
+pip install 'ms-swift[llm]' -U
+# 仅使用AIGC
+pip install 'ms-swift[aigc]' -U
+# 仅使用adapters
+pip install ms-swift -U
+```
+
+## 源代码安装
+
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[all]'
+```
+
+## Notebook环境
+
+Swift支持训练的绝大多数模型都可以在`A10`显卡上使用，用户可以使用ModelScope官方提供的免费显卡资源：
+
+1. 进入[ModelScope](https://www.modelscope.cn)官方网站并登录
+2. 点击左侧的`我的Notebook`并开启一个免费GPU实例
+3. 愉快地薅A10显卡羊毛
+
+## Build文档
+
+Swift支持完整的API Doc文档，在swift根目录下执行：
+
+```shell
+make docs
+```
+
+等待执行完成后，查看`docs/build/html/index.html`即可。
--- a/ms-swift/docs/source/GetStarted/使用tuners.md
+++ b/ms-swift/docs/source/GetStarted/使用tuners.md
+# 基本使用
+
+tuner是指附加在模型上的额外结构部分，用于减少训练参数量或者提高训练精度。目前SWIFT支持的tuners有：
+
+1. LoRA: [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/abs/2106.09685)
+2. LoRA+: [LoRA+: Efficient Low Rank Adaptation of Large Models](https://arxiv.org/pdf/2402.12354.pdf)
+3. LLaMA PRO: [LLAMA PRO: Progressive LLaMA with Block Expansion](https://arxiv.org/pdf/2401.02415.pdf)
+4. GaLore: [GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection](https://arxiv.org/abs/2403.03507)
+5. LISA: [LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning](https://arxiv.org/abs/2403.17919)
+6. UnSloth: https://github.com/unslothai/unsloth
+7. SCEdit: [SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing](https://arxiv.org/abs/2312.11392)  < [arXiv](https://arxiv.org/abs/2312.11392)  |  [Project Page](https://scedit.github.io/) >
+8. NEFTune: [Noisy Embeddings Improve Instruction Finetuning](https://arxiv.org/abs/2310.05914)
+9. LongLoRA: [Efficient Fine-tuning of Long-Context Large Language Models](https://arxiv.org/abs/2309.12307)
+10. Adapter: [Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751)
+11. Vision Prompt Tuning: [Visual Prompt Tuning](https://arxiv.org/abs/2203.12119)
+12. Side: [Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks](https://arxiv.org/abs/1912.13503)
+13. Res-Tuning: [Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone](https://arxiv.org/abs/2310.19859)  < [arXiv](https://arxiv.org/abs/2310.19859)  |  [Project Page](https://res-tuning.github.io/)  |  [Usage](ResTuning.md) >
+14. [PEFT](https://github.com/huggingface/peft)提供的tuners, 如IA3, AdaLoRA等
+
+## 在训练中使用
+
+调用`Swift.prepare_model()`来将tuners添加到模型上：
+
+```python
+from modelscope import Model
+from swift import Swift, LoraConfig
+import torch
+model = Model.from_pretrained('ZhipuAI/chatglm3-6b', torch_dtype=torch.bfloat16, device_map='auto')
+lora_config = LoraConfig(
+                r=16,
+                target_modules=['query_key_value'],
+                lora_alpha=32,
+                lora_dropout=0.)
+model = Swift.prepare_model(model, lora_config)
+```
+
+也可以同时使用多个tuners：
+
+```python
+from modelscope import Model
+from swift import Swift, LoraConfig, AdapterConfig
+import torch
+model = Model.from_pretrained('ZhipuAI/chatglm3-6b', torch_dtype=torch.bfloat16, device_map='auto')
+lora_config = LoraConfig(
+                r=16,
+                target_modules=['query_key_value'],
+                lora_alpha=32,
+                lora_dropout=0.)
+adapter_config = AdapterConfig(
+                dim=model.config.hidden_size,
+                target_modules=['mlp'],
+                method_name='forward',
+                hidden_pos=0,
+                adapter_length=32,
+            )
+model = Swift.prepare_model(model, {'first_tuner': lora_config, 'second_tuner': adapter_config})
+# use model to do other things
+```
+
+在使用多个tuners时，传入的第二个参数需要是Dict，key是tuner名字，value是tuner配置。
+
+训练后可以调用：
+
+```python
+model.save_pretrained(save_directory='./output')
+```
+
+来存储模型checkpoint。模型的checkpoint文件只会包括tuners的权重，不会包含模型本身的权重。存储后的结构如下：
+
+> outputs
+>
+>      |-- configuration.json
+>
+>      |-- first_tuner
+>
+>                |-- adapter_config.json
+>
+>                |-- adapter_model.bin
+>
+>      |-- second_tuner
+>
+>                |-- adapter_config.json
+>
+>                |-- adapter_model.bin
+>
+>      |-- ...
+
+如果只传入单独的config，则会使用默认的名称`default`：
+
+> outputs
+>
+>       |-- configuration.json
+>
+>       |-- default
+>
+>                 |-- adapter_config.json
+>
+>                 |-- adapter_model.bin
+>
+>       |-- ...
+
+### 完整的训练代码
+
+```python
+# A100 18G memory
+from swift import Seq2SeqTrainer, Seq2SeqTrainingArguments
+from modelscope import MsDataset, AutoTokenizer
+from modelscope import AutoModelForCausalLM
+from swift import Swift, LoraConfig
+from swift.llm import get_template, TemplateType
+import torch
+
+# 拉起模型
+model = AutoModelForCausalLM.from_pretrained('ZhipuAI/chatglm3-6b', torch_dtype=torch.bfloat16, device_map='auto', trust_remote_code=True)
+lora_config = LoraConfig(
+                r=16,
+                target_modules=['query_key_value'],
+                lora_alpha=32,
+                lora_dropout=0.05)
+model = Swift.prepare_model(model, lora_config)
+tokenizer = AutoTokenizer.from_pretrained('ZhipuAI/chatglm3-6b', trust_remote_code=True)
+dataset = MsDataset.load('AI-ModelScope/alpaca-gpt4-data-en', split='train')
+template = get_template(TemplateType.chatglm3, tokenizer, max_length=1024)
+
+def encode(example):
+    inst, inp, output = example['instruction'], example.get('input', None), example['output']
+    if output is None:
+        return {}
+    if inp is None or len(inp) == 0:
+        q = inst
+    else:
+        q = f'{inst}\n{inp}'
+    example, kwargs = template.encode({'query': q, 'response': output})
+    return example
+
+dataset = dataset.map(encode).filter(lambda e: e.get('input_ids'))
+dataset = dataset.train_test_split(test_size=0.001)
+
+train_dataset, val_dataset = dataset['train'], dataset['test']
+
+
+train_args = Seq2SeqTrainingArguments(
+    output_dir='output',
+    learning_rate=1e-4,
+    num_train_epochs=2,
+    eval_steps=500,
+    save_steps=500,
+    evaluation_strategy='steps',
+    save_strategy='steps',
+    dataloader_num_workers=4,
+    per_device_train_batch_size=1,
+    gradient_accumulation_steps=16,
+    logging_steps=10,
+)
+
+trainer = Seq2SeqTrainer(
+    model=model,
+    args=train_args,
+    data_collator=template.data_collator,
+    train_dataset=train_dataset,
+    eval_dataset=val_dataset,
+    tokenizer=tokenizer)
+
+trainer.train()
+```
+
+## 在推理时使用
+
+使用`Swift.from_pretrained()`来拉起训练后存储的checkpoint：
+
+```python
+from modelscope import Model
+from swift import Swift
+import torch
+model = Model.from_pretrained('ZhipuAI/chatglm2-6b', torch_dtype=torch.bfloat16, device_map='auto')
+model = Swift.from_pretrained(model, './output')
+```
+
+### 完整的推理代码
+
+```python
+# A100 14G memory
+import torch
+from modelscope import AutoModelForCausalLM, GenerationConfig
+from modelscope import AutoTokenizer
+
+from swift import Swift
+from swift.llm import get_template, TemplateType, to_device
+
+# 拉起模型
+model = AutoModelForCausalLM.from_pretrained('ZhipuAI/chatglm3-6b', torch_dtype=torch.bfloat16,
+                                             device_map='auto', trust_remote_code=True)
+model = Swift.from_pretrained(model, 'output/checkpoint-xxx')
+tokenizer = AutoTokenizer.from_pretrained('ZhipuAI/chatglm3-6b', trust_remote_code=True)
+template = get_template(TemplateType.chatglm3, tokenizer, max_length=1024)
+
+examples, tokenizer_kwargs = template.encode({'query': 'How are you?'})
+if 'input_ids' in examples:
+    input_ids = torch.tensor(examples['input_ids'])[None]
+    examples['input_ids'] = input_ids
+    token_len = input_ids.shape[1]
+
+generation_config = GenerationConfig(
+    max_new_tokens=1024,
+    temperature=0.3,
+    top_k=25,
+    top_p=0.8,
+    do_sample=True,
+    repetition_penalty=1.0,
+    num_beams=10,
+    pad_token_id=tokenizer.pad_token_id,
+    eos_token_id=tokenizer.eos_token_id)
+
+device = next(model.parameters()).device
+examples = to_device(examples, device)
+
+generate_ids = model.generate(
+    generation_config=generation_config,
+    **examples)
+generate_ids = template.get_generate_ids(generate_ids, token_len)
+print(tokenizer.decode(generate_ids, **tokenizer_kwargs))
+# I'm an AI language model, so I don't have feelings or physical sensations. However, I'm here to assist you with any questions or tasks you may have. How can I help you today?
+```
+
+# 接口列表
+
+## Swift类静态接口
+
+- `Swift.prepare_model(model, config, **kwargs)`
+  - 接口作用：加载某个tuner到模型上，如果是PeftConfig的子类，则使用Peft库的对应接口加载tuner。在使用SwiftConfig的情况下，本接口可以传入SwiftModel实例并重复调用，此时和config传入字典的效果相同。
+    - 本接口支持并行加载不同类型的多个tuners共同使用
+  - 参数：
+    - `model`: `torch.nn.Module`或`SwiftModel`的实例，被加载的模型
+    - `config`: `SwiftConfig`、`PeftConfig`的实例，或者一个自定义tuner名称对config的字典
+  - 返回值：`SwiftModel`或`PeftModel`的实例
+- `Swift.merge_and_unload(model)`
+  - 接口作用：将LoRA weights合并回原模型，并将LoRA部分完全卸载
+  - 参数：
+    - model: `SwiftModel`或`PeftModel`的实例，已加载LoRA的模型实例
+  - 返回值：None
+
+- `Swift.merge(model)`
+
+  - 接口作用：将LoRA weights合并回原模型，不卸载LoRA部分
+
+  - 参数：
+    - model: `SwiftModel`或`PeftModel`的实例，已加载LoRA的模型实例
+
+  - 返回值：None
+
+- `Swift.unmerge(model)`
+
+  - 接口作用：将LoRA weights从原模型weights中拆分回LoRA结构
+
+  - 参数：
+    - model: `SwiftModel`或`PeftModel`的实例，已加载LoRA的模型实例
+
+  - 返回值：None
+
+- `Swift.save_to_peft_format(ckpt_dir, output_dir)`
+
+  - 接口作用：将存储的LoRA checkpoint转换为Peft兼容的格式。主要改变有：
+
+    - `default`会从对应的`default`文件夹中拆分到output_dir根目录中
+    - weights中的`{tuner_name}.`字段会被移除，如`model.layer.0.self.in_proj.lora_A.default.weight`会变为`model.layer.0.self.in_proj.lora_A.weight`
+    - weights中的key会增加`basemodel.model`前缀
+
+    - 注意：只有LoRA可以被转换，其他类型tuner由于Peft本身不支持，因此会报转换错误。此外，由于LoRAConfig中存在额外参数，如`dtype`，因此在这些参数有设定的情况下，不支持转换为Peft格式，此时可以手动删除adapter_config.json中的对应字段
+
+  - 参数：
+
+    - ckpt_dir：原weights目录
+    - output_dir：目标weights目录
+
+  - 返回值：None
+
+- `Swift.from_pretrained(model, model_id, adapter_name, revision, **kwargs)`
+  - 接口作用：从存储的weights目录中加载起tuner到模型上，如果adapter_name不传，则会将model_id目录下所有的tuners都加载起来。同`prepare_model`相同，本接口可以重复调用
+  - 参数：
+    - model：`torch.nn.Module`或`SwiftModel`的实例，被加载的模型
+    - model_id：`str`类型，待加载的tuner checkpoint， 可以是魔搭hub的id，或者训练产出的本地目录
+    - adapter_name：`str`或`List[str]`或`Dict[str, str]`类型或`None`，待加载tuner目录中的tuner名称，如果为`None`则加载所有名称的tuners，如果是`str`或`List[str]`则只加载某些具体的tuner，如果是`Dict`，则将`key`指代的tuner加载起来后换成`value`的名字
+    - revision: 如果model_id是魔搭的id，则revision可以指定对应版本号
+
+## SwiftModel接口
+
+下面列出用户可能调用的接口列表，其他内部接口或不推荐使用的接口可以通过`make docs`命令查看API Doc文档。
+
+- `SwiftModel.create_optimizer_param_groups(self, **defaults)`
+  - 接口作用：根据加载的tuners创建parameter groups，目前仅对`LoRA+`算法有作用
+  - 参数：
+    - defaults：`optimizer_groups`的默认参数，如`lr`和`weight_decay`
+  - 返回值：
+    - 创建的`optimizer_groups`
+
+- `SwiftModel.add_weighted_adapter(self, ...)`
+  - 接口作用：将已有的LoRA tuners合并为一个
+  - 参数：
+    - 本接口是PeftModel.add_weighted_adapter的透传，参数可以参考：[add_weighted_adapter文档](https://huggingface.co/docs/peft/main/en/package_reference/lora#peft.LoraModel.add_weighted_adapter)
+
+- `SwiftModel.save_pretrained(self, save_directory, safe_serialization, adapter_name)`
+  - 接口作用：存储tuner weights
+  - 参数：
+    - save_directory：存储目录
+    - safe_serialization： 是否使用safe_tensors，默认为False
+    - adapter_name：存储的adapter tuner，如果不传则默认存储所有的tuners
+- `SwiftModel.set_active_adapters(self, adapter_names, offload=None)`
+  - 接口作用：设置当前激活的adapters，不在列表中的adapters会被失活
+    - 在`推理`时支持环境变量`USE_UNIQUE_THREAD=0/1`，默认值`1`，如果为`0`则set_active_adapters只对当前线程生效，此时默认使用本线程激活的tuners，不同线程tuners互不干扰
+  - 参数：
+    - adapter_names：激活的tuners
+    - offload：失活的adapters如何处理，默认为`None`代表留在显存中，同时支持`cpu`和`meta`，代表offload到cpu和meta设备中以减轻显存消耗，在`USE_UNIQUE_THREAD=0`时offload不要传值以免影响其他线程
+  - 返回值：None
+- `SwiftModel.activate_adapter(self, adapter_name)`
+  - 接口作用：激活一个tuner
+    - 在`推理`时支持环境变量`USE_UNIQUE_THREAD=0/1`，默认值`1`，如果为`0`则activate_adapter只对当前线程生效，此时默认使用本线程激活的tuners，不同线程tuners互不干扰
+  - 参数：
+    - adapter_name：待激活的tuner名字
+  - 返回值：None
+- `SwiftModel.deactivate_adapter(self, adapter_name, offload)`
+  - 接口作用：失活一个tuner
+    - 在`推理`时环境变量`USE_UNIQUE_THREAD=0`时不要调用本接口
+  - 参数：
+    - adapter_name：待失活的tuner名字
+    - offload：失活的adapters如何处理，默认为`None`代表留在显存中，同时支持`cpu`和`meta`，代表offload到cpu和meta设备中以减轻显存消耗
+  - 返回值：None
+
+- `SwiftModel.get_trainable_parameters(self)`
+
+  - 接口作用：返回训练参数信息
+
+  - 参数：无
+
+  - 返回值：训练参数信息，格式如下：
+    ```text
+    trainable params: 100M || all params: 1000M || trainable%: 10.00% || cuda memory: 10GiB.
+    ```
--- a/ms-swift/docs/source/GetStarted/在SWIFT内使用PEFT.md
+++ b/ms-swift/docs/source/GetStarted/在SWIFT内使用PEFT.md
+# 对Peft的兼容性
+
+为了支持习惯Peft的用户，Swift提供了对于Peft的兼容性。用户可以从swift中import peft组件：
+
+>PeftModel
+>
+>PeftConfig
+>
+>PeftModelForSeq2SeqLM
+>
+>PeftModelForSequenceClassification
+>
+>PeftModelForTokenClassification
+>
+>PeftModelForCausalLM
+>
+>PromptEncoderConfig
+>
+>PromptTuningConfig
+>
+>PrefixTuningConfig
+>
+>PromptLearningConfig
+>
+>LoraConfig
+>
+>get_peft_config
+>
+>get_peft_model_state_dict
+>
+>get_peft_model
+
+以上组件均可以从swift中import：
+
+```python
+from swift import PeftModel, PeftConfig
+```
+
+Swift类也支持初始化Peft的tuner：
+
+```python
+from modelscope.models.nlp import SbertForSequenceClassification
+from modelscope.models.nlp.structbert import SbertConfig
+
+from swift import LoraConfig, Swift
+model = SbertForSequenceClassification(SbertConfig())
+lora_config = LoraConfig(target_modules=['query', 'key', 'value'])
+model = Swift.prepare_model(model, lora_config)
+```
+
+Swift对Peft进行了浅封装，使Peft可以在from_pretrained时使用modelscope hub中的模型。
--- a/ms-swift/docs/source/GetStarted/界面训练推理.md
+++ b/ms-swift/docs/source/GetStarted/界面训练推理.md
+# 界面训练推理
+
+目前SWIFT已经支持了界面化的训练和推理，参数支持和脚本训练相同。在安装SWIFT后，使用如下命令：
+
+```shell
+swift web-ui
+```
+
+开启界面训练和推理。
+
+web-ui可以通过环境变量或者参数控制UI行为。环境变量如下：
+
+> WEBUI_SHARE=1/0 默认为0 控制gradio是否是share状态
+>
+> SWIFT_UI_LANG=en/zh 控制web-ui界面语言
+>
+> WEBUI_SERVER server_name参数，web-ui host ip，0.0.0.0代表所有ip均可访问，127.0.0.1代表只允许本机访问
+>
+> WEBUI_PORT web-ui的端口号
+>
+> USE_INFERENCE=1/0 默认0. 控制gradio的推理页面是直接加载模型推理或者部署（USE_INFERENCE=0）
+
+如果使用参数，请参考[命令行参数](../LLM/命令行参数.md#web-ui-参数)。