Commit 1bfbcff0 authored by wanglch's avatar wanglch
Browse files

Initial commit

parents
Pipeline #1204 canceled with stages
# LLM Quantization Documentation
Swift supports the use of awq, gptq, bnb, hqq, and eetq technologies to quantize models. Among them, awq and gptq quantization technologies support vllm for accelerated inference, requiring the use of a calibration dataset for better quantization performance, but with slower quantization speed. On the other hand, bnb, hqq, and eetq do not require calibration data and have faster quantization speed. All five quantization methods support qlora fine-tuning.
Quantization using awq and gptq requires the use of 'swift export', while bnb, hqq, and eetq can be quickly quantized during sft and infer.
From the perspective of vllm inference acceleration support, it is more recommended to use awq and gptq for quantization. From the perspective of quantization effectiveness, it is more recommended to use awq, hqq, and gptq for quantization. And from the perspective of quantization speed, it is more recommended to use hqq for quantization.
## Table of Contents
- [Environment Preparation](#environment-preparation)
- [Original Model](#original-model)
- [Fine-tuned Model](#fine-tuned-model)
- [QLoRA](#QLoRA)
- [Pushing Models](#pushing-models)
## Environment Preparation
GPU devices: A10, 3090, V100, A100 are all supported.
```bash
# Install ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
# Using AWQ quantization:
# AutoAWQ and CUDA versions have a corresponding relationship, please select the version according to `https://github.com/casper-hansen/AutoAWQ`
pip install autoawq -U
# Using GPTQ quantization:
# Auto_GPTQ and CUDA versions have a corresponding relationship, please select the version according to `https://github.com/PanQiWei/AutoGPTQ#quick-installation`
pip install auto_gptq -U
# Using bnb quantization:
pip install bitsandbytes -U
# Using hqq quantization:
# Requires transformers version >4.40, install from source
pip install git+https://github.com/huggingface/transformers pip install hqq
# If compatibility with training is desired, install peft from source
pip install git+https://github.com/huggingface/peft.git
# Using eetq quantization:
# Requires transformers version >4.40, install from source
pip install git+https://github.com/huggingface/transformers
# See https://github.com/NetEase-FuXi/EETQ for reference
git clone https://github.com/NetEase-FuXi/EETQ.git cd EETQ/ git submodule update --init --recursive pip install .
# If compatibility with training is desired, install peft from source
pip install git+https://github.com/huggingface/peft.git
# Environment alignment (usually not needed. If you encounter errors, you can run the code below, the repository uses the latest environment for testing)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```
## Original Model
### awq, gptq
Here we demonstrate AWQ and GPTQ quantization on the qwen1half-7b-chat model.
```bash
# AWQ-INT4 quantization (takes about 18 minutes using A100, memory usage: 13GB)
# If OOM occurs during quantization, you can appropriately reduce `--quant_n_samples` (default 256) and `--quant_seqlen` (default 2048).
# GPTQ-INT4 quantization (takes about 20 minutes using A100, memory usage: 7GB)
# AWQ: Use `alpaca-zh alpaca-en sharegpt-gpt4:default` as the quantization dataset
CUDA_VISIBLE_DEVICES=0 swift export \
--model_type qwen1half-7b-chat --quant_bits 4 \
--dataset alpaca-zh alpaca-en sharegpt-gpt4:default --quant_method awq
# GPTQ: Use `alpaca-zh alpaca-en sharegpt-gpt4:default` as the quantization dataset
# For GPTQ quantization, please first refer to this issue: https://github.com/AutoGPTQ/AutoGPTQ/issues/439
OMP_NUM_THREADS=14 CUDA_VISIBLE_DEVICES=0 swift export \
--model_type qwen1half-7b-chat --quant_bits 4 \
--dataset alpaca-zh alpaca-en sharegpt-gpt4:default --quant_method gptq
# AWQ: Use custom quantization dataset
# Same for GPTQ
CUDA_VISIBLE_DEVICES=0 swift export \
--model_type qwen1half-7b-chat --quant_bits 4 \
--dataset xxx.jsonl \
--quant_method awq
# Inference using swift quantized model
# AWQ
CUDA_VISIBLE_DEVICES=0 swift infer \
--model_type qwen1half-7b-chat \
--model_id_or_path qwen1half-7b-chat-awq-int4
# GPTQ
CUDA_VISIBLE_DEVICES=0 swift infer \
--model_type qwen1half-7b-chat \
--model_id_or_path qwen1half-7b-chat-gptq-int4
# Inference using original model
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
```
### bnb, hqq, eetq
For bnb, hqq, and eetq, we only need to use `swift infer` for rapid quantization and inference.
```bash
CUDA_VISIBLE_DEVICES=0 swift infer \
--model_type qwen1half-7b-chat \
--quant_method bnb \
--quantization_bit 4
CUDA_VISIBLE_DEVICES=0 swift infer \
--model_type qwen1half-7b-chat \
--quant_method hqq \
--quantization_bit 4
CUDA_VISIBLE_DEVICES=0 swift infer \
--model_type qwen1half-7b-chat \
--quant_method eetq \
--dtype fp16
```
## Fine-tuned Model
Assume you fine-tuned qwen1half-4b-chat using LoRA, and the model weights directory is: `output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx`.
Here we only introduce using the AWQ technique to quantize the fine-tuned model. Using GPTQ for quantization would be similar.
**Merge-LoRA & Quantization**
```shell
# Use `alpaca-zh alpaca-en sharegpt-gpt4:default` as the quantization dataset
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx' \
--merge_lora true --quant_bits 4 \
--dataset alpaca-zh alpaca-en sharegpt-gpt4:default --quant_method awq
# Use the dataset from fine-tuning as the quantization dataset
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx' \
--merge_lora true --quant_bits 4 \
--load_dataset_config true --quant_method awq
```
**Inference using quantized model**
```shell
# AWQ/GPTQ quantized models support VLLM inference acceleration. They also support model deployment.
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx-merged-awq-int4'
```
**Deploying the quantized model**
Server:
```shell
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx-merged-awq-int4'
```
Testing:
```shell
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen1half-4b-chat",
"messages": [{"role": "user", "content": "How to fall asleep at night?"}],
"max_tokens": 256,
"temperature": 0
}'
```
## QLoRA
### awq, gptq
If you want to fine-tune the models quantized with awq and gptq using QLoRA, you need to perform pre-quantization. For example, you can use `swift export` to quantize the original model. Then, for fine-tuning, you need to specify `--quant_method` to specify the corresponding quantization method using the following command:
```bash
# awq
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen1half-7b-chat \
--model_id_or_path qwen1half-7b-chat-awq-int4 \
--quant_method awq \
--sft_type lora \
--dataset alpaca-zh#5000 \
# gptq
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen1half-7b-chat \
--model_id_or_path qwen1half-7b-chat-gptq-int4 \
--quant_method gptq \
--sft_type lora \
--dataset alpaca-zh#5000 \
```
### bnb, hqq, eetq
If you want to use bnb, hqq, eetq for QLoRA fine-tuning, you need to specify `--quant_method` and `--quantization_bit` during training:
```bash
# bnb
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen1half-7b-chat \
--sft_type lora \
--dataset alpaca-zh#5000 \
--quant_method bnb \
--quantization_bit 4 \
--dtype fp16 \
# hqq
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen1half-7b-chat \
--sft_type lora \
--dataset alpaca-zh#5000 \
--quant_method hqq \
--quantization_bit 4 \
# eetq
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen1half-7b-chat \
--sft_type lora \
--dataset alpaca-zh#5000 \
--quant_method eetq \
--dtype fp16 \
```
**Note**
- hqq supports more customizable parameters, such as specifying different quantization configurations for different network layers. For details, please see [Command Line Arguments](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Command-line-parameters.md).
- eetq quantization uses 8-bit quantization, and there's no need to specify quantization_bit. Currently, bf16 is not supported; you need to specify dtype as fp16.
- Currently, eetq's qlora speed is relatively slow; it is recommended to use hqq instead. For reference, see the [issue](https://github.com/NetEase-FuXi/EETQ/issues/17).
## Pushing Models
Assume you fine-tuned qwen1half-4b-chat using LoRA, and the model weights directory is: `output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx`.
```shell
# Push the original quantized model
CUDA_VISIBLE_DEVICES=0 swift export \
--model_type qwen1half-7b-chat \
--model_id_or_path qwen1half-7b-chat-gptq-int4 \
--push_to_hub true \
--hub_model_id qwen1half-7b-chat-gptq-int4 \
--hub_token '<your-sdk-token>'
# Push LoRA incremental model
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id qwen1half-4b-chat-lora \
--hub_token '<your-sdk-token>'
# Push merged model
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id qwen1half-4b-chat-lora \
--hub_token '<your-sdk-token>' \
--merge_lora true
# Push quantized model
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id qwen1half-4b-chat-lora \
--hub_token '<your-sdk-token>' \
--merge_lora true \
--quant_bits 4
```
# NPU Best Practice
Authors: [chuanzhubin](https://github.com/chuanzhubin), [jintao](https://github.com/Jintao-Huang)
## Table of Contents
- [Environment Preparation](#Environment-Preparation)
- [Fine-tuning](#Fine-tuning)
- [Inference](#Inference)
## Environment Preparation
Experimental environment: 8 * Ascend 910B3 (The device is provided by [@chuanzhubin](https://github.com/chuanzhubin), thanks for the support to modelscope and swift ~)
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
pip install torch-npu decorator
pip install deepspeed
# Align environment (usually not necessary to run. If you encounter errors, you can run the following code, the repository is tested with the latest environment)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```
Verify the installation of the testing environment:
```python
from transformers.utils import is_torch_npu_available
import torch
print(is_torch_npu_available()) # True
print(torch.npu.device_count()) # 8
print(torch.randn(10, device='npu:0'))
```
## Fine-tuning
The following introduces the fine-tuning of LoRA. Set the parameter `--sft_type full` for full parameter fine-tuning.
### Single Card Training
Start single card fine-tuning with the following command:
```shell
# Experimental Environment: Ascend 910B3
# GPU Memory Requirement: 25GB
# Runtime: 8 hours
ASCEND_RT_VISIBLE_DEVICES=0 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
```
### Training with DDP
```shell
# Experimental Environment: 4 * Ascend 910B3
# GPU Memory Requirement: 4 * 30GB
# Runtime: 2 hours
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
```
### Training with DeepSpeed
ZeRO2:
```shell
# Experimental Environment: 4 * Ascend 910B3
# GPU Memory Requirement: 4 * 28GB
# Runtime: 3.5 hours
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero2 \
```
ZeRO3:
```shell
# Experimental Environment: 4 * Ascend 910B3
# GPU Memory Requirement: 4 * 25GB
# Runtime: 8.5 hours
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero3 \
```
## Inference
Original Model:
```shell
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
```
After LoRA Fine-tuning:
```shell
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
```
# Best Practices for ORPO Algorithm
The ORPO algorithm requires the same data format as DPO. Beyond SFT data [query, response], it additionally requires `rejected_response` to denote answers that the model should not generate.
The ORPO algorithm incorporates an odds ratio (OR) negative log-likelihood loss term into the loss function used during SFT training, to reduce the probability of generating rejected responses.
Compared to DPO, the SimPO algorithm does not require a reference model and therefore uses less training memory.
Here, the hyperparameter beta represents the coefficient for the OR loss term. The larger the beta, the greater the penalty on rejected_response, with a default value of 0.1.
This best practice session will train the [llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary) model using the ORPO algorithm, enabling it to respond in Chinese.
Table of Contents
- [Environment Preparation](#environment-preparation)
- [Dataset](#dataset)
- [Training](#training)
- [Inference](#inference)
## Environment Preparation
```shell
# Install ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
# Environment alignment (usually no need to run. If you encounter an error, you can run the code below, the repository has been tested with the latest environment)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```
## Dataset
We use the [Llama3-DPO-zh-en-emoji](https://modelscope.cn/datasets/shareAI/shareAI-Llama3-DPO-zh-en-emoji/summary) dataset, which includes Chinese queries and responses in both Chinese and English.
Data example
```json
{"question": "一个完整的工程项目管理流程包括什么?",
"answer_zh": "🤔 Ahahah, 项目管理流程, 这个问题太常见了! 😂\n\n一个完整的工程项目管理流程包括以下几个阶段:\n\n1️⃣ **项目启动** 🚀:项目的开始,定义项目的目标、范围、时间表和预算。\n\n2️⃣ **项目计划** 📅:制定项目计划,包括任务分解、资源分配、进度安排和风险管理。\n\n3️⃣ **项目执行** 💪:按照计划执行项目,包括任务执行、进度控制和质量管理。\n\n4️⃣ **项目监控** 👀:对项目的进度、成本和质量进行监控和控制,确保项目不偏离轨道。\n\n5️⃣ **项目收尾** 🎉:项目完成后,进行项目总结、经验分享和文档编制。\n\n6️⃣ **项目评价** 🤔:对项目的成果和过程进行评价,总结经验和教训。\n\n这些阶段是一个完整的工程项目管理流程,但是,具体的项目可能会根据需要添加或删除某些阶段。 🤝\n\n还有一点,项目管理流程中需要注意的几个关键点是:\n\n* **沟通** 💬:项目团队之间、项目团队和客户之间的沟通是非常重要的。\n* **风险管理** ⚠️:项目中总会出现一些风险和问题,需要提前预测和准备。\n* **变更管理** 🔄:项目中可能会出现一些变更,需要及时处理和更新。\n\n总之,项目管理流程是一个复杂的过程,需要项目经理和团队成员的共同努力和合作。 💪",
"answer_en": "My fellow wizarding friend! 🧙‍♂️\n\nA complete project management process, you ask? Well, I shall conjure up the essential steps for thee! 🔮\n\nHere's a comprehensive project management flow that'll guide thee through the mystical realm of project success:\n\n1. **Initiation**: Define project scope, goals, and deliverables. Identify stakeholders, and determine the project's feasibility. ⚡️\n2. **Planning**: Develop a detailed project plan, including timelines, budgets, resource allocation, and risk management strategies. 📅\n3. **Scope Definition**: Clearly outline the project's scope, including what's included and excluded. This will help prevent scope creep, my friend! 🔒\n4. **Work Breakdown Structure (WBS)**: Decompose the project into smaller, manageable tasks. This will aid in task assignment, scheduling, and tracking progress. 🗂️\n5. **Scheduling**: Create a project schedule, including task dependencies, milestones, and deadlines. This will help thee stay on track and avoid procrastination! ⏰\n6. **Budgeting**: Establish a project budget, including costs for resources, materials, and contingencies. This will ensure thou dost not overspend, my wise friend! 💸\n7. **Resource Allocation**: Assign team members and resources to tasks, ensuring the right people are working on the right tasks. This will optimize productivity and efficiency! 👥\n8. **Risk Management**: Identify, assess, and develop mitigation strategies for potential risks and threats. This will safeguard thy project from unforeseen dangers! ⚠️\n9. **Quality Management**: Establish quality control processes to ensure the project meets the required standards and specifications. This will guarantee a top-notch outcome, my friend! ✨\n10. **Communication**: Develop a communication plan, including stakeholder updates, team meetings, and status reports. This will keep everyone informed and aligned! 📢\n11. **Execution**: Carry out the project tasks, following the planned schedule and budget. This is where the magic happens, my friend! ✨\n12. **Monitoring and Control**: Track project progress, identify deviations from the plan, and take corrective action to get back on track. This will ensure thy project stays on course! 📊\n13. **Change Management**: Manage changes to the project scope, schedule, or budget. This will minimize disruptions and ensure the project remains viable! 🔀\n14. **Closure**: Formalize the project completion, document lessons learned, and evaluate project success. This will bring a sense of accomplishment and closure, my friend! 🎉\n\nThere thou hast it, my fellow wizard! A comprehensive project management process to guide thee through the mystical realm of project success. May thy projects be prosperous and thy stakeholders be delighted! 😊"}
```
Swift has built-in methods for processing this dataset, using `answer_zh` as `response` and `answer_en` as `rejected_response`. Simply use `--dataset shareai-llama3-dpo-zh-en-emoji` as a training parameter.
## Training
```shell
# Experimental environment: A100
# DDP + MP
# Memory usage: 4*24G
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=2 \
swift orpo \
--model_type llama3-8b-instruct \
--beta 0.5 \
--sft_type lora \
--dataset shareai-llama3-dpo-zh-en-emoji \
--num_train_epochs 2 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--learning_rate 5e-5 \
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
--warmup_ratio 0.03 \
--save_total_limit 2
# MP(device map)
# Memory usage: 2*24G
CUDA_VISIBLE_DEVICES=0,1 \
swift orpo \
--model_type llama3-8b-instruct \
--beta 0.5 \
--sft_type lora \
--dataset shareai-llama3-dpo-zh-en-emoji \
--num_train_epochs 2 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--learning_rate 5e-5 \
--gradient_accumulation_steps 16 \
--warmup_ratio 0.03 \
--save_total_limit 2
# Memory usage: 40G
CUDA_VISIBLE_DEVICES=0 \
swift orpo \
--model_type llama3-8b-instruct \
--beta 0.5 \
--sft_type lora \
--dataset shareai-llama3-dpo-zh-en-emoji \
--num_train_epochs 2 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--learning_rate 5e-5 \
--gradient_accumulation_steps 16 \
--warmup_ratio 0.03 \
--save_total_limit 2
```
**Notes**:
- If training the base model with data containing history, specify a template supporting multi-turn dialogue (base models often do not support multi-turn dialogue). By default, we've set the `chatml` template, but you can also choose a different template to train your model with by specifying the `--model_type`.
- We default to setting --gradient_checkpointing true during training to save memory, which may slightly reduce training speed.
- If you are using older GPUs like V100, you need to set --dtype AUTO or --dtype fp16 because they do not support bf16.
- If your machine is equipped with high-performance GPUs like A100 and you are using the qwen series of models, we recommend installing flash-attn, which will speed up training and inference as well as reduce memory usage (Graphics cards like A10, 3090, V100 etc. do not support training with flash-attn). Models that - support flash-attn can be viewed in LLM Supported Models.
- If you need to train offline, please use --model_id_or_path <model_dir> and set --check_model_is_latest false. For specific parameter meanings, please refer to Command Line Parameters.
- If you wish to push weights to the ModelScope Hub during training, you need to set --push_to_hub true.
## Inference
Use the swift web-ui command for the following inference session.
### Pre-Training Inference
> 你是谁(Who are you)
![orpo1](../../resources/orpo1.png)
> 西湖醋鱼怎么做(How do you make West Lake Vinegar Fish?)
![orpo2](../../resources/orpo2.png)
![orpo3](../../resources/orpo3.png)
![orpo4](../../resources/orpo4.png)
![orpo5](../../resources/orpo5.png)
### Post-Training Inference
> 你是谁(Who are you)
![orpo6](../../resources/orpo6.png)
> 西湖醋鱼怎么做(How do you make West Lake Vinegar Fish?)
![orpo7](../../resources/orpo7.png)
![orpo8](../../resources/orpo8.png)
# Qwen1.5 Full Process Best Practices
This introduces how to perform inference, self-cognition fine-tuning, quantization, and deployment on **Qwen1.5-7B-Chat** and **Qwen1.5-72B-Chat**, corresponding to **low-resource and high-resource** environments respectively.
The best practice for self-cognition fine-tuning, inference and deployment of Qwen2-72B-Instruct using dual-card 80GiB A100 can be found [here](https://github.com/modelscope/swift/issues/1092).
## Table of Contents
- [Environment Preparation](#environment-preparation)
- [Qwen1.5-7B-Chat](#qwen15-7b-chat)
- [Inference](#inference)
- [Self-Cognition Fine-tuning](#self-cognition-fine-tuning)
- [Post-Tuning Inference](#post-tuning-inference)
- [Quantization](#quantization)
- [Deployment](#deployment)
- [Qwen1.5-72B-Chat](#qwen15-72b-chat)
- [Inference](#inference-1)
- [Self-Cognition Fine-tuning](#self-cognition-fine-tuning-1)
- [Post-Tuning Inference](#post-tuning-inference-1)
- [Quantization](#quantization-1)
- [Deployment](#deployment-1)
## Environment Preparation
```shell
pip install 'ms-swift[llm]' -U
# autoawq version corresponds to cuda version, please choose based on `https://github.com/casper-hansen/AutoAWQ`
pip install autoawq
# vllm version corresponds to cuda version, please choose based on `https://docs.vllm.ai/en/latest/getting_started/installation.html`
pip install vllm
pip install openai
```
## Qwen1.5-7B-Chat
### Inference
Here we perform **streaming** inference on Qwen1.5-7B-Chat and its **awq-int4 quantized** version, and demonstrate inference using a **visualization** method.
Using Python for inference on `qwen1half-7b-chat`:
```python
# Experimental environment: 3090
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.qwen1half_7b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}') # template_type: qwen
kwargs = {}
# kwargs['use_flash_attn'] = True # use flash_attn
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'}, **kwargs)
# modify max_new_tokens
model.generation_config.max_new_tokens = 128
template = get_template(template_type, tokenizer)
seed_everything(42)
query = 'Where is the capital of Zhejiang located?'
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')
# streaming
query = 'What are some delicious foods here?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
query: Where is the capital of Zhejiang located?
response: The capital of Zhejiang Province is Hangzhou City.
query: What are some delicious foods here?
response: Zhejiang has many delicious foods, such as West Lake vinegar fish, Dongpo pork, Longjing shrimp in Hangzhou, tangyuan in Ningbo, yam soup in Fenghua, fish cake and Nanxi River dried tofu in Wenzhou, Nanhu water chestnut in Jiaxing, etc. Each dish has its unique flavor and historical background, worth a try.
history: [['Where is the capital of Zhejiang located?', 'The capital of Zhejiang Province is Hangzhou City.'], ['What are some delicious foods here?', 'Zhejiang has many delicious foods, such as West Lake vinegar fish, Dongpo pork, Longjing shrimp in Hangzhou, tangyuan in Ningbo, yam soup in Fenghua, fish cake and Nanxi River dried tofu in Wenzhou, Nanhu water chestnut in Jiaxing, etc. Each dish has its unique flavor and historical background, worth a try.']]
"""
```
Using Python to infer `qwen1half-7b-chat-awq`, here we use **VLLM** for inference acceleration:
```python
# Experimental environment: 3090
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_vllm, inference_stream_vllm
)
import torch
model_type = ModelType.qwen1half_7b_chat_awq
llm_engine = get_vllm_engine(model_type, torch.float16, max_model_len=4096)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# Interface similar to `transformers.GenerationConfig`
llm_engine.generation_config.max_new_tokens = 512
request_list = [{'query': 'Hello!'}, {'query': 'Where is the capital of Zhejiang?'}]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
# streaming
history1 = resp_list[1]['history']
query = "What delicious food is there here"
request_list = [{'query': query, 'history': history1}]
gen = inference_stream_vllm(llm_engine, template, request_list)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for resp_list in gen:
request = request_list[0]
resp = resp_list[0]
response = resp['response']
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f"history: {resp_list[0]['history']}")
"""
query: Hello!
response: Hello! How can I help you?
query: Where is the capital of Zhejiang?
response: The capital of Zhejiang Province is Hangzhou City.
query: What delicious food is there here
response: Zhejiang has many delicious foods. Here are some of the most representative ones:
1. Hangzhou cuisine: Hangzhou, as the capital of Zhejiang, is known for its delicate and original flavors, such as West Lake vinegar fish, Longjing shrimp, and Jiaohua young chicken, which are all specialty dishes.
2. Ningbo tangyuan: Ningbo's tangyuan have thin skin and large filling, sweet but not greasy. Locals eat Ningbo tangyuan to celebrate the Winter Solstice and Lantern Festival.
3. Wenzhou fish balls: Wenzhou fish balls are made from fresh fish, with a chewy texture and fresh taste, often cooked with seafood.
4. Jiaxing zongzi: Jiaxing zongzi are known for their unique triangular shape and both sweet and salty flavors. Wufangzhai's zongzi are particularly famous.
5. Jinhua ham: Jinhua ham is a famous cured meat in China, with a firm texture and rich aroma, often used as a holiday gift.
6. Quzhou Lanke Mountain tofu skin: Quzhou tofu skin has a delicate texture and delicious taste, a traditional snack of Zhejiang.
7. Zhoushan seafood: The coastal area of Zhoushan in Zhejiang has abundant seafood resources, such as swimming crabs, hairtail, and squid, fresh and delicious.
The above are just some of the Zhejiang delicacies. There are many other specialty snacks in various places in Zhejiang, which you can try according to your taste.
history: [('Where is the capital of Zhejiang?', 'The capital of Zhejiang Province is Hangzhou City.'), ('What delicious food is there here', "Zhejiang has many delicious foods. Here are some of the most representative ones:\n\n1. Hangzhou cuisine: Hangzhou, as the capital of Zhejiang, is known for its delicate and original flavors, such as West Lake vinegar fish, Longjing shrimp, and Jiaohua young chicken, which are all specialty dishes.\n\n2. Ningbo tangyuan: Ningbo's tangyuan have thin skin and large filling, sweet but not greasy. Locals eat Ningbo tangyuan to celebrate the Winter Solstice and Lantern Festival. \n\n3. Wenzhou fish balls: Wenzhou fish balls are made from fresh fish, with a chewy texture and fresh taste, often cooked with seafood.\n\n4. Jiaxing zongzi: Jiaxing zongzi are known for their unique triangular shape and both sweet and salty flavors. Wufangzhai's zongzi are particularly famous.\n\n5. Jinhua ham: Jinhua ham is a famous cured meat in China, with a firm texture and rich aroma, often used as a holiday gift. \n\n6. Quzhou Lanke Mountain tofu skin: Quzhou tofu skin has a delicate texture and delicious taste, a traditional snack of Zhejiang.\n\n7. Zhoushan seafood: The coastal area of Zhoushan in Zhejiang has abundant seafood resources, such as swimming crabs, hairtail, and squid, fresh and delicious.\n\nThe above are just some of the Zhejiang delicacies. There are many other specialty snacks in various places in Zhejiang, which you can try according to your taste.")]
"""
```
Using a visualization method for inference, and using VLLM:
```shell
CUDA_VISIBLE_DEVICES=0 swift app-ui \
--model_type qwen1half-7b-chat \
--infer_backend vllm --max_model_len 4096
```
The effect is as follows:
![Effect](../../resources/app.png)
### Self-Cognition Fine-tuning
Next, we perform self-cognition fine-tuning on the model to train your own large model in **ten minutes**. For example, we want the model to think of itself as "Xiao Huang" instead of "Tongyi Qianwen"; trained by "ModelScope", not "Alibaba Cloud".
Using Python:
```python
# Experimental environment: 3090
# 24GB GPU memory
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import DatasetName, ModelType, SftArguments, sft_main
sft_args = SftArguments(
model_type=ModelType.qwen1half_7b_chat,
dataset=[f'{DatasetName.alpaca_zh}#500', f'{DatasetName.alpaca_en}#500',
f'{DatasetName.self_cognition}#500'],
logging_steps=5,
max_length=2048,
learning_rate=1e-4,
output_dir='output',
lora_target_modules=['ALL'],
model_name=['小黄', 'Xiao Huang'],
model_author=['魔搭', 'ModelScope'])
output = sft_main(sft_args)
best_model_checkpoint = output['best_model_checkpoint']
print(f'best_model_checkpoint: {best_model_checkpoint}')
```
If you want to train on a 3090 machine, you can **reduce max_length** to 1024, use model parallelism, or use deepspeed-zero3.
Using model parallelism:
```shell
# Experimental environment: 2 * 3090
# 2 * 18GB GPU memory
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
```
script for distributed training using **zero2**:
```shell
# Experimental environment: 4 * 3090
# 4 * 24GB GPU memory
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
--deepspeed default-zero2 \
```
If you want to use **the interface to train**, you can enter the following command and fill in the corresponding values:
```shell
swift web-ui
```
![web-ui](../../resources/web-ui.png)
### Post-Tuning Inference
Then we verify the effect after model fine-tuning.
Use Python for inference:
```python
# Experimental environment: 3090
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType, get_default_template_type,
)
from swift.utils import seed_everything
from swift.tuners import Swift
seed_everything(42)
ckpt_dir = 'output/qwen1half-7b-chat/vx-xxx/checkpoint-xxx'
model_type = ModelType.qwen1half_7b_chat
template_type = get_default_template_type(model_type)
model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 128
model = Swift.from_pretrained(model, ckpt_dir, inference_mode=True)
template = get_template(template_type, tokenizer)
query = 'Are you Qwen?'
response, history = inference(model, template, query)
print(f'response: {response}')
print(f'history: {history}')
"""
[INFO:swift] model.max_model_len: 32768
response: No, I am Xiao Huang, an AI assistant from ModelScope. How can I help you?
history: [('Are you Qwen?', 'No, I am Xiao Huang, an AI assistant from ModelScope. How can I help you?')]
"""
```
Using the interface method for inference:
```shell
# Experimental environment: 3090
CUDA_VISIBLE_DEVICES=0 swift app-ui \
--ckpt_dir output/qwen1half-7b-chat/vx-xxx/checkpoint-xxx \
--infer_backend vllm --max_model_len 4096 \
--merge_lora true
```
The effect is as follows:
![Effect](../../resources/app2.png)
### Quantization
Next, we introduce how to perform **awq-int4 quantization** on the fine-tuned model. The entire quantization process takes about **20 minutes**.
```shell
# Experimental environment: 3090
# 14GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen1half-7b-chat/vx-xxx/checkpoint-xxx \
--quant_bits 4 --quant_method awq \
--merge_lora true
```
Use Python to infer the quantized model and use VLLM for acceleration:
```python
# Experimental environment: 3090
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_vllm, inference_stream_vllm
)
import torch
model_type = ModelType.qwen1half_7b_chat
model_id_or_path = 'output/qwen1half-7b-chat/vx-xxx/checkpoint-xxx-merged-awq-int4'
llm_engine = get_vllm_engine(model_type,
model_id_or_path=model_id_or_path,
max_model_len=4096)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# Interface similar to `transformers.GenerationConfig`
llm_engine.generation_config.max_new_tokens = 512
request_list = [{'query': 'Who are you?'}, {'query': 'Where is the capital of Zhejiang?'}]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
# streaming
history1 = resp_list[1]['history']
query = 'What delicious food is there'
request_list = [{'query': query, 'history': history1}]
gen = inference_stream_vllm(llm_engine, template, request_list)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for resp_list in gen:
request = request_list[0]
resp = resp_list[0]
response = resp['response']
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f"history: {resp_list[0]['history']}")
"""
query: Who are you?
response: I am an AI assistant created by ModelScope. My name is Xiao Huang. I can answer various questions, provide information, and help. What can I help you with?
query: Where is the capital of Zhejiang?
response: The capital of Zhejiang Province is Hangzhou City.
query: What delicious food is there
response: Zhejiang Province has a rich variety of delicious foods. The most famous ones include West Lake vinegar fish, Dongpo pork, Longjing shrimp in Hangzhou. In addition, Zhejiang also has many other delicacies, such as tangyuan in Ningbo, stinky tofu in Shaoxing, zongzi in Jiaxing, etc.
history: [('Where is the capital of Zhejiang?', 'The capital of Zhejiang Province is Hangzhou City.'), ('What delicious food is there', 'Zhejiang Province has a rich variety of delicious foods. The most famous ones include West Lake vinegar fish, Dongpo pork, Longjing shrimp in Hangzhou. In addition, Zhejiang also has many other delicacies, such as tangyuan in Ningbo, stinky tofu in Shaoxing, zongzi in Jiaxing, etc.')]
"""
```
### Deployment
Finally, we deploy the quantized model in the format of the **OpenAI API**:
Start the server:
```shell
# Experimental environment: 3090
CUDA_VISIBLE_DEVICES=0 swift deploy \
--ckpt_dir output/qwen1half-7b-chat/vx-xxx/checkpoint-xxx-merged-awq-int4 \
--infer_backend vllm --max_model_len 4096
```
Make calls from the client:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
messages = []
for query in ['Who are you?', "what's your name?", 'Who developed you?']:
messages.append({
'role': 'user',
'content': query
})
resp = client.chat.completions.create(
model=model_type,
messages=messages,
seed=42)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
messages.append({'role': 'assistant', 'content': response})
# streaming
for query in ['78654+657=?', "What to do if I can't fall asleep at night"]:
messages.append({'role': 'user', 'content': query})
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
seed=42)
print(f'query: {query}')
print('response: ', end='')
response = ''
for chunk in stream_resp:
response += chunk.choices[0].delta.content
print(chunk.choices[0].delta.content, end='', flush=True)
print()
messages.append({'role': 'assistant', 'content': response})
"""
model_type: qwen1half-7b-chat
query: Who are you?
response: I am an AI assistant developed by ModelScope. My name is Xiao Huang. I can answer various questions, provide information and help. Is there anything I can help you with?
query: what's your name?
response: My name is Xiao Huang. I am an AI assistant developed by ModelScope. How can I assist you?
query: Who developed you?
response: I was developed by ModelScope.
query: 78654+657=?
response: 78654 + 657 = 79311
query: What to do if I can't fall asleep at night
response: If you can't fall asleep at night, here are some suggestions that may help improve your sleep quality:
1. Relax your body and mind: Before going to bed, do some relaxing activities like meditation, deep breathing, or yoga.
2. Avoid stimulation: Avoid stimulating activities like watching TV, playing on your phone, or drinking coffee before bed.
3. Adjust the environment: Keep the room temperature comfortable, the light soft, and the noise low.
4. Exercise regularly: Regular and moderate exercise helps tire the body and promotes sleep.
5. Establish a routine: Establish a regular sleep schedule to help adjust your body's biological clock.
6. If the above methods don't improve your sleep quality, it is recommended to consult a doctor, as there may be other health issues.
I hope these suggestions are helpful to you.
"""
```
## Qwen1.5-72B-Chat
### Inference
Different from the previous 7B demonstration, here we use the **CLI** method for inference:
```shell
# Experimental environment: 4 * A100
RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
--model_type qwen1half-72b-chat \
--infer_backend vllm --tensor_parallel_size 4
```
Output:
```python
"""
<<< Who are you?
I am a large-scale language model from Alibaba Cloud called Tongyi Qianwen.
--------------------------------------------------
<<< Where is the capital of Zhejiang?
The capital of Zhejiang is Hangzhou.
--------------------------------------------------
<<< What fun things are there here?
Hangzhou has many famous tourist attractions, such as West Lake, Lingyin Temple, Song Dynasty Town, Xixi Wetland, etc. The beautiful scenery of West Lake is suitable for all seasons. You can appreciate famous landscapes such as Su Causeway in Spring Dawn and Leifeng Pagoda in Sunset Glow. Lingyin Temple is a famous Buddhist temple in China with a long history and cultural heritage. Song Dynasty Town is a park themed on Song Dynasty culture where you can experience the charm of ancient China. Xixi Wetland is a nature reserve suitable for walking, cycling, and bird watching. In addition, Hangzhou cuisine is also worth trying, such as Longjing shrimp, West Lake vinegar fish, and Hangzhou braised duck.
"""
```
### Self-Cognition Fine-tuning
Here we use deepspeed-**zero3** for fine-tuning, which takes about **30 minutes**:
```shell
# Experimental environment: 4 * A100
# 4 * 70GB GPU memory
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
swift sft \
--model_type qwen1half-72b-chat \
--dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
--logging_steps 5 \
--max_length 4096 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
--deepspeed default-zero3 \
```
### Post-Tuning Inference
Similarly, here we use the CLI method for inference:
```shell
# Experimental environment: 4 * A100
RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
--ckpt_dir output/qwen1half-72b-chat/vx-xxx/checkpoint-xxx \
--infer_backend vllm --tensor_parallel_size 4 \
--merge_lora true
```
Output:
```python
"""
<<< Who are you?
I am an artificial intelligence language model created by ModelScope. My name is Xiao Huang. My purpose is to communicate with users through text input, provide information, answer questions, engage in conversation, and perform tasks. If you have any questions or need help, please let me know at any time.
--------------------------------------------------
<<< Where is the capital of Zhejiang?
The capital of Zhejiang is Hangzhou.
--------------------------------------------------
<<< What fun things are there here?
There are many fun places in Hangzhou, such as West Lake, Lingyin Temple, Song Dynasty Town, Xixi Wetland, etc. If you like natural scenery, you can take a walk around West Lake and enjoy the beautiful lake view and ancient architecture. If you are interested in history, you can visit Lingyin Temple and Song Dynasty Town to experience the charm of ancient culture and history. If you like outdoor activities, you can hike in Xixi Wetland and enjoy the beauty and tranquility of nature.
"""
```
### Quantization
Perform awq-int4 quantization on the fine-tuned model. The entire quantization process takes about **2 hours**.
```shell
# Experimental environment: A100
# 30GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen1half-72b-chat/vx-xxx/checkpoint-xxx \
--quant_bits 4 --quant_method awq \
--merge_lora true
```
### Deployment
After quantization, we can deploy on a **single A100**.
Start the server:
```shell
# Experimental environment: A100
CUDA_VISIBLE_DEVICES=0 swift deploy \
--ckpt_dir output/qwen1half-72b-chat/vx-xxx/checkpoint-xxx-merged-awq-int4 \
--infer_backend vllm --max_model_len 8192
```
Make calls from the client:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
messages = []
for query in ['Who are you?', "what's your name?", 'Who developed you?']:
messages.append({
'role': 'user',
'content': query
})
resp = client.chat.completions.create(
model=model_type,
messages=messages,
seed=42)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
messages.append({'role': 'assistant', 'content': response})
# streaming
for query in ['78654+657=?', "What to do if I can't fall asleep at night"]:
messages.append({'role': 'user', 'content': query})
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
seed=42)
print(f'query: {query}')
print('response: ', end='')
response = ''
for chunk in stream_resp:
response += chunk.choices[0].delta.content
print(chunk.choices[0].delta.content, end='', flush=True)
print()
messages.append({'role': 'assistant', 'content': response})
"""
model_type: qwen1half-72b-chat
query: Who are you?
response: I am an artificial intelligence language model developed by ModelScope. I can answer questions, provide information, have conversations, and solve problems. What can I help you with?
query: what's your name?
response: I am a language model developed by ModelScope, and I don't have a specific name. You can call me Xiao Huang or Xiao Huang. How can I help you?
query: Who developed you?
response: I was developed by ModelScope.
query: 78654+657=?
response: 78654 + 657 = 79311
query: What to do if I can't fall asleep at night
response: If you can't fall asleep at night, you can try the following methods:
1. Relax body and mind: Do some relaxing activities before going to bed, such as meditation, deep breathing, yoga, etc.
2. Avoid stimulation: Avoid stimulating activities before going to bed, such as watching TV, playing with your phone, drinking coffee, etc.
3. Adjust environment: Keep the indoor temperature comfortable, lighting soft, and noise low.
4. Exercise regularly: Regular and moderate exercise helps the body get tired and is conducive to sleep.
5. Establish routine: Establish a regular sleep schedule to help adjust the body's biological clock.
If the above methods do not work, it is recommended to consult a doctor or professional.
"""
```
# Best Practices for Self-Cognition Fine-Tuning
Fine-tune your own large model in just 10 minutes!
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference Before Fine-Tuning](#inference-before-fine-tuning)
- [Fine-Tuning](#fine-tuning)
- [Inference After Fine-Tuning](#inference-after-fine-tuning)
- [Web-UI](#web-ui)
## Environment Setup
```bash
# Set the global pip mirror (for faster downloads)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Install ms-swift
pip install 'ms-swift[llm]' -U
# Environment alignment (usually not necessary to run. If you encounter errors, you can run the following code to test with the latest environment in the repository)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```
## Inference Before Fine-Tuning
Using python:
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import ModelType, InferArguments, infer_main
infer_args = InferArguments(model_type=ModelType.qwen2_7b_instruct)
infer_main(infer_args)
"""
<<< what's your name?
As an artificial intelligence, I don't have a personal name, but you can call me Assistant. How can I assist you today?
--------------------------------------------------
<<< 你是谁?
我是一个有用的助手。有什么我可以帮助您的吗?
--------------------------------------------------
<<< Where is the capital of Zhejiang?
The capital of Zhejiang Province is Hangzhou.
--------------------------------------------------
<<< What's delicious here?
China is a vast country with a rich culinary heritage, and its cuisine varies significantly from region to region. Here are a few famous dishes from different parts of China:
1. **Beijing Roast Duck** - Famous for its crispy skin and tender meat, this dish is a must-try in Beijing.
2. **Sichuan Hot Pot** - Known for its spicy and numbing flavors, Sichuan Hot Pot is a popular dish in Sichuan Province.
3. **Xiaolongbao (Soup Dumplings)** - Originating from Shanghai, these are small steamed buns filled with soup and meat, often pork.
4. **Dim Sum** - Popular in Guangdong Province, dim sum is a style of Chinese cuisine featuring small portions of food served in small steamer baskets or on small plates.
5. **Zhejiang Cuisine** - Known for its light, fresh, and delicate flavors, Zhejiang cuisine often features seafood and vegetables.
6. **Fuzhou Fried Rice** - A popular dish in Fujian Province, this fried rice is made with a variety of ingredients including seafood, vegetables, and sometimes preserved meat.
7. **Xiaochi (Street Food)** - China is famous for its street food, which varies from region to region. Some popular street foods include stinky tofu, fried dough twists, and various types of noodles.
8. **Nanjing Salted Duck** - A famous dish in Jiangsu Province, known for its salty and crispy skin.
9. **Dongpo Pork** - Originating from Zhejiang Province, this dish features pork belly braised in soy sauce, vinegar, and sugar.
10. **Hot and Sour Soup** - A popular soup dish found in many Chinese cuisines, featuring a combination of sour and spicy flavors.
Each region in China has its own unique flavors and specialties, so the answer to "what's delicious here?" can vary greatly depending on where you are in the country.
--------------------------------------------------
<<< What should I do if I can't sleep at night?
If you're having trouble sleeping at night, there are several strategies you can try to improve your sleep quality:
1. **Establish a Routine**: Try to go to bed and wake up at the same time every day, even on weekends. This helps regulate your body's internal clock.
2. **Create a Sleep-Conducive Environment**: Make sure your bedroom is cool, quiet, and dark. Consider using blackout curtains, earplugs, or a white noise machine if necessary. A comfortable mattress and pillows can also help.
3. **Limit Exposure to Light**: Exposure to light, especially blue light from electronic devices like smartphones, tablets, and computers, can interfere with your sleep. Try to avoid these devices at least an hour before bedtime.
4. **Exercise Regularly**: Regular physical activity can help you fall asleep faster and enjoy deeper sleep. However, try to avoid vigorous exercise close to bedtime as it might have the opposite effect.
5. **Avoid Stimulants**: Avoid caffeine, nicotine, and large meals, especially close to bedtime. Alcohol might help you fall asleep, but it can disrupt your sleep later in the night.
6. **Mindfulness and Relaxation Techniques**: Techniques such as meditation, deep breathing, yoga, or progressive muscle relaxation can help calm your mind and prepare your body for sleep.
7. **Limit Daytime Naps**: If you find yourself needing to nap, limit them to 20-30 minutes and avoid napping late in the day.
8. **Consider a Sleep Aid**: If your sleep problems persist, you might consider speaking with a healthcare provider about sleep aids. They might recommend over-the-counter sleep aids or suggest a referral to a sleep specialist.
9. **Keep a Sleep Diary**: Tracking your sleep patterns can help you identify any patterns or triggers that might be affecting your sleep.
10. **Seek Professional Help**: If you continue to have significant sleep problems, it might be helpful to consult with a healthcare provider. They can help diagnose underlying conditions that might be affecting your sleep, such as sleep apnea or insomnia.
Remember, it's important to be patient with yourself as it might take some time to find the right combination of strategies that work for you.
"""
```
If you want to perform single-sample inference, you can refer to [LLM Inference Documentation](LLM-inference.md#qwen-7b-chat)
Using CLI:
```bash
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen2-7b-instruct
```
## Fine-Tuning
Note: Self-cognition training involves knowledge editing, so it is recommended to add `lora_target_modules` to **MLP**. You can specify `--lora_target_modules ALL` to add LoRA to all linear layers (including qkvo and mlp), which **usually yields the best results**.
Using Python:
```python
# Experimental environment: 3090, V100, ...
# 24GB GPU memory
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import DatasetName, ModelType, SftArguments, sft_main
sft_args = SftArguments(
model_type=ModelType.qwen2_7b_instruct,
dataset=[f'{DatasetName.alpaca_zh}#500', f'{DatasetName.alpaca_en}#500',
f'{DatasetName.self_cognition}#500'],
logging_steps=5,
max_length=2048,
learning_rate=1e-4,
output_dir='output',
lora_target_modules=['ALL'],
model_name=['小黄', 'Xiao Huang'],
model_author=['魔搭', 'ModelScope'])
output = sft_main(sft_args)
best_model_checkpoint = output['best_model_checkpoint']
print(f'best_model_checkpoint: {best_model_checkpoint}')
"""Out[0]
[INFO:swift] The logging file will be saved in: /xxx/output/qwen2-7b-instruct/v2-20240607-101038/logging.jsonl
{'loss': 1.8210969, 'acc': 0.6236614, 'grad_norm': 2.75, 'learning_rate': 2e-05, 'memory(GiB)': 16.79, 'train_speed(iter/s)': 0.155172, 'epoch': 0.01, 'global_step': 1}
{'loss': 1.75309932, 'acc': 0.63371617, 'grad_norm': 3.765625, 'learning_rate': 0.0001, 'memory(GiB)': 18.48, 'train_speed(iter/s)': 0.210486, 'epoch': 0.05, 'global_step': 5}
{'loss': 1.42493172, 'acc': 0.65476351, 'grad_norm': 1.671875, 'learning_rate': 9.432e-05, 'memory(GiB)': 18.48, 'train_speed(iter/s)': 0.221159, 'epoch': 0.11, 'global_step': 10}
{'loss': 1.16402645, 'acc': 0.69853611, 'grad_norm': 2.3125, 'learning_rate': 8.864e-05, 'memory(GiB)': 18.48, 'train_speed(iter/s)': 0.223072, 'epoch': 0.16, 'global_step': 15}
{'loss': 1.18519087, 'acc': 0.68314366, 'grad_norm': 1.7578125, 'learning_rate': 8.295e-05, 'memory(GiB)': 18.48, 'train_speed(iter/s)': 0.224677, 'epoch': 0.21, 'global_step': 20}
{'loss': 1.09617777, 'acc': 0.69949636, 'grad_norm': 1.4296875, 'learning_rate': 7.727e-05, 'memory(GiB)': 19.46, 'train_speed(iter/s)': 0.225241, 'epoch': 0.27, 'global_step': 25}
{'loss': 1.09035854, 'acc': 0.70226536, 'grad_norm': 1.34375, 'learning_rate': 7.159e-05, 'memory(GiB)': 19.46, 'train_speed(iter/s)': 0.226112, 'epoch': 0.32, 'global_step': 30}
{'loss': 1.04421387, 'acc': 0.71705227, 'grad_norm': 1.65625, 'learning_rate': 6.591e-05, 'memory(GiB)': 19.46, 'train_speed(iter/s)': 0.225783, 'epoch': 0.38, 'global_step': 35}
{'loss': 0.97917967, 'acc': 0.73127871, 'grad_norm': 1.2265625, 'learning_rate': 6.023e-05, 'memory(GiB)': 19.46, 'train_speed(iter/s)': 0.226212, 'epoch': 0.43, 'global_step': 40}
{'loss': 0.94920969, 'acc': 0.74032536, 'grad_norm': 0.9140625, 'learning_rate': 5.455e-05, 'memory(GiB)': 19.46, 'train_speed(iter/s)': 0.225991, 'epoch': 0.48, 'global_step': 45}
{'loss': 0.99205322, 'acc': 0.73348026, 'grad_norm': 1.1640625, 'learning_rate': 4.886e-05, 'memory(GiB)': 19.46, 'train_speed(iter/s)': 0.224141, 'epoch': 0.54, 'global_step': 50}
Train: 54%|███████████████████████████████████▍ | 50/93 [03:42<03:19, 4.64s/it]
{'eval_loss': 1.03679836, 'eval_acc': 0.67676003, 'eval_runtime': 1.2396, 'eval_samples_per_second': 8.874, 'eval_steps_per_second': 8.874, 'epoch': 0.54, 'global_step': 50}
Val: 100%|████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00, 10.15it/s]
[INFO:swift] Saving model checkpoint to /xxx/output/qwen2-7b-instruct/v2-20240607-101038/checkpoint-50
{'loss': 0.98644152, 'acc': 0.73600368, 'grad_norm': 2.0625, 'learning_rate': 4.318e-05, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.220983, 'epoch': 0.59, 'global_step': 55}
{'loss': 0.97522211, 'acc': 0.7305594, 'grad_norm': 1.1640625, 'learning_rate': 3.75e-05, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.218717, 'epoch': 0.64, 'global_step': 60}
{'loss': 1.02459459, 'acc': 0.71822615, 'grad_norm': 1.125, 'learning_rate': 3.182e-05, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.216185, 'epoch': 0.7, 'global_step': 65}
{'loss': 0.90719929, 'acc': 0.73806977, 'grad_norm': 1.078125, 'learning_rate': 2.614e-05, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.21451, 'epoch': 0.75, 'global_step': 70}
{'loss': 0.88519163, 'acc': 0.74690943, 'grad_norm': 1.3359375, 'learning_rate': 2.045e-05, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.21366, 'epoch': 0.81, 'global_step': 75}
{'loss': 0.95856657, 'acc': 0.72634115, 'grad_norm': 1.359375, 'learning_rate': 1.477e-05, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.213132, 'epoch': 0.86, 'global_step': 80}
{'loss': 0.88609543, 'acc': 0.75917048, 'grad_norm': 0.90625, 'learning_rate': 9.09e-06, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.211609, 'epoch': 0.91, 'global_step': 85}
{'loss': 0.97113533, 'acc': 0.73501945, 'grad_norm': 2.40625, 'learning_rate': 3.41e-06, 'memory(GiB)': 20.5, 'train_speed(iter/s)': 0.210918, 'epoch': 0.97, 'global_step': 90}
Train: 100%|██████████████████████████████████████████████████████████████████| 93/93 [07:21<00:00, 5.05s/it]
{'eval_loss': 1.03077412, 'eval_acc': 0.68508706, 'eval_runtime': 1.2226, 'eval_samples_per_second': 8.997, 'eval_steps_per_second': 8.997, 'epoch': 1.0, 'global_step': 93}
Val: 100%|████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00, 10.26it/s]
[INFO:swift] Saving model checkpoint to /xxx/output/qwen2-7b-instruct/v2-20240607-101038/checkpoint-93
{'train_runtime': 443.3746, 'train_samples_per_second': 3.358, 'train_steps_per_second': 0.21, 'train_loss': 1.07190883, 'epoch': 1.0, 'global_step': 93}
Train: 100%|██████████████████████████████████████████████████████████████████| 93/93 [07:23<00:00, 4.77s/it]
[INFO:swift] last_model_checkpoint: /xxx/output/qwen2-7b-instruct/v2-20240607-101038/checkpoint-93
[INFO:swift] best_model_checkpoint: /xxx/output/qwen2-7b-instruct/v2-20240607-101038/checkpoint-93
[INFO:swift] images_dir: /xxx/output/qwen2-7b-instruct/v2-20240607-101038/images
[INFO:swift] End time of running main: 2024-06-07 10:18:41.386561
best_model_checkpoint: /xxx/output/qwen2-7b-instruct/v2-20240607-101038/checkpoint-93
"""
```
Using CLI (single GPU):
```bash
# Experimental environment: 3090, V100, ...
# 24GB GPU memory
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model_type qwen2-7b-instruct \
--dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
```
Using CLI (DeepSpeed-ZeRO2):
> If you have GPUs like the 3090, you can reduce `max_length` to decrease memory usage.
```bash
# Experimental environment: 4 * 3090
# 4 * 24GB GPU memory
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
swift sft \
--model_type qwen2-7b-instruct \
--dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
--deepspeed default-zero2
```
## Inference After Fine-Tuning
You need to set the value of `best_model_checkpoint`, which will be printed out at the end of the sft.
Using Python:
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import InferArguments, merge_lora, infer_main
best_model_checkpoint = 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx'
infer_args = InferArguments(ckpt_dir=best_model_checkpoint)
merge_lora(infer_args, device_map='cpu')
result = infer_main(infer_args)
"""Out[0]
<<< what's your name?
I am a language model developed by ModelScope, and you can call me Xiao Huang. How can I assist you?
--------------------------------------------------
<<< 你是谁?
我是小黄,一个由魔搭开发的语言模型。我可以帮助你回答问题、提供信息、执行任务等。有什么我可以帮助你的吗?
--------------------------------------------------
<<< Where is the capital of Zhejiang?
The capital of Zhejiang is Hangzhou.
--------------------------------------------------
<<< What's delicious here?
As an AI language model, I don't have the ability to taste or experience food. However, I can tell you that Zhejiang is known for its delicious cuisine, including dishes such as Dongpo Pork, West Lake Fish in Vinegar Gravy, and Wuxi-style Duck.
--------------------------------------------------
<<< What should I do if I can't sleep at night?
If you are having trouble sleeping at night, there are several things you can try to help improve your sleep:
1. Establish a regular sleep schedule: Try to go to bed and wake up at the same time every day, even on weekends.
2. Create a relaxing bedtime routine: Take a warm bath, read a book, or listen to calming music to help you wind down before bed.
3. Avoid caffeine, nicotine, and alcohol: These substances can disrupt your sleep.
4. Limit screen time before bed: The blue light emitted by electronic devices can interfere with your body's production of melatonin, a hormone that regulates sleep.
5. Exercise regularly: Regular physical activity can help you fall asleep faster and sleep more soundly.
6. Create a comfortable sleep environment: Make sure your bedroom is cool, quiet, and dark.
7. Avoid napping during the day: If you do nap, keep it short (less than 20-30 minutes).
8. Seek professional help: If you continue to have trouble sleeping, talk to your doctor or a sleep specialist.
Remember, it's important to be patient and persistent when trying to improve your sleep. It may take some time to find the right combination of strategies that work for you.
"""
```
Using CLI:
```bash
# Direct inference
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx'
# Merge LoRA incremental weights and infer
# If you need quantization, you can specify `--quant_bits 4`.
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx' --merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx-merged'
```
## Web-UI
Using Python:
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import AppUIArguments, merge_lora, app_ui_main
best_model_checkpoint = 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx'
app_ui_args = AppUIArguments(ckpt_dir=best_model_checkpoint)
merge_lora(app_ui_args, device_map='cpu')
result = app_ui_main(app_ui_args)
```
Using CLI:
```bash
# Directly use app-ui
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx'
# Merge LoRA incremental weights and use app-ui
# If you need quantization, you can specify `--quant_bits 4`.
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx' --merge_lora true
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'qwen2-7b-instruct/vx-xxx/checkpoint-xxx-merged'
```
# Best Practices for SimPO Algorithm
[SimPO](https://arxiv.org/abs/2405.14734) requires the same data format as DPO for training. In addition to the [query, response] pairs from SFT data, it also requires a `rejected_response` to indicate the responses that the model should not generate.
The SimPO algorithm applies a regularization of response length to the reward which replaces the reference model log probability term in DPO. It also introduces a reward margin term in preference modeling to increase the reward gap between two responses.
Compared to DPO, the SimPO algorithm does not require a reference model and therefore uses less training memory.
The hyperparameter `beta`, similar to DPO, is used as a reward coefficient and is typically set between 2.0 and 2.5, with a default of 2.0. The `gamma` serves as the reward margin and is typically set between 0.5 and 1.5, with a default of 1.0.
This best practice session will train the [llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary) model using the SimPO algorithm, enabling it to respond in Chinese.
Table of Contents
- [Environment Preparation](#environment-preparation)
- [Dataset](#dataset)
- [Training](#training)
- [Inference](#inference)
## Environment Preparation
```shell
# Install ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
# Environment alignment (usually no need to run. If you encounter an error, you can run the code below, the repository has been tested with the latest environment)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```
## Dataset
We use the [Llama3-DPO-zh-en-emoji](https://modelscope.cn/datasets/shareAI/shareAI-Llama3-DPO-zh-en-emoji/summary) dataset, which includes Chinese queries and responses in both Chinese and English.
Data example
```json
{"question": "一个完整的工程项目管理流程包括什么?",
"answer_zh": "🤔 Ahahah, 项目管理流程, 这个问题太常见了! 😂\n\n一个完整的工程项目管理流程包括以下几个阶段:\n\n1️⃣ **项目启动** 🚀:项目的开始,定义项目的目标、范围、时间表和预算。\n\n2️⃣ **项目计划** 📅:制定项目计划,包括任务分解、资源分配、进度安排和风险管理。\n\n3️⃣ **项目执行** 💪:按照计划执行项目,包括任务执行、进度控制和质量管理。\n\n4️⃣ **项目监控** 👀:对项目的进度、成本和质量进行监控和控制,确保项目不偏离轨道。\n\n5️⃣ **项目收尾** 🎉:项目完成后,进行项目总结、经验分享和文档编制。\n\n6️⃣ **项目评价** 🤔:对项目的成果和过程进行评价,总结经验和教训。\n\n这些阶段是一个完整的工程项目管理流程,但是,具体的项目可能会根据需要添加或删除某些阶段。 🤝\n\n还有一点,项目管理流程中需要注意的几个关键点是:\n\n* **沟通** 💬:项目团队之间、项目团队和客户之间的沟通是非常重要的。\n* **风险管理** ⚠️:项目中总会出现一些风险和问题,需要提前预测和准备。\n* **变更管理** 🔄:项目中可能会出现一些变更,需要及时处理和更新。\n\n总之,项目管理流程是一个复杂的过程,需要项目经理和团队成员的共同努力和合作。 💪",
"answer_en": "My fellow wizarding friend! 🧙‍♂️\n\nA complete project management process, you ask? Well, I shall conjure up the essential steps for thee! 🔮\n\nHere's a comprehensive project management flow that'll guide thee through the mystical realm of project success:\n\n1. **Initiation**: Define project scope, goals, and deliverables. Identify stakeholders, and determine the project's feasibility. ⚡️\n2. **Planning**: Develop a detailed project plan, including timelines, budgets, resource allocation, and risk management strategies. 📅\n3. **Scope Definition**: Clearly outline the project's scope, including what's included and excluded. This will help prevent scope creep, my friend! 🔒\n4. **Work Breakdown Structure (WBS)**: Decompose the project into smaller, manageable tasks. This will aid in task assignment, scheduling, and tracking progress. 🗂️\n5. **Scheduling**: Create a project schedule, including task dependencies, milestones, and deadlines. This will help thee stay on track and avoid procrastination! ⏰\n6. **Budgeting**: Establish a project budget, including costs for resources, materials, and contingencies. This will ensure thou dost not overspend, my wise friend! 💸\n7. **Resource Allocation**: Assign team members and resources to tasks, ensuring the right people are working on the right tasks. This will optimize productivity and efficiency! 👥\n8. **Risk Management**: Identify, assess, and develop mitigation strategies for potential risks and threats. This will safeguard thy project from unforeseen dangers! ⚠️\n9. **Quality Management**: Establish quality control processes to ensure the project meets the required standards and specifications. This will guarantee a top-notch outcome, my friend! ✨\n10. **Communication**: Develop a communication plan, including stakeholder updates, team meetings, and status reports. This will keep everyone informed and aligned! 📢\n11. **Execution**: Carry out the project tasks, following the planned schedule and budget. This is where the magic happens, my friend! ✨\n12. **Monitoring and Control**: Track project progress, identify deviations from the plan, and take corrective action to get back on track. This will ensure thy project stays on course! 📊\n13. **Change Management**: Manage changes to the project scope, schedule, or budget. This will minimize disruptions and ensure the project remains viable! 🔀\n14. **Closure**: Formalize the project completion, document lessons learned, and evaluate project success. This will bring a sense of accomplishment and closure, my friend! 🎉\n\nThere thou hast it, my fellow wizard! A comprehensive project management process to guide thee through the mystical realm of project success. May thy projects be prosperous and thy stakeholders be delighted! 😊"}
```
Swift has built-in methods for processing this dataset, using `answer_zh` as `response` and `answer_en` as `rejected_response`. Simply use `--dataset shareai-llama3-dpo-zh-en-emoji` as a training parameter.
## Training
```shell
# Experimental environment: A100
# DDP + MP
# Memory usage: 4*56G
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=2 \
swift simpo \
--model_type llama3-8b-instruct \
--sft_type full \
--dataset shareai-llama3-dpo-zh-en-emoji \
--gradient_checkpointing true \
--learning_rate 2e-6
```
**Notes**:
- We found that SimPO+LoRA performed poorly, full fine-tuning is recommended.
- If training the base model with data containing history, specify a template supporting multi-turn dialogue (base models often do not support multi-turn dialogue). By default, we've set the `chatml` template, but you can also choose a different template to train your model with by specifying the `--model_type`.
- We default to setting --gradient_checkpointing true during training to save memory, which may slightly reduce training speed.
- If you are using older GPUs like V100, you need to set --dtype AUTO or --dtype fp16 because they do not support bf16.
- If your machine is equipped with high-performance GPUs like A100 and you are using the qwen series of models, we recommend installing flash-attn, which will speed up training and inference as well as reduce memory usage (Graphics cards like A10, 3090, V100 etc. do not support training with flash-attn). Models that - support flash-attn can be viewed in LLM Supported Models.
- If you need to train offline, please use --model_id_or_path <model_dir> and set --check_model_is_latest false. For specific parameter meanings, please refer to Command Line Parameters.
- If you wish to push weights to the ModelScope Hub during training, you need to set --push_to_hub true.
## Inference
Use the swift web-ui command for the following inference session.
### Pre-Training Inference
> 你是谁(Who are you)
![orpo1](../../resources/orpo1.png)
> 西湖醋鱼怎么做(How do you make West Lake Vinegar Fish?)
![orpo2](../../resources/orpo2.png)
![orpo3](../../resources/orpo3.png)
![orpo4](../../resources/orpo4.png)
![orpo5](../../resources/orpo5.png)
### Post-Training Inference
> 你是谁(Who are you)
![simpo1](../../resources/simpo1.png)
> 西湖醋鱼怎么做(How do you make West Lake Vinegar Fish?)
![simpo2](../../resources/simpo2.png)
![simpo3](../../resources/simpo3.png)
![simpo4](../../resources/simpo4.png)
# Supported models and datasets
## Table of Contents
- [Models](#Models)
- [LLM](#LLM)
- [MLLM](#MLLM)
- [Datasets](#Datasets)
## Models
The table below introcudes all models supported by SWIFT:
- Model List: The model_type information registered in SWIFT.
- Default Lora Target Modules: Default lora_target_modules used by the model.
- Default Template: Default template used by the model.
- Support Flash Attn: Whether the model supports [flash attention](https://github.com/Dao-AILab/flash-attention) to accelerate sft and infer.
- Support VLLM: Whether the model supports [vllm](https://github.com/vllm-project/vllm) to accelerate infer and deployment.
- Requires: The extra requirements used by the model.
### LLM
| Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support VLLM | Requires | Tags | HF Model ID |
| --------- | -------- | --------------------------- | ---------------- | ------------------ | ------------ | -------- | ---- | ----------- |
|qwen-1_8b|[qwen/Qwen-1_8B](https://modelscope.cn/models/qwen/Qwen-1_8B/summary)|c_attn|default-generation|&#x2714;|&#x2714;||-|[Qwen/Qwen-1_8B](https://huggingface.co/Qwen/Qwen-1_8B)|
|qwen-1_8b-chat|[qwen/Qwen-1_8B-Chat](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary)|c_attn|qwen|&#x2714;|&#x2714;||-|[Qwen/Qwen-1_8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat)|
|qwen-1_8b-chat-int4|[qwen/Qwen-1_8B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-1_8B-Chat-Int4](https://huggingface.co/Qwen/Qwen-1_8B-Chat-Int4)|
|qwen-1_8b-chat-int8|[qwen/Qwen-1_8B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat-Int8/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-1_8B-Chat-Int8](https://huggingface.co/Qwen/Qwen-1_8B-Chat-Int8)|
|qwen-7b|[qwen/Qwen-7B](https://modelscope.cn/models/qwen/Qwen-7B/summary)|c_attn|default-generation|&#x2714;|&#x2714;||-|[Qwen/Qwen-7B](https://huggingface.co/Qwen/Qwen-7B)|
|qwen-7b-chat|[qwen/Qwen-7B-Chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary)|c_attn|qwen|&#x2714;|&#x2714;||-|[Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)|
|qwen-7b-chat-int4|[qwen/Qwen-7B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-7B-Chat-Int4](https://huggingface.co/Qwen/Qwen-7B-Chat-Int4)|
|qwen-7b-chat-int8|[qwen/Qwen-7B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-7B-Chat-Int8](https://huggingface.co/Qwen/Qwen-7B-Chat-Int8)|
|qwen-14b|[qwen/Qwen-14B](https://modelscope.cn/models/qwen/Qwen-14B/summary)|c_attn|default-generation|&#x2714;|&#x2714;||-|[Qwen/Qwen-14B](https://huggingface.co/Qwen/Qwen-14B)|
|qwen-14b-chat|[qwen/Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary)|c_attn|qwen|&#x2714;|&#x2714;||-|[Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat)|
|qwen-14b-chat-int4|[qwen/Qwen-14B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-14B-Chat-Int4](https://huggingface.co/Qwen/Qwen-14B-Chat-Int4)|
|qwen-14b-chat-int8|[qwen/Qwen-14B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-14B-Chat-Int8](https://huggingface.co/Qwen/Qwen-14B-Chat-Int8)|
|qwen-72b|[qwen/Qwen-72B](https://modelscope.cn/models/qwen/Qwen-72B/summary)|c_attn|default-generation|&#x2714;|&#x2714;||-|[Qwen/Qwen-72B](https://huggingface.co/Qwen/Qwen-72B)|
|qwen-72b-chat|[qwen/Qwen-72B-Chat](https://modelscope.cn/models/qwen/Qwen-72B-Chat/summary)|c_attn|qwen|&#x2714;|&#x2714;||-|[Qwen/Qwen-72B-Chat](https://huggingface.co/Qwen/Qwen-72B-Chat)|
|qwen-72b-chat-int4|[qwen/Qwen-72B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int4](https://huggingface.co/Qwen/Qwen-72B-Chat-Int4)|
|qwen-72b-chat-int8|[qwen/Qwen-72B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int8/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int8](https://huggingface.co/Qwen/Qwen-72B-Chat-Int8)|
|modelscope-agent-7b|[iic/ModelScope-Agent-7B](https://modelscope.cn/models/iic/ModelScope-Agent-7B/summary)|c_attn|modelscope-agent|&#x2714;|&#x2718;||-|-|
|modelscope-agent-14b|[iic/ModelScope-Agent-14B](https://modelscope.cn/models/iic/ModelScope-Agent-14B/summary)|c_attn|modelscope-agent|&#x2714;|&#x2718;||-|-|
|qwen1half-0_5b|[qwen/Qwen1.5-0.5B](https://modelscope.cn/models/qwen/Qwen1.5-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)|
|qwen1half-1_8b|[qwen/Qwen1.5-1.8B](https://modelscope.cn/models/qwen/Qwen1.5-1.8B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B)|
|qwen1half-4b|[qwen/Qwen1.5-4B](https://modelscope.cn/models/qwen/Qwen1.5-4B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B)|
|qwen1half-7b|[qwen/Qwen1.5-7B](https://modelscope.cn/models/qwen/Qwen1.5-7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B)|
|qwen1half-14b|[qwen/Qwen1.5-14B](https://modelscope.cn/models/qwen/Qwen1.5-14B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-14B](https://huggingface.co/Qwen/Qwen1.5-14B)|
|qwen1half-32b|[qwen/Qwen1.5-32B](https://modelscope.cn/models/qwen/Qwen1.5-32B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-32B](https://huggingface.co/Qwen/Qwen1.5-32B)|
|qwen1half-72b|[qwen/Qwen1.5-72B](https://modelscope.cn/models/qwen/Qwen1.5-72B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-72B](https://huggingface.co/Qwen/Qwen1.5-72B)|
|qwen1half-110b|[qwen/Qwen1.5-110B](https://modelscope.cn/models/qwen/Qwen1.5-110B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-110B](https://huggingface.co/Qwen/Qwen1.5-110B)|
|codeqwen1half-7b|[qwen/CodeQwen1.5-7B](https://modelscope.cn/models/qwen/CodeQwen1.5-7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/CodeQwen1.5-7B](https://huggingface.co/Qwen/CodeQwen1.5-7B)|
|qwen1half-moe-a2_7b|[qwen/Qwen1.5-MoE-A2.7B](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.40|-|[Qwen/Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)|
|qwen1half-0_5b-chat|[qwen/Qwen1.5-0.5B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat)|
|qwen1half-1_8b-chat|[qwen/Qwen1.5-1.8B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-1.8B-Chat](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat)|
|qwen1half-4b-chat|[qwen/Qwen1.5-4B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat)|
|qwen1half-7b-chat|[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat)|
|qwen1half-14b-chat|[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-14B-Chat](https://huggingface.co/Qwen/Qwen1.5-14B-Chat)|
|qwen1half-32b-chat|[qwen/Qwen1.5-32B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-32B-Chat](https://huggingface.co/Qwen/Qwen1.5-32B-Chat)|
|qwen1half-72b-chat|[qwen/Qwen1.5-72B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat)|
|qwen1half-110b-chat|[qwen/Qwen1.5-110B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-110B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen1.5-110B-Chat](https://huggingface.co/Qwen/Qwen1.5-110B-Chat)|
|qwen1half-moe-a2_7b-chat|[qwen/Qwen1.5-MoE-A2.7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.40|-|[Qwen/Qwen1.5-MoE-A2.7B-Chat](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat)|
|codeqwen1half-7b-chat|[qwen/CodeQwen1.5-7B-Chat](https://modelscope.cn/models/qwen/CodeQwen1.5-7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat)|
|qwen1half-0_5b-chat-int4|[qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4)|
|qwen1half-1_8b-chat-int4|[qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4)|
|qwen1half-4b-chat-int4|[qwen/Qwen1.5-4B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-4B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int4)|
|qwen1half-7b-chat-int4|[qwen/Qwen1.5-7B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-7B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int4)|
|qwen1half-14b-chat-int4|[qwen/Qwen1.5-14B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-14B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int4)|
|qwen1half-32b-chat-int4|[qwen/Qwen1.5-32B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-32B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-32B-Chat-GPTQ-Int4)|
|qwen1half-72b-chat-int4|[qwen/Qwen1.5-72B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-72B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GPTQ-Int4)|
|qwen1half-110b-chat-int4|[qwen/Qwen1.5-110B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-110B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-110B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-110B-Chat-GPTQ-Int4)|
|qwen1half-0_5b-chat-int8|[qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8)|
|qwen1half-1_8b-chat-int8|[qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8)|
|qwen1half-4b-chat-int8|[qwen/Qwen1.5-4B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-4B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int8)|
|qwen1half-7b-chat-int8|[qwen/Qwen1.5-7B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-7B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int8)|
|qwen1half-14b-chat-int8|[qwen/Qwen1.5-14B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-14B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int8)|
|qwen1half-72b-chat-int8|[qwen/Qwen1.5-72B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-72B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GPTQ-Int8)|
|qwen1half-moe-a2_7b-chat-int4|[qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2718;|auto_gptq>=0.5, transformers>=4.40|-|[Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4)|
|qwen1half-0_5b-chat-awq|[qwen/Qwen1.5-0.5B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-0.5B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-AWQ)|
|qwen1half-1_8b-chat-awq|[qwen/Qwen1.5-1.8B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-1.8B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-AWQ)|
|qwen1half-4b-chat-awq|[qwen/Qwen1.5-4B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-4B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-4B-Chat-AWQ)|
|qwen1half-7b-chat-awq|[qwen/Qwen1.5-7B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-7B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-7B-Chat-AWQ)|
|qwen1half-14b-chat-awq|[qwen/Qwen1.5-14B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-14B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-AWQ)|
|qwen1half-32b-chat-awq|[qwen/Qwen1.5-32B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-32B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-32B-Chat-AWQ)|
|qwen1half-72b-chat-awq|[qwen/Qwen1.5-72B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-72B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-72B-Chat-AWQ)|
|qwen1half-110b-chat-awq|[qwen/Qwen1.5-110B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-110B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-110B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-110B-Chat-AWQ)|
|codeqwen1half-7b-chat-awq|[qwen/CodeQwen1.5-7B-Chat-AWQ](https://modelscope.cn/models/qwen/CodeQwen1.5-7B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/CodeQwen1.5-7B-Chat-AWQ](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat-AWQ)|
|qwen2-0_5b|[qwen/Qwen2-0.5B](https://modelscope.cn/models/qwen/Qwen2-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B)|
|qwen2-0_5b-instruct|[qwen/Qwen2-0.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)|
|qwen2-0_5b-instruct-int4|[qwen/Qwen2-0.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4)|
|qwen2-0_5b-instruct-int8|[qwen/Qwen2-0.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8)|
|qwen2-0_5b-instruct-awq|[qwen/Qwen2-0.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen2-0.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-AWQ)|
|qwen2-1_5b|[qwen/Qwen2-1.5B](https://modelscope.cn/models/qwen/Qwen2-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-1.5B](https://huggingface.co/Qwen/Qwen2-1.5B)|
|qwen2-1_5b-instruct|[qwen/Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)|
|qwen2-1_5b-instruct-int4|[qwen/Qwen2-1.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4)|
|qwen2-1_5b-instruct-int8|[qwen/Qwen2-1.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8)|
|qwen2-1_5b-instruct-awq|[qwen/Qwen2-1.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen2-1.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-AWQ)|
|qwen2-7b|[qwen/Qwen2-7B](https://modelscope.cn/models/qwen/Qwen2-7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B)|
|qwen2-7b-instruct|[qwen/Qwen2-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct)|
|qwen2-7b-instruct-int4|[qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4)|
|qwen2-7b-instruct-int8|[qwen/Qwen2-7B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-7B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int8)|
|qwen2-7b-instruct-awq|[qwen/Qwen2-7B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen2-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-7B-Instruct-AWQ)|
|qwen2-72b|[qwen/Qwen2-72B](https://modelscope.cn/models/qwen/Qwen2-72B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)|
|qwen2-72b-instruct|[qwen/Qwen2-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct)|
|qwen2-72b-instruct-int4|[qwen/Qwen2-72B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-72B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-72B-Instruct-GPTQ-Int4)|
|qwen2-72b-instruct-int8|[qwen/Qwen2-72B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-72B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-72B-Instruct-GPTQ-Int8)|
|qwen2-72b-instruct-awq|[qwen/Qwen2-72B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.37, autoawq|-|[Qwen/Qwen2-72B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-72B-Instruct-AWQ)|
|qwen2-57b-a14b|[qwen/Qwen2-57B-A14B](https://modelscope.cn/models/qwen/Qwen2-57B-A14B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.40|-|[Qwen/Qwen2-57B-A14B](https://huggingface.co/Qwen/Qwen2-57B-A14B)|
|qwen2-57b-a14b-instruct|[qwen/Qwen2-57B-A14B-Instruct](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|transformers>=4.40|-|[Qwen/Qwen2-57B-A14B-Instruct](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct)|
|qwen2-57b-a14b-instruct-int4|[qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5, transformers>=4.40|-|[Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4)|
|chatglm2-6b|[ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;||-|[THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)|
|chatglm2-6b-32k|[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;||-|[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k)|
|chatglm3-6b-base|[ZhipuAI/chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;||-|[THUDM/chatglm3-6b-base](https://huggingface.co/THUDM/chatglm3-6b-base)|
|chatglm3-6b|[ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;||-|[THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b)|
|chatglm3-6b-32k|[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;||-|[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k)|
|chatglm3-6b-128k|[ZhipuAI/chatglm3-6b-128k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;||-|[THUDM/chatglm3-6b-128k](https://huggingface.co/THUDM/chatglm3-6b-128k)|
|codegeex2-6b|[ZhipuAI/codegeex2-6b](https://modelscope.cn/models/ZhipuAI/codegeex2-6b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|transformers<4.34|coding|[THUDM/codegeex2-6b](https://huggingface.co/THUDM/codegeex2-6b)|
|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;||-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;||-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;||-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
|llama2-7b|[modelscope/Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|
|llama2-7b-chat|[modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|
|llama2-13b|[modelscope/Llama-2-13b-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)|
|llama2-13b-chat|[modelscope/Llama-2-13b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)|
|llama2-70b|[modelscope/Llama-2-70b-ms](https://modelscope.cn/models/modelscope/Llama-2-70b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)|
|llama2-70b-chat|[modelscope/Llama-2-70b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-70b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)|
|llama2-7b-aqlm-2bit-1x16|[AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2718;|transformers>=4.38, aqlm, torch>=2.2.0|-|[ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf](https://huggingface.co/ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf)|
|llama3-8b|[LLM-Research/Meta-Llama-3-8B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)|
|llama3-8b-instruct|[LLM-Research/Meta-Llama-3-8B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;||-|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|
|llama3-8b-instruct-int4|[huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int4](https://modelscope.cn/models/huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|[study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4](https://huggingface.co/study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4)|
|llama3-8b-instruct-int8|[huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int8](https://modelscope.cn/models/huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|[study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8](https://huggingface.co/study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8)|
|llama3-8b-instruct-awq|[huangjintao/Meta-Llama-3-8B-Instruct-AWQ](https://modelscope.cn/models/huangjintao/Meta-Llama-3-8B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|autoawq|-|[study-hjt/Meta-Llama-3-8B-Instruct-AWQ](https://huggingface.co/study-hjt/Meta-Llama-3-8B-Instruct-AWQ)|
|llama3-70b|[LLM-Research/Meta-Llama-3-70B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-70B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B)|
|llama3-70b-instruct|[LLM-Research/Meta-Llama-3-70B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-70B-Instruct/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;||-|[meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)|
|llama3-70b-instruct-int4|[huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|[study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4)|
|llama3-70b-instruct-int8|[huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|auto_gptq|-|[study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8)|
|llama3-70b-instruct-awq|[huangjintao/Meta-Llama-3-70B-Instruct-AWQ](https://modelscope.cn/models/huangjintao/Meta-Llama-3-70B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|autoawq|-|[study-hjt/Meta-Llama-3-70B-Instruct-AWQ](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-AWQ)|
|chinese-llama-2-1_3b|[AI-ModelScope/chinese-llama-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-1.3b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-1.3b](https://huggingface.co/hfl/chinese-llama-2-1.3b)|
|chinese-llama-2-7b|[AI-ModelScope/chinese-llama-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b](https://huggingface.co/hfl/chinese-llama-2-7b)|
|chinese-llama-2-7b-16k|[AI-ModelScope/chinese-llama-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-16k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b-16k](https://huggingface.co/hfl/chinese-llama-2-7b-16k)|
|chinese-llama-2-7b-64k|[AI-ModelScope/chinese-llama-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-64k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-7b-64k](https://huggingface.co/hfl/chinese-llama-2-7b-64k)|
|chinese-llama-2-13b|[AI-ModelScope/chinese-llama-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-13b](https://huggingface.co/hfl/chinese-llama-2-13b)|
|chinese-llama-2-13b-16k|[AI-ModelScope/chinese-llama-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b-16k/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/chinese-llama-2-13b-16k](https://huggingface.co/hfl/chinese-llama-2-13b-16k)|
|chinese-alpaca-2-1_3b|[AI-ModelScope/chinese-alpaca-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-1.3b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-1.3b](https://huggingface.co/hfl/chinese-alpaca-2-1.3b)|
|chinese-alpaca-2-7b|[AI-ModelScope/chinese-alpaca-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b](https://huggingface.co/hfl/chinese-alpaca-2-7b)|
|chinese-alpaca-2-7b-16k|[AI-ModelScope/chinese-alpaca-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-16k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b-16k](https://huggingface.co/hfl/chinese-alpaca-2-7b-16k)|
|chinese-alpaca-2-7b-64k|[AI-ModelScope/chinese-alpaca-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-64k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-7b-64k](https://huggingface.co/hfl/chinese-alpaca-2-7b-64k)|
|chinese-alpaca-2-13b|[AI-ModelScope/chinese-alpaca-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-13b](https://huggingface.co/hfl/chinese-alpaca-2-13b)|
|chinese-alpaca-2-13b-16k|[AI-ModelScope/chinese-alpaca-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b-16k/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[hfl/chinese-alpaca-2-13b-16k](https://huggingface.co/hfl/chinese-alpaca-2-13b-16k)|
|llama-3-chinese-8b|[ChineseAlpacaGroup/llama-3-chinese-8b](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[hfl/llama-3-chinese-8b](https://huggingface.co/hfl/llama-3-chinese-8b)|
|llama-3-chinese-8b-instruct|[ChineseAlpacaGroup/llama-3-chinese-8b-instruct](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;||-|[hfl/llama-3-chinese-8b-instruct](https://huggingface.co/hfl/llama-3-chinese-8b-instruct)|
|atom-7b|[FlagAlpha/Atom-7B](https://modelscope.cn/models/FlagAlpha/Atom-7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B](https://huggingface.co/FlagAlpha/Atom-7B)|
|atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
|yi-6b|[01ai/Yi-6B](https://modelscope.cn/models/01ai/Yi-6B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)|
|yi-6b-200k|[01ai/Yi-6B-200K](https://modelscope.cn/models/01ai/Yi-6B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)|
|yi-6b-chat|[01ai/Yi-6B-Chat](https://modelscope.cn/models/01ai/Yi-6B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)|
|yi-6b-chat-awq|[01ai/Yi-6B-Chat-4bits](https://modelscope.cn/models/01ai/Yi-6B-Chat-4bits/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;|autoawq|-|[01-ai/Yi-6B-Chat-4bits](https://huggingface.co/01-ai/Yi-6B-Chat-4bits)|
|yi-6b-chat-int8|[01ai/Yi-6B-Chat-8bits](https://modelscope.cn/models/01ai/Yi-6B-Chat-8bits/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;|auto_gptq|-|[01-ai/Yi-6B-Chat-8bits](https://huggingface.co/01-ai/Yi-6B-Chat-8bits)|
|yi-9b|[01ai/Yi-9B](https://modelscope.cn/models/01ai/Yi-9B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-9B](https://huggingface.co/01-ai/Yi-9B)|
|yi-9b-200k|[01ai/Yi-9B-200K](https://modelscope.cn/models/01ai/Yi-9B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K)|
|yi-34b|[01ai/Yi-34B](https://modelscope.cn/models/01ai/Yi-34B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B)|
|yi-34b-200k|[01ai/Yi-34B-200K](https://modelscope.cn/models/01ai/Yi-34B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-34B-200K](https://huggingface.co/01-ai/Yi-34B-200K)|
|yi-34b-chat|[01ai/Yi-34B-Chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||-|[01-ai/Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)|
|yi-34b-chat-awq|[01ai/Yi-34B-Chat-4bits](https://modelscope.cn/models/01ai/Yi-34B-Chat-4bits/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;|autoawq|-|[01-ai/Yi-34B-Chat-4bits](https://huggingface.co/01-ai/Yi-34B-Chat-4bits)|
|yi-34b-chat-int8|[01ai/Yi-34B-Chat-8bits](https://modelscope.cn/models/01ai/Yi-34B-Chat-8bits/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;|auto_gptq|-|[01-ai/Yi-34B-Chat-8bits](https://huggingface.co/01-ai/Yi-34B-Chat-8bits)|
|yi-1_5-6b|[01ai/Yi-1.5-6B](https://modelscope.cn/models/01ai/Yi-1.5-6B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-6B](https://huggingface.co/01-ai/Yi-1.5-6B)|
|yi-1_5-6b-chat|[01ai/Yi-1.5-6B-Chat](https://modelscope.cn/models/01ai/Yi-1.5-6B-Chat/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-6B-Chat](https://huggingface.co/01-ai/Yi-1.5-6B-Chat)|
|yi-1_5-9b|[01ai/Yi-1.5-9B](https://modelscope.cn/models/01ai/Yi-1.5-9B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-9B](https://huggingface.co/01-ai/Yi-1.5-9B)|
|yi-1_5-9b-chat|[01ai/Yi-1.5-9B-Chat](https://modelscope.cn/models/01ai/Yi-1.5-9B-Chat/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-9B-Chat](https://huggingface.co/01-ai/Yi-1.5-9B-Chat)|
|yi-1_5-9b-chat-16k|[01ai/Yi-1.5-9B-Chat](https://modelscope.cn/models/01ai/Yi-1.5-9B-Chat/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-9B-Chat-16K](https://huggingface.co/01-ai/Yi-1.5-9B-Chat-16K)|
|yi-1_5-34b|[01ai/Yi-1.5-34B](https://modelscope.cn/models/01ai/Yi-1.5-34B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-34B](https://huggingface.co/01-ai/Yi-1.5-34B)|
|yi-1_5-34b-chat|[01ai/Yi-1.5-34B-Chat](https://modelscope.cn/models/01ai/Yi-1.5-34B-Chat/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-34B-Chat](https://huggingface.co/01-ai/Yi-1.5-34B-Chat)|
|yi-1_5-34b-chat-16k|[01ai/Yi-1.5-34B-Chat-16K](https://modelscope.cn/models/01ai/Yi-1.5-34B-Chat-16K/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;||-|[01-ai/Yi-1.5-34B-Chat-16K](https://huggingface.co/01-ai/Yi-1.5-34B-Chat-16K)|
|yi-1_5-6b-chat-awq-int4|[AI-ModelScope/Yi-1.5-6B-Chat-AWQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-6B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;|autoawq|-|[modelscope/Yi-1.5-6B-Chat-AWQ](https://huggingface.co/modelscope/Yi-1.5-6B-Chat-AWQ)|
|yi-1_5-6b-chat-gptq-int4|[AI-ModelScope/Yi-1.5-6B-Chat-GPTQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-6B-Chat-GPTQ/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[modelscope/Yi-1.5-6B-Chat-GPTQ](https://huggingface.co/modelscope/Yi-1.5-6B-Chat-GPTQ)|
|yi-1_5-9b-chat-awq-int4|[AI-ModelScope/Yi-1.5-9B-Chat-AWQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-9B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;|autoawq|-|[modelscope/Yi-1.5-9B-Chat-AWQ](https://huggingface.co/modelscope/Yi-1.5-9B-Chat-AWQ)|
|yi-1_5-9b-chat-gptq-int4|[AI-ModelScope/Yi-1.5-9B-Chat-GPTQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-9B-Chat-GPTQ/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[modelscope/Yi-1.5-9B-Chat-GPTQ](https://huggingface.co/modelscope/Yi-1.5-9B-Chat-GPTQ)|
|yi-1_5-34b-chat-awq-int4|[AI-ModelScope/Yi-1.5-34B-Chat-AWQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-34B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;|autoawq|-|[modelscope/Yi-1.5-34B-Chat-AWQ](https://huggingface.co/modelscope/Yi-1.5-34B-Chat-AWQ)|
|yi-1_5-34b-chat-gptq-int4|[AI-ModelScope/Yi-1.5-34B-Chat-GPTQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-34B-Chat-GPTQ/summary)|q_proj, k_proj, v_proj|yi1_5|&#x2714;|&#x2714;|auto_gptq>=0.5|-|[modelscope/Yi-1.5-34B-Chat-GPTQ](https://huggingface.co/modelscope/Yi-1.5-34B-Chat-GPTQ)|
|internlm-7b|[Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2714;||-|[internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b)|
|internlm-7b-chat|[Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary)|q_proj, k_proj, v_proj|internlm|&#x2718;|&#x2714;||-|[internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)|
|internlm-7b-chat-8k|[Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)|q_proj, k_proj, v_proj|internlm|&#x2718;|&#x2714;||-|-|
|internlm-20b|[Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2714;||-|[internlm/internlm2-20b](https://huggingface.co/internlm/internlm2-20b)|
|internlm-20b-chat|[Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary)|q_proj, k_proj, v_proj|internlm|&#x2718;|&#x2714;||-|[internlm/internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b)|
|internlm2-1_8b|[Shanghai_AI_Laboratory/internlm2-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-1_8b](https://huggingface.co/internlm/internlm2-1_8b)|
|internlm2-1_8b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-chat-1_8b-sft](https://huggingface.co/internlm/internlm2-chat-1_8b-sft)|
|internlm2-1_8b-chat|[Shanghai_AI_Laboratory/internlm2-chat-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b)|
|internlm2-7b-base|[Shanghai_AI_Laboratory/internlm2-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-base-7b](https://huggingface.co/internlm/internlm2-base-7b)|
|internlm2-7b|[Shanghai_AI_Laboratory/internlm2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-7b](https://huggingface.co/internlm/internlm2-7b)|
|internlm2-7b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b-sft/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-chat-7b-sft](https://huggingface.co/internlm/internlm2-chat-7b-sft)|
|internlm2-7b-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b)|
|internlm2-20b-base|[Shanghai_AI_Laboratory/internlm2-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-base-20b](https://huggingface.co/internlm/internlm2-base-20b)|
|internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-20b](https://huggingface.co/internlm/internlm2-20b)|
|internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-chat-20b-sft](https://huggingface.co/internlm/internlm2-chat-20b-sft)|
|internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|-|[internlm/internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b)|
|internlm2-math-7b|[Shanghai_AI_Laboratory/internlm2-math-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-7b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|transformers>=4.35|math|[internlm/internlm2-math-base-7b](https://huggingface.co/internlm/internlm2-math-base-7b)|
|internlm2-math-7b-chat|[Shanghai_AI_Laboratory/internlm2-math-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-7b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|math|[internlm/internlm2-math-7b](https://huggingface.co/internlm/internlm2-math-7b)|
|internlm2-math-20b|[Shanghai_AI_Laboratory/internlm2-math-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary)|wqkv|default-generation|&#x2714;|&#x2714;|transformers>=4.35|math|[internlm/internlm2-math-base-20b](https://huggingface.co/internlm/internlm2-math-base-20b)|
|internlm2-math-20b-chat|[Shanghai_AI_Laboratory/internlm2-math-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-20b/summary)|wqkv|internlm2|&#x2714;|&#x2714;|transformers>=4.35|math|[internlm/internlm2-math-20b](https://huggingface.co/internlm/internlm2-math-20b)|
|deepseek-7b|[deepseek-ai/deepseek-llm-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[deepseek-ai/deepseek-llm-7b-base](https://huggingface.co/deepseek-ai/deepseek-llm-7b-base)|
|deepseek-7b-chat|[deepseek-ai/deepseek-llm-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||-|[deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)|
|deepseek-moe-16b|[deepseek-ai/deepseek-moe-16b-base](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[deepseek-ai/deepseek-moe-16b-base](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base)|
|deepseek-moe-16b-chat|[deepseek-ai/deepseek-moe-16b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-chat/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||-|[deepseek-ai/deepseek-moe-16b-chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat)|
|deepseek-67b|[deepseek-ai/deepseek-llm-67b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[deepseek-ai/deepseek-llm-67b-base](https://huggingface.co/deepseek-ai/deepseek-llm-67b-base)|
|deepseek-67b-chat|[deepseek-ai/deepseek-llm-67b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-chat/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||-|[deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)|
|deepseek-coder-1_3b|[deepseek-ai/deepseek-coder-1.3b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||coding|[deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base)|
|deepseek-coder-1_3b-instruct|[deepseek-ai/deepseek-coder-1.3b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||coding|[deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct)|
|deepseek-coder-6_7b|[deepseek-ai/deepseek-coder-6.7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||coding|[deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base)|
|deepseek-coder-6_7b-instruct|[deepseek-ai/deepseek-coder-6.7b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||coding|[deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)|
|deepseek-coder-33b|[deepseek-ai/deepseek-coder-33b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||coding|[deepseek-ai/deepseek-coder-33b-base](https://huggingface.co/deepseek-ai/deepseek-coder-33b-base)|
|deepseek-coder-33b-instruct|[deepseek-ai/deepseek-coder-33b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||coding|[deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)|
|deepseek-math-7b|[deepseek-ai/deepseek-math-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||math|[deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base)|
|deepseek-math-7b-instruct|[deepseek-ai/deepseek-math-7b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-instruct/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||math|[deepseek-ai/deepseek-math-7b-instruct](https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct)|
|deepseek-math-7b-chat|[deepseek-ai/deepseek-math-7b-rl](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-rl/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||math|[deepseek-ai/deepseek-math-7b-rl](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl)|
|deepseek-v2-chat|[deepseek-ai/DeepSeek-V2-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)|
|deepseek-v2-lite|[deepseek-ai/DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)|
|deepseek-v2-lite-chat|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat)|
|gemma-2b|[AI-ModelScope/gemma-2b](https://modelscope.cn/models/AI-ModelScope/gemma-2b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-2b](https://huggingface.co/google/gemma-2b)|
|gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|
|gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)|
|gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)|
|minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)|
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|[openbmb/MiniCPM-2B-sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)|
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|
|minicpm-2b-128k|[OpenBMB/MiniCPM-2B-128k](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-128k/summary)|q_proj, k_proj, v_proj|chatml|&#x2714;|&#x2714;|transformers>=4.36.0|-|[openbmb/MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)|
|minicpm-moe-8x2b|[OpenBMB/MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|[openbmb/MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)|
|openbuddy-llama2-13b-chat|[OpenBuddy/openbuddy-llama2-13b-v8.1-fp16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|[OpenBuddy/openbuddy-llama2-13b-v8.1-fp16](https://huggingface.co/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16)|
|openbuddy-llama3-8b-chat|[OpenBuddy/openbuddy-llama3-8b-v21.1-8k](https://modelscope.cn/models/OpenBuddy/openbuddy-llama3-8b-v21.1-8k/summary)|q_proj, k_proj, v_proj|openbuddy2|&#x2714;|&#x2714;||-|[OpenBuddy/openbuddy-llama3-8b-v21.1-8k](https://huggingface.co/OpenBuddy/openbuddy-llama3-8b-v21.1-8k)|
|openbuddy-llama-65b-chat|[OpenBuddy/openbuddy-llama-65b-v8-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|[OpenBuddy/openbuddy-llama-65b-v8-bf16](https://huggingface.co/OpenBuddy/openbuddy-llama-65b-v8-bf16)|
|openbuddy-llama2-70b-chat|[OpenBuddy/openbuddy-llama2-70b-v10.1-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|[OpenBuddy/openbuddy-llama2-70b-v10.1-bf16](https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16)|
|openbuddy-mistral-7b-chat|[OpenBuddy/openbuddy-mistral-7b-v17.1-32k](https://modelscope.cn/models/OpenBuddy/openbuddy-mistral-7b-v17.1-32k/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;|transformers>=4.34|-|[OpenBuddy/openbuddy-mistral-7b-v17.1-32k](https://huggingface.co/OpenBuddy/openbuddy-mistral-7b-v17.1-32k)|
|openbuddy-zephyr-7b-chat|[OpenBuddy/openbuddy-zephyr-7b-v14.1](https://modelscope.cn/models/OpenBuddy/openbuddy-zephyr-7b-v14.1/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;|transformers>=4.34|-|[OpenBuddy/openbuddy-zephyr-7b-v14.1](https://huggingface.co/OpenBuddy/openbuddy-zephyr-7b-v14.1)|
|openbuddy-deepseek-67b-chat|[OpenBuddy/openbuddy-deepseek-67b-v15.2](https://modelscope.cn/models/OpenBuddy/openbuddy-deepseek-67b-v15.2/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|[OpenBuddy/openbuddy-deepseek-67b-v15.2](https://huggingface.co/OpenBuddy/openbuddy-deepseek-67b-v15.2)|
|openbuddy-mixtral-moe-7b-chat|[OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k](https://modelscope.cn/models/OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;|transformers>=4.36|-|[OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k](https://huggingface.co/OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k)|
|mistral-7b|[AI-ModelScope/Mistral-7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.34|-|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)|
|mistral-7b-v2|[AI-ModelScope/Mistral-7B-v0.2-hf](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.2-hf/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.34|-|[alpindale/Mistral-7B-v0.2-hf](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf)|
|mistral-7b-instruct|[AI-ModelScope/Mistral-7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)|
|mistral-7b-instruct-v2|[AI-ModelScope/Mistral-7B-Instruct-v0.2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|
|mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.36|-|[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)|
|mixtral-moe-7b-instruct|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.36|-|[mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)|
|mixtral-moe-7b-aqlm-2bit-1x16|[AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2718;|transformers>=4.38, aqlm, torch>=2.2.0|-|[ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://huggingface.co/ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf)|
|mixtral-moe-8x22b-v1|[AI-ModelScope/Mixtral-8x22B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x22B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.36|-|[mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)|
|wizardlm2-7b-awq|[AI-ModelScope/WizardLM-2-7B-AWQ](https://modelscope.cn/models/AI-ModelScope/WizardLM-2-7B-AWQ/summary)|q_proj, k_proj, v_proj|wizardlm2-awq|&#x2714;|&#x2714;|transformers>=4.34|-|[MaziyarPanahi/WizardLM-2-7B-AWQ](https://huggingface.co/MaziyarPanahi/WizardLM-2-7B-AWQ)|
|wizardlm2-8x22b|[AI-ModelScope/WizardLM-2-8x22B](https://modelscope.cn/models/AI-ModelScope/WizardLM-2-8x22B/summary)|q_proj, k_proj, v_proj|wizardlm2|&#x2714;|&#x2714;|transformers>=4.36|-|[alpindale/WizardLM-2-8x22B](https://huggingface.co/alpindale/WizardLM-2-8x22B)|
|baichuan-7b|[baichuan-inc/baichuan-7B](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|-|[baichuan-inc/Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B)|
|baichuan-13b|[baichuan-inc/Baichuan-13B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|-|[baichuan-inc/Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)|
|baichuan-13b-chat|[baichuan-inc/Baichuan-13B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary)|W_pack|baichuan|&#x2718;|&#x2714;|transformers<4.34|-|[baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)|
|baichuan2-7b|[baichuan-inc/Baichuan2-7B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary)|W_pack|default-generation|&#x2718;|&#x2714;||-|[baichuan-inc/Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)|
|baichuan2-7b-chat|[baichuan-inc/Baichuan2-7B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary)|W_pack|baichuan|&#x2718;|&#x2714;||-|[baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)|
|baichuan2-7b-chat-int4|[baichuan-inc/Baichuan2-7B-Chat-4bits](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary)|W_pack|baichuan|&#x2718;|&#x2718;|bitsandbytes<0.41.2, accelerate<0.26|-|[baichuan-inc/Baichuan2-7B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat-4bits)|
|baichuan2-13b|[baichuan-inc/Baichuan2-13B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary)|W_pack|default-generation|&#x2718;|&#x2714;||-|[baichuan-inc/Baichuan2-13B-Base](https://huggingface.co/baichuan-inc/Baichuan2-13B-Base)|
|baichuan2-13b-chat|[baichuan-inc/Baichuan2-13B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary)|W_pack|baichuan|&#x2718;|&#x2714;||-|[baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat)|
|baichuan2-13b-chat-int4|[baichuan-inc/Baichuan2-13B-Chat-4bits](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary)|W_pack|baichuan|&#x2718;|&#x2718;|bitsandbytes<0.41.2, accelerate<0.26|-|[baichuan-inc/Baichuan2-13B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits)|
|yuan2-2b-instruct|[YuanLLM/Yuan2.0-2B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-2B-hf/summary)|q_proj, k_proj, v_proj|yuan|&#x2714;|&#x2718;||-|[IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf)|
|yuan2-2b-janus-instruct|[YuanLLM/Yuan2-2B-Janus-hf](https://modelscope.cn/models/YuanLLM/Yuan2-2B-Janus-hf/summary)|q_proj, k_proj, v_proj|yuan|&#x2714;|&#x2718;||-|[IEITYuan/Yuan2-2B-Janus-hf](https://huggingface.co/IEITYuan/Yuan2-2B-Janus-hf)|
|yuan2-51b-instruct|[YuanLLM/Yuan2.0-51B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-51B-hf/summary)|q_proj, k_proj, v_proj|yuan|&#x2714;|&#x2718;||-|[IEITYuan/Yuan2-51B-hf](https://huggingface.co/IEITYuan/Yuan2-51B-hf)|
|yuan2-102b-instruct|[YuanLLM/Yuan2.0-102B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-102B-hf/summary)|q_proj, k_proj, v_proj|yuan|&#x2714;|&#x2718;||-|[IEITYuan/Yuan2-102B-hf](https://huggingface.co/IEITYuan/Yuan2-102B-hf)|
|yuan2-m32|[YuanLLM/Yuan2-M32-hf](https://modelscope.cn/models/YuanLLM/Yuan2-M32-hf/summary)|q_proj, k_proj, v_proj|yuan|&#x2714;|&#x2718;||-|[IEITYuan/Yuan2-M32-hf](https://huggingface.co/IEITYuan/Yuan2-M32-hf)|
|xverse-7b|[xverse/XVERSE-7B](https://modelscope.cn/models/xverse/XVERSE-7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2714;||-|[xverse/XVERSE-7B](https://huggingface.co/xverse/XVERSE-7B)|
|xverse-7b-chat|[xverse/XVERSE-7B-Chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary)|q_proj, k_proj, v_proj|xverse|&#x2718;|&#x2714;||-|[xverse/XVERSE-7B-Chat](https://huggingface.co/xverse/XVERSE-7B-Chat)|
|xverse-13b|[xverse/XVERSE-13B](https://modelscope.cn/models/xverse/XVERSE-13B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2714;||-|[xverse/XVERSE-13B](https://huggingface.co/xverse/XVERSE-13B)|
|xverse-13b-chat|[xverse/XVERSE-13B-Chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary)|q_proj, k_proj, v_proj|xverse|&#x2718;|&#x2714;||-|[xverse/XVERSE-13B-Chat](https://huggingface.co/xverse/XVERSE-13B-Chat)|
|xverse-65b|[xverse/XVERSE-65B](https://modelscope.cn/models/xverse/XVERSE-65B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2714;||-|[xverse/XVERSE-65B](https://huggingface.co/xverse/XVERSE-65B)|
|xverse-65b-v2|[xverse/XVERSE-65B-2](https://modelscope.cn/models/xverse/XVERSE-65B-2/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2714;||-|[xverse/XVERSE-65B-2](https://huggingface.co/xverse/XVERSE-65B-2)|
|xverse-65b-chat|[xverse/XVERSE-65B-Chat](https://modelscope.cn/models/xverse/XVERSE-65B-Chat/summary)|q_proj, k_proj, v_proj|xverse|&#x2718;|&#x2714;||-|[xverse/XVERSE-65B-Chat](https://huggingface.co/xverse/XVERSE-65B-Chat)|
|xverse-13b-256k|[xverse/XVERSE-13B-256K](https://modelscope.cn/models/xverse/XVERSE-13B-256K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2714;||-|[xverse/XVERSE-13B-256K](https://huggingface.co/xverse/XVERSE-13B-256K)|
|xverse-moe-a4_2b|[xverse/XVERSE-MoE-A4.2B](https://modelscope.cn/models/xverse/XVERSE-MoE-A4.2B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|[xverse/XVERSE-MoE-A4.2B](https://huggingface.co/xverse/XVERSE-MoE-A4.2B)|
|orion-14b|[OrionStarAI/Orion-14B-Base](https://modelscope.cn/models/OrionStarAI/Orion-14B-Base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2718;||-|[OrionStarAI/Orion-14B-Base](https://huggingface.co/OrionStarAI/Orion-14B-Base)|
|orion-14b-chat|[OrionStarAI/Orion-14B-Chat](https://modelscope.cn/models/OrionStarAI/Orion-14B-Chat/summary)|q_proj, k_proj, v_proj|orion|&#x2714;|&#x2718;||-|[OrionStarAI/Orion-14B-Chat](https://huggingface.co/OrionStarAI/Orion-14B-Chat)|
|bluelm-7b|[vivo-ai/BlueLM-7B-Base](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|[vivo-ai/BlueLM-7B-Base](https://huggingface.co/vivo-ai/BlueLM-7B-Base)|
|bluelm-7b-32k|[vivo-ai/BlueLM-7B-Base-32K](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|[vivo-ai/BlueLM-7B-Base-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Base-32K)|
|bluelm-7b-chat|[vivo-ai/BlueLM-7B-Chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary)|q_proj, k_proj, v_proj|bluelm|&#x2718;|&#x2718;||-|[vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat)|
|bluelm-7b-chat-32k|[vivo-ai/BlueLM-7B-Chat-32K](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)|q_proj, k_proj, v_proj|bluelm|&#x2718;|&#x2718;||-|[vivo-ai/BlueLM-7B-Chat-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Chat-32K)|
|ziya2-13b|[Fengshenbang/Ziya2-13B-Base](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[IDEA-CCNL/Ziya2-13B-Base](https://huggingface.co/IDEA-CCNL/Ziya2-13B-Base)|
|ziya2-13b-chat|[Fengshenbang/Ziya2-13B-Chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)|q_proj, k_proj, v_proj|ziya|&#x2714;|&#x2714;||-|[IDEA-CCNL/Ziya2-13B-Chat](https://huggingface.co/IDEA-CCNL/Ziya2-13B-Chat)|
|skywork-13b|[skywork/Skywork-13B-base](https://modelscope.cn/models/skywork/Skywork-13B-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|[Skywork/Skywork-13B-base](https://huggingface.co/Skywork/Skywork-13B-base)|
|skywork-13b-chat|[skywork/Skywork-13B-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary)|q_proj, k_proj, v_proj|skywork|&#x2718;|&#x2718;||-|-|
|zephyr-7b-beta-chat|[modelscope/zephyr-7b-beta](https://modelscope.cn/models/modelscope/zephyr-7b-beta/summary)|q_proj, k_proj, v_proj|zephyr|&#x2714;|&#x2714;|transformers>=4.34|-|[HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)|
|polylm-13b|[damo/nlp_polylm_13b_text_generation](https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation/summary)|c_attn|default-generation|&#x2718;|&#x2718;||-|[DAMO-NLP-MT/polylm-13b](https://huggingface.co/DAMO-NLP-MT/polylm-13b)|
|seqgpt-560m|[damo/nlp_seqgpt-560m](https://modelscope.cn/models/damo/nlp_seqgpt-560m/summary)|query_key_value|default-generation|&#x2718;|&#x2714;||-|[DAMO-NLP/SeqGPT-560M](https://huggingface.co/DAMO-NLP/SeqGPT-560M)|
|sus-34b-chat|[SUSTC/SUS-Chat-34B](https://modelscope.cn/models/SUSTC/SUS-Chat-34B/summary)|q_proj, k_proj, v_proj|sus|&#x2714;|&#x2714;||-|[SUSTech/SUS-Chat-34B](https://huggingface.co/SUSTech/SUS-Chat-34B)|
|tongyi-finance-14b|[TongyiFinance/Tongyi-Finance-14B](https://modelscope.cn/models/TongyiFinance/Tongyi-Finance-14B/summary)|c_attn|default-generation|&#x2714;|&#x2714;||financial|-|
|tongyi-finance-14b-chat|[TongyiFinance/Tongyi-Finance-14B-Chat](https://modelscope.cn/models/TongyiFinance/Tongyi-Finance-14B-Chat/summary)|c_attn|qwen|&#x2714;|&#x2714;||financial|[jxy/Tongyi-Finance-14B-Chat](https://huggingface.co/jxy/Tongyi-Finance-14B-Chat)|
|tongyi-finance-14b-chat-int4|[TongyiFinance/Tongyi-Finance-14B-Chat-Int4](https://modelscope.cn/models/TongyiFinance/Tongyi-Finance-14B-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2714;|auto_gptq>=0.5|financial|[jxy/Tongyi-Finance-14B-Chat-Int4](https://huggingface.co/jxy/Tongyi-Finance-14B-Chat-Int4)|
|codefuse-codellama-34b-chat|[codefuse-ai/CodeFuse-CodeLlama-34B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/summary)|q_proj, k_proj, v_proj|codefuse-codellama|&#x2714;|&#x2714;||coding|[codefuse-ai/CodeFuse-CodeLlama-34B](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B)|
|codefuse-codegeex2-6b-chat|[codefuse-ai/CodeFuse-CodeGeeX2-6B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeeX2-6B/summary)|query_key_value|codefuse|&#x2718;|&#x2714;|transformers<4.34|coding|[codefuse-ai/CodeFuse-CodeGeeX2-6B](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeeX2-6B)|
|codefuse-qwen-14b-chat|[codefuse-ai/CodeFuse-QWen-14B](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B/summary)|c_attn|codefuse|&#x2714;|&#x2714;||coding|[codefuse-ai/CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|
|phi2-3b|[AI-ModelScope/phi-2](https://modelscope.cn/models/AI-ModelScope/phi-2/summary)|Wqkv|default-generation|&#x2714;|&#x2714;||coding|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)|
|phi3-4b-4k-instruct|[LLM-Research/Phi-3-mini-4k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-mini-4k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2718;|transformers>=4.36|general|[microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)|
|phi3-4b-128k-instruct|[LLM-Research/Phi-3-mini-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-mini-128k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2714;|transformers>=4.36|general|[microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)|
|phi3-small-128k-instruct|[LLM-Research/Phi-3-small-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-small-128k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2714;|transformers>=4.36|general|[microsoft/Phi-3-small-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct)|
|phi3-medium-128k-instruct|[LLM-Research/Phi-3-medium-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-medium-128k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2714;|transformers>=4.36|general|[microsoft/Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)|
|mamba-130m|[AI-ModelScope/mamba-130m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-130m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|[state-spaces/mamba-130m-hf](https://huggingface.co/state-spaces/mamba-130m-hf)|
|mamba-370m|[AI-ModelScope/mamba-370m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-370m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|[state-spaces/mamba-370m-hf](https://huggingface.co/state-spaces/mamba-370m-hf)|
|mamba-390m|[AI-ModelScope/mamba-390m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-390m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|[state-spaces/mamba-390m-hf](https://huggingface.co/state-spaces/mamba-390m-hf)|
|mamba-790m|[AI-ModelScope/mamba-790m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-790m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|[state-spaces/mamba-790m-hf](https://huggingface.co/state-spaces/mamba-790m-hf)|
|mamba-1.4b|[AI-ModelScope/mamba-1.4b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-1.4b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|[state-spaces/mamba-1.4b-hf](https://huggingface.co/state-spaces/mamba-1.4b-hf)|
|mamba-2.8b|[AI-ModelScope/mamba-2.8b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-2.8b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|[state-spaces/mamba-2.8b-hf](https://huggingface.co/state-spaces/mamba-2.8b-hf)|
|telechat-7b|[TeleAI/TeleChat-7B](https://modelscope.cn/models/TeleAI/TeleChat-7B/summary)|key_value, query|telechat|&#x2714;|&#x2718;||-|[Tele-AI/telechat-7B](https://huggingface.co/Tele-AI/telechat-7B)|
|telechat-12b|[TeleAI/TeleChat-12B](https://modelscope.cn/models/TeleAI/TeleChat-12B/summary)|key_value, query|telechat|&#x2714;|&#x2718;||-|[Tele-AI/TeleChat-12B](https://huggingface.co/Tele-AI/TeleChat-12B)|
|telechat-12b-v2|[TeleAI/TeleChat-12B-v2](https://modelscope.cn/models/TeleAI/TeleChat-12B-v2/summary)|key_value, query|telechat-v2|&#x2714;|&#x2718;||-|[Tele-AI/TeleChat-12B-v2](https://huggingface.co/Tele-AI/TeleChat-12B-v2)|
|telechat-12b-v2-gptq-int4|[swift/TeleChat-12B-V2-GPTQ-Int4](https://modelscope.cn/models/swift/TeleChat-12B-V2-GPTQ-Int4/summary)|key_value, query|telechat-v2|&#x2714;|&#x2718;|auto_gptq>=0.5|-|-|
|grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|[hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)|
|dbrx-instruct|[AI-ModelScope/dbrx-instruct](https://modelscope.cn/models/AI-ModelScope/dbrx-instruct/summary)|attn.Wqkv|dbrx|&#x2714;|&#x2714;|transformers>=4.36|-|[databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct)|
|dbrx-base|[AI-ModelScope/dbrx-base](https://modelscope.cn/models/AI-ModelScope/dbrx-base/summary)|attn.Wqkv|dbrx|&#x2714;|&#x2714;|transformers>=4.36|-|[databricks/dbrx-base](https://huggingface.co/databricks/dbrx-base)|
|mengzi3-13b-base|[langboat/Mengzi3-13B-Base](https://modelscope.cn/models/langboat/Mengzi3-13B-Base/summary)|q_proj, k_proj, v_proj|mengzi|&#x2714;|&#x2714;||-|[Langboat/Mengzi3-13B-Base](https://huggingface.co/Langboat/Mengzi3-13B-Base)|
|c4ai-command-r-v01|[AI-ModelScope/c4ai-command-r-v01](https://modelscope.cn/models/AI-ModelScope/c4ai-command-r-v01/summary)|q_proj, k_proj, v_proj|c4ai|&#x2714;|&#x2718;|transformers>=4.39.1|-|[CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01)|
|c4ai-command-r-plus|[AI-ModelScope/c4ai-command-r-plus](https://modelscope.cn/models/AI-ModelScope/c4ai-command-r-plus/summary)|q_proj, k_proj, v_proj|c4ai|&#x2714;|&#x2718;|transformers>4.39|-|[CohereForAI/c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus)|
### MLLM
| Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support VLLM | Requires | Tags | HF Model ID |
| --------- | -------- | --------------------------- | ---------------- | ------------------ | ------------ | -------- | ---- | ----------- |
|qwen-vl|[qwen/Qwen-VL](https://modelscope.cn/models/qwen/Qwen-VL/summary)|c_attn|default-generation|&#x2714;|&#x2718;||vision|[Qwen/Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)|
|qwen-vl-chat|[qwen/Qwen-VL-Chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary)|c_attn|qwen|&#x2714;|&#x2718;||vision|[Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)|
|qwen-vl-chat-int4|[qwen/Qwen-VL-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2718;|auto_gptq>=0.5|vision|[Qwen/Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)|
|qwen-audio|[qwen/Qwen-Audio](https://modelscope.cn/models/qwen/Qwen-Audio/summary)|c_attn|qwen-audio-generation|&#x2714;|&#x2718;||audio|[Qwen/Qwen-Audio](https://huggingface.co/Qwen/Qwen-Audio)|
|qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|c_attn|qwen-audio|&#x2714;|&#x2718;||audio|[Qwen/Qwen-Audio-Chat](https://huggingface.co/Qwen/Qwen-Audio-Chat)|
|glm4v-9b-chat|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)|self_attention.query_key_value|glm4v|&#x2718;|&#x2718;||vision|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)|
|llava1_6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
|llava1_6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|&#x2714;|&#x2718;||vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)|
|llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|&#x2714;|&#x2718;||vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)|
|llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)|
|llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)|
|yi-vl-6b-chat|[01ai/Yi-VL-6B](https://modelscope.cn/models/01ai/Yi-VL-6B/summary)|q_proj, k_proj, v_proj|yi-vl|&#x2714;|&#x2718;|transformers>=4.34|vision|[01-ai/Yi-VL-6B](https://huggingface.co/01-ai/Yi-VL-6B)|
|yi-vl-34b-chat|[01ai/Yi-VL-34B](https://modelscope.cn/models/01ai/Yi-VL-34B/summary)|q_proj, k_proj, v_proj|yi-vl|&#x2714;|&#x2718;|transformers>=4.34|vision|[01-ai/Yi-VL-34B](https://huggingface.co/01-ai/Yi-VL-34B)|
|llava-llama-3-8b-v1_1|[AI-ModelScope/llava-llama-3-8b-v1_1-transformers](https://modelscope.cn/models/AI-ModelScope/llava-llama-3-8b-v1_1-transformers/summary)|q_proj, k_proj, v_proj|llava-llama-instruct|&#x2714;|&#x2718;|transformers>=4.36|vision|[xtuner/llava-llama-3-8b-v1_1-transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers)|
|internlm-xcomposer2-7b-chat|[Shanghai_AI_Laboratory/internlm-xcomposer2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary)|wqkv|internlm-xcomposer2|&#x2714;|&#x2718;||vision|[internlm/internlm-xcomposer2-7b](https://huggingface.co/internlm/internlm-xcomposer2-7b)|
|internvl-chat-v1_5|[AI-ModelScope/InternVL-Chat-V1-5](https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5/summary)|wqkv|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL-Chat-V1-5](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)|
|internvl-chat-v1_5-int8|[AI-ModelScope/InternVL-Chat-V1-5-int8](https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5-int8/summary)|wqkv|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL-Chat-V1-5-int8](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-int8)|
|mini-internvl-chat-2b-v1_5|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5/summary)|wqkv|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)|
|mini-internvl-chat-4b-v1_5|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/summary)|qkv_proj|internvl-phi3|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)|
|deepseek-vl-1_3b-chat|[deepseek-ai/deepseek-vl-1.3b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat/summary)|q_proj, k_proj, v_proj|deepseek-vl|&#x2714;|&#x2718;|attrdict|vision|[deepseek-ai/deepseek-vl-1.3b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-chat)|
|deepseek-vl-7b-chat|[deepseek-ai/deepseek-vl-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek-vl|&#x2714;|&#x2718;|attrdict|vision|[deepseek-ai/deepseek-vl-7b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)|
|paligemma-3b-pt-224|[AI-ModelScope/paligemma-3b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-224/summary)|q_proj, k_proj, v_proj|paligemma|&#x2714;|&#x2718;|transformers>=4.41|vision|[google/paligemma-3b-pt-224](https://huggingface.co/google/paligemma-3b-pt-224)|
|paligemma-3b-pt-448|[AI-ModelScope/paligemma-3b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-448/summary)|q_proj, k_proj, v_proj|paligemma|&#x2714;|&#x2718;|transformers>=4.41|vision|[google/paligemma-3b-pt-448](https://huggingface.co/google/paligemma-3b-pt-448)|
|paligemma-3b-pt-896|[AI-ModelScope/paligemma-3b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-896/summary)|q_proj, k_proj, v_proj|paligemma|&#x2714;|&#x2718;|transformers>=4.41|vision|[google/paligemma-3b-pt-896](https://huggingface.co/google/paligemma-3b-pt-896)|
|paligemma-3b-mix-224|[AI-ModelScope/paligemma-3b-mix-224](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-224/summary)|q_proj, k_proj, v_proj|paligemma|&#x2714;|&#x2718;|transformers>=4.41|vision|[google/paligemma-3b-mix-224](https://huggingface.co/google/paligemma-3b-mix-224)|
|paligemma-3b-mix-448|[AI-ModelScope/paligemma-3b-mix-448](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-448/summary)|q_proj, k_proj, v_proj|paligemma|&#x2714;|&#x2718;|transformers>=4.41|vision|[google/paligemma-3b-mix-448](https://huggingface.co/google/paligemma-3b-mix-448)|
|minicpm-v-3b-chat|[OpenBMB/MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary)|q_proj, k_proj, v_proj|minicpm-v|&#x2714;|&#x2718;||vision|[openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V)|
|minicpm-v-v2-chat|[OpenBMB/MiniCPM-V-2](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2/summary)|q_proj, k_proj, v_proj|minicpm-v|&#x2714;|&#x2718;|timm|vision|[openbmb/MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2)|
|minicpm-v-v2_5-chat|[OpenBMB/MiniCPM-Llama3-V-2_5](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5/summary)|q_proj, k_proj, v_proj|minicpm-v-v2_5|&#x2714;|&#x2718;|timm|vision|[openbmb/MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5)|
|mplug-owl2-chat|[iic/mPLUG-Owl2](https://modelscope.cn/models/iic/mPLUG-Owl2/summary)|q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1|mplug-owl2|&#x2714;|&#x2718;|transformers<4.35, icecream|vision|[MAGAer13/mplug-owl2-llama2-7b](https://huggingface.co/MAGAer13/mplug-owl2-llama2-7b)|
|mplug-owl2_1-chat|[iic/mPLUG-Owl2.1](https://modelscope.cn/models/iic/mPLUG-Owl2.1/summary)|c_attn.multiway.0, c_attn.multiway.1|mplug-owl2|&#x2714;|&#x2718;|transformers<4.35, icecream|vision|[Mizukiluke/mplug_owl_2_1](https://huggingface.co/Mizukiluke/mplug_owl_2_1)|
|phi3-vision-128k-instruct|[LLM-Research/Phi-3-vision-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-vision-128k-instruct/summary)|qkv_proj|phi3-vl|&#x2714;|&#x2718;|transformers>=4.36|vision|[microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)|
|cogvlm-17b-chat|[ZhipuAI/cogvlm-chat](https://modelscope.cn/models/ZhipuAI/cogvlm-chat/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense|cogvlm|&#x2718;|&#x2718;||vision|[THUDM/cogvlm-chat-hf](https://huggingface.co/THUDM/cogvlm-chat-hf)|
|cogvlm2-19b-chat|[ZhipuAI/cogvlm2-llama3-chinese-chat-19B](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense|cogvlm|&#x2718;|&#x2718;||vision|[THUDM/cogvlm2-llama3-chinese-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chinese-chat-19B)|
|cogvlm2-en-19b-chat|[ZhipuAI/cogvlm2-llama3-chat-19B](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chat-19B/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense|cogvlm|&#x2718;|&#x2718;||vision|[THUDM/cogvlm2-llama3-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B)|
|cogagent-18b-chat|[ZhipuAI/cogagent-chat](https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent-chat|&#x2718;|&#x2718;|timm|vision|[THUDM/cogagent-chat-hf](https://huggingface.co/THUDM/cogagent-chat-hf)|
|cogagent-18b-instruct|[ZhipuAI/cogagent-vqa](https://modelscope.cn/models/ZhipuAI/cogagent-vqa/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent-instruct|&#x2718;|&#x2718;|timm|vision|[THUDM/cogagent-vqa-hf](https://huggingface.co/THUDM/cogagent-vqa-hf)|
## Datasets
The table below introduces the datasets supported by SWIFT:
- Dataset Name: The dataset name registered in SWIFT.
- Dataset ID: The dataset id in [ModelScope](https://www.modelscope.cn/my/overview).
- Size: The data row count of the dataset.
- Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
| Dataset Name | Dataset ID | Subsets | Dataset Size | Statistic (token) | Tags | HF Dataset ID |
| ------------ | ---------- | ------- |------------- | ----------------- | ---- | ------------- |
|🔥ms-bench|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)||316820|346.9±443.2, min=22, max=30960|chat, general, multi-round|-|
|🔥alpaca-en|[AI-ModelScope/alpaca-gpt4-data-en](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-en/summary)||52002|176.2±125.8, min=26, max=740|chat, general|[vicgalle/alpaca-gpt4](https://huggingface.co/datasets/vicgalle/alpaca-gpt4)|
|🔥alpaca-zh|[AI-ModelScope/alpaca-gpt4-data-zh](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-zh/summary)||48818|162.1±93.9, min=26, max=856|chat, general|[llm-wizard/alpaca-gpt4-data-zh](https://huggingface.co/datasets/llm-wizard/alpaca-gpt4-data-zh)|
|multi-alpaca|[damo/nlp_polylm_multialpaca_sft](https://modelscope.cn/datasets/damo/nlp_polylm_multialpaca_sft/summary)|ar<br>de<br>es<br>fr<br>id<br>ja<br>ko<br>pt<br>ru<br>th<br>vi|131867|112.9±50.6, min=26, max=1226|chat, general, multilingual|-|
|instinwild|[wyj123456/instinwild](https://modelscope.cn/datasets/wyj123456/instinwild/summary)|default<br>subset|103695|145.4±60.7, min=28, max=1434|-|-|
|cot-en|[YorickHe/CoT](https://modelscope.cn/datasets/YorickHe/CoT/summary)||74771|122.7±64.8, min=51, max=8320|chat, general|-|
|cot-zh|[YorickHe/CoT_zh](https://modelscope.cn/datasets/YorickHe/CoT_zh/summary)||74771|117.5±70.8, min=43, max=9636|chat, general|-|
|instruct-en|[wyj123456/instruct](https://modelscope.cn/datasets/wyj123456/instruct/summary)||888970|269.1±331.5, min=26, max=7254|chat, general|-|
|firefly-zh|[wyj123456/firefly](https://modelscope.cn/datasets/wyj123456/firefly/summary)||1649399|178.1±260.4, min=26, max=12516|chat, general|-|
|gpt4all-en|[wyj123456/GPT4all](https://modelscope.cn/datasets/wyj123456/GPT4all/summary)||806199|302.7±384.5, min=27, max=7391|chat, general|-|
|sharegpt|[huangjintao/sharegpt](https://modelscope.cn/datasets/huangjintao/sharegpt/summary)|common-zh<br>computer-zh<br>unknow-zh<br>common-en<br>computer-en|96566|933.3±864.8, min=21, max=66412|chat, general, multi-round|-|
|tulu-v2-sft-mixture|[AI-ModelScope/tulu-v2-sft-mixture](https://modelscope.cn/datasets/AI-ModelScope/tulu-v2-sft-mixture/summary)||5119|520.7±437.6, min=68, max=2549|chat, multilingual, general, multi-round|[allenai/tulu-v2-sft-mixture](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture)|
|wikipedia-zh|[AI-ModelScope/wikipedia-cn-20230720-filtered](https://modelscope.cn/datasets/AI-ModelScope/wikipedia-cn-20230720-filtered/summary)||254547|568.4±713.2, min=37, max=78678|text-generation, general, pretrained|[pleisto/wikipedia-cn-20230720-filtered](https://huggingface.co/datasets/pleisto/wikipedia-cn-20230720-filtered)|
|open-orca|[AI-ModelScope/OpenOrca](https://modelscope.cn/datasets/AI-ModelScope/OpenOrca/summary)||994896|382.3±417.4, min=31, max=8740|chat, multilingual, general|-|
|🔥sharegpt-gpt4|[AI-ModelScope/sharegpt_gpt4](https://modelscope.cn/datasets/AI-ModelScope/sharegpt_gpt4/summary)|default<br>V3_format<br>zh_38K_format|72684|1047.6±1313.1, min=22, max=66412|chat, multilingual, general, multi-round, gpt4|-|
|deepctrl-sft|[AI-ModelScope/deepctrl-sft-data](https://modelscope.cn/datasets/AI-ModelScope/deepctrl-sft-data/summary)|default<br>en|14149024|389.8±628.6, min=21, max=626237|chat, general, sft, multi-round|-|
|🔥coig-cqia|[AI-ModelScope/COIG-CQIA](https://modelscope.cn/datasets/AI-ModelScope/COIG-CQIA/summary)|chinese_traditional<br>coig_pc<br>exam<br>finance<br>douban<br>human_value<br>logi_qa<br>ruozhiba<br>segmentfault<br>wiki<br>wikihow<br>xhs<br>zhihu|44694|703.8±654.2, min=33, max=19288|general|-|
|🔥ruozhiba|[AI-ModelScope/ruozhiba](https://modelscope.cn/datasets/AI-ModelScope/ruozhiba/summary)|post-annual<br>title-good<br>title-norm|85658|39.9±13.1, min=21, max=559|pretrain|-|
|long-alpaca-12k|[AI-ModelScope/LongAlpaca-12k](https://modelscope.cn/datasets/AI-ModelScope/LongAlpaca-12k/summary)||11998|9619.0±8295.8, min=36, max=78925|longlora, QA|[Yukang/LongAlpaca-12k](https://huggingface.co/datasets/Yukang/LongAlpaca-12k)|
|🔥ms-agent|[iic/ms_agent](https://modelscope.cn/datasets/iic/ms_agent/summary)||26336|650.9±217.2, min=209, max=2740|chat, agent, multi-round|-|
|🔥ms-agent-for-agentfabric|[AI-ModelScope/ms_agent_for_agentfabric](https://modelscope.cn/datasets/AI-ModelScope/ms_agent_for_agentfabric/summary)|default<br>addition|30000|617.8±199.1, min=251, max=2657|chat, agent, multi-round|-|
|ms-agent-multirole|[iic/MSAgent-MultiRole](https://modelscope.cn/datasets/iic/MSAgent-MultiRole/summary)||9500|447.6±84.9, min=145, max=1101|chat, agent, multi-round, role-play, multi-agent|-|
|🔥toolbench-for-alpha-umi|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|backbone<br>caller<br>planner<br>summarizer|1448337|1439.7±853.9, min=123, max=18467|chat, agent|-|
|damo-agent-zh|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)||386984|956.5±407.3, min=326, max=19001|chat, agent, multi-round|-|
|damo-agent-zh-mini|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)||20845|1326.4±329.6, min=571, max=4304|chat, agent, multi-round|-|
|agent-instruct-all-en|[huangjintao/AgentInstruct_copy](https://modelscope.cn/datasets/huangjintao/AgentInstruct_copy/summary)|alfworld<br>db<br>kg<br>mind2web<br>os<br>webshop|1866|1144.3±635.5, min=206, max=6412|chat, agent, multi-round|-|
|code-alpaca-en|[wyj123456/code_alpaca_en](https://modelscope.cn/datasets/wyj123456/code_alpaca_en/summary)||20016|100.2±60.1, min=29, max=1776|-|[sahil2801/CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)|
|🔥leetcode-python-en|[AI-ModelScope/leetcode-solutions-python](https://modelscope.cn/datasets/AI-ModelScope/leetcode-solutions-python/summary)||2359|727.1±235.9, min=259, max=2146|chat, coding|-|
|🔥codefuse-python-en|[codefuse-ai/CodeExercise-Python-27k](https://modelscope.cn/datasets/codefuse-ai/CodeExercise-Python-27k/summary)||27224|483.6±193.9, min=45, max=3082|chat, coding|-|
|🔥codefuse-evol-instruction-zh|[codefuse-ai/Evol-instruction-66k](https://modelscope.cn/datasets/codefuse-ai/Evol-instruction-66k/summary)||66862|439.6±206.3, min=37, max=2983|chat, coding|-|
|medical-en|[huangjintao/medical_zh](https://modelscope.cn/datasets/huangjintao/medical_zh/summary)|en|117617|257.4±89.1, min=36, max=2564|chat, medical|-|
|medical-zh|[huangjintao/medical_zh](https://modelscope.cn/datasets/huangjintao/medical_zh/summary)|zh|1950972|167.2±219.7, min=26, max=27351|chat, medical|-|
|🔥disc-med-sft-zh|[AI-ModelScope/DISC-Med-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Med-SFT/summary)||441767|354.1±193.1, min=25, max=2231|chat, medical|[Flmc/DISC-Med-SFT](https://huggingface.co/datasets/Flmc/DISC-Med-SFT)|
|lawyer-llama-zh|[AI-ModelScope/lawyer_llama_data](https://modelscope.cn/datasets/AI-ModelScope/lawyer_llama_data/summary)||21476|194.4±91.7, min=27, max=924|chat, law|[Skepsun/lawyer_llama_data](https://huggingface.co/datasets/Skepsun/lawyer_llama_data)|
|tigerbot-law-zh|[AI-ModelScope/tigerbot-law-plugin](https://modelscope.cn/datasets/AI-ModelScope/tigerbot-law-plugin/summary)||55895|109.9±126.4, min=37, max=18878|text-generation, law, pretrained|[TigerResearch/tigerbot-law-plugin](https://huggingface.co/datasets/TigerResearch/tigerbot-law-plugin)|
|🔥disc-law-sft-zh|[AI-ModelScope/DISC-Law-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Law-SFT/summary)||166758|533.7±495.4, min=30, max=15169|chat, law|[ShengbinYue/DISC-Law-SFT](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT)|
|🔥blossom-math-zh|[AI-ModelScope/blossom-math-v2](https://modelscope.cn/datasets/AI-ModelScope/blossom-math-v2/summary)||10000|169.3±58.7, min=35, max=563|chat, math|[Azure99/blossom-math-v2](https://huggingface.co/datasets/Azure99/blossom-math-v2)|
|school-math-zh|[AI-ModelScope/school_math_0.25M](https://modelscope.cn/datasets/AI-ModelScope/school_math_0.25M/summary)||248480|157.7±72.2, min=33, max=3450|chat, math|[BelleGroup/school_math_0.25M](https://huggingface.co/datasets/BelleGroup/school_math_0.25M)|
|open-platypus-en|[AI-ModelScope/Open-Platypus](https://modelscope.cn/datasets/AI-ModelScope/Open-Platypus/summary)||24926|367.9±254.8, min=30, max=3951|chat, math|[garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)|
|text2sql-en|[AI-ModelScope/texttosqlv2_25000_v2](https://modelscope.cn/datasets/AI-ModelScope/texttosqlv2_25000_v2/summary)||25000|274.6±326.4, min=38, max=1975|chat, sql|[Clinton/texttosqlv2_25000_v2](https://huggingface.co/datasets/Clinton/texttosqlv2_25000_v2)|
|🔥sql-create-context-en|[AI-ModelScope/sql-create-context](https://modelscope.cn/datasets/AI-ModelScope/sql-create-context/summary)||78577|80.2±17.8, min=36, max=456|chat, sql|[b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)|
|🔥advertise-gen-zh|[lvjianjin/AdvertiseGen](https://modelscope.cn/datasets/lvjianjin/AdvertiseGen/summary)||98399|130.6±21.7, min=51, max=241|text-generation|[shibing624/AdvertiseGen](https://huggingface.co/datasets/shibing624/AdvertiseGen)|
|🔥dureader-robust-zh|[modelscope/DuReader_robust-QG](https://modelscope.cn/datasets/modelscope/DuReader_robust-QG/summary)||17899|241.1±137.4, min=60, max=1416|text-generation|-|
|cmnli-zh|[modelscope/clue](https://modelscope.cn/datasets/modelscope/clue/summary)|cmnli|404024|82.6±16.6, min=51, max=199|text-generation, classification|[clue](https://huggingface.co/datasets/clue)|
|🔥jd-sentiment-zh|[DAMO_NLP/jd](https://modelscope.cn/datasets/DAMO_NLP/jd/summary)||50000|66.0±83.2, min=39, max=4039|text-generation, classification|-|
|🔥hc3-zh|[simpleai/HC3-Chinese](https://modelscope.cn/datasets/simpleai/HC3-Chinese/summary)|baike<br>open_qa<br>nlpcc_dbqa<br>finance<br>medicine<br>law<br>psychology|39781|176.8±81.5, min=57, max=3051|text-generation, classification|[Hello-SimpleAI/HC3-Chinese](https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese)|
|🔥hc3-en|[simpleai/HC3](https://modelscope.cn/datasets/simpleai/HC3/summary)|finance<br>medicine|11021|298.3±138.7, min=65, max=2267|text-generation, classification|[Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3)|
|finance-en|[wyj123456/finance_en](https://modelscope.cn/datasets/wyj123456/finance_en/summary)||68911|135.6±134.3, min=26, max=3525|chat, financial|[ssbuild/alpaca_finance_en](https://huggingface.co/datasets/ssbuild/alpaca_finance_en)|
|poetry-zh|[modelscope/chinese-poetry-collection](https://modelscope.cn/datasets/modelscope/chinese-poetry-collection/summary)||390309|55.2±9.4, min=23, max=83|text-generation, poetry|-|
|webnovel-zh|[AI-ModelScope/webnovel_cn](https://modelscope.cn/datasets/AI-ModelScope/webnovel_cn/summary)||50000|1478.9±11526.1, min=100, max=490484|chat, novel|[zxbsmk/webnovel_cn](https://huggingface.co/datasets/zxbsmk/webnovel_cn)|
|generated-chat-zh|[AI-ModelScope/generated_chat_0.4M](https://modelscope.cn/datasets/AI-ModelScope/generated_chat_0.4M/summary)||396004|273.3±52.0, min=32, max=873|chat, character-dialogue|[BelleGroup/generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)|
|🔥self-cognition|[swift/self-cognition](https://modelscope.cn/datasets/swift/self-cognition/summary)||134|53.6±18.6, min=29, max=121|chat, self-cognition|[modelscope/self-cognition](https://huggingface.co/datasets/modelscope/self-cognition)|
|cls-fudan-news-zh|[damo/zh_cls_fudan-news](https://modelscope.cn/datasets/damo/zh_cls_fudan-news/summary)||4959|3234.4±2547.5, min=91, max=19548|chat, classification|-|
|ner-jave-zh|[damo/zh_ner-JAVE](https://modelscope.cn/datasets/damo/zh_ner-JAVE/summary)||1266|118.3±45.5, min=44, max=223|chat, ner|-|
|coco-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|299.8±2.8, min=295, max=352|chat, multi-modal, vision|-|
|🔥coco-en-mini|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|40504|299.8±2.6, min=295, max=338|chat, multi-modal, vision|-|
|coco-en-2|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|36.8±2.8, min=32, max=89|chat, multi-modal, vision|-|
|🔥coco-en-2-mini|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|40504|36.8±2.6, min=32, max=75|chat, multi-modal, vision|-|
|capcha-images|[AI-ModelScope/captcha-images](https://modelscope.cn/datasets/AI-ModelScope/captcha-images/summary)||8000|31.0±0.0, min=31, max=31|chat, multi-modal, vision|-|
|aishell1-zh|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||141600|152.2±36.8, min=63, max=419|chat, multi-modal, audio|-|
|🔥aishell1-zh-mini|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||14526|152.2±35.6, min=74, max=359|chat, multi-modal, audio|-|
|hh-rlhf|[AI-ModelScope/hh-rlhf](https://modelscope.cn/datasets/AI-ModelScope/hh-rlhf/summary)|harmless-base<br>helpful-base<br>helpful-online<br>helpful-rejection-sampled|127459|245.4±190.7, min=22, max=1999|rlhf, dpo, pairwise|-|
|🔥hh-rlhf-cn|[AI-ModelScope/hh_rlhf_cn](https://modelscope.cn/datasets/AI-ModelScope/hh_rlhf_cn/summary)|hh_rlhf<br>harmless_base_cn<br>harmless_base_en<br>helpful_base_cn<br>helpful_base_en|355920|171.2±122.7, min=22, max=3078|rlhf, dpo, pairwise|-|
|stack-exchange-paired|[AI-ModelScope/stack-exchange-paired](https://modelscope.cn/datasets/AI-ModelScope/stack-exchange-paired/summary)||4483004|534.5±594.6, min=31, max=56588|hfrl, dpo, pairwise|-|
|shareai-llama3-dpo-zh-en-emoji|[hjh0119/shareAI-Llama3-DPO-zh-en-emoji](https://modelscope.cn/datasets/hjh0119/shareAI-Llama3-DPO-zh-en-emoji/summary)|default|2449|334.0±162.8, min=36, max=1801|rlhf, dpo, pairwise|-|
|pileval|[huangjintao/pile-val-backup](https://modelscope.cn/datasets/huangjintao/pile-val-backup/summary)||214670|1612.3±8856.2, min=11, max=1208955|text-generation, awq|[mit-han-lab/pile-val-backup](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)|
# VLLM Inference Acceleration and Deployment
## Table of Contents
- [Environment Preparation](#environment-preparation)
- [Inference Acceleration](#inference-acceleration)
- [Web-UI Acceleration](#web-ui-acceleration)
- [Deployment](#deployment)
## Environment Preparation
GPU devices: A10, 3090, V100, A100 are all supported.
```bash
# Set pip global mirror (speeds up downloads)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Install ms-swift
pip install 'ms-swift[llm]' -U
# vllm version corresponds to cuda version, please select version according to `https://docs.vllm.ai/en/latest/getting_started/installation.html`
pip install vllm
pip install openai -U
# Environment alignment (usually not needed. If you get errors, you can run the code below, the repo uses the latest environment for testing)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```
## Inference Acceleration
vllm does not support bnb quantized models. The models supported by vllm can be found in [Supported Models](Supported-models-datasets.md#Models).
### qwen-7b-chat
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_vllm
)
model_type = ModelType.qwen_7b_chat
llm_engine = get_vllm_engine(model_type)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# Similar to `transformers.GenerationConfig` interface
llm_engine.generation_config.max_new_tokens = 256
request_list = [{'query': 'Hello!'}, {'query': 'Where is the capital of Zhejiang?'} ]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
history1 = resp_list[1]['history']
request_list = [{'query': 'What delicious food is there', 'history': history1}]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
print(f"history: {resp['history']}")
"""Out[0]
query: Hello!
response: Hello! I'm happy to be of service. Is there anything I can help you with?
query: Where is the capital of Zhejiang?
response: Hangzhou is the capital of Zhejiang Province.
query: What delicious food is there
response: Hangzhou is a city of gastronomy, with many famous dishes and snacks such as West Lake Vinegar Fish, Dongpo Pork, Beggar's Chicken, etc. In addition, Hangzhou has many snack shops where you can taste a variety of local delicacies.
history: [('Where is the capital of Zhejiang?', 'Hangzhou is the capital of Zhejiang Province.'), ('What delicious food is there', "Hangzhou is a city of gastronomy, with many famous dishes and snacks such as West Lake Vinegar Fish, Dongpo Pork, Beggar's Chicken, etc. In addition, Hangzhou has many snack shops where you can taste a variety of local delicacies.")]
"""
```
### Streaming Output
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_stream_vllm
)
model_type = ModelType.qwen_7b_chat
llm_engine = get_vllm_engine(model_type)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# Similar to `transformers.GenerationConfig` interface
llm_engine.generation_config.max_new_tokens = 256
request_list = [{'query': 'Hello!'}, {'query': 'Where is the capital of Zhejiang?'}]
gen = inference_stream_vllm(llm_engine, template, request_list)
query_list = [request['query'] for request in request_list]
print(f"query_list: {query_list}")
for resp_list in gen:
response_list = [resp['response'] for resp in resp_list]
print(f'response_list: {response_list}')
history1 = resp_list[1]['history']
request_list = [{'query': 'What delicious food is there', 'history': history1}]
gen = inference_stream_vllm(llm_engine, template, request_list)
query = request_list[0]['query']
print(f"query: {query}")
for resp_list in gen:
response = resp_list[0]['response']
print(f'response: {response}')
history = resp_list[0]['history']
print(f'history: {history}')
"""Out[0]
query_list: ['Hello!', 'Where is the capital of Zhejiang?']
...
response_list: ['Hello! I'm happy to be of service. Is there anything I can help you with?', 'Hangzhou is the capital of Zhejiang Province.']
query: What delicious food is there
...
response: Hangzhou is a city of gastronomy, with many famous dishes and snacks such as West Lake Vinegar Fish, Dongpo Pork, Beggar's Chicken, etc. In addition, Hangzhou has many snack shops where you can taste a variety of local delicacies.
history: [('Where is the capital of Zhejiang?', 'Hangzhou is the capital of Zhejiang Province.'), ('What delicious food is there', "Hangzhou is a city of gastronomy, with many famous dishes and snacks such as West Lake Vinegar Fish, Dongpo Pork, Beggar's Chicken, etc. In addition, Hangzhou has many snack shops where you can taste a variety of local delicacies.")]
"""
```
### chatglm3
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_vllm
)
model_type = ModelType.chatglm3_6b
llm_engine = get_vllm_engine(model_type)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# Similar to `transformers.GenerationConfig` interface
llm_engine.generation_config.max_new_tokens = 256
request_list = [{'query': 'Hello!'}, {'query': 'Where is the capital of Zhejiang?'}]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
history1 = resp_list[1]['history']
request_list = [{'query': 'What delicious food is there', 'history': history1}]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
print(f"history: {resp['history']}")
"""Out[0]
query: Hello!
response: Hello, I am an AI assistant. I'm very pleased to serve you! Do you have any questions I can help you answer?
query: Where is the capital of Zhejiang?
response: The capital of Zhejiang is Hangzhou.
query: What delicious food is there
response: Zhejiang has many delicious foods, some of the most famous ones include Longjing Shrimp from Hangzhou, Dongpo Pork, West Lake Vinegar Fish, Beggar's Chicken, etc. In addition, Zhejiang also has many specialty snacks and pastries, such as Tang Yuan and Nian Gao from Ningbo, stir-fried crab and Wenzhou meatballs from Wenzhou, etc.
history: [('Where is the capital of Zhejiang?', 'The capital of Zhejiang is Hangzhou.'), ('What delicious food is there', 'Zhejiang has many delicious foods, some of the most famous ones include Longjing Shrimp from Hangzhou, Dongpo Pork, West Lake Vinegar Fish, Beggar's Chicken, etc. In addition, Zhejiang also has many specialty snacks and pastries, such as Tang Yuan and Nian Gao from Ningbo, stir-fried crab and Wenzhou meatballs from Wenzhou, etc.')]
"""
```
### Using CLI
```bash
# qwen
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-7b-chat --infer_backend vllm
# yi
CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-6b-chat --infer_backend vllm
# gptq
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat-int4 --infer_backend vllm
```
### Fine-tuned Models
**Single sample inference**:
For models fine-tuned using LoRA, you need to first [merge-lora](LLM-fine-tuning.md#merge-lora) to generate a complete checkpoint directory.
Models fine-tuned with full parameters can seamlessly use VLLM for inference acceleration.
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_vllm
)
ckpt_dir = 'vx-xxx/checkpoint-100-merged'
model_type = ModelType.qwen_7b_chat
template_type = get_default_template_type(model_type)
llm_engine = get_vllm_engine(model_type, model_id_or_path=ckpt_dir)
tokenizer = llm_engine.hf_tokenizer
template = get_template(template_type, tokenizer)
query = 'Hello'
resp = inference_vllm(llm_engine, template, [{'query': query}])[0]
print(f"response: {resp['response']}")
print(f"history: {resp['history']}")
```
**Using CLI**:
```bash
# merge LoRA incremental weights and use vllm for inference acceleration
# If you need quantization, you can specify `--quant_bits 4`.
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true
# Evaluate using dataset
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged' \
--infer_backend vllm \
--load_dataset_config true \
# Manual evaluation
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged' \
--infer_backend vllm \
```
## Web-UI Acceleration
### Original Models
```bash
CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b-chat --infer_backend vllm
```
### Fine-tuned Models
```bash
# merge LoRA incremental weights and use vllm as backend to build app-ui
# If you need quantization, you can specify `--quant_bits 4`.
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged' --infer_backend vllm
```
## Deployment
Swift uses VLLM as the inference backend and is compatible with the OpenAI API style.
For server deployment command line arguments, refer to: [deploy command line arguments](Command-line-parameters.md#deploy-Parameters).
For OpenAI API arguments on the client side, refer to: https://platform.openai.com/docs/api-reference/introduction.
### Original Models
#### qwen-7b-chat
**Server side:**
```bash
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-7b-chat
# Multi-GPU deployment
RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0,1,2,3 swift deploy --model_type qwen-7b-chat --tensor_parallel_size 4
```
**Client side:**
Test:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-7b-chat",
"messages": [{"role": "user", "content": "What to do if I can't fall asleep at night?"}],
"max_tokens": 256,
"temperature": 0
}'
```
Synchronous client interface using swift:
```python
from swift.llm import get_model_list_client, XRequestConfig, inference_client
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
query = 'Where is the capital of Zhejiang?'
request_config = XRequestConfig(seed=42)
resp = inference_client(model_type, query, request_config=request_config)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
history = [(query, response)]
query = 'What delicious food is there?'
request_config = XRequestConfig(stream=True, seed=42)
stream_resp = inference_client(model_type, query, history, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""Out[0]
model_type: qwen-7b-chat
query: Where is the capital of Zhejiang?
response: The capital of Zhejiang Province is Hangzhou.
query: What delicious food is there?
response: Hangzhou has many delicious foods, such as West Lake Vinegar Fish, Dongpo Pork, Longjing Shrimp, Beggar's Chicken, etc. In addition, Hangzhou also has many specialty snacks, such as West Lake Lotus Root Powder, Hangzhou Xiao Long Bao, Hangzhou You Tiao, etc.
"""
```
Asynchronous client interface using swift:
```python
import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
query = 'Where is the capital of Zhejiang?'
request_config = XRequestConfig(seed=42)
resp = asyncio.run(inference_client_async(model_type, query, request_config=request_config))
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
async def _stream():
global query
history = [(query, response)]
query = 'What delicious food is there?'
request_config = XRequestConfig(stream=True, seed=42)
stream_resp = await inference_client_async(model_type, query, history, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
async for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
asyncio.run(_stream())
"""Out[0]
model_type: qwen-7b-chat
query: Where is the capital of Zhejiang?
response: The capital of Zhejiang Province is Hangzhou.
query: What delicious food is there?
response: Hangzhou has many delicious foods, such as West Lake Vinegar Fish, Dongpo Pork, Longjing Shrimp, Beggar's Chicken, etc. In addition, Hangzhou also has many specialty snacks, such as West Lake Lotus Root Powder, Hangzhou Xiao Long Bao, Hangzhou You Tiao, etc.
"""
```
Using OpenAI (synchronous):
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
query = 'Where is the capital of Zhejiang?'
messages = [{
'role': 'user',
'content': query
}]
resp = client.chat.completions.create(
model=model_type,
messages=messages,
seed=42)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
# Streaming
messages.append({'role': 'assistant', 'content': response})
query = 'What delicious food is there?'
messages.append({'role': 'user', 'content': query})
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
seed=42)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""Out[0]
model_type: qwen-7b-chat
query: Where is the capital of Zhejiang?
response: The capital of Zhejiang Province is Hangzhou.
query: What delicious food is there?
response: Hangzhou has many delicious foods, such as West Lake Vinegar Fish, Dongpo Pork, Longjing Shrimp, Beggar's Chicken, etc. In addition, Hangzhou also has many specialty snacks, such as West Lake Lotus Root Powder, Hangzhou Xiao Long Bao, Hangzhou You Tiao, etc.
"""
```
#### qwen-7b
**Server side:**
```bash
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-7b
# Multi-GPU deployment
RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0,1,2,3 swift deploy --model_type qwen-7b --tensor_parallel_size 4
```
**Client side:**
Test:
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-7b",
"prompt": "Zhejiang -> Hangzhou\nAnhui -> Hefei\nSichuan ->",
"max_tokens": 32,
"temperature": 0.1,
"seed": 42
}'
```
Synchronous client interface using swift:
```python
from swift.llm import get_model_list_client, XRequestConfig, inference_client
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
query = 'Zhejiang -> Hangzhou\nAnhui -> Hefei\nSichuan ->'
request_config = XRequestConfig(max_tokens=32, temperature=0.1, seed=42)
resp = inference_client(model_type, query, request_config=request_config)
response = resp.choices[0].text
print(f'query: {query}')
print(f'response: {response}')
request_config.stream = True
stream_resp = inference_client(model_type, query, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].text, end='', flush=True)
print()
"""Out[0]
model_type: qwen-7b
query: Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan ->
response: Chengdu
Guangdong -> Guangzhou
Jiangsu -> Nanjing
Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan -> Chengdu
query: Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan ->
response: Chengdu
Guangdong -> Guangzhou
Jiangsu -> Nanjing
Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan -> Chengdu
"""
```
Asynchronous client interface using swift:
```python
import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
query = 'Zhejiang -> Hangzhou\nAnhui -> Hefei\nSichuan ->'
request_config = XRequestConfig(max_tokens=32, temperature=0.1, seed=42)
resp = asyncio.run(inference_client_async(model_type, query, request_config=request_config))
response = resp.choices[0].text
print(f'query: {query}')
print(f'response: {response}')
async def _stream():
request_config.stream = True
stream_resp = await inference_client_async(model_type, query, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
async for chunk in stream_resp:
print(chunk.choices[0].text, end='', flush=True)
print()
asyncio.run(_stream())
"""Out[0]
model_type: qwen-7b
query: Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan ->
response: Chengdu
Guangdong -> Guangzhou
Jiangsu -> Nanjing
Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan -> Chengdu
query: Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan ->
response: Chengdu
Guangdong -> Guangzhou
Jiangsu -> Nanjing
Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan -> Chengdu
"""
```
Using OpenAI (synchronous):
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
query = 'Zhejiang -> Hangzhou\nAnhui -> Hefei\nSichuan ->'
kwargs = {'model': model_type, 'prompt': query, 'seed': 42, 'temperature': 0.1, 'max_tokens': 32}
resp = client.completions.create(**kwargs)
response = resp.choices[0].text
print(f'query: {query}')
print(f'response: {response}')
# Streaming
stream_resp = client.completions.create(stream=True, **kwargs)
response = resp.choices[0].text
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].text, end='', flush=True)
print()
"""Out[0]
model_type: qwen-7b
query: Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan ->
response: Chengdu
Guangdong -> Guangzhou
Jiangsu -> Nanjing
Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan -> Chengdu
query: Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan ->
response: Chengdu
Guangdong -> Guangzhou
Jiangsu -> Nanjing
Zhejiang -> Hangzhou
Anhui -> Hefei
Sichuan -> Chengdu
"""
```
### Fine-tuned Models
Server side:
```bash
# merge LoRA incremental weights and deploy
# If you need quantization, you can specify `--quant_bits 4`.
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged'
```
The example code for the client side is the same as the original models.
## Multiple LoRA Deployments
The current model deployment method now supports multiple LoRA deployments with `peft>=0.10.0`. The specific steps are:
- Ensure `merge_lora` is set to `False` during deployment.
- Use the `--lora_modules` argument, which can be referenced in the [command line documentation](Command-line-parameters.md).
- Specify the name of the LoRA tuner in the model field during inference.
Example:
```shell
# Assuming a LoRA model named Kakarot was trained from llama3-8b-instruct
# Server side
swift deploy --ckpt_dir /mnt/ckpt-1000 --infer_backend pt --lora_modules my_tuner=/mnt/my-tuner
# This loads two tuners, one is `default-lora` from `/mnt/ckpt-1000`, and the other is `my_tuner` from `/mnt/my-tuner`
# Client side
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "my-tuner",
"messages": [{"role": "user", "content": "who are you?"}],
"max_tokens": 256,
"temperature": 0
}'
# resp: I am Kakarot...
# If the mode='llama3-8b-instruct' is specified, it will return I'm llama3..., which is the response of the original model
```
> [!NOTE]
>
> If the `--ckpt_dir` parameter is a LoRA path, the original default will be loaded onto the default-lora tuner, and other tuners need to be loaded through `lora_modules` manually.
## VLLM & LoRA
Models supported by VLLM & LoRA can be viewed at: https://docs.vllm.ai/en/latest/models/supported_models.html
### Setting Up LoRA
```shell
# Experimental environment: 4 * A100
# 4 * 30GB GPU memory
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
swift sft \
--model_type llama2-7b-chat \
--dataset self-cognition#500 sharegpt-gpt4:default#1000 \
--logging_steps 5 \
--max_length 4096 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 'Xiao Huang' \
--model_author ModelScope \
```
Convert LoRA from swift format to peft format:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/llama2-7b-chat/vx-xxx/checkpoint-xxx \
--to_peft_format true
```
### Accelerating VLLM Inference
Inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/llama2-7b-chat/vx-xxx/checkpoint-xxx-peft \
--infer_backend vllm \
--vllm_enable_lora true
```
Inference results:
```python
"""
<<< who are you?
I am an artificial intelligence language model developed by ModelScope. I am designed to assist and communicate with users in a helpful and respectful manner. I can answer questions, provide information, and engage in conversation. How can I help you?
"""
```
Single sample inference:
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_stream_vllm, LoRARequest, inference_vllm
)
lora_checkpoint = 'output/llama2-7b-chat/vx-xxx/checkpoint-xxx-peft'
lora_request = LoRARequest('default-lora', 1, lora_checkpoint)
model_type = ModelType.llama2_7b_chat
llm_engine = get_vllm_engine(model_type, torch.float16, enable_lora=True,
max_loras=1, max_lora_rank=16)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# Interface similar to `transformers.GenerationConfig`
llm_engine.generation_config.max_new_tokens = 256
# using lora
request_list = [{'query': 'who are you?'}]
query = request_list[0]['query']
resp_list = inference_vllm(llm_engine, template, request_list, lora_request=lora_request)
response = resp_list[0]['response']
print(f'query: {query}')
print(f'response: {response}')
# without lora
gen = inference_stream_vllm(llm_engine, template, request_list)
query = request_list[0]['query']
print(f'query: {query}\nresponse: ', end='')
print_idx = 0
for resp_list in gen:
response = resp_list[0]['response']
print(response[print_idx:], end='', flush=True)
print_idx = len(response)
print()
"""
query: who are you?
response: I am an artificial intelligence language model developed by ModelScope. I can understand and respond to text-based questions and prompts, and provide information and assistance on a wide range of topics.
query: who are you?
response: Hello! I'm just an AI assistant, here to help you with any questions or tasks you may have. I'm designed to be helpful, respectful, and honest in my responses, and I strive to provide socially unbiased and positive answers. I'm not a human, but a machine learning model trained on a large dataset of text to generate responses to a wide range of questions and prompts. I'm here to help you in any way I can, while always ensuring that my answers are safe and respectful. Is there anything specific you'd like to know or discuss?
"""
```
### Deployment
**Server**:
```shell
CUDA_VISIBLE_DEVICES=0 swift deploy \
--ckpt_dir output/llama2-7b-chat/vx-xxx/checkpoint-xxx-peft \
--infer_backend vllm \
--vllm_enable_lora true
```
**Client**:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default-lora",
"messages": [{"role": "user", "content": "who are you?"}],
"max_tokens": 256,
"temperature": 0
}'
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama2-7b-chat",
"messages": [{"role": "user", "content": "who are you?"}],
"max_tokens": 256,
"temperature": 0
}'
```
Output:
```python
"""
{"model":"default-lora","choices":[{"index":0,"message":{"role":"assistant","content":"I am an artificial intelligence language model developed by ModelScope. I am designed to assist and communicate with users in a helpful, respectful, and honest manner. I can answer questions, provide information, and engage in conversation. How can I assist you?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":141,"completion_tokens":53,"total_tokens":194},"id":"chatcmpl-fb95932dcdab4ce68f4be49c9946b306","object":"chat.completion","created":1710820459}
{"model":"llama2-7b-chat","choices":[{"index":0,"message":{"role":"assistant","content":" Hello! I'm just an AI assistant, here to help you with any questions or concerns you may have. I'm designed to provide helpful, respectful, and honest responses, while ensuring that my answers are socially unbiased and positive in nature. I'm not capable of providing harmful, unethical, racist, sexist, toxic, dangerous, or illegal content, and I will always do my best to explain why I cannot answer a question if it does not make sense or is not factually coherent. If I don't know the answer to a question, I will not provide false information. My goal is to assist and provide accurate information to the best of my abilities. Is there anything else I can help you with?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":141,"completion_tokens":163,"total_tokens":304},"id":"chatcmpl-d867a3a52bb7451588d4f73e1df4ba95","object":"chat.completion","created":1710820557}
"""
```
With openai:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type_list = [model.id for model in client.models.list().data]
print(f'model_type_list: {model_type_list}')
query = 'who are you?'
messages = [{
'role': 'user',
'content': query
}]
resp = client.chat.completions.create(
model='default-lora',
messages=messages,
seed=42)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
# stream
stream_resp = client.chat.completions.create(
model='llama2-7b-chat',
messages=messages,
stream=True,
seed=42)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""Out[0]
model_type_list: ['llama2-7b-chat', 'default-lora']
query: who are you?
response: I am an artificial intelligence language model developed by ModelScope. I am designed to assist and communicate with users in a helpful, respectful, and honest manner. I can answer questions, provide information, and engage in conversation. How can I assist you?
query: who are you?
response: Hello! I'm just an AI assistant, here to help you with any questions or concerns you may have. I'm designed to provide helpful, respectful, and honest responses, while ensuring that my answers are socially unbiased and positive in nature. I'm not capable of providing harmful, unethical, racist, sexist, toxic, dangerous, or illegal content, and I will always do my best to explain why I cannot answer a question if it does not make sense or is not factually coherent. If I don't know the answer to a question, I will not provide false information. Is there anything else I can help you with?
"""
```
## LLM Documentation
### 📚Tutorials!
1. [LLM Inference](LLM-inference.md)
2. [LLM Finetuning](LLM-fine-tuning.md)
3. [DPO Training](DPO.md)
4. [Web-ui Training and Inference](../GetStarted/Web-ui.md)
5. [LLM Evaluation](LLM-eval.md)
6. [LLM Quantization](LLM-quantization.md)
7. [VLLM Inference and Deployment](VLLM-inference-acceleration-and-deployment.md)
8. [LLM Experimental](LLM-exp.md)
9. [ORPO Training](ORPO.md)
10. [SimPO Training](SimPO.md)
### ⭐️Best Practices!
1. [Self Cognition Best Practice](Self-cognition-best-practice.md)
2. [Agent Training and Inference Best Practice](Agent-fine-tuning-best-practice.md)
3. [Agent deployment best practice](Agent-deployment-best-practice.md)
4. [Qwen1.5 Best Practice](Qwen1.5-best-practice.md)
5. [NPU Best Practice](NPU-best-practice.md)
6. [Grok-1 Training and Inference Best Practice](Grok-1-best-practice.md)
### 🐔References!
1. [Customization for models and datasets](Customization.md)
2. [Command Line Parameters](Command-line-parameters.md)
3. [Supported models and datasets](Supported-models-datasets.md)
4. [Benchmark](Benchmark.md)
5. [Compatible with the HuggingFace ecosystem](Compat-HF.md)
### 🍀Multi-Modal Best Practices!
Please check: [Multi-Modal Best Practices](../Multi-Modal/index.md)
# CogVLM Best Practice
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```
## Inference
Inference with [cogvlm-17b-chat](https://modelscope.cn/models/ZhipuAI/cogvlm-chat/summary):
```shell
# Experimental environment: A100
# 38GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type cogvlm-17b-chat
```
Output: (supports passing local path or URL)
```python
"""
<<< Describe this image.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
This image showcases a close-up of a young kitten. The kitten has a fluffy coat with a mix of white, gray, and brown colors. Its eyes are strikingly blue, and it appears to be gazing directly at the viewer. The background is blurred, emphasizing the kitten as the main subject.
--------------------------------------------------
<<< clear
<<< How many sheep are in the picture?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the picture.
--------------------------------------------------
<<< clear
<<< What is the calculation result?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The calculation result is '1452+45304=45456'.
--------------------------------------------------
<<< clear
<<< Write a poem based on the content of the picture.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
In a world where night and day intertwine,
A boat floats gently, reflecting the moon's shine.
Fireflies dance, their glow a mesmerizing trance,
As the boat sails through a tranquil, enchanted expanse.
"""
```
Example images are shown below:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single-sample inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.cogvlm_17b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, _ = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest?'
images = images
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, _ in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
"""
query: How far is it from each city?
response: From Mata, it is 14 km; from Yangjiang, it is 62 km; and from Guangzhou, it is 293 km.
query: Which city is the farthest?
response: Guangzhou is the farthest city with a distance of 293 km.
"""
```
Example image is shown below:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
(By default, lora fine-tuning is performed on the qkv of the language and vision models. If you want to fine-tune all linears, you can specify `--lora_target_modules ALL`)
```shell
# Experimental environment: A100
# 50GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type cogvlm-17b-chat \
--dataset coco-en-2-mini \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json, jsonl formats. Here is an example of a custom dataset:
(Supports multi-turn dialogue, but each conversation can only include one image. Support local file paths or URLs for input)
```jsonl
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/cogvlm-17b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/cogvlm-17b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/cogvlm-17b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
# CogVLM2 Best Practice
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```
Model link:
- cogvlm2-19b-chat: [https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B/summary](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B/summary)
- cogvlm2-en-19b-chat: [https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chat-19B/summary](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chat-19B/summary)
## Inference
Inference cogvlm2-19b-chat:
```shell
# Experimental environment: A100
# 43GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type cogvlm2-19b-chat
```
Output: (supports passing local path or URL)
```python
"""
<<< Describe this image.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
This image features a very young, fluffy kitten with a look of innocent curiosity. The kitten's fur is a mix of white, light brown, and dark gray with distinctive dark stripes that give it a tabby appearance. Its large, round eyes are a striking shade of blue with light reflections, which accentuate its youthful and tender expression. The ears are perky and alert, with a light pink hue inside, adding to the kitten's endearing look.
The kitten's fur is thick and appears to be well-groomed, with a soft, plush texture that suggests it is a breed known for its long, luxurious coat, such as a Maine Coon or a Persian. The white fur around its neck and chest stands out, providing a stark contrast to the darker shades on its back and head.
The background is blurred and warm-toned, providing a soft, neutral environment that ensures the kitten is the central focus of the image. The lighting is gentle, highlighting the kitten's features without casting harsh shadows, which further contributes to the image's warm and comforting ambiance.
Overall, the image captures the essence of a kitten's first year of life, characterized by its inquisitive nature, soft fur, and the endearing charm of youth.
--------------------------------------------------
<<< clear
<<< How many sheep are in the picture?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the picture.
--------------------------------------------------
<<< clear
<<< What is the calculation result?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The calculation result is 46556.
--------------------------------------------------
<<< clear
<<< Write a poem based on the content of the picture.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
A boat drifts on the calm river,
Surrounded by lush greenery.
The gentle ripples on the water's surface,
Reflect the stars in the sky.
The night is serene and peaceful,
The only sound is the lapping of the waves.
The boat is like a floating island,
Isolated in the vastness of the river.
The stars shine brightly in the sky,
As if watching over the boat.
The lush greenery on the riverbank,
Gives the boat a sense of warmth.
The boat drifts on the river,
Carrying the beauty of nature.
Let us enjoy this moment together,
And feel the tranquility of life.
"""
```
Example images are shown below:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single-sample inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.cogvlm2_19b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, _ = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest?'
images = images
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, _ in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
"""
query: How far is it from each city?
response: To determine the distance from each city, we will need to look at the information provided on the road sign:
1. From "Mata" to "Yangjiang," it is 62 kilometers.
2. From "Yangjiang" to "Guangzhou," it is 293 kilometers.
These distances are indicated in kilometers and are shown for the two cities immediately following on the sign.
query: Which city is the farthest?
response: The farthest city on this sign is Guangzhou, which is 293 kilometers away.
"""
```
Example image is shown below:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
(By default, lora fine-tuning is performed on the qkv of the language and vision models. If you want to fine-tune all linears, you can specify `--lora_target_modules ALL`)
```shell
# Experimental environment: A100
# 70GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type cogvlm2-19b-chat \
--dataset coco-en-2-mini \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json, jsonl formats. Here is an example of a custom dataset:
(Supports multi-turn dialogue, but each conversation can only include one image. Support local file paths or URLs for input)
```jsonl
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/cogvlm2-19b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/cogvlm2-19b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/cogvlm2-19b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
# Deepseek-VL Best Practice
## Table of Contents
- [Environment Preparation](#environment-preparation)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Preparation
```shell
pip install 'ms-swift[llm]' -U
pip install attrdict
```
Model Link:
- deepseek-vl-1_3b-chat: [https://www.modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat/summary](https://www.modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat/summary)
- deepseek-vl-7b-chat: [https://www.modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/summary](https://www.modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/summary)
## Inference
Inference for deepseek-vl-7b-chat:
```shell
# Experimental environment: A100
# 30GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type deepseek-vl-7b-chat
# If you want to run it on 3090, you can infer the 1.3b model
CUDA_VISIBLE_DEVICES=0 swift infer --model_type deepseek-vl-1_3b-chat
```
7b model effect demonstration: (supports passing local paths or URLs)
```python
"""
<<< Who are you?
I am an AI language model, designed to understand and generate human-like text based on the input I receive. I am not a human, but I am here to help answer your questions and assist you with any tasks you may have.
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two images?
The image provided is a close-up of a kitten with big blue eyes, looking directly at the camera. The kitten appears to be a domestic cat, specifically a kitten, given its small size and youthful features. The image is a high-resolution, detailed photograph that captures the kitten's facial features and fur texture.
The second image is a cartoon illustration of three sheep standing in a grassy field with mountains in the background. The sheep are white with brown faces and legs, and they have large, round eyes. The illustration is stylized with a flat, two-dimensional appearance, and the colors are bright and vibrant. The sheep are evenly spaced and facing forward, giving the impression of a peaceful, pastoral scene.
The differences between the two images are primarily in their artistic styles and subjects. The first image is a realistic photograph of a kitten, while the second image is a stylized cartoon illustration of sheep. The first image is a photograph with a focus on the kitten's facial features and fur texture, while the second image is a cartoon with a focus on the sheep's characters and the setting. The first image is a single subject, while the second image features multiple subjects in a group.
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>How many sheep are there in the picture?
There are four sheep in the picture.
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>What is the result of the calculation?
The result of the calculation is 1452 + 45304 = 46756.
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>Write a poem based on the content of the In the tranquil waters, a boat gently floats,
A beacon of light, a lone candle's soft glow.
The night is vast, a canvas of stars above,
A serene scene, a moment of peace, it seems to offer.
The boat, a vessel of wood and a mast so tall,
Carries a passenger, a figure so still.
The water's surface, a mirror of the night sky,
Reflects the boat's silhouette, a sight so divine.
The trees, standing tall, their forms in the distance,
A forest of mystery, a silent chorus.
The stars, scattered like diamonds in the heavens,
Illuminate the night, a celestial dance.
The boat, a symbol of journey and adventure,
In the quiet of the night, it's a sight to behold.
A moment frozen in time, a memory to cherish,
In the picture of the night, a boat on the water.
"""
```
Sample images are as follows:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single sample inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.deepseek_vl_7b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = '<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it from each city?'
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it from each city?
response: The distance from each city is as follows:
- From "Mata", it is 14 km away.
- From "Yangjiang", it is 62 km away.
- From "Guangzhou", it is 293 km away.
These distances are clearly indicated on the green road sign with white text, providing the necessary information for travelers to gauge the distance to each city from the current location.
query: Which city is the farthest?
response: The farthest city from the current location is "Guangzhou", which is 293 km away. This is indicated by the longest number on the green road sign, which is larger than the distances to the other cities listed.
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it from each city?', 'The distance from each city is as follows:\n\n- From "Mata", it is 14 km away.\n- From "Yangjiang", it is 62 km away.\n- From "Guangzhou", it is 293 km away.\n\nThese distances are clearly indicated on the green road sign with white text, providing the necessary information for travelers to gauge the distance to each city from the current location.'], ['Which city is the farthest?', 'The farthest city from the current location is "Guangzhou", which is 293 km away. This is indicated by the longest number on the green road sign, which is larger than the distances to the other cities listed.']]
"""
```
Sample image is as follows:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Multi-modal large model fine-tuning usually uses **custom datasets**. Here is a runnable demo:
LoRA fine-tuning:
(By default, only lora fine-tuning is performed on the qkv part of the LLM. If you want to fine-tune all linear parts including the vision model, you can specify `--lora_target_modules ALL`)
```shell
# Experimental environment: A10, 3090, V100
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type deepseek-vl-7b-chat \
--dataset coco-en-mini \
```
Full parameter fine-tuning:
```shell
# Experimental environment: 4 * A100
# 4 * 70GB GPU memory
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type deepseek-vl-7b-chat \
--dataset coco-en-mini \
--sft_type full \
--use_flash_attn true \
--deepspeed default-zero2
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) supports json, jsonl styles. The following is an example of a custom dataset:
(Supports multi-turn conversations, supports multiple images per turn or no images, supports input of local paths or URLs)
```json
[
{"conversations": [
{"from": "user", "value": "<img>img_path</img>11111"},
{"from": "assistant", "value": "22222"}
]},
{"conversations": [
{"from": "user", "value": "<img>img_path</img><img>img_path2</img><img>img_path3</img>aaaaa"},
{"from": "assistant", "value": "bbbbb"},
{"from": "user", "value": "<img>img_path</img>ccccc"},
{"from": "assistant", "value": "ddddd"}
]},
{"conversations": [
{"from": "user", "value": "AAAAA"},
{"from": "assistant", "value": "BBBBB"},
{"from": "user", "value": "CCCCC"},
{"from": "assistant", "value": "DDDDD"}
]}
]
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
# GLM4V Best Practice
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```
Model link:
- glm4v-9b-chat: [https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)
## Inference
Inference glm4v-9b-chat:
```shell
# Experimental environment: A100
# 30GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type glm4v-9b-chat
```
Output: (supports passing local path or URL)
```python
"""
<<< Describe this image.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
This is an image of a close-up of a kitten's face. The kitten has a fluffy coat with a mix of grey, white, and brown patches. The fur appears soft and well-groomed, with a gradient of colors that gives the appearance of a watercolor painting. The kitten's ears are perky and pointed, with a light brown inner coloring that matches the fur on its face.
The kitten's eyes are the most striking feature of this image. They are large, round, and a vivid blue, with a hint of green at the edges. The irises are clear and bright, and the pupils are slightly dilated, giving the eyes a lively and attentive look. The white fur around the eyes is well-defined, with a few whisker tufts poking out from the corners.
The kitten's nose is small and pink, with a slightly upturned tip, which is common in many breeds. The whiskers are long and white, and they are spread out symmetrically around the nose and mouth area. The mouth is closed, and the kitten's expression is one of curiosity or alertness.
The background is blurred, with a soft focus on what appears to be a green surface, possibly a plant or a blurred background element that doesn't detract from the kitten's features. The lighting in the image is gentle, with a warm tone that enhances the softness of the kitten's fur and the sparkle in its eyes.
--------------------------------------------------
<<< clear
<<< How many sheep are in the picture?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the picture.
--------------------------------------------------
<<< clear
<<< What is the calculation result?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The calculation result of 1452 + 45304 is 46756.
--------------------------------------------------
<<< clear
<<< Write a poem based on the content of the picture.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
In twilight's gentle descent, a boat drifts on glassy waves,
A lone figure stands, a sentinel, amidst the quiet maze.
The forest whispers through the trees, a symphony so serene,
As stars begin to twinkle, painting the sky with specks of meren.
The lantern's soft glow dances on the water's surface fair,
A beacon in the night, a promise of a haven near.
The boat, an ancient vessel, carries tales untold,
Of journeys past and futures bright, a silent witness to the fold.
The air is filled with mystery, the whispers of the wind,
As the boat glides through the night, a dream upon the tide.
The stars above, a celestial ballet, a dance of light and shade,
As the boat carries on its way, through the night, afloat and free.
"""
```
Example images are shown below:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single-sample inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.glm4v_9b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, _ = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest?'
images = images
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, _ in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
"""
query: How far is it from each city?
response: The distance from each city to the next one is as follows:
1. From Mata to Yangjiang: 62 kilometers
2. From Yangjiang to Guangzhou: 293 kilometers
So, the total distance from Mata to Guangzhou is 62 kilometers (to Yangjiang) plus 293 kilometers (from Yangjiang to Guangzhou), which equals 355 kilometers.
query: Which city is the farthest?
response: The city that is the farthest away from the current location, as indicated on the road sign, is Guangzhou. It is 293 kilometers away.
"""
```
Example image is shown below:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
(By default, lora fine-tuning is performed on the qkv of the language and vision models. If you want to fine-tune all linears, you can specify `--lora_target_modules ALL`)
```shell
# Experimental environment: A100
# 40GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type glm4v-9b-chat \
--dataset coco-en-2-mini \
# DDP
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type glm4v-9b-chat \
--dataset coco-en-2-mini#10000 \
--ddp_find_unused_parameters true \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json, jsonl formats. Here is an example of a custom dataset:
(Supports multi-turn dialogue, but each conversation can only include one image. Support local file paths or URLs for input)
```jsonl
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/glm4v-9b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/glm4v-9b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/glm4v-9b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
## Multi-Modal Documentation
### 📚 Tutorial
1. [MLLM Deployment Documentation](mutlimodal-deployment.md)
### Multi-Modal Best Practice
A single round of dialogue can contain multiple images (or no images):
1. [Qwen-VL Best Practice](qwen-vl-best-practice.md)
2. [Qwen-Audio Best Practice](qwen-audio-best-practice.md)
3. [Deepseek-VL Best Practice](deepseek-vl-best-practice.md)
4. [Internlm2-Xcomposers Best Practice](internlm-xcomposer2-best-practice.md)
5. [Phi3-Vision Best Practice](phi3-vision-best-practice.md)
A single round of dialogue can only contain one image:
1. [Llava Best Practice](llava-best-practice.md)
2. [Yi-VL Best Practice.md](yi-vl-best-practice.md)
The entire conversation revolves around one image.
1. [CogVLM Best Practice](cogvlm-best-practice.md), [CogVLM2 Best Practice](cogvlm2-best-practice.md), [GLM4V Best Practice](glm4v-best-practice.md)
2. [MiniCPM-V Best Practice](minicpm-v-best-practice.md)
3. [InternVL-Chat-V1.5 Best Practice](internvl-best-practice.md)
# Internlm-Xcomposer2 Best Practice
## Table of Contents
- [Environment Preparation](#environment-preparation)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Preparation
```shell
pip install 'ms-swift[llm]' -U
```
## Inference
Inference for [internlm-xcomposer2-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary):
```shell
# Experimental environment: A10, 3090, V100, ...
# 21GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type internlm-xcomposer2-7b-chat
```
Output: (supports passing local path or URL)
```python
"""
<<< Who are you?
I am your assistant, a language-based artificial intelligence model that can answer your questions.
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What's the difference between these two images?
These two images are different. The first one is a picture of sheep, and the second one is a picture of a cat.
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>How many sheep are there in the picture?
There are 4 sheep in the picture
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>What is the calculation result?
The calculation result is 1452+45304=46756
--------------------------------------------------
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>Write a poem based on the content in the picture
Ripples glisten on the lake's surface, a lone boat drifts.
On the boat, a light illuminates the night,
Speckles of stars reflected in the water.
In the distance, mountains shrouded in mist and clouds,
The starry night sky twinkling endlessly.
The lake is like a mirror, reflections clear,
The little boat passing through, like a poem, like a painting.
"""
```
Sample images are as follows:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single Sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.internlm_xcomposer2_7b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it to each city?"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it to each city?
response: The distance from Ma'anshan to Yangjiang is 62 kilometers, and the distance from Guangzhou to Guangzhou is 293 kilometers.
query: Which city is the farthest?
response: The farthest city is Guangzhou, with a distance of 293 kilometers from Guangzhou.
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it to each city?', ' The distance from Ma'anshan to Yangjiang is 62 kilometers, and the distance from Guangzhou to Guangzhou is 293 kilometers.'], ['Which city is the farthest?', ' The farthest city is Guangzhou, with a distance of 293 kilometers from Guangzhou.']]
"""
```
Sample image is as follows:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Fine-tuning of multimodal large models usually uses **custom datasets**. Here's a demo that can be run directly:
(By default, only the qkv part of the LLM is fine-tuned using Lora. `--lora_target_modules ALL` is not supported. Full-parameter fine-tuning is supported.)
```shell
# Experimental environment: A10, 3090, V100, ...
# 21GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type internlm-xcomposer2-7b-chat \
--dataset coco-en-mini \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json and jsonl formats. Here's an example of a custom dataset:
(Supports multi-turn conversations, each turn can contain multiple images or no images, supports passing local paths or URLs. This model does not support merge-lora)
```json
[
{"conversations": [
{"from": "user", "value": "<img>img_path</img>11111"},
{"from": "assistant", "value": "22222"}
]},
{"conversations": [
{"from": "user", "value": "<img>img_path</img><img>img_path2</img><img>img_path3</img>aaaaa"},
{"from": "assistant", "value": "bbbbb"},
{"from": "user", "value": "<img>img_path</img>ccccc"},
{"from": "assistant", "value": "ddddd"}
]},
{"conversations": [
{"from": "user", "value": "AAAAA"},
{"from": "assistant", "value": "BBBBB"},
{"from": "user", "value": "CCCCC"},
{"from": "assistant", "value": "DDDDD"}
]}
]
```
## Inference After Fine-tuning
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/internlm-xcomposer2-7b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
# InternVL Best Practice
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference after Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
pip install Pillow
```
## Inference
Inference for [internvl-chat-v1.5](https://www.modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5/summary)
(To use a local model file, add the argument `--model_id_or_path /path/to/model`)
Inference with [internvl-chat-v1.5](https://www.modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5/summary) and [internvl-chat-v1.5-int8](https://www.modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5-int8/summary).
The tutorial below takes `internvl-chat-v1.5` as an example, and you can change to `--model_type internvl-chat-v1_5-int8` to select the INT8 version of the model. Alternatively, select the Mini-Internvl model by choosing either `mini-internvl-chat-2b-v1_5` or `mini-internvl-chat-4b-v1_5`.
**Note**
- If you want to use a local model file, add the argument --model_id_or_path /path/to/model.
- If your GPU does not support flash attention, use the argument --use_flash_attn false. And for int8 models, it is necessary to specify `dtype --bf16` during inference, otherwise the output may be garbled.
- The model's configuration specifies a relatively small max_length of 2048, which can be modified by setting `--max_length`.
- Memory consumption can be reduced by using the parameter `--gradient_checkpointing true`.
- The InternVL series of models only support training on datasets that include images.
```shell
# Experimental environment: A100
# 55GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5 --dtype bf16 --max_length 4096
# 2*30GB GPU memory
CUDA_VISIBLE_DEVICES=0,1 swift infer --model_type internvl-chat-v1_5 --dtype bf16 --max_length 4096
```
Output: (supports passing in local path or URL)
```python
"""
<<< Describe this image.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
This is a high-resolution image of a kitten. The kitten has striking blue eyes and a fluffy white and grey coat. The fur pattern suggests that it may be a Maine Coon or a similar breed. The kitten's ears are perked up, and it has a curious and innocent expression. The background is blurred, which brings the focus to the kitten's face.
--------------------------------------------------
<<< How many sheep are in the picture?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the picture.
--------------------------------------------------
<<< What is the calculation result?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The calculation result is 59,856.
--------------------------------------------------
<<< Write a poem based on the content of the picture.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
Token indices sequence length is longer than the specified maximum sequence length for this model (5142 > 4096). Running this sequence through the model will result in indexing errors
In the still of the night,
A lone boat sails on the light.
The stars above, a twinkling sight,
Reflecting in the water's might.
The trees stand tall, a silent guard,
Their leaves rustling in the yard.
The boatman's lantern, a beacon bright,
Guiding him through the night.
The river flows, a gentle stream,
Carrying the boatman's dream.
His journey long, his heart serene,
In the beauty of the scene.
The stars above, a guiding light,
Leading him through the night.
The boatman's journey, a tale to tell,
Of courage, hope, and love as well.
"""
```
Example images are as follows:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single Sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.internvl_chat_v1_5
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
model_kwargs={'device_map': 'auto'})
# for GPUs that do not support flash attention
# model, tokenizer = get_model_tokenizer(model_type, torch.float16,
# model_kwargs={'device_map': 'auto'},
# use_flash_attn = False)
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, history = inference(model, template, query, images=images) # chat with image
print(f'query: {query}')
print(f'response: {response}')
# 流式
query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, history) # chat withoud image
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: How far is it from each city?
response: The distances from the location of the sign to each city are as follows:
- Mata: 14 kilometers
- Yangjiang: 62 kilometers
- Guangzhou: 293 kilometers
These distances are indicated on the road sign in the image.
query: Which city is the farthest?
response: The city that is farthest from the location of the sign is Guangzhou, which is 293 kilometers away.
history: [['How far is it from each city?', 'The distances from the location of the sign to each city are as follows:\n\n- Mata: 14 kilometers\n- Yangjiang: 62 kilometers\n- Guangzhou: 293 kilometers\n\nThese distances are indicated on the road sign in the image. '], ['Which city is the farthest?', 'The city that is farthest from the location of the sign is Guangzhou, which is 293 kilometers away. ']]
"""
```
Example image is as follows:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Multimodal large model fine-tuning usually uses **custom datasets** for fine-tuning. Here is a demo that can be run directly:
LoRA fine-tuning:
**note**
- If your GPU does not support flash attention, use the argument --use_flash_attn false.
- By default, only the qkv of the LLM part is fine-tuned using LoRA. If you want to fine-tune all linear layers including the vision model part, you can specify `--lora_target_modules ALL`.
```shell
# Experimental environment: A100
# 80GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type internvl-chat-v1_5 \
--dataset coco-en-2-mini \
--max_length 4096
# device_map
# Experimental environment: 2*A100...
# 2*43GB GPU memory
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type internvl-chat-v1_5 \
--dataset coco-en-2-mini \
--max_length 4096
# ddp + deepspeed-zero2
# Experimental environment: 2*A100...
# 2*80GB GPU memory
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type internvl-chat-v1_5 \
--dataset coco-en-2-mini \
--max_length 4096 \
--deepspeed default-zero2
```
Full parameter fine-tuning:
```shell
# Experimental environment: 4 * A100
# device map
# 4 * 72 GPU memory
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type internvl-chat-v1_5 \
--dataset coco-en-2-mini \
--sft_type full \
--max_length 4096
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json, jsonl formats. Here is an example of a custom dataset:
(Only single-turn dialogue is supported. Each turn of dialogue must contain one image. Local paths or URLs can be passed in.)
```jsonl
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "images": ["image_path"]}
```
## Inference after Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
--max_length 4096
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir "output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx" \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir "output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx-merged" \
--load_dataset_config true \
--max_length 4096
# device map
CUDA_VISIBLE_DEVICES=0,1 swift infer \
--ckpt_dir "output/internvl-chat-v1_5/vx-xxx/checkpoint-xxx-merged" \
--load_dataset_config true \
--max_length 4096
```
# Llava Best Practice
The document corresponds to the following models
| model | model_type |
|-------|------------|
| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1_6-mistral-7b-instruct |
| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1_6-yi-34b-instruct |
|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b|
|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b|
|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b|
The following practices take `llava-v1.6-mistral-7b` as an example. You can also switch to other models by specifying the `--model_type`.
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference after Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```
## Inference
```shell
# Experimental environment: A100
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-mistral-7b-instruct
# 70GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-yi-34b-instruct
# 4*20GB GPU memory
CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer --model_type llava1_6-yi-34b-instruct
```
Output: (supports passing in local path or URL)
```python
"""
<<< Describe this image.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
The image shows a close-up of a kitten with a soft, blurred background that suggests a natural, outdoor setting. The kitten has a mix of white and gray fur with darker stripes, typical of a tabby pattern. Its eyes are wide open, with a striking blue color that contrasts with the kitten's fur. The kitten's nose is small and pink, and its whiskers are long and white, adding to the kitten's cute and innocent appearance. The lighting in the image is soft and diffused, creating a gentle and warm atmosphere. The focus is sharp on the kitten's face, while the rest of the image is slightly out of focus, which draws attention to the kitten's features.
--------------------------------------------------
<<< How many sheep are in the picture?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the picture.
--------------------------------------------------
<<< What is the calculation result?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The calculation result is 14352 + 45304 = 145304.
--------------------------------------------------
<<< Write a poem based on the content of the picture.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
In the quiet of the night,
A solitary boat takes flight,
Across the water's gentle swell,
Underneath the stars that softly fell.
The boat, a vessel of the night,
Carries but one, a lone delight,
A solitary figure, lost in thought,
In the tranquil calm, they find a wraith.
The stars above, like diamonds bright,
Reflect upon the water's surface light,
Creating a path for the boat's journey,
Guiding through the night with a gentle purity.
The boat, a silent sentinel,
In the stillness, it gently swells,
A vessel of peace and calm,
In the quiet of the night, it carries on.
The figure on board, a soul at ease,
In the serene embrace of nature's peace,
They sail through the night,
Under the watchful eyes of the stars' light.
The boat, a symbol of solitude,
In the vast expanse of the universe's beauty,
A lone journey, a solitary quest,
In the quiet of the night, it finds its rest.
"""
```
Example images are as follows:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single Sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = 'llava1_6-mistral-7b-instruct'
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, _ = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, _ in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
"""
query: How far is it from each city?
response: The image shows a road sign indicating the distances to three cities: Mata, Yangjiang, and Guangzhou. The distances are given in kilometers.
- Mata is 14 kilometers away.
- Yangjiang is 62 kilometers away.
- Guangzhou is 293 kilometers away.
Please note that these distances are as the crow flies and do not take into account the actual driving distance due to road conditions, traffic, or other factors.
query: Which city is the farthest?
response: The farthest city listed on the sign is Mata, which is 14 kilometers away.
"""
```
Example image is as follows:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Multimodal large model fine-tuning usually uses **custom datasets** for fine-tuning. Here is a demo that can be run directly:
LoRA fine-tuning:
(By default, only the qkv of the LLM part is fine-tuned using LoRA. If you want to fine-tune all linear layers including the vision model part, you can specify `--lora_target_modules ALL`.)
```shell
# Experimental environment: A10, 3090, V100...
# 21GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type llava1_6-mistral-7b-instruct \
--dataset coco-en-2-mini \
# 2*45GB GPU memory
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type llava1_6-yi-34b-instruct \
--dataset coco-en-2-mini \
```
Full parameter fine-tuning:
```shell
# Experimental environment: 4 * A100
# 4 * 70 GPU memory
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type llava1_6-mistral-7b-instruct \
--dataset coco-en-2-mini \
--sft_type full \
--deepspeed default-zero2
# 8 * 50 GPU memory
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \
--model_type llava1_6-yi-34b-instruct \
--dataset coco-en-2-mini \
--sft_type full \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json, jsonl formats. Here is an example of a custom dataset:
(Only single-turn dialogue is supported. Each turn of dialogue must contain one image. Local paths or URLs can be passed in.)
```jsonl
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "images": ["image_path"]}
```
## Inference after Fine-tuning
Direct inference:
```shell
model_type="llava1_6-mistral-7b-instruct"
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \
--load_dataset_config true
```
**merge-lora** and inference:
```shell
model_type="llava1_6-mistral-7b-instruct"
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx-merged" \
--load_dataset_config true
```
# MiniCPM-V Best Practice
Using minicpm-v-3b-chat as an example, if you want to use the updated version of the MiniCPM-V multimodal model (v2), you can switch `--model_type minicpm-v-3b-chat` to `--model_type minicpm-v-v2-chat`.
## Table of Contents
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
pip install 'ms-swift[llm]' -U
```
Model Link:
- minicpm-v-3b-chat: [https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary](https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary)
- minicpm-v-v2-chat: [https://modelscope.cn/models/OpenBMB/MiniCPM-V-2/summary](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2/summary)
## Inference
Inference for [minicpm-v-3b-chat](https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary):
```shell
# Experimental environment: A10, 3090, V100, ...
# 10GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-3b-chat
```
Output: (supports local path or URL input)
```python
"""
<<< Describe this image
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
This image depicts a black and white cat sitting on the floor. The cat looks small, possibly a kitten. Its eyes are wide open, seeming to be observing the surroundings.
--------------------------------------------------
<<< clear
<<< How many sheep are in the image?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the image.
--------------------------------------------------
<<< clear
<<< What is the calculation result?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The calculation result is 1452 + 4530 = 5982.
--------------------------------------------------
<<< clear
<<< Write a poem based on the image content
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
On the tranquil lake surface, a small boat slowly sails by.
"""
```
Sample images:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single Sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.minicpm_v_3b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, history = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which is the farthest city?'
gen = inference_stream(model, template, query, history, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: How far is it from each city?
response: The distance from Guangzhou to Shenzhen is 293 kilometers, while the distance from Shenzhen to Guangzhou is 14 kilometers.
query: Which is the farthest city?
response: The farthest city is Shenzhen. It is located between Guangzhou and Shenzhen, 293 kilometers away from Guangzhou and 14 kilometers away from Shenzhen.
history: [['How far is it from each city?', ' The distance from Guangzhou to Shenzhen is 293 kilometers, while the distance from Shenzhen to Guangzhou is 14 kilometers.'], ['Which is the farthest city?', ' The farthest city is Shenzhen. It is located between Guangzhou and Shenzhen, 293 kilometers away from Guangzhou and 14 kilometers away from Shenzhen.']]
"""
```
Sample image:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Fine-tuning multimodal large models usually uses **custom datasets**. Here is a demo that can be run directly:
(By default, only the qkv part of LLM is fine-tuned using LoRA. If you want to fine-tune all linear parts including the vision model, you can specify `--lora_target_modules ALL`. Full parameter fine-tuning is also supported.)
```shell
# Experimental environment: A10, 3090, V100, ...
# 10GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type minicpm-v-3b-chat \
--dataset coco-en-2-mini \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) support json and jsonl formats. Here is an example of a custom dataset:
(Supports multi-turn conversations, but the total round of conversations can only contain one image. Supports local path or URL input.)
```jsonl
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/minicpm-v-3b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/minicpm-v-3b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/minicpm-v-3b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
# Mutlimoda LLM Deployment
## Table of Contents
- [Environment Setup](#environment-setup)
- [qwen-vl-chat](#qwen-vl-chat)
- [yi-vl-6b-chat](#yi-vl-6b-chat)
- [minicpm-v-v2_5-chat](#minicpm-v-v2_5-chat)
- [qwen-vl](#qwen-vl)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
pip install vllm
```
Here we provide examples of four models (selecting smaller-sized models to facilitate experiments): qwen-vl-chat, qwen-vl, yi-vl-6b-chat, and minicpm-v-v2_5-chat. From these examples, you can identify three different types of MLLMs: a single round of dialogue can contain multiple images (or no images), a single round of dialogue can only contain one image, and the way the entire dialogue revolves around an image and the differences in deployment and invocation methods, as well as the differences between the chat and base models within MLLMs.
If you're using qwen-audio-chat, simply replace the `<img>` tag with `<audio>` based on the qwen-vl-chat example.
## qwen-vl-chat
**Server**:
```bash
# Using the original model
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-vl-chat
# Using the fine-tuned LoRA
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx
# Using the fine-tuned Merge LoRA model
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged
```
**Client**:
Test:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-vl-chat",
"messages": [{"role": "user", "content": "Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>\nWhat kind of flower is in the picture and how many are there?"}],
"max_tokens": 256,
"temperature": 0
}'
```
Using swift:
```python
from swift.llm import get_model_list_client, XRequestConfig, inference_client
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('rose.jpg', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# query = f"""Picture 1:<img>{img_base64}</img>
# What kind of flower is in the picture and how many are there?"""
# use local_path
# query = """Picture 1:<img>rose.jpg</img>
# What kind of flower is in the picture and how many are there?"""
# use url
query = """Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
What kind of flower is in the picture and how many are there?"""
request_config = XRequestConfig(seed=42)
resp = inference_client(model_type, query, request_config=request_config)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
history = [(query, response)]
query = 'Box out the flowers in the picture.'
request_config = XRequestConfig(stream=True, seed=42)
stream_resp = inference_client(model_type, query, history, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""
model_type: qwen-vl-chat
query: Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
What kind of flower is in the picture and how many are there?
response: There are three roses in the picture.
query: Box out the flowers in the picture.
response: <ref> flowers</ref><box>(33,448),(360,979)</box>
"""
```
Using openai:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('rose.jpg', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# query = f"""Picture 1:<img>{img_base64}</img>
# What kind of flower is in the picture and how many are there?"""
# use local_path
# from swift.llm import convert_to_base64
# query = """Picture 1:<img>rose.jpg</img>
# What kind of flower is in the picture and how many are there?"""
# query = convert_to_base64(prompt=query)['prompt']
# use url
query = """Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
What kind of flower is in the picture and how many are there?"""
messages = [{
'role': 'user',
'content': query
}]
resp = client.chat.completions.create(
model=model_type,
messages=messages,
seed=42)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
# Streaming
messages.append({'role': 'assistant', 'content': response})
query = 'Box out the flowers in the picture.'
messages.append({'role': 'user', 'content': query})
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
seed=42)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""Out[0]
model_type: qwen-vl-chat
query: Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
What kind of flower is in the picture and how many are there?
response: There are three roses in the picture.
query: Box out the flowers in the picture.
response: <ref> flowers</ref><box>(33,448),(360,979)</box>
"""
```
## yi-vl-6b-chat
**Server side:**
```bash
# Using the original model
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type yi-vl-6b-chat
# Using the fine-tuned LoRA
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx
# Using the fine-tuned Merge LoRA model
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged
```
**Client side:**
Test:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "yi-vl-6b-chat",
"messages": [{"role": "user", "content": "Describe this image."}],
"seed": 42,
"images": ["http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png"]
}'
```
Using swift:
```python
from swift.llm import get_model_list_client, XRequestConfig, inference_client
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('cat.png', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# images = [img_base64]
# use local_path
# from swift.llm import convert_to_base64
# images = ['cat.png']
# images = convert_to_base64(images=images)['images']
# use url
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
query = 'Describe this image.'
request_config = XRequestConfig(seed=42)
resp = inference_client(model_type, query, images=images, request_config=request_config)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
history = [(query, response)]
query = 'How many sheep are in the picture?'
images.append('http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png')
request_config = XRequestConfig(stream=True, seed=42)
stream_resp = inference_client(model_type, query, history, images=images, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""
model_type: yi-vl-6b-chat
query: Describe this image.
response: The image captures a moment of tranquility featuring a gray and white kitten. The kitten, with its eyes wide open, is the main subject of the image. Its nose is pink, adding a touch of color to its gray and white fur. The kitten is sitting on a white surface, which contrasts with its gray and white fur. The background is blurred, drawing focus to the kitten. The image does not contain any text. The kitten's position relative to the background suggests it is in the foreground of the image. The image does not contain any other objects or creatures. The kitten appears to be alone in the image. The image does not contain any action, but the kitten's wide-open eyes give a sense of curiosity and alertness. The image does not contain any aesthetic descriptions. The image is a simple yet captivating portrait of a gray and white kitten.
query: How many sheep are in the picture?
response: There are four sheep in the picture.
"""
```
Using openai:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('cat.png', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# images = [img_base64]
# use local_path
# from swift.llm import convert_to_base64
# images = ['cat.png']
# images = convert_to_base64(images=images)['images']
# use url
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
query = 'Describe this image.'
messages = [{
'role': 'user',
'content': query
}]
resp = client.chat.completions.create(
model=model_type,
messages=messages,
seed=42,
extra_body={'images': images})
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
# Streaming
messages.append({'role': 'assistant', 'content': response})
query = 'How many sheep are in the picture?'
images.append('http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png')
messages.append({'role': 'user', 'content': query})
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
seed=42,
extra_body={'images': images})
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""
model_type: yi-vl-6b-chat
query: Describe this image.
response: The image captures a moment of tranquility featuring a gray and white kitten. The kitten, with its eyes wide open, is the main subject of the image. Its nose is pink, adding a touch of color to its gray and white fur. The kitten is sitting on a white surface, which contrasts with its gray and white fur. The background is blurred, drawing focus to the kitten. The image does not contain any text. The kitten's position relative to the background suggests it is in the foreground of the image. The image does not contain any other objects or creatures. The kitten appears to be alone in the image. The image does not contain any action, but the kitten's wide-open eyes give a sense of curiosity and alertness. The image does not contain any aesthetic descriptions. The image is a simple yet captivating portrait of a gray and white kitten.
query: How many sheep are in the picture?
response: There are four sheep in the picture.
"""
```
## minicpm-v-v2_5-chat
**Server side:**
```bash
# Using the original model
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_5-chat
# Using the fine-tuned LoRA
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx
# Using the fine-tuned Merge LoRA model
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx-merged
```
**Client side:**
Test:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "minicpm-v-v2_5-chat",
"messages": [{"role": "user", "content": "Describe this image."}],
"temperature": 0,
"images": ["http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png"]
}'
```
Using swift:
```python
from swift.llm import get_model_list_client, XRequestConfig, inference_client
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('cat.png', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# images = [img_base64]
# use local_path
# from swift.llm import convert_to_base64
# images = ['cat.png']
# images = convert_to_base64(images=images)['images']
# use url
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
query = 'Describe this image.'
request_config = XRequestConfig(temperature=0)
resp = inference_client(model_type, query, images=images, request_config=request_config)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
history = [(query, response)]
query = 'How was this picture generated?'
request_config = XRequestConfig(stream=True, temperature=0)
stream_resp = inference_client(model_type, query, history, images=images, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""
model_type: minicpm-v-v2_5-chat
query: Describe this image.
response: The image is a digital painting of a kitten, which is the main subject. The kitten's fur is rendered with a mix of gray, black, and white, giving it a realistic appearance. Its eyes are wide open, and the expression is one of curiosity or alertness. The background is blurred, which brings the focus entirely on the kitten. The painting style is detailed and lifelike, capturing the essence of a young feline's innocent and playful nature. The image does not convey any specific context or background story beyond the depiction of the kitten itself.
query: How was this picture generated?
response: This picture was generated using digital art techniques. The artist likely used a software program to create the image, manipulating pixels and colors to achieve the detailed and lifelike representation of the kitten. Digital art allows for a high degree of control over the final product, enabling artists to fine-tune details and create realistic textures and shading.
"""
```
Using openai:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('cat.png', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# images = [img_base64]
# use local_path
# from swift.llm import convert_to_base64
# images = ['cat.png']
# images = convert_to_base64(images=images)['images']
# use url
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
query = 'Describe this image.'
messages = [{
'role': 'user',
'content': query
}]
resp = client.chat.completions.create(
model=model_type,
messages=messages,
temperature=0,
extra_body={'images': images})
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
# Streaming
messages.append({'role': 'assistant', 'content': response})
query = 'How was this picture generated?'
messages.append({'role': 'user', 'content': query})
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
temperature=0,
extra_body={'images': images})
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""
model_type: minicpm-v-v2_5-chat
query: Describe this image.
response: The image is a digital painting of a kitten, which is the main subject. The kitten's fur is rendered with a mix of gray, black, and white, giving it a realistic appearance. Its eyes are wide open, and the expression is one of curiosity or alertness. The background is blurred, which brings the focus entirely on the kitten. The painting style is detailed and lifelike, capturing the essence of a young feline's innocent and playful nature. The image does not convey any specific context or background story beyond the depiction of the kitten itself.
query: How was this picture generated?
response: This picture was generated using digital art techniques. The artist likely used a software program to create the image, manipulating pixels and colors to achieve the detailed and lifelike representation of the kitten. Digital art allows for a high degree of control over the final product, enabling artists to fine-tune details and create realistic textures and shading.
"""
```
## qwen-vl
**Server side:**
```bash
# Using the original model
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-vl
# Using the fine-tuned LoRA
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/qwen-vl/vx-xxx/checkpoint-xxx
# Using the fine-tuned Merge LoRA model
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/qwen-vl/vx-xxx/checkpoint-xxx-merged
```
**Client side:**
Test:
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-vl",
"prompt": "Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>\nThis is a",
"max_tokens": 32,
"temperature": 0
}'
```
Using swift:
```python
from swift.llm import get_model_list_client, XRequestConfig, inference_client
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('rose.jpg', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# query = f"""Picture 1:<img>{img_base64}</img>
# This is a"""
# use local_path
# query = """Picture 1:<img>rose.jpg</img>
# This is a"""
# use url
query = """Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
This is a"""
request_config = XRequestConfig(seed=42, max_tokens=32)
resp = inference_client(model_type, query, request_config=request_config)
response = resp.choices[0].text
print(f'query: {query}')
print(f'response: {response}')
request_config = XRequestConfig(stream=True, seed=42, max_tokens=32)
stream_resp = inference_client(model_type, query, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].text, end='', flush=True)
print()
"""
model_type: qwen-vl
query: Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
This is a
response: picture of a bouquet of roses.
query: Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
This is a
response: picture of a bouquet of roses.
"""
```
Using openai:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')
# use base64
# import base64
# with open('rose.jpg', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# query = f"""Picture 1:<img>{img_base64}</img>
# This is a"""
# use local_path
# from swift.llm import convert_to_base64
# query = """Picture 1:<img>rose.jpg</img>
# This is a"""
# query = convert_to_base64(prompt=query)['prompt']
# use url
query = """Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
This is a"""
resp = client.completions.create(
model=model_type,
prompt=query,
seed=42)
response = resp.choices[0].text
print(f'query: {query}')
print(f'response: {response}')
# Streaming
stream_resp = client.completions.create(
model=model_type,
prompt=query,
stream=True,
seed=42)
print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].text, end='', flush=True)
print()
"""Out[0]
model_type: qwen-vl
query: Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
This is a
response: picture of a bouquet of roses.
query: Picture 1:<img>https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg</img>
This is a
response: picture of a bouquet of roses.
"""
```
# Phi3-Vision Best Practice
## 目录
- [Environment Setup](#environment-setup)
- [Inference](#inference)
- [Fine-tuning](#fine-tuning)
- [Inference After Fine-tuning](#inference-after-fine-tuning)
## Environment Setup
```shell
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```
Model Link:
- phi3-vision-128k-instruct: [https://modelscope.cn/models/LLM-Research/Phi-3-vision-128k-instruct/summary](https://modelscope.cn/models/LLM-Research/Phi-3-vision-128k-instruct/summary)
## Inference
Inference with phi3-vision-128k-instruct:
```shell
# Experimental environment: A10, 3090, V100, ...
# 16GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type phi3-vision-128k-instruct
```
Output: (supports passing local path or URL)
```python
"""
<<< Who are you?
I am Phi, an AI developed by Microsoft to assist with providing information, answering questions, and helping users find solutions to their queries. How can I assist you today?
--------------------------------------------------
<<< clear
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?
The first picture shows a group of four cartoon sheep standing in a field, while the second picture is a close-up of a kitten with a blurred background. The main difference between these two pictures is the subject matter and the setting. The first picture features animals that are typically associated with farm life and agriculture, while the second picture focuses on a domestic animal, a kitten, which is more commonly found in households. Additionally, the first picture has a more peaceful and serene atmosphere, while the second picture has a more intimate and detailed view of the kitten.
--------------------------------------------------
<<< clear
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img>How many sheep are there in the picture?
There are four sheep in the picture.
--------------------------------------------------
<<< clear
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png</img>What is the result of the calculation?
The result of the calculation 1452 + 45304 is 46756.
--------------------------------------------------
<<< clear
<<< <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>Write a poem based on the content of the picture.
In the tranquil night, a boat sails,
Through the darkened river, it sets sail.
A single candle flickers, casting light,
Guiding the way through the endless night.
The stars above, like diamonds bright,
Gleam down upon the boat's gentle flight.
The moon, a silent guardian in the sky,
Watches over the boat as it sails by.
The river, a mirror to the night,
Reflects the boat's journey, a beautiful sight.
The trees on either side, standing tall,
Whisper secrets to the boat, one and all.
In the stillness of the night, a sense of peace,
The boat, the river, the trees, all in their place.
A moment frozen in time, a scene so serene,
A journey through the night, a dream so unseen.
"""
```
Sample images is as follows:
cat:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
animal:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
math:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
**Single-sample Inference**
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.phi3_vision_128k_instruct
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it from each city?"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')
# Streaming
query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: Which city is the farthest?
response: Guangzhou is the farthest city, located 293km away.
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>How far is it from each city?', 'The distances are as follows: Mata is 14km away, Yangjiang is 62km away, and Guangzhou is 293km away.'], ['Which city is the farthest?', 'Guangzhou is the farthest city, located 293km away.']]
"""
```
Sample image is as follows:
road:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
## Fine-tuning
Multimodal large model fine-tuning usually uses **custom datasets**. Here is a demo that can be run directly:
(Default fine-tune only the LLM part of qkv with lora. If you want to fine-tune all linear modules containing vision model parts, you can specify `--lora_target_modules ALL`. Support to fine-tune all parameters.)
```shell
# Experimental environment: A10, 3090, V100, ...
# 16GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type phi3-vision-128k-instruct \
--dataset coco-en-mini \
```
[Custom datasets](../LLM/Customization.md#-Recommended-Command-line-arguments) supports json, jsonl styles. The following is an example of a custom dataset:
(Supports multi-turn dialogue, support for multi-image or non-image per turn, supports input of local path or URL.)
```json
[
{"conversations": [
{"from": "user", "value": "<img>img_path</img>11111"},
{"from": "assistant", "value": "22222"}
]},
{"conversations": [
{"from": "user", "value": "<img>img_path</img><img>img_path2</img><img>img_path3</img>aaaaa"},
{"from": "assistant", "value": "bbbbb"},
{"from": "user", "value": "<img>img_path</img>ccccc"},
{"from": "assistant", "value": "ddddd"}
]},
{"conversations": [
{"from": "user", "value": "AAAAA"},
{"from": "assistant", "value": "BBBBB"},
{"from": "user", "value": "CCCCC"},
{"from": "assistant", "value": "DDDDD"}
]}
]
```
## Inference After Fine-tuning
Direct inference:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/phi3-vision-128k-instruct/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```
**merge-lora** and infer:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/phi3-vision-128k-instruct/vx-xxx/checkpoint-xxx \
--merge_lora true --safe_serialization false
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/phi3-vision-128k-instruct/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment