Commit c3a3c678 authored by chenych's avatar chenych
Browse files

Update to v0.9.3

parent 1bc2def5
...@@ -32,6 +32,9 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭 ...@@ -32,6 +32,9 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭
| [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 | | [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 |
| [Llama 3-3.3](https://huggingface.co/meta-llama) | 1B/3B/8B/70B | llama3 | | [Llama 3-3.3](https://huggingface.co/meta-llama) | 1B/3B/8B/70B | llama3 |
| [Llama 4](https://huggingface.co/meta-llama) | 109B/402B | llama4 | | [Llama 4](https://huggingface.co/meta-llama) | 109B/402B | llama4 |
| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral |
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
| [Mistral Small](https://huggingface.co/mistralai) | 24B | mistral_small |
| [OLMo](https://hf-mirror.com/allenai) | 1B/7B | olmo | | [OLMo](https://hf-mirror.com/allenai) | 1B/7B | olmo |
| [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen | | [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
| [Qwen3 (MoE)](https://huggingface.co/Qwen) | 0.6B/1.7B/4B/8B/14B/32B/235B | qwen3 | | [Qwen3 (MoE)](https://huggingface.co/Qwen) | 0.6B/1.7B/4B/8B/14B/32B/235B | qwen3 |
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
[![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors) [![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors)
[![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml) [![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml)
[![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/) [![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/)
[![Citation](https://img.shields.io/badge/citation-544-green)](https://scholar.google.com/scholar?cites=12620864006390196564) [![Citation](https://img.shields.io/badge/citation-614-green)](https://scholar.google.com/scholar?cites=12620864006390196564)
[![Docker Pulls](https://img.shields.io/docker/pulls/hiyouga/llamafactory)](https://hub.docker.com/r/hiyouga/llamafactory/tags) [![Docker Pulls](https://img.shields.io/docker/pulls/hiyouga/llamafactory)](https://hub.docker.com/r/hiyouga/llamafactory/tags)
[![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai) [![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai)
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing)
[![Open in DSW](https://gallery.pai-ml.com/assets/open-in-dsw.svg)](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) [![Open in DSW](https://gallery.pai-ml.com/assets/open-in-dsw.svg)](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory)
[![Open in Alaya](assets/alaya_new.svg)](https://docs.alayanew.com/docs/documents/newActivities/llamafactory/?utm_source=LLaMA-Factory)
[![Open in Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/hiyouga/LLaMA-Board) [![Open in Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/hiyouga/LLaMA-Board)
[![Open in Studios](https://img.shields.io/badge/ModelScope-Open%20in%20Studios-blue)](https://modelscope.cn/studios/hiyouga/LLaMA-Board) [![Open in Studios](https://img.shields.io/badge/ModelScope-Open%20in%20Studios-blue)](https://modelscope.cn/studios/hiyouga/LLaMA-Board)
[![Open in Novita](https://img.shields.io/badge/Novita-Deploy%20Template-blue)](https://novita.ai/templates-library/105981?sharer=88115474-394e-4bda-968e-b88e123d0c47) [![Open in Novita](https://img.shields.io/badge/Novita-Deploy%20Template-blue)](https://novita.ai/templates-library/105981?sharer=88115474-394e-4bda-968e-b88e123d0c47)
...@@ -40,7 +41,7 @@ ...@@ -40,7 +41,7 @@
</div> </div>
👋 Join our [WeChat](assets/wechat.jpg) or [NPU user group](assets/wechat_npu.jpg). 👋 Join our [WeChat group](assets/wechat.jpg), [NPU user group](assets/wechat_npu.jpg) or [Alaya NeW user group](assets/wechat_alaya.png).
\[ English | [中文](README_zh.md) \] \[ English | [中文](README_zh.md) \]
...@@ -54,6 +55,7 @@ Choose your path: ...@@ -54,6 +55,7 @@ Choose your path:
- **Colab (free)**: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing - **Colab (free)**: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing
- **Local machine**: Please refer to [usage](#getting-started) - **Local machine**: Please refer to [usage](#getting-started)
- **PAI-DSW (free trial)**: https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory - **PAI-DSW (free trial)**: https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory
- **Alaya NeW (cloud GPU deal)**: https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory
> [!NOTE] > [!NOTE]
> Except for the above links, all other websites are unauthorized third-party websites. Please carefully use them. > Except for the above links, all other websites are unauthorized third-party websites. Please carefully use them.
...@@ -103,12 +105,14 @@ Choose your path: ...@@ -103,12 +105,14 @@ Choose your path:
## Blogs ## Blogs
- [A One-Stop Code-Free Model Reinforcement Learning and Deployment Platform based on LLaMA-Factory and EasyR1](https://aws.amazon.com/cn/blogs/china/building-llm-model-hub-based-on-llamafactory-and-easyr1/) (Chinese)
- [Fine-tune Qwen2.5-VL for Autonomous Driving using LLaMA-Factory](https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory) (Chinese)
- [How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod](https://aws.amazon.com/cn/blogs/machine-learning/how-apoidea-group-enhances-visual-information-extraction-from-banking-documents-with-multimodal-models-using-llama-factory-on-amazon-sagemaker-hyperpod/) (English) - [How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod](https://aws.amazon.com/cn/blogs/machine-learning/how-apoidea-group-enhances-visual-information-extraction-from-banking-documents-with-multimodal-models-using-llama-factory-on-amazon-sagemaker-hyperpod/) (English)
- [Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge](https://buaa-act.feishu.cn/wiki/GVzlwYcRFiR8OLkHbL6cQpYin7g) (English) - [Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge](https://buaa-act.feishu.cn/wiki/GVzlwYcRFiR8OLkHbL6cQpYin7g) (English)
- [LLaMA Factory: Fine-tuning the DeepSeek-R1-Distill-Qwen-7B Model for News Classifier](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b) (Chinese)
<details><summary>All Blogs</summary> <details><summary>All Blogs</summary>
- [LLaMA Factory: Fine-tuning the DeepSeek-R1-Distill-Qwen-7B Model for News Classifier](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b) (Chinese)
- [A One-Stop Code-Free Model Fine-Tuning \& Deployment Platform based on SageMaker and LLaMA-Factory](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/) (Chinese) - [A One-Stop Code-Free Model Fine-Tuning \& Deployment Platform based on SageMaker and LLaMA-Factory](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/) (Chinese)
- [LLaMA Factory Multi-Modal Fine-Tuning Practice: Fine-Tuning Qwen2-VL for Personal Tourist Guide](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) (Chinese) - [LLaMA Factory Multi-Modal Fine-Tuning Practice: Fine-Tuning Qwen2-VL for Personal Tourist Guide](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) (Chinese)
- [LLaMA Factory: Fine-tuning the LLaMA3 Model for Role-Playing](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) (Chinese) - [LLaMA Factory: Fine-tuning the LLaMA3 Model for Role-Playing](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) (Chinese)
...@@ -277,7 +281,7 @@ Choose your path: ...@@ -277,7 +281,7 @@ Choose your path:
| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next | | [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video | | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B | mimo | | [MiMo](https://huggingface.co/XiaomiMiMo) | 7B | mimo |
| [MiniCPM](https://huggingface.co/openbmb) | 1B/2B/4B | cpm/cpm3 | | [MiniCPM](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 |
| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v | | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral | | [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral |
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral | | [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
...@@ -414,7 +418,7 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t ...@@ -414,7 +418,7 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t
- [DPO mixed (en&zh)](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) - [DPO mixed (en&zh)](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k)
- [UltraFeedback (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) - [UltraFeedback (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
- [COIG-P (en&zh)](https://huggingface.co/datasets/m-a-p/COIG-P) - [COIG-P (zh)](https://huggingface.co/datasets/m-a-p/COIG-P)
- [RLHF-V (en)](https://huggingface.co/datasets/openbmb/RLHF-V-Dataset) - [RLHF-V (en)](https://huggingface.co/datasets/openbmb/RLHF-V-Dataset)
- [VLFeedback (en)](https://huggingface.co/datasets/Zhihui/VLFeedback) - [VLFeedback (en)](https://huggingface.co/datasets/Zhihui/VLFeedback)
- [RLAIF-V (en)](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset) - [RLAIF-V (en)](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset)
...@@ -490,6 +494,8 @@ Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel ...@@ -490,6 +494,8 @@ Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel
docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest
``` ```
This image is built on Ubuntu 22.04 (x86\_64), CUDA 12.4, Python 3.11, PyTorch 2.6.0, and Flash-attn 2.7.4.
Find the pre-built images: https://hub.docker.com/r/hiyouga/llamafactory/tags Find the pre-built images: https://hub.docker.com/r/hiyouga/llamafactory/tags
Please refer to [build docker](#build-docker) to build the image yourself. Please refer to [build docker](#build-docker) to build the image yourself.
...@@ -677,11 +683,6 @@ docker build -f ./docker/docker-cuda/Dockerfile \ ...@@ -677,11 +683,6 @@ docker build -f ./docker/docker-cuda/Dockerfile \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host --gpus=all \ docker run -dit --ipc=host --gpus=all \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./shared_data:/app/shared_data \
-v ./output:/app/output \
-p 7860:7860 \ -p 7860:7860 \
-p 8000:8000 \ -p 8000:8000 \
--name llamafactory \ --name llamafactory \
...@@ -699,11 +700,6 @@ docker build -f ./docker/docker-npu/Dockerfile \ ...@@ -699,11 +700,6 @@ docker build -f ./docker/docker-npu/Dockerfile \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host \ docker run -dit --ipc=host \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./shared_data:/app/shared_data \
-v ./output:/app/output \
-v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
...@@ -729,11 +725,6 @@ docker build -f ./docker/docker-rocm/Dockerfile \ ...@@ -729,11 +725,6 @@ docker build -f ./docker/docker-rocm/Dockerfile \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host \ docker run -dit --ipc=host \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./shared_data:/app/shared_data \
-v ./output:/app/output \
-p 7860:7860 \ -p 7860:7860 \
-p 8000:8000 \ -p 8000:8000 \
--device /dev/kfd \ --device /dev/kfd \
...@@ -746,12 +737,14 @@ docker exec -it llamafactory bash ...@@ -746,12 +737,14 @@ docker exec -it llamafactory bash
</details> </details>
<details><summary>Details about volume</summary> <details><summary>Use Docker volumes</summary>
- `hf_cache`: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory. You can uncomment `VOLUME [ "/root/.cache/huggingface", "/app/shared_data", "/app/output" ]` in the Dockerfile to use data volumes.
- `ms_cache`: Similar to Hugging Face cache but for ModelScope users.
- `om_cache`: Similar to Hugging Face cache but for Modelers users. When building the Docker image, use `-v ./hf_cache:/root/.cache/huggingface` argument to mount the local directory to the container. The following data volumes are available.
- `shared_data`: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
- `hf_cache`: Utilize Hugging Face cache on the host machine.
- `shared_data`: The directionary to store datasets on the host machine.
- `output`: Set export dir to this location so that the merged result can be accessed directly on the host machine. - `output`: Set export dir to this location so that the merged result can be accessed directly on the host machine.
</details> </details>
...@@ -901,6 +894,7 @@ If you have a project that should be incorporated, please contact via email or c ...@@ -901,6 +894,7 @@ If you have a project that should be incorporated, please contact via email or c
1. Xia et al. Using Pre-trained Language Model for Accurate ESG Prediction. FinNLP 2024. [[paper]](https://aclanthology.org/2024.finnlp-2.1/) 1. Xia et al. Using Pre-trained Language Model for Accurate ESG Prediction. FinNLP 2024. [[paper]](https://aclanthology.org/2024.finnlp-2.1/)
1. Liang et al. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. 2024. [[arxiv]](https://arxiv.org/abs/2408.08072) 1. Liang et al. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. 2024. [[arxiv]](https://arxiv.org/abs/2408.08072)
1. Bai et al. Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation. CIKM 2024. [[paper]](https://dl.acm.org/doi/10.1145/3627673.3679611) 1. Bai et al. Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation. CIKM 2024. [[paper]](https://dl.acm.org/doi/10.1145/3627673.3679611)
1. Zhang et al. CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling. ACL 2024. [[paper]](https://aclanthology.org/2024.findings-acl.830.pdf)
1. **[StarWhisper](https://github.com/Yu-Yang-Li/StarWhisper)**: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B. 1. **[StarWhisper](https://github.com/Yu-Yang-Li/StarWhisper)**: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.
1. **[DISC-LawLLM](https://github.com/FudanDISC/DISC-LawLLM)**: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge. 1. **[DISC-LawLLM](https://github.com/FudanDISC/DISC-LawLLM)**: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.
1. **[Sunsimiao](https://github.com/X-D-Lab/Sunsimiao)**: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B. 1. **[Sunsimiao](https://github.com/X-D-Lab/Sunsimiao)**: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.
...@@ -915,7 +909,7 @@ If you have a project that should be incorporated, please contact via email or c ...@@ -915,7 +909,7 @@ If you have a project that should be incorporated, please contact via email or c
1. **[360-LLaMA-Factory](https://github.com/Qihoo360/360-LLaMA-Factory)**: A modified library that supports long sequence SFT & DPO using ring attention. 1. **[360-LLaMA-Factory](https://github.com/Qihoo360/360-LLaMA-Factory)**: A modified library that supports long sequence SFT & DPO using ring attention.
1. **[Sky-T1](https://novasky-ai.github.io/posts/sky-t1/)**: An o1-like model fine-tuned by NovaSky AI with very small cost. 1. **[Sky-T1](https://novasky-ai.github.io/posts/sky-t1/)**: An o1-like model fine-tuned by NovaSky AI with very small cost.
1. **[WeClone](https://github.com/xming521/WeClone)**: One-stop solution for creating your digital avatar from chat logs. 1. **[WeClone](https://github.com/xming521/WeClone)**: One-stop solution for creating your digital avatar from chat logs.
1. **[EmoLLM](https://github.com/SmartFlowAI/EmoLLM)**: A project about large language models (LLMs) and mental health.
</details> </details>
## License ## License
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
[![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors) [![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors)
[![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml) [![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml)
[![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/) [![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/)
[![Citation](https://img.shields.io/badge/citation-544-green)](https://scholar.google.com/scholar?cites=12620864006390196564) [![Citation](https://img.shields.io/badge/citation-614-green)](https://scholar.google.com/scholar?cites=12620864006390196564)
[![Docker Pulls](https://img.shields.io/docker/pulls/hiyouga/llamafactory)](https://hub.docker.com/r/hiyouga/llamafactory/tags) [![Docker Pulls](https://img.shields.io/docker/pulls/hiyouga/llamafactory)](https://hub.docker.com/r/hiyouga/llamafactory/tags)
[![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai) [![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai)
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing)
[![Open in DSW](https://gallery.pai-ml.com/assets/open-in-dsw.svg)](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) [![Open in DSW](https://gallery.pai-ml.com/assets/open-in-dsw.svg)](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory)
[![Open in Alaya](assets/alaya_new.svg)](https://docs.alayanew.com/docs/documents/newActivities/llamafactory/?utm_source=LLaMA-Factory)
[![Open in Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/hiyouga/LLaMA-Board) [![Open in Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/hiyouga/LLaMA-Board)
[![Open in Studios](https://img.shields.io/badge/ModelScope-Open%20in%20Studios-blue)](https://modelscope.cn/studios/hiyouga/LLaMA-Board) [![Open in Studios](https://img.shields.io/badge/ModelScope-Open%20in%20Studios-blue)](https://modelscope.cn/studios/hiyouga/LLaMA-Board)
[![Open in Novita](https://img.shields.io/badge/Novita-Deploy%20Template-blue)](https://novita.ai/templates-library/105981?sharer=88115474-394e-4bda-968e-b88e123d0c47) [![Open in Novita](https://img.shields.io/badge/Novita-Deploy%20Template-blue)](https://novita.ai/templates-library/105981?sharer=88115474-394e-4bda-968e-b88e123d0c47)
...@@ -40,7 +41,7 @@ ...@@ -40,7 +41,7 @@
</div> </div>
👋 加入我们的[微信群](assets/wechat.jpg)[NPU 用户群](assets/wechat_npu.jpg) 👋 加入我们的[微信群](assets/wechat.jpg)[NPU 用户群](assets/wechat_npu.jpg)[九章智算云算力优惠群](assets/wechat_alaya.png)
\[ [English](README.md) | 中文 \] \[ [English](README.md) | 中文 \]
...@@ -56,6 +57,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc ...@@ -56,6 +57,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
- **Colab(免费)**:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing - **Colab(免费)**:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing
- **本地机器**:请见[如何使用](#如何使用) - **本地机器**:请见[如何使用](#如何使用)
- **PAI-DSW(免费试用)**:https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory - **PAI-DSW(免费试用)**:https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory
- **九章智算云(算力优惠活动)**:https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory
> [!NOTE] > [!NOTE]
> 除上述链接以外的其他网站均为未经许可的第三方网站,请小心甄别。 > 除上述链接以外的其他网站均为未经许可的第三方网站,请小心甄别。
...@@ -105,12 +107,14 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc ...@@ -105,12 +107,14 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
## 官方博客 ## 官方博客
- [基于 LLaMA-Factory 和 EasyR1 打造一站式无代码大模型强化学习和部署平台 LLM Model Hub](https://aws.amazon.com/cn/blogs/china/building-llm-model-hub-based-on-llamafactory-and-easyr1/)(中文)
- [使用 LLaMA-Factory 微调 Qwen2.5-VL 实现自动驾驶场景微调](https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory)(中文)
- [通过亚马逊 SageMaker HyperPod 上的 LLaMA-Factory 增强多模态模型银行文档的视觉信息提取](https://aws.amazon.com/cn/blogs/machine-learning/how-apoidea-group-enhances-visual-information-extraction-from-banking-documents-with-multimodal-models-using-llama-factory-on-amazon-sagemaker-hyperpod/)(英文) - [通过亚马逊 SageMaker HyperPod 上的 LLaMA-Factory 增强多模态模型银行文档的视觉信息提取](https://aws.amazon.com/cn/blogs/machine-learning/how-apoidea-group-enhances-visual-information-extraction-from-banking-documents-with-multimodal-models-using-llama-factory-on-amazon-sagemaker-hyperpod/)(英文)
- [Easy Dataset × LLaMA Factory: 让大模型高效学习领域知识](https://buaa-act.feishu.cn/wiki/KY9xwTGs1iqHrRkjXBwcZP9WnL9)(中文) - [Easy Dataset × LLaMA Factory: 让大模型高效学习领域知识](https://buaa-act.feishu.cn/wiki/KY9xwTGs1iqHrRkjXBwcZP9WnL9)(中文)
- [LLaMA Factory:微调 DeepSeek-R1-Distill-Qwen-7B 模型实现新闻标题分类器](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b)(中文)
<details><summary>全部博客</summary> <details><summary>全部博客</summary>
- [LLaMA Factory:微调 DeepSeek-R1-Distill-Qwen-7B 模型实现新闻标题分类器](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b)(中文)
- [基于 Amazon SageMaker 和 LLaMA-Factory 打造一站式无代码模型微调部署平台 Model Hub](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)(中文) - [基于 Amazon SageMaker 和 LLaMA-Factory 打造一站式无代码模型微调部署平台 Model Hub](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)(中文)
- [LLaMA Factory 多模态微调实践:微调 Qwen2-VL 构建文旅大模型](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl)(中文) - [LLaMA Factory 多模态微调实践:微调 Qwen2-VL 构建文旅大模型](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl)(中文)
- [LLaMA Factory:微调LLaMA3模型实现角色扮演](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory)(中文) - [LLaMA Factory:微调LLaMA3模型实现角色扮演](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory)(中文)
...@@ -279,7 +283,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc ...@@ -279,7 +283,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next | | [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video | | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B | mimo | | [MiMo](https://huggingface.co/XiaomiMiMo) | 7B | mimo |
| [MiniCPM](https://huggingface.co/openbmb) | 1B/2B/4B | cpm/cpm3 | | [MiniCPM](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 |
| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v | | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral | | [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral |
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral | | [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
...@@ -416,7 +420,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc ...@@ -416,7 +420,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
- [DPO mixed (en&zh)](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k) - [DPO mixed (en&zh)](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k)
- [UltraFeedback (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) - [UltraFeedback (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
- [COIG-P (en&zh)](https://huggingface.co/datasets/m-a-p/COIG-P) - [COIG-P (zh)](https://huggingface.co/datasets/m-a-p/COIG-P)
- [RLHF-V (en)](https://huggingface.co/datasets/openbmb/RLHF-V-Dataset) - [RLHF-V (en)](https://huggingface.co/datasets/openbmb/RLHF-V-Dataset)
- [VLFeedback (en)](https://huggingface.co/datasets/Zhihui/VLFeedback) - [VLFeedback (en)](https://huggingface.co/datasets/Zhihui/VLFeedback)
- [RLAIF-V (en)](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset) - [RLAIF-V (en)](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset)
...@@ -492,6 +496,8 @@ pip install -e ".[torch,metrics]" --no-build-isolation ...@@ -492,6 +496,8 @@ pip install -e ".[torch,metrics]" --no-build-isolation
docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest
``` ```
该镜像基于 Ubuntu 22.04(x86\_64)、CUDA 12.4、Python 3.11、PyTorch 2.6.0 和 Flash-attn 2.7.4 构建。
查看全部镜像:https://hub.docker.com/r/hiyouga/llamafactory/tags 查看全部镜像:https://hub.docker.com/r/hiyouga/llamafactory/tags
请参阅[构建 Docker](#构建-docker) 来重新构建镜像。 请参阅[构建 Docker](#构建-docker) 来重新构建镜像。
...@@ -679,11 +685,6 @@ docker build -f ./docker/docker-cuda/Dockerfile \ ...@@ -679,11 +685,6 @@ docker build -f ./docker/docker-cuda/Dockerfile \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host --gpus=all \ docker run -dit --ipc=host --gpus=all \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./shared_data:/app/shared_data \
-v ./output:/app/output \
-p 7860:7860 \ -p 7860:7860 \
-p 8000:8000 \ -p 8000:8000 \
--name llamafactory \ --name llamafactory \
...@@ -701,11 +702,6 @@ docker build -f ./docker/docker-npu/Dockerfile \ ...@@ -701,11 +702,6 @@ docker build -f ./docker/docker-npu/Dockerfile \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host \ docker run -dit --ipc=host \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./shared_data:/app/shared_data \
-v ./output:/app/output \
-v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
...@@ -731,11 +727,6 @@ docker build -f ./docker/docker-rocm/Dockerfile \ ...@@ -731,11 +727,6 @@ docker build -f ./docker/docker-rocm/Dockerfile \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host \ docker run -dit --ipc=host \
-v ./hf_cache:/root/.cache/huggingface \
-v ./ms_cache:/root/.cache/modelscope \
-v ./om_cache:/root/.cache/openmind \
-v ./shared_data:/app/shared_data \
-v ./output:/app/output \
-p 7860:7860 \ -p 7860:7860 \
-p 8000:8000 \ -p 8000:8000 \
--device /dev/kfd \ --device /dev/kfd \
...@@ -748,11 +739,13 @@ docker exec -it llamafactory bash ...@@ -748,11 +739,13 @@ docker exec -it llamafactory bash
</details> </details>
<details><summary>数据卷详情</summary> <details><summary>使用数据卷</summary>
您可以通过移除 Dockerfile 中 `VOLUME [ "/root/.cache/huggingface", "/app/shared_data", "/app/output" ]` 的注释来使用数据卷。
在构建 Docker 时使用参数 `-v ./hf_cache:/root/.cache/huggingface` 来挂载数据卷。各个数据卷的含义表示如下。
- `hf_cache`:使用宿主机的 Hugging Face 缓存文件夹,允许更改为新的目录。 - `hf_cache`:使用宿主机的 Hugging Face 缓存文件夹。
- `ms_cache`:类似 Hugging Face 缓存文件夹,为 ModelScope 用户提供。
- `om_cache`:类似 Hugging Face 缓存文件夹,为 Modelers 用户提供。
- `shared_data`:宿主机中存放数据集的文件夹路径。 - `shared_data`:宿主机中存放数据集的文件夹路径。
- `output`:将导出目录设置为该路径后,即可在宿主机中访问导出后的模型。 - `output`:将导出目录设置为该路径后,即可在宿主机中访问导出后的模型。
......
<svg width="150" height="20" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<!-- Created with Method Draw - http://github.com/duopixel/Method-Draw/ -->
<g>
<title>background</title>
<rect fill="none" id="canvas_background" height="22" width="152" y="-1" x="-1"/>
</g>
<g>
<title>Layer 1</title>
<defs>
<style>.cls-1 {
fill: url(#linear-gradient);
}
.cls-2 {
fill: url(#linear-gradient-2);
filter: url(#filter);
}
.cls-3 {
font-size: 41.667px;
text-anchor: middle;
fill: #fff;
font-family: "Source Han Sans CN";
font-weight: 700;
}</style>
</defs>
<g stroke="null" id="svg_22">
<rect stroke="null" x="6720.78327" y="1114.755591" transform="matrix(0.2504266498995074,0,0,0.23702906968655965,-1682.6751828654376,-263.95604433442816) " ry="15" rx="15" height="84" width="596" class="cls-1" data-name="矩形 1 拷贝" id="svg_16"/>
<rect stroke="null" transform="matrix(0.2504266498995074,0,0,0.23702906968655965,-1682.6751828654376,-263.95604433442816) " ry="15" rx="15" height="65" width="85" y="1124.755591" x="6749.78327" class="cls-2" data-name="矩形 2" id="svg_15"/>
<image stroke="null" transform="matrix(0.2504266498995074,0,0,0.23702906968655965,-1682.6751828654376,-263.95604433442816) " xlink:href="data:img/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEkAAAAhCAYAAACV1IbrAAAMxElEQVRogbWaCZBV1ZnHf+ece+9b+r1eoNlpBUFiRkYSRJE4MZOJURzLJAoMYsQEdys6Mk6lkmiKQIWU1FiVpcYkRsUZkiFRY0yYsYoxi1MxJgyIOjiGUcKmgGHphu5++93O1LnvPrrpft2vwfbrOv3ufedu3//9v/Vcob85jvdJPgIsBy4Fpsa3OAD8HlgPvIJ1jJdyV/Jo7nomymMIIb5l29Zy27a1ZVvYloVt2ygl0Vqb85PAHKXkXs8LyOeLCIEZD0gp/0FKqaWSKCmRUprr1TQbIwnn94ix21M6zy3uV2nVh4BsFlgLXAVMAMQAKMz8Iut9Auhx4JY637cBFwB3Ad8Be+UkeRhZOk5eenMTtlhpoIhGbSMSC6WU2fhXIcReo0u+UMDzPZRU04QQa82xJw+PN2pACfiNK5LbK6RY4X2D1nAPiIlZCN4AzhpGj1eAZ98PkP4ALBjBcfeiEx+YYHVdlfCOkdNNv5CEfbO6+s8wyAzH4aCUcoVRPJfLUyyWsJTC18EmwxyDhAwGU0EpmRPoq44yib/1n2RK8N8gxgLBfzQAyAc+QfQTja48MUKAqiLkQkvrm/OlIAiU3+GrGhfESSqZv9hqFhmAPK9IT28Ow4+A4F4h5QXRfGguB0HQx6gqi/wbumj3pug3+VjwQxDGYtVnQX+swdPdBvSMNkiXACtO64zQwkn2rp+kuthebKXZyuHjAF4MkoiGgA1+EG4ze8VikcAPjCm1ay2+rSJ8YpSM+hLCMD5T8HwQyudy2mEFPwJRACYmIXiiwZO9bEy7tiPPDI+68sxpnxFYSMvlnmkvEnplTpQlvu/heV7kb6rDL1Zc79ZyuUw+n6dScQm1JgiCp8JQUx0hYVD9DIIQHYTR9zrwlh30JjAvfJHp4iXjv81NN0D0Swwni/vPjRZIDwFTTvssoaHcwtnj9nDnlC3s7knhG4A8v2+47tJSqeyXiiXKFZfAAOEHy4Iw/JtQBxEoEUgRUPGnDtGhf9txr+lEJjjGZ6QxM8M5+5PA3zV4qvuBd059zPeeAnwQ2NngmLtj0/724CkBdiXykyu2LmN3ZRIzmotI5eA49nNCcI1hhQntxiehdVoI0aMsZZl9KQRSqihNEFKilDARbyuCS97xO7g79QMuTPwnhCbCh91AyzDPuQv4wMAvR4NJzzaY7wS+Ww35nBg8rcFNguPylZm/o1gJ6K2AH3i64ro3FIolXM+LIlzg+/hBsCHU2upjjabGKB0G6KqZLTlYGctc+TIXJn4J2uAS/nMDgIwsqvflewXpPuC8ujNagOVCorQMp1w1LS2W1T1WhFBsZdbk3Xxuyqvs6GrheFHdcaQ3zJXKJhcSxgcZgObrUC+OfE9/84r8URD7p2BVr+ccEH6JG1JPx/dNTovZPJw8DLxRb16tvrLpTAFqB35dHyAJToXAS3zzz/nWR3Rgk3RKoNUeYDbwF3VBFYJ5bUf43+PtT+Vc7s/Qi51IoGwnimg65PdC0oruy4378iJhTt+F1kverozn85lNzM7+Afzx5uK/jTPqoeRwLSeqJ+8lBdhYHyAByQJuJcVNW69W75YyuKHk4bm/YV7Hm5AbeyMi/Myge5tfvJJGOWXWX/RcrtuzcBDcv+c69hSytNrug1LKs0UoqswbKMY3hf61B7x2Zjs7ubxlMwSmqmBlnOUPJ4uHmzxTc7sGuKL+FTXGvB5+a57e0jn5XkcGlxcCm7U7FxCWspAoVtDi7+uea5R3EyDDW1szvVenUyUTpXC9YEYQhF825hSc9EP9oloY+aPv5D210/N9bm3bBLICYbod9Lca6PJkXE8OKWcCkhiaRRIy3fzfoXPD9ftm88HmLprs4CezmnO8WRjPo7vmQ6JgWPP9oew/YlRgg5f8MZ6D6xkQgp9pXQOjnx/qc95dOvBX7iu0sSj7IlOzO8EzUTt8soEuLvD5RgqfCUg/jKvjU8WYWaIEpbT+2hsLSFmBSNvaFKbtQtnfn9VSYP3BC9n17ixInzCAfrrBfZrR8ks9FbVIh8GcGmNMFItG5Khj4AJ/0cFShnPt/Swd/zwExs+K64fzM7HcAFQaKXy6IF0G3Fh3xjAgUeCxNy/Sr3ZPkGdnCiCtKLeRUt6ZTYTzko7koX1/XQ35VnkvsHrIO2llfNu6c1InnjlaSiB0EEew4GTCGOgo495Y8sRveyuSeyZuRjo94GVSoH/UQJdfAj8bidKnC9LT9RUSkM6x//B0Htk/R8xsyccAiRpIZv+Z6ZkC/1OYxnOHLoakqR3FmoHZ7UkJVQT8yllbaLMKdJad2NSq4V4H2uyXCL2b9+SbubbtZc5rfx0q7ca3bRhBUKqfjtSR0wHpn4YMo8rHqyRY99YlaGGJZicqQKMsuDZMZJLKWnVWpsDjhz9KT+9USHQbs6ubwEVJZilD+9h3uW/Gy7yTb+pXeuias/7c4WLSnWB1cVvHCxAkIZSmul/SQJd7gOOjDdL5wBeHnLVccoW22/YUx66e3FRGCxUzSJ1kUtQAk3LN2KQ7PUeW7x28EpP4IN3twKN1r2tMuNjCp2e+xifa97EnlzWtj5rD/pXrhT89UnT4x47/IpE5BpVmc059tvfJa3HiOGIZKUibhp3VYp9UweOtiWANQhX7M0iZuioa1bYq0t40I9PDr3MfYkvnAkhETvyOIX9Z3466aavOfwkbl+6KhYl0hP6SP/Vmuab9DT4y9VUojTFmZqLm+Aa6NAoYg2QkIH0ZmDHkrOWBXV6UtT0sSyGUWt6fPVFh2n9I+ZeOJZdPTJd4rHMh5dJ4sLtr0WiwGDYVWpg0/gB3z3iNXb3N5D1x1/58qicrc6w89yXQlmm7zAHubKDL6rjPPqogTQQerDtjciLbhUCt7sk3v9ZZSWOb5r1Sz0opXzjV1MQAs1OPjk+UnE7GsbHr6mriJ4JfDclYk5mVM9w043Uuaz/w896KeqTsar547hZa245AKTKzRv0sEyDWjBCXU6RRBKhf4ZtolsqRL7Rwy7aFfs63SUmXZFbTmjbFqFoqhTgqlRTVdsYgoJIouXFa+viSzaVLubT3j5yX3QbehBsh7BrcFDMlSyrK5NfPe777UClDUgVMbOqGQqsB6KvAzAa6NHLmQ8pwTLp9yH616bgrTz+08yK2dk9dG1rZub26FbdSrdht2+qUSt1+qsmpeFT3lZSLUxZXtDg+j+cWE/itIHvzIG6qe8+4ZLGd8oppYw5fN7Gls5omBGo68PUGev4Y2HamIA3VdGuJez8DFx+qZtbcxevvzAqv33aNPG9MiXTS3qekPMcPAhzHIZlIRCEawStKyrmn+Kj+jFLyuCX02D1eB0vSm7mueQME0fO8CHx0mOcuArX2hYmOFw5zbDnWxz1NbE7KUEzaWB8gAablUU7pB9+aL9rSgqaEwrbs6clk8oFMpily3uZMZZluoVp0ivMe4JuklGOEsh4+O9HFZu8KDrkXgDxmEq9GppEGvgQsbAAQcW12xgAxBEifAq6ue3RUeuT5l7cu1jsKHeKclgrKcqoO27HXplOpael0qrrqKqVZF9svlfxaH3PUwEhnnPgX0rb/4YQt2eialZ4kiNwREPc1ePZ1wOYGx7wAPDViNIaQgeZmx2tNqUGHGxY19bD3aAdLty/RU5t90ZISESCOGQmbhJPYIaX8kFkrM0183/MjMAT8SUo5s8/c1EBWHVQy7NgfTGOZ9RM+bm8EHTXLdoygFzScmBXj7vcK0kAmPVEXICPKw6skeWj3pdiOI9qS2phZBJDt2NG2bVtzksnEHelUiuZsxjTyowXCU81O9bHqJMPEVKnsNVOcLjaLazkeml78UfN4Q5QsI5K7RwOggSAtGLLCj0HqKYy5/YA7/usdGRdpzMyJgMGJzM3CsiwDzCOpVLI5m22ira2lWrcp+bpU8nu1FxmqKxqyNhczS63KqMpZWCl+LlbEy7CF3SDWnYFeO+LFh1GR/iANn4xp3paWfmxsE6uk7ZQMQBE4hkm2g2M78XYE1NPmBYeW5mw0TASwlPqClPKYOlnTVc2tVq6YfaGcX0y2jrHLuZit1nLQebOY+BUQ9Xvp9aUrfktk1KQG0r8Bk4eFSIcLNZJEuolkwl5eBcSOTMqJtxOJ6idwpdb6LtNqzWazJJPJmEFqsax1BVQU/frypmhfflhKsWacPMa/2yvYry4DbXQWn4zfVGkkpg1rypM/jyZIxnHPB7YA+TrzHmDWh+9Hdf9xb2k267puJilKODJ81kk4l9dASiSqTIoQ1SaZwiy8n6+U6nRdl0KhVH2BQfCYlHLpgFQgYlJt8V8Snt8rxrytCLml8gDjQmN1zWb+44Aphv8q6lxWxeRBr8ZpS6NG2+kL8P+kTjwNKo0VdAAAAABJRU5ErkJggg==" height="33" width="73" y="1139.755591" x="6754.78327" data-name="Alaya New 图标" id="svg_14"/>
<text stroke="null" transform="matrix(0.2504266498995074,0,0,0.23702906968655957,-1756.5273553170623,8.368913241150969) " y="22.830059" x="7363.318675" class="cls-3" data-name="Open in Alaya NeW" id="svg_13">
<tspan stroke="null" id="svg_21" x="7363.318675">Open in Alaya NeW</tspan>
</text>
<image stroke="null" x="6720.78327" y="1114.755591" transform="matrix(0.2504266498995074,0,0,0.23702906968655965,-1682.6751828654376,-263.95604433442816) " xlink:href="data:img/png;base64," height="84" width="596" data-name="图层 1" id="svg_12"/>
</g>
</g>
</svg>
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="350.696449pt" height="268.034375pt" viewBox="0 0 350.696449 268.034375" xmlns="http://www.w3.org/2000/svg" version="1.1">
<metadata>
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cc:Work>
<dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
<dc:date>2023-11-18T11:28:03.028228</dc:date>
<dc:format>image/svg+xml</dc:format>
<dc:creator>
<cc:Agent>
<dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title>
</cc:Agent>
</dc:creator>
</cc:Work>
</rdf:RDF>
</metadata>
<defs>
<style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
</defs>
<g id="figure_1">
<g id="patch_1">
<path d="M 0 268.034375
L 350.696449 268.034375
L 350.696449 0
L 0 0
z
" style="fill: #ffffff"/>
</g>
<g id="axes_1">
<g id="patch_2">
<path d="M 7.2 244.078125
L 342 244.078125
L 342 22.318125
L 7.2 22.318125
z
" style="fill: #ffffff"/>
</g>
<g id="matplotlib.axis_1">
<g id="xtick_1">
<g id="line2d_1">
<defs>
<path id="md49eeea5b7" d="M 0 0
L 0 3.5
" style="stroke: #000000; stroke-width: 0.8"/>
</defs>
<g>
<use xlink:href="#md49eeea5b7" x="56.236364" y="244.078125" style="stroke: #000000; stroke-width: 0.8"/>
</g>
</g>
<g id="text_1">
<!-- Training Speed -->
<g transform="translate(14.12777 258.676562) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-54" d="M 31 4666
L 4331 4666
L 4331 3756
L 2784 3756
L 2784 0
L 1581 0
L 1581 3756
L 31 3756
L 31 4666
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-72" d="M 3138 2547
Q 2991 2616 2845 2648
Q 2700 2681 2553 2681
Q 2122 2681 1889 2404
Q 1656 2128 1656 1613
L 1656 0
L 538 0
L 538 3500
L 1656 3500
L 1656 2925
Q 1872 3269 2151 3426
Q 2431 3584 2822 3584
Q 2878 3584 2943 3579
Q 3009 3575 3134 3559
L 3138 2547
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-61" d="M 2106 1575
Q 1756 1575 1579 1456
Q 1403 1338 1403 1106
Q 1403 894 1545 773
Q 1688 653 1941 653
Q 2256 653 2472 879
Q 2688 1106 2688 1447
L 2688 1575
L 2106 1575
z
M 3816 1997
L 3816 0
L 2688 0
L 2688 519
Q 2463 200 2181 54
Q 1900 -91 1497 -91
Q 953 -91 614 226
Q 275 544 275 1050
Q 275 1666 698 1953
Q 1122 2241 2028 2241
L 2688 2241
L 2688 2328
Q 2688 2594 2478 2717
Q 2269 2841 1825 2841
Q 1466 2841 1156 2769
Q 847 2697 581 2553
L 581 3406
Q 941 3494 1303 3539
Q 1666 3584 2028 3584
Q 2975 3584 3395 3211
Q 3816 2838 3816 1997
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-69" d="M 538 3500
L 1656 3500
L 1656 0
L 538 0
L 538 3500
z
M 538 4863
L 1656 4863
L 1656 3950
L 538 3950
L 538 4863
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-6e" d="M 4056 2131
L 4056 0
L 2931 0
L 2931 347
L 2931 1631
Q 2931 2084 2911 2256
Q 2891 2428 2841 2509
Q 2775 2619 2662 2680
Q 2550 2741 2406 2741
Q 2056 2741 1856 2470
Q 1656 2200 1656 1722
L 1656 0
L 538 0
L 538 3500
L 1656 3500
L 1656 2988
Q 1909 3294 2193 3439
Q 2478 3584 2822 3584
Q 3428 3584 3742 3212
Q 4056 2841 4056 2131
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-67" d="M 2919 594
Q 2688 288 2409 144
Q 2131 0 1766 0
Q 1125 0 706 504
Q 288 1009 288 1791
Q 288 2575 706 3076
Q 1125 3578 1766 3578
Q 2131 3578 2409 3434
Q 2688 3291 2919 2981
L 2919 3500
L 4044 3500
L 4044 353
Q 4044 -491 3511 -936
Q 2978 -1381 1966 -1381
Q 1638 -1381 1331 -1331
Q 1025 -1281 716 -1178
L 716 -306
Q 1009 -475 1290 -558
Q 1572 -641 1856 -641
Q 2406 -641 2662 -400
Q 2919 -159 2919 353
L 2919 594
z
M 2181 2772
Q 1834 2772 1640 2515
Q 1447 2259 1447 1791
Q 1447 1309 1634 1061
Q 1822 813 2181 813
Q 2531 813 2725 1069
Q 2919 1325 2919 1791
Q 2919 2259 2725 2515
Q 2531 2772 2181 2772
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-20" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-53" d="M 3834 4519
L 3834 3531
Q 3450 3703 3084 3790
Q 2719 3878 2394 3878
Q 1963 3878 1756 3759
Q 1550 3641 1550 3391
Q 1550 3203 1689 3098
Q 1828 2994 2194 2919
L 2706 2816
Q 3484 2659 3812 2340
Q 4141 2022 4141 1434
Q 4141 663 3683 286
Q 3225 -91 2284 -91
Q 1841 -91 1394 -6
Q 947 78 500 244
L 500 1259
Q 947 1022 1364 901
Q 1781 781 2169 781
Q 2563 781 2772 912
Q 2981 1044 2981 1288
Q 2981 1506 2839 1625
Q 2697 1744 2272 1838
L 1806 1941
Q 1106 2091 782 2419
Q 459 2747 459 3303
Q 459 4000 909 4375
Q 1359 4750 2203 4750
Q 2588 4750 2994 4692
Q 3400 4634 3834 4519
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-70" d="M 1656 506
L 1656 -1331
L 538 -1331
L 538 3500
L 1656 3500
L 1656 2988
Q 1888 3294 2169 3439
Q 2450 3584 2816 3584
Q 3463 3584 3878 3070
Q 4294 2556 4294 1747
Q 4294 938 3878 423
Q 3463 -91 2816 -91
Q 2450 -91 2169 54
Q 1888 200 1656 506
z
M 2400 2772
Q 2041 2772 1848 2508
Q 1656 2244 1656 1747
Q 1656 1250 1848 986
Q 2041 722 2400 722
Q 2759 722 2948 984
Q 3138 1247 3138 1747
Q 3138 2247 2948 2509
Q 2759 2772 2400 2772
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-65" d="M 4031 1759
L 4031 1441
L 1416 1441
Q 1456 1047 1700 850
Q 1944 653 2381 653
Q 2734 653 3104 758
Q 3475 863 3866 1075
L 3866 213
Q 3469 63 3072 -14
Q 2675 -91 2278 -91
Q 1328 -91 801 392
Q 275 875 275 1747
Q 275 2603 792 3093
Q 1309 3584 2216 3584
Q 3041 3584 3536 3087
Q 4031 2591 4031 1759
z
M 2881 2131
Q 2881 2450 2695 2645
Q 2509 2841 2209 2841
Q 1884 2841 1681 2658
Q 1478 2475 1428 2131
L 2881 2131
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-64" d="M 2919 2988
L 2919 4863
L 4044 4863
L 4044 0
L 2919 0
L 2919 506
Q 2688 197 2409 53
Q 2131 -91 1766 -91
Q 1119 -91 703 423
Q 288 938 288 1747
Q 288 2556 703 3070
Q 1119 3584 1766 3584
Q 2128 3584 2408 3439
Q 2688 3294 2919 2988
z
M 2181 722
Q 2541 722 2730 984
Q 2919 1247 2919 1747
Q 2919 2247 2730 2509
Q 2541 2772 2181 2772
Q 1825 2772 1636 2509
Q 1447 2247 1447 1747
Q 1447 1247 1636 984
Q 1825 722 2181 722
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-54"/>
<use xlink:href="#DejaVuSans-Bold-72" x="57.212891"/>
<use xlink:href="#DejaVuSans-Bold-61" x="106.529297"/>
<use xlink:href="#DejaVuSans-Bold-69" x="174.009766"/>
<use xlink:href="#DejaVuSans-Bold-6e" x="208.287109"/>
<use xlink:href="#DejaVuSans-Bold-69" x="279.478516"/>
<use xlink:href="#DejaVuSans-Bold-6e" x="313.755859"/>
<use xlink:href="#DejaVuSans-Bold-67" x="384.947266"/>
<use xlink:href="#DejaVuSans-Bold-20" x="456.529297"/>
<use xlink:href="#DejaVuSans-Bold-53" x="491.34375"/>
<use xlink:href="#DejaVuSans-Bold-70" x="563.365234"/>
<use xlink:href="#DejaVuSans-Bold-65" x="634.947266"/>
<use xlink:href="#DejaVuSans-Bold-65" x="702.769531"/>
<use xlink:href="#DejaVuSans-Bold-64" x="770.591797"/>
</g>
</g>
</g>
<g id="xtick_2">
<g id="line2d_2">
<g>
<use xlink:href="#md49eeea5b7" x="174.6" y="244.078125" style="stroke: #000000; stroke-width: 0.8"/>
</g>
</g>
<g id="text_2">
<!-- Rouge Score -->
<g transform="translate(139.1875 258.598437) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-52" d="M 2297 2597
Q 2675 2597 2839 2737
Q 3003 2878 3003 3200
Q 3003 3519 2839 3656
Q 2675 3794 2297 3794
L 1791 3794
L 1791 2597
L 2297 2597
z
M 1791 1766
L 1791 0
L 588 0
L 588 4666
L 2425 4666
Q 3347 4666 3776 4356
Q 4206 4047 4206 3378
Q 4206 2916 3982 2619
Q 3759 2322 3309 2181
Q 3556 2125 3751 1926
Q 3947 1728 4147 1325
L 4800 0
L 3519 0
L 2950 1159
Q 2778 1509 2601 1637
Q 2425 1766 2131 1766
L 1791 1766
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-6f" d="M 2203 2784
Q 1831 2784 1636 2517
Q 1441 2250 1441 1747
Q 1441 1244 1636 976
Q 1831 709 2203 709
Q 2569 709 2762 976
Q 2956 1244 2956 1747
Q 2956 2250 2762 2517
Q 2569 2784 2203 2784
z
M 2203 3584
Q 3106 3584 3614 3096
Q 4122 2609 4122 1747
Q 4122 884 3614 396
Q 3106 -91 2203 -91
Q 1297 -91 786 396
Q 275 884 275 1747
Q 275 2609 786 3096
Q 1297 3584 2203 3584
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-75" d="M 500 1363
L 500 3500
L 1625 3500
L 1625 3150
Q 1625 2866 1622 2436
Q 1619 2006 1619 1863
Q 1619 1441 1641 1255
Q 1663 1069 1716 984
Q 1784 875 1895 815
Q 2006 756 2150 756
Q 2500 756 2700 1025
Q 2900 1294 2900 1772
L 2900 3500
L 4019 3500
L 4019 0
L 2900 0
L 2900 506
Q 2647 200 2364 54
Q 2081 -91 1741 -91
Q 1134 -91 817 281
Q 500 653 500 1363
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-63" d="M 3366 3391
L 3366 2478
Q 3138 2634 2908 2709
Q 2678 2784 2431 2784
Q 1963 2784 1702 2511
Q 1441 2238 1441 1747
Q 1441 1256 1702 982
Q 1963 709 2431 709
Q 2694 709 2930 787
Q 3166 866 3366 1019
L 3366 103
Q 3103 6 2833 -42
Q 2563 -91 2291 -91
Q 1344 -91 809 395
Q 275 881 275 1747
Q 275 2613 809 3098
Q 1344 3584 2291 3584
Q 2566 3584 2833 3536
Q 3100 3488 3366 3391
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-52"/>
<use xlink:href="#DejaVuSans-Bold-6f" x="77.001953"/>
<use xlink:href="#DejaVuSans-Bold-75" x="145.703125"/>
<use xlink:href="#DejaVuSans-Bold-67" x="216.894531"/>
<use xlink:href="#DejaVuSans-Bold-65" x="288.476562"/>
<use xlink:href="#DejaVuSans-Bold-20" x="356.298828"/>
<use xlink:href="#DejaVuSans-Bold-53" x="391.113281"/>
<use xlink:href="#DejaVuSans-Bold-63" x="463.134766"/>
<use xlink:href="#DejaVuSans-Bold-6f" x="522.412109"/>
<use xlink:href="#DejaVuSans-Bold-72" x="591.113281"/>
<use xlink:href="#DejaVuSans-Bold-65" x="640.429688"/>
</g>
</g>
</g>
<g id="xtick_3">
<g id="line2d_3">
<g>
<use xlink:href="#md49eeea5b7" x="292.963636" y="244.078125" style="stroke: #000000; stroke-width: 0.8"/>
</g>
</g>
<g id="text_3">
<!-- GPU Memory (GB) -->
<g transform="translate(242.430824 258.665625) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-47" d="M 4781 347
Q 4331 128 3847 18
Q 3363 -91 2847 -91
Q 1681 -91 1000 561
Q 319 1213 319 2328
Q 319 3456 1012 4103
Q 1706 4750 2913 4750
Q 3378 4750 3804 4662
Q 4231 4575 4609 4403
L 4609 3438
Q 4219 3659 3833 3768
Q 3447 3878 3059 3878
Q 2341 3878 1952 3476
Q 1563 3075 1563 2328
Q 1563 1588 1938 1184
Q 2313 781 3003 781
Q 3191 781 3352 804
Q 3513 828 3641 878
L 3641 1784
L 2906 1784
L 2906 2591
L 4781 2591
L 4781 347
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-50" d="M 588 4666
L 2584 4666
Q 3475 4666 3951 4270
Q 4428 3875 4428 3144
Q 4428 2409 3951 2014
Q 3475 1619 2584 1619
L 1791 1619
L 1791 0
L 588 0
L 588 4666
z
M 1791 3794
L 1791 2491
L 2456 2491
Q 2806 2491 2997 2661
Q 3188 2831 3188 3144
Q 3188 3456 2997 3625
Q 2806 3794 2456 3794
L 1791 3794
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-55" d="M 588 4666
L 1791 4666
L 1791 1869
Q 1791 1291 1980 1042
Q 2169 794 2597 794
Q 3028 794 3217 1042
Q 3406 1291 3406 1869
L 3406 4666
L 4609 4666
L 4609 1869
Q 4609 878 4112 393
Q 3616 -91 2597 -91
Q 1581 -91 1084 393
Q 588 878 588 1869
L 588 4666
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-4d" d="M 588 4666
L 2119 4666
L 3181 2169
L 4250 4666
L 5778 4666
L 5778 0
L 4641 0
L 4641 3413
L 3566 897
L 2803 897
L 1728 3413
L 1728 0
L 588 0
L 588 4666
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-6d" d="M 3781 2919
Q 3994 3244 4286 3414
Q 4578 3584 4928 3584
Q 5531 3584 5847 3212
Q 6163 2841 6163 2131
L 6163 0
L 5038 0
L 5038 1825
Q 5041 1866 5042 1909
Q 5044 1953 5044 2034
Q 5044 2406 4934 2573
Q 4825 2741 4581 2741
Q 4263 2741 4089 2478
Q 3916 2216 3909 1719
L 3909 0
L 2784 0
L 2784 1825
Q 2784 2406 2684 2573
Q 2584 2741 2328 2741
Q 2006 2741 1831 2477
Q 1656 2213 1656 1722
L 1656 0
L 531 0
L 531 3500
L 1656 3500
L 1656 2988
Q 1863 3284 2130 3434
Q 2397 3584 2719 3584
Q 3081 3584 3359 3409
Q 3638 3234 3781 2919
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-79" d="M 78 3500
L 1197 3500
L 2138 1125
L 2938 3500
L 4056 3500
L 2584 -331
Q 2363 -916 2067 -1148
Q 1772 -1381 1288 -1381
L 641 -1381
L 641 -647
L 991 -647
Q 1275 -647 1404 -556
Q 1534 -466 1606 -231
L 1638 -134
L 78 3500
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-28" d="M 2413 -844
L 1484 -844
Q 1006 -72 778 623
Q 550 1319 550 2003
Q 550 2688 779 3389
Q 1009 4091 1484 4856
L 2413 4856
Q 2013 4116 1813 3408
Q 1613 2700 1613 2009
Q 1613 1319 1811 609
Q 2009 -100 2413 -844
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-42" d="M 2456 2859
Q 2741 2859 2887 2984
Q 3034 3109 3034 3353
Q 3034 3594 2887 3720
Q 2741 3847 2456 3847
L 1791 3847
L 1791 2859
L 2456 2859
z
M 2497 819
Q 2859 819 3042 972
Q 3225 1125 3225 1434
Q 3225 1738 3044 1889
Q 2863 2041 2497 2041
L 1791 2041
L 1791 819
L 2497 819
z
M 3616 2497
Q 4003 2384 4215 2081
Q 4428 1778 4428 1338
Q 4428 663 3972 331
Q 3516 0 2584 0
L 588 0
L 588 4666
L 2394 4666
Q 3366 4666 3802 4372
Q 4238 4078 4238 3431
Q 4238 3091 4078 2852
Q 3919 2613 3616 2497
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-29" d="M 513 -844
Q 913 -100 1113 609
Q 1313 1319 1313 2009
Q 1313 2700 1113 3408
Q 913 4116 513 4856
L 1441 4856
Q 1916 4091 2145 3389
Q 2375 2688 2375 2003
Q 2375 1319 2147 623
Q 1919 -72 1441 -844
L 513 -844
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-47"/>
<use xlink:href="#DejaVuSans-Bold-50" x="82.080078"/>
<use xlink:href="#DejaVuSans-Bold-55" x="155.371094"/>
<use xlink:href="#DejaVuSans-Bold-20" x="236.572266"/>
<use xlink:href="#DejaVuSans-Bold-4d" x="271.386719"/>
<use xlink:href="#DejaVuSans-Bold-65" x="370.898438"/>
<use xlink:href="#DejaVuSans-Bold-6d" x="438.720703"/>
<use xlink:href="#DejaVuSans-Bold-6f" x="542.919922"/>
<use xlink:href="#DejaVuSans-Bold-72" x="611.621094"/>
<use xlink:href="#DejaVuSans-Bold-79" x="660.9375"/>
<use xlink:href="#DejaVuSans-Bold-20" x="726.123047"/>
<use xlink:href="#DejaVuSans-Bold-28" x="760.9375"/>
<use xlink:href="#DejaVuSans-Bold-47" x="806.640625"/>
<use xlink:href="#DejaVuSans-Bold-42" x="888.720703"/>
<use xlink:href="#DejaVuSans-Bold-29" x="964.941406"/>
</g>
</g>
</g>
</g>
<g id="patch_3">
<path d="M 22.418182 244.078125
L 56.236364 244.078125
L 56.236364 195.339663
L 22.418182 195.339663
z
" clip-path="url(#p080f205d85)" style="fill: #6baed6"/>
</g>
<g id="patch_4">
<path d="M 140.781818 244.078125
L 174.6 244.078125
L 174.6 146.601202
L 140.781818 146.601202
z
" clip-path="url(#p080f205d85)" style="fill: #6baed6"/>
</g>
<g id="patch_5">
<path d="M 259.145455 244.078125
L 292.963636 244.078125
L 292.963636 205.087356
L 259.145455 205.087356
z
" clip-path="url(#p080f205d85)" style="fill: #6baed6"/>
</g>
<g id="patch_6">
<path d="M 56.236364 244.078125
L 90.054545 244.078125
L 90.054545 32.878125
L 56.236364 32.878125
z
" clip-path="url(#p080f205d85)" style="fill: #3182bd"/>
</g>
<g id="patch_7">
<path d="M 174.6 244.078125
L 208.418182 244.078125
L 208.418182 130.355048
L 174.6 130.355048
z
" clip-path="url(#p080f205d85)" style="fill: #3182bd"/>
</g>
<g id="patch_8">
<path d="M 292.963636 244.078125
L 326.781818 244.078125
L 326.781818 218.084279
L 292.963636 218.084279
z
" clip-path="url(#p080f205d85)" style="fill: #3182bd"/>
</g>
<g id="patch_9">
<path d="M 7.2 244.078125
L 342 244.078125
" style="fill: none; stroke: #dddddd; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/>
</g>
<g id="text_4">
<!-- 5.81 -->
<g transform="translate(26.991335 193.259976) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-35" d="M 678 4666
L 3669 4666
L 3669 3781
L 1638 3781
L 1638 3059
Q 1775 3097 1914 3117
Q 2053 3138 2203 3138
Q 3056 3138 3531 2711
Q 4006 2284 4006 1522
Q 4006 766 3489 337
Q 2972 -91 2053 -91
Q 1656 -91 1267 -14
Q 878 63 494 219
L 494 1166
Q 875 947 1217 837
Q 1559 728 1863 728
Q 2300 728 2551 942
Q 2803 1156 2803 1522
Q 2803 1891 2551 2103
Q 2300 2316 1863 2316
Q 1603 2316 1309 2248
Q 1016 2181 678 2041
L 678 4666
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-2e" d="M 653 1209
L 1778 1209
L 1778 0
L 653 0
L 653 1209
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-38" d="M 2228 2088
Q 1891 2088 1709 1903
Q 1528 1719 1528 1375
Q 1528 1031 1709 848
Q 1891 666 2228 666
Q 2563 666 2741 848
Q 2919 1031 2919 1375
Q 2919 1722 2741 1905
Q 2563 2088 2228 2088
z
M 1350 2484
Q 925 2613 709 2878
Q 494 3144 494 3541
Q 494 4131 934 4440
Q 1375 4750 2228 4750
Q 3075 4750 3515 4442
Q 3956 4134 3956 3541
Q 3956 3144 3739 2878
Q 3522 2613 3097 2484
Q 3572 2353 3814 2058
Q 4056 1763 4056 1313
Q 4056 619 3595 264
Q 3134 -91 2228 -91
Q 1319 -91 855 264
Q 391 619 391 1313
Q 391 1763 633 2058
Q 875 2353 1350 2484
z
M 1631 3419
Q 1631 3141 1786 2991
Q 1941 2841 2228 2841
Q 2509 2841 2662 2991
Q 2816 3141 2816 3419
Q 2816 3697 2662 3845
Q 2509 3994 2228 3994
Q 1941 3994 1786 3844
Q 1631 3694 1631 3419
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-31" d="M 750 831
L 1813 831
L 1813 3847
L 722 3622
L 722 4441
L 1806 4666
L 2950 4666
L 2950 831
L 4013 831
L 4013 0
L 750 0
L 750 831
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-35"/>
<use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
<use xlink:href="#DejaVuSans-Bold-38" x="107.568359"/>
<use xlink:href="#DejaVuSans-Bold-31" x="177.148438"/>
</g>
</g>
<g id="text_5">
<!-- 7.20 -->
<g transform="translate(145.354972 144.521514) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-37" d="M 428 4666
L 3944 4666
L 3944 3988
L 2125 0
L 953 0
L 2675 3781
L 428 3781
L 428 4666
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-32" d="M 1844 884
L 3897 884
L 3897 0
L 506 0
L 506 884
L 2209 2388
Q 2438 2594 2547 2791
Q 2656 2988 2656 3200
Q 2656 3528 2436 3728
Q 2216 3928 1850 3928
Q 1569 3928 1234 3808
Q 900 3688 519 3450
L 519 4475
Q 925 4609 1322 4679
Q 1719 4750 2100 4750
Q 2938 4750 3402 4381
Q 3866 4013 3866 3353
Q 3866 2972 3669 2642
Q 3472 2313 2841 1759
L 1844 884
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-30" d="M 2944 2338
Q 2944 3213 2780 3570
Q 2616 3928 2228 3928
Q 1841 3928 1675 3570
Q 1509 3213 1509 2338
Q 1509 1453 1675 1090
Q 1841 728 2228 728
Q 2613 728 2778 1090
Q 2944 1453 2944 2338
z
M 4147 2328
Q 4147 1169 3647 539
Q 3147 -91 2228 -91
Q 1306 -91 806 539
Q 306 1169 306 2328
Q 306 3491 806 4120
Q 1306 4750 2228 4750
Q 3147 4750 3647 4120
Q 4147 3491 4147 2328
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-37"/>
<use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
<use xlink:href="#DejaVuSans-Bold-32" x="107.568359"/>
<use xlink:href="#DejaVuSans-Bold-30" x="177.148438"/>
</g>
</g>
<g id="text_6">
<!-- 5.78 -->
<g transform="translate(263.718608 203.007668) scale(0.1 -0.1)">
<use xlink:href="#DejaVuSans-Bold-35"/>
<use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
<use xlink:href="#DejaVuSans-Bold-37" x="107.568359"/>
<use xlink:href="#DejaVuSans-Bold-38" x="177.148438"/>
</g>
</g>
<g id="text_7">
<!-- 21.67 -->
<g transform="translate(57.330611 30.798438) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-36" d="M 2316 2303
Q 2000 2303 1842 2098
Q 1684 1894 1684 1484
Q 1684 1075 1842 870
Q 2000 666 2316 666
Q 2634 666 2792 870
Q 2950 1075 2950 1484
Q 2950 1894 2792 2098
Q 2634 2303 2316 2303
z
M 3803 4544
L 3803 3681
Q 3506 3822 3243 3889
Q 2981 3956 2731 3956
Q 2194 3956 1894 3657
Q 1594 3359 1544 2772
Q 1750 2925 1990 3001
Q 2231 3078 2516 3078
Q 3231 3078 3670 2659
Q 4109 2241 4109 1563
Q 4109 813 3618 361
Q 3128 -91 2303 -91
Q 1394 -91 895 523
Q 397 1138 397 2266
Q 397 3422 980 4083
Q 1563 4744 2578 4744
Q 2900 4744 3203 4694
Q 3506 4644 3803 4544
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-32"/>
<use xlink:href="#DejaVuSans-Bold-31" x="69.580078"/>
<use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
<use xlink:href="#DejaVuSans-Bold-36" x="177.148438"/>
<use xlink:href="#DejaVuSans-Bold-37" x="246.728516"/>
</g>
</g>
<g id="text_8">
<!-- 7.36 -->
<g transform="translate(179.173153 128.275361) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-33" d="M 2981 2516
Q 3453 2394 3698 2092
Q 3944 1791 3944 1325
Q 3944 631 3412 270
Q 2881 -91 1863 -91
Q 1503 -91 1142 -33
Q 781 25 428 141
L 428 1069
Q 766 900 1098 814
Q 1431 728 1753 728
Q 2231 728 2486 893
Q 2741 1059 2741 1369
Q 2741 1688 2480 1852
Q 2219 2016 1709 2016
L 1228 2016
L 1228 2791
L 1734 2791
Q 2188 2791 2409 2933
Q 2631 3075 2631 3366
Q 2631 3634 2415 3781
Q 2200 3928 1806 3928
Q 1516 3928 1219 3862
Q 922 3797 628 3669
L 628 4550
Q 984 4650 1334 4700
Q 1684 4750 2022 4750
Q 2931 4750 3382 4451
Q 3834 4153 3834 3553
Q 3834 3144 3618 2883
Q 3403 2622 2981 2516
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-37"/>
<use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
<use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
<use xlink:href="#DejaVuSans-Bold-36" x="177.148438"/>
</g>
</g>
<g id="text_9">
<!-- 5.14 -->
<g transform="translate(297.53679 216.004591) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-34" d="M 2356 3675
L 1038 1722
L 2356 1722
L 2356 3675
z
M 2156 4666
L 3494 4666
L 3494 1722
L 4159 1722
L 4159 850
L 3494 850
L 3494 0
L 2356 0
L 2356 850
L 288 850
L 288 1881
L 2156 4666
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-35"/>
<use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
<use xlink:href="#DejaVuSans-Bold-31" x="107.568359"/>
<use xlink:href="#DejaVuSans-Bold-34" x="177.148438"/>
</g>
</g>
<g id="text_10">
<!-- ChatGLM2-6B - - 1×A100 -->
<g transform="translate(93.349688 16.318125) scale(0.12 -0.12)">
<defs>
<path id="DejaVuSans-Bold-43" d="M 4288 256
Q 3956 84 3597 -3
Q 3238 -91 2847 -91
Q 1681 -91 1000 561
Q 319 1213 319 2328
Q 319 3447 1000 4098
Q 1681 4750 2847 4750
Q 3238 4750 3597 4662
Q 3956 4575 4288 4403
L 4288 3438
Q 3953 3666 3628 3772
Q 3303 3878 2944 3878
Q 2300 3878 1931 3465
Q 1563 3053 1563 2328
Q 1563 1606 1931 1193
Q 2300 781 2944 781
Q 3303 781 3628 887
Q 3953 994 4288 1222
L 4288 256
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-68" d="M 4056 2131
L 4056 0
L 2931 0
L 2931 347
L 2931 1625
Q 2931 2084 2911 2256
Q 2891 2428 2841 2509
Q 2775 2619 2662 2680
Q 2550 2741 2406 2741
Q 2056 2741 1856 2470
Q 1656 2200 1656 1722
L 1656 0
L 538 0
L 538 4863
L 1656 4863
L 1656 2988
Q 1909 3294 2193 3439
Q 2478 3584 2822 3584
Q 3428 3584 3742 3212
Q 4056 2841 4056 2131
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-74" d="M 1759 4494
L 1759 3500
L 2913 3500
L 2913 2700
L 1759 2700
L 1759 1216
Q 1759 972 1856 886
Q 1953 800 2241 800
L 2816 800
L 2816 0
L 1856 0
Q 1194 0 917 276
Q 641 553 641 1216
L 641 2700
L 84 2700
L 84 3500
L 641 3500
L 641 4494
L 1759 4494
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-4c" d="M 588 4666
L 1791 4666
L 1791 909
L 3903 909
L 3903 0
L 588 0
L 588 4666
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-2d" d="M 347 2297
L 2309 2297
L 2309 1388
L 347 1388
L 347 2297
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-d7" d="M 4563 3359
L 3206 2003
L 4563 653
L 4038 128
L 2681 1478
L 1325 128
L 800 653
L 2156 2003
L 800 3359
L 1325 3884
L 2681 2528
L 4038 3884
L 4563 3359
z
" transform="scale(0.015625)"/>
<path id="DejaVuSans-Bold-41" d="M 3419 850
L 1538 850
L 1241 0
L 31 0
L 1759 4666
L 3194 4666
L 4922 0
L 3713 0
L 3419 850
z
M 1838 1716
L 3116 1716
L 2478 3572
L 1838 1716
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-43"/>
<use xlink:href="#DejaVuSans-Bold-68" x="73.388672"/>
<use xlink:href="#DejaVuSans-Bold-61" x="144.580078"/>
<use xlink:href="#DejaVuSans-Bold-74" x="212.060547"/>
<use xlink:href="#DejaVuSans-Bold-47" x="259.863281"/>
<use xlink:href="#DejaVuSans-Bold-4c" x="341.943359"/>
<use xlink:href="#DejaVuSans-Bold-4d" x="405.664062"/>
<use xlink:href="#DejaVuSans-Bold-32" x="505.175781"/>
<use xlink:href="#DejaVuSans-Bold-2d" x="574.755859"/>
<use xlink:href="#DejaVuSans-Bold-36" x="616.259766"/>
<use xlink:href="#DejaVuSans-Bold-42" x="685.839844"/>
<use xlink:href="#DejaVuSans-Bold-20" x="762.060547"/>
<use xlink:href="#DejaVuSans-Bold-2d" x="796.875"/>
<use xlink:href="#DejaVuSans-Bold-2d" x="838.378906"/>
<use xlink:href="#DejaVuSans-Bold-20" x="879.882812"/>
<use xlink:href="#DejaVuSans-Bold-31" x="914.697266"/>
<use xlink:href="#DejaVuSans-Bold-d7" x="984.277344"/>
<use xlink:href="#DejaVuSans-Bold-41" x="1068.066406"/>
<use xlink:href="#DejaVuSans-Bold-31" x="1145.458984"/>
<use xlink:href="#DejaVuSans-Bold-30" x="1215.039062"/>
<use xlink:href="#DejaVuSans-Bold-30" x="1284.619141"/>
</g>
</g>
<g id="legend_1">
<g id="patch_10">
<path d="M 201.507812 59.830625
L 335 59.830625
Q 337 59.830625 337 57.830625
L 337 29.318125
Q 337 27.318125 335 27.318125
L 201.507812 27.318125
Q 199.507812 27.318125 199.507812 29.318125
L 199.507812 57.830625
Q 199.507812 59.830625 201.507812 59.830625
L 201.507812 59.830625
z
" style="fill: none; opacity: 0"/>
</g>
<g id="patch_11">
<path d="M 203.507812 38.916562
L 223.507812 38.916562
L 223.507812 31.916562
L 203.507812 31.916562
z
" style="fill: #6baed6"/>
</g>
<g id="text_11">
<!-- ChatGLM P-Tuning -->
<g transform="translate(231.507812 38.916562) scale(0.1 -0.1)">
<use xlink:href="#DejaVuSans-Bold-43"/>
<use xlink:href="#DejaVuSans-Bold-68" x="73.388672"/>
<use xlink:href="#DejaVuSans-Bold-61" x="144.580078"/>
<use xlink:href="#DejaVuSans-Bold-74" x="212.060547"/>
<use xlink:href="#DejaVuSans-Bold-47" x="259.863281"/>
<use xlink:href="#DejaVuSans-Bold-4c" x="341.943359"/>
<use xlink:href="#DejaVuSans-Bold-4d" x="405.664062"/>
<use xlink:href="#DejaVuSans-Bold-20" x="505.175781"/>
<use xlink:href="#DejaVuSans-Bold-50" x="539.990234"/>
<use xlink:href="#DejaVuSans-Bold-2d" x="611.53125"/>
<use xlink:href="#DejaVuSans-Bold-54" x="638.285156"/>
<use xlink:href="#DejaVuSans-Bold-75" x="695.498047"/>
<use xlink:href="#DejaVuSans-Bold-6e" x="766.689453"/>
<use xlink:href="#DejaVuSans-Bold-69" x="837.880859"/>
<use xlink:href="#DejaVuSans-Bold-6e" x="872.158203"/>
<use xlink:href="#DejaVuSans-Bold-67" x="943.349609"/>
</g>
</g>
<g id="patch_12">
<path d="M 203.507812 53.672812
L 223.507812 53.672812
L 223.507812 46.672812
L 203.507812 46.672812
z
" style="fill: #3182bd"/>
</g>
<g id="text_12">
<!-- LLaMA-Factory -->
<g transform="translate(231.507812 53.672812) scale(0.1 -0.1)">
<defs>
<path id="DejaVuSans-Bold-46" d="M 588 4666
L 3834 4666
L 3834 3756
L 1791 3756
L 1791 2888
L 3713 2888
L 3713 1978
L 1791 1978
L 1791 0
L 588 0
L 588 4666
z
" transform="scale(0.015625)"/>
</defs>
<use xlink:href="#DejaVuSans-Bold-4c"/>
<use xlink:href="#DejaVuSans-Bold-4c" x="63.720703"/>
<use xlink:href="#DejaVuSans-Bold-61" x="127.441406"/>
<use xlink:href="#DejaVuSans-Bold-4d" x="194.921875"/>
<use xlink:href="#DejaVuSans-Bold-41" x="294.433594"/>
<use xlink:href="#DejaVuSans-Bold-2d" x="371.826172"/>
<use xlink:href="#DejaVuSans-Bold-46" x="413.330078"/>
<use xlink:href="#DejaVuSans-Bold-61" x="475.765625"/>
<use xlink:href="#DejaVuSans-Bold-63" x="543.246094"/>
<use xlink:href="#DejaVuSans-Bold-74" x="602.523438"/>
<use xlink:href="#DejaVuSans-Bold-6f" x="650.326172"/>
<use xlink:href="#DejaVuSans-Bold-72" x="719.027344"/>
<use xlink:href="#DejaVuSans-Bold-79" x="768.34375"/>
</g>
</g>
</g>
</g>
</g>
<defs>
<clipPath id="p080f205d85">
<rect x="7.2" y="22.318125" width="334.8" height="221.76"/>
</clipPath>
</defs>
</svg>
assets/wechat.jpg

169 KB | W: | H:

assets/wechat.jpg

167 KB | W: | H:

assets/wechat.jpg
assets/wechat.jpg
assets/wechat.jpg
assets/wechat.jpg
  • 2-up
  • Swipe
  • Onion skin
...@@ -165,6 +165,14 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 ...@@ -165,6 +165,14 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
``` ```
### Elastic and Fault-Tolerant Supervised Fine-Tuning on Multiple Nodes
To launch an elastic job with `MAX_RESTARTS` failures retries, run the following on at least `MIN_NNODES` nodes and at most `MAX_NNODES` nodes. `RDZV_ID` should be set as a unique job id (shared by all nodes participating in the job). See also [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html).
```bash
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
```
#### Multimodal Supervised Fine-Tuning #### Multimodal Supervised Fine-Tuning
```bash ```bash
......
...@@ -106,6 +106,14 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 ...@@ -106,6 +106,14 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
``` ```
### 支持弹性和容错的多机指令监督微调
要启动一个支持弹性节点和容错的多机指令微调,在每个节点上执行以下命令。弹性节点数量范围为 `MIN_NNODES:MAX_NNODES`,每个节点最多允许因为错误重启 `MAX_RESTARTS` 次。`RDZV_ID` 应设置为一个唯一的作业 ID(由参与该作业的所有节点共享)。更多新可以参考官方文档 [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html)
```bash
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
```
#### 使用 DeepSpeed ZeRO-3 平均分配显存 #### 使用 DeepSpeed ZeRO-3 平均分配显存
```bash ```bash
......
...@@ -12,6 +12,18 @@ ...@@ -12,6 +12,18 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Why we need this script for qwen_omni?
Because the qwen_omni model is constructed by two parts:
1. [Thinker]:[audio_encoder, vision_encoder, LLM backbone], which our repository does support to post-training.
2. [Talker]: [audio_decoder, wave_model], which is not supported to post-training without specific tokenizer.
When we post-training the model, we exactly train the [Thinker] part, and the [Talker] part is dropped.
So, to get the complete model, we need to merge the [Talker] part back to the [Thinker] part.
LoRA mode: [Thinker + LoRA weights] + [Original Talker] -> [Omni model]
Full mode: [Thinker] + [Original Talker] -> [Omni model]
For Processor, we do saved the processor from trained model instead of the original model.
"""
import os import os
import shutil import shutil
......
...@@ -42,24 +42,22 @@ def get_console_scripts() -> list[str]: ...@@ -42,24 +42,22 @@ def get_console_scripts() -> list[str]:
extra_require = { extra_require = {
"torch": ["torch>=1.13.1"], "torch": ["torch>=2.0.0", "torchvision>=0.15.0"],
"torch-npu": ["torch==2.4.0", "torch-npu==2.4.0.post2", "decorator"], "torch-npu": ["torch==2.4.0", "torch-npu==2.4.0.post2", "decorator"],
"metrics": ["nltk", "jieba", "rouge-chinese"], "metrics": ["nltk", "jieba", "rouge-chinese"],
"deepspeed": ["deepspeed>=0.10.0,<=0.16.5"], "deepspeed": ["deepspeed>=0.10.0,<=0.16.9"],
"liger-kernel": ["liger-kernel>=0.5.5"], "liger-kernel": ["liger-kernel>=0.5.5"],
"bitsandbytes": ["bitsandbytes>=0.39.0"], "bitsandbytes": ["bitsandbytes>=0.39.0"],
"hqq": ["hqq"], "hqq": ["hqq"],
"eetq": ["eetq"], "eetq": ["eetq"],
"gptq": ["optimum>=1.17.0", "auto-gptq>=0.5.0"], "gptq": ["optimum>=1.24.0", "gptqmodel>=2.0.0"],
"awq": ["autoawq"],
"aqlm": ["aqlm[gpu]>=1.1.0"], "aqlm": ["aqlm[gpu]>=1.1.0"],
"vllm": ["vllm>=0.4.3,<=0.8.4"], "vllm": ["vllm>=0.4.3,<=0.9.1"],
"sglang": ["sglang[srt]>=0.4.5", "transformers==4.51.1"], "sglang": ["sglang[srt]>=0.4.5", "transformers==4.51.1"],
"galore": ["galore-torch"], "galore": ["galore-torch"],
"apollo": ["apollo-torch"], "apollo": ["apollo-torch"],
"badam": ["badam>=1.2.1"], "badam": ["badam>=1.2.1"],
"adam-mini": ["adam-mini"], "adam-mini": ["adam-mini"],
"qwen": ["transformers_stream_generator"],
"minicpm_v": [ "minicpm_v": [
"soundfile", "soundfile",
"torchvision", "torchvision",
...@@ -69,7 +67,6 @@ extra_require = { ...@@ -69,7 +67,6 @@ extra_require = {
"msgpack", "msgpack",
"referencing", "referencing",
"jsonschema_specifications", "jsonschema_specifications",
"transformers==4.48.3",
], ],
"modelscope": ["modelscope"], "modelscope": ["modelscope"],
"openmind": ["openmind"], "openmind": ["openmind"],
......
...@@ -83,7 +83,13 @@ def main(): ...@@ -83,7 +83,13 @@ def main():
master_port = os.getenv("MASTER_PORT", str(find_available_port())) master_port = os.getenv("MASTER_PORT", str(find_available_port()))
logger.info_rank0(f"Initializing {nproc_per_node} distributed tasks at: {master_addr}:{master_port}") logger.info_rank0(f"Initializing {nproc_per_node} distributed tasks at: {master_addr}:{master_port}")
if int(nnodes) > 1: if int(nnodes) > 1:
print(f"Multi-node training enabled: num nodes: {nnodes}, node rank: {node_rank}") logger.info_rank0(f"Multi-node training enabled: num nodes: {nnodes}, node rank: {node_rank}")
# elastic launch support
max_restarts = os.getenv("MAX_RESTARTS", "0")
rdzv_id = os.getenv("RDZV_ID")
min_nnodes = os.getenv("MIN_NNODES")
max_nnodes = os.getenv("MAX_NNODES")
env = deepcopy(os.environ) env = deepcopy(os.environ)
if is_env_enabled("OPTIM_TORCH", "1"): if is_env_enabled("OPTIM_TORCH", "1"):
...@@ -91,25 +97,55 @@ def main(): ...@@ -91,25 +97,55 @@ def main():
env["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" env["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
env["TORCH_NCCL_AVOID_RECORD_STREAMS"] = "1" env["TORCH_NCCL_AVOID_RECORD_STREAMS"] = "1"
# NOTE: DO NOT USE shell=True to avoid security risk if rdzv_id is not None:
process = subprocess.run( # launch elastic job with fault tolerant support when possible
( # see also https://docs.pytorch.org/docs/stable/elastic/train_script.html
"torchrun --nnodes {nnodes} --node_rank {node_rank} --nproc_per_node {nproc_per_node} " rdzv_nnodes = nnodes
"--master_addr {master_addr} --master_port {master_port} {file_name} {args}" # elastic number of nodes if MIN_NNODES and MAX_NNODES are set
if min_nnodes is not None and max_nnodes is not None:
rdzv_nnodes = f"{min_nnodes}:{max_nnodes}"
process = subprocess.run(
(
"torchrun --nnodes {rdzv_nnodes} --nproc-per-node {nproc_per_node} "
"--rdzv-id {rdzv_id} --rdzv-backend c10d --rdzv-endpoint {master_addr}:{master_port} "
"--max-restarts {max_restarts} {file_name} {args}"
)
.format(
rdzv_nnodes=rdzv_nnodes,
nproc_per_node=nproc_per_node,
rdzv_id=rdzv_id,
master_addr=master_addr,
master_port=master_port,
max_restarts=max_restarts,
file_name=launcher.__file__,
args=" ".join(sys.argv[1:]),
)
.split(),
env=env,
check=True,
) )
.format( else:
nnodes=nnodes, # NOTE: DO NOT USE shell=True to avoid security risk
node_rank=node_rank, process = subprocess.run(
nproc_per_node=nproc_per_node, (
master_addr=master_addr, "torchrun --nnodes {nnodes} --node_rank {node_rank} --nproc_per_node {nproc_per_node} "
master_port=master_port, "--master_addr {master_addr} --master_port {master_port} {file_name} {args}"
file_name=launcher.__file__, )
args=" ".join(sys.argv[1:]), .format(
nnodes=nnodes,
node_rank=node_rank,
nproc_per_node=nproc_per_node,
master_addr=master_addr,
master_port=master_port,
file_name=launcher.__file__,
args=" ".join(sys.argv[1:]),
)
.split(),
env=env,
check=True,
) )
.split(),
env=env,
check=True,
)
sys.exit(process.returncode) sys.exit(process.returncode)
elif command in COMMAND_MAP: elif command in COMMAND_MAP:
COMMAND_MAP[command]() COMMAND_MAP[command]()
......
...@@ -21,6 +21,7 @@ from typing import TYPE_CHECKING, Any, Literal, Optional ...@@ -21,6 +21,7 @@ from typing import TYPE_CHECKING, Any, Literal, Optional
import numpy as np import numpy as np
import torch import torch
import torch.nn.functional as F import torch.nn.functional as F
from peft import PeftModel
from transformers import DataCollatorForSeq2Seq from transformers import DataCollatorForSeq2Seq
from ..extras.constants import AUDIO_PLACEHOLDER, IGNORE_INDEX, IMAGE_PLACEHOLDER from ..extras.constants import AUDIO_PLACEHOLDER, IGNORE_INDEX, IMAGE_PLACEHOLDER
...@@ -94,6 +95,16 @@ class MultiModalDataCollatorForSeq2Seq(DataCollatorForSeq2Seq): ...@@ -94,6 +95,16 @@ class MultiModalDataCollatorForSeq2Seq(DataCollatorForSeq2Seq):
if self.template is None: if self.template is None:
raise ValueError("Template is required for MultiModalDataCollator.") raise ValueError("Template is required for MultiModalDataCollator.")
if isinstance(self.model, PeftModel):
self.model = self.model.base_model.model
if self.model is not None and hasattr(self.model, "get_rope_index"): # for qwen2vl mrope
self.get_rope_func = self.model.get_rope_index # transformers < 4.52.0 or qwen2.5 omni
elif self.model is not None and hasattr(self.model, "model") and hasattr(self.model.model, "get_rope_index"):
self.get_rope_func = self.model.model.get_rope_index # transformers >= 4.52.0
else:
self.get_rope_func = None
def __call__(self, features: list[dict[str, Any]]) -> dict[str, "torch.Tensor"]: def __call__(self, features: list[dict[str, Any]]) -> dict[str, "torch.Tensor"]:
batch_images, batch_videos, batch_audios = [], [], [] batch_images, batch_videos, batch_audios = [], [], []
batch_imglens, batch_vidlens, batch_audlens, batch_input_ids = [], [], [], [] batch_imglens, batch_vidlens, batch_audlens, batch_input_ids = [], [], [], []
...@@ -171,7 +182,7 @@ class MultiModalDataCollatorForSeq2Seq(DataCollatorForSeq2Seq): ...@@ -171,7 +182,7 @@ class MultiModalDataCollatorForSeq2Seq(DataCollatorForSeq2Seq):
features: dict[str, torch.Tensor] = super().__call__(features) features: dict[str, torch.Tensor] = super().__call__(features)
if self.model is not None and hasattr(self.model, "get_rope_index"): # for qwen2vl mrope if self.get_rope_func is not None:
rope_index_kwargs = { rope_index_kwargs = {
"input_ids": features["input_ids"], "input_ids": features["input_ids"],
"image_grid_thw": mm_inputs.get("image_grid_thw"), "image_grid_thw": mm_inputs.get("image_grid_thw"),
...@@ -180,27 +191,29 @@ class MultiModalDataCollatorForSeq2Seq(DataCollatorForSeq2Seq): ...@@ -180,27 +191,29 @@ class MultiModalDataCollatorForSeq2Seq(DataCollatorForSeq2Seq):
} }
if "second_per_grid_ts" in mm_inputs: # for qwen2vl if "second_per_grid_ts" in mm_inputs: # for qwen2vl
rope_index_kwargs["second_per_grid_ts"] = mm_inputs.get("second_per_grid_ts") rope_index_kwargs["second_per_grid_ts"] = mm_inputs.get("second_per_grid_ts")
if "video_second_per_grid" in mm_inputs: # for qwen2omni elif "video_second_per_grid" in mm_inputs: # for qwen2.5 omni
rope_index_kwargs["second_per_grids"] = mm_inputs.get("video_second_per_grid") rope_index_kwargs["second_per_grids"] = mm_inputs.get("video_second_per_grid")
if getattr(self.model.config, "model_type", None) == "qwen2_5_omni_thinker": # for qwen2omni if getattr(self.model.config, "model_type", None) == "qwen2_5_omni_thinker": # for qwen2.5 omni
rope_index_kwargs["use_audio_in_video"] = getattr(self.processor, "use_audio_in_video", False) rope_index_kwargs["use_audio_in_video"] = getattr(self.processor, "use_audio_in_video", False)
feature_attention_mask = mm_inputs.get("feature_attention_mask", None) feature_attention_mask = mm_inputs.get("feature_attention_mask", None)
if feature_attention_mask is not None: if feature_attention_mask is not None: # FIXME: need to get video image lengths
audio_feature_lengths = torch.sum( audio_feature_lengths = torch.sum(feature_attention_mask, dim=1)
feature_attention_mask, dim=1
) # FIXME need to get video image lengths
rope_index_kwargs["audio_seqlens"] = audio_feature_lengths # prepare for input rope_index_kwargs["audio_seqlens"] = audio_feature_lengths # prepare for input
delta0 = (1 - rope_index_kwargs["attention_mask"]).sum(dim=-1).unsqueeze(1) features["position_ids"], rope_deltas = self.get_rope_func(**rope_index_kwargs)
# avoid conflict features["rope_deltas"] = rope_deltas - (1 - rope_index_kwargs["attention_mask"]).sum(
new_position_ids, rope_deltas = self.model.get_rope_index(**rope_index_kwargs) dim=-1
features["position_ids"], features["rope_deltas"] = ( ).unsqueeze(-1)
new_position_ids.clone(),
rope_deltas - delta0,
) # avoid inplace operation FIXME
else: # for qwen2vl else: # for qwen2vl
features["position_ids"], features["rope_deltas"] = self.model.get_rope_index(**rope_index_kwargs) features["position_ids"], features["rope_deltas"] = self.get_rope_func(**rope_index_kwargs)
if (
self.model is not None
and getattr(self.model.config, "model_type", None) in ["qwen2_vl", "qwen2_5_vl", "qwen2_5_omni_thinker"]
and ("position_ids" not in features or features["position_ids"].dim() != 3)
):
raise ValueError("Qwen2-VL/Qwen2.5-Omni model requires 3D position ids for mrope.")
if "cross_attention_mask" in mm_inputs: # for mllama inputs when pad_to_multiple_of is enabled if "cross_attention_mask" in mm_inputs: # for mllama inputs when pad_to_multiple_of is enabled
cross_attention_mask = mm_inputs.pop("cross_attention_mask") cross_attention_mask = mm_inputs.pop("cross_attention_mask")
......
...@@ -1274,9 +1274,10 @@ class PixtralPlugin(BasePlugin): ...@@ -1274,9 +1274,10 @@ class PixtralPlugin(BasePlugin):
content = message["content"] content = message["content"]
while IMAGE_PLACEHOLDER in content: while IMAGE_PLACEHOLDER in content:
if self.expand_mm_tokens: if self.expand_mm_tokens:
patch_size = processor.patch_size * getattr(processor, "spatial_merge_size", 1)
height, width = next(image_sizes) height, width = next(image_sizes)
num_height_tokens = height // processor.patch_size num_height_tokens = height // patch_size
num_width_tokens = width // processor.patch_size num_width_tokens = width // patch_size
replace_tokens = [[self.image_token] * num_width_tokens + [image_break_token]] * num_height_tokens replace_tokens = [[self.image_token] * num_width_tokens + [image_break_token]] * num_height_tokens
replace_tokens = [item for sublist in replace_tokens for item in sublist] # flatten list replace_tokens = [item for sublist in replace_tokens for item in sublist] # flatten list
replace_tokens[-1] = image_end_token replace_tokens[-1] = image_end_token
......
...@@ -501,7 +501,11 @@ def register_template( ...@@ -501,7 +501,11 @@ def register_template(
default_slots = ["{{content}}"] if efficient_eos else ["{{content}}", {"eos_token"}] default_slots = ["{{content}}"] if efficient_eos else ["{{content}}", {"eos_token"}]
default_user_formatter = StringFormatter(slots=["{{content}}"]) default_user_formatter = StringFormatter(slots=["{{content}}"])
default_assistant_formatter = StringFormatter(slots=default_slots) default_assistant_formatter = StringFormatter(slots=default_slots)
default_function_formatter = FunctionFormatter(slots=default_slots, tool_format="default") if format_assistant is not None:
default_function_formatter = FunctionFormatter(slots=format_assistant.slots, tool_format="default")
else:
default_function_formatter = FunctionFormatter(slots=default_slots, tool_format="default")
default_tool_formatter = ToolFormatter(tool_format="default") default_tool_formatter = ToolFormatter(tool_format="default")
default_prefix_formatter = EmptyFormatter() default_prefix_formatter = EmptyFormatter()
TEMPLATES[name] = template_class( TEMPLATES[name] = template_class(
...@@ -798,6 +802,19 @@ register_template( ...@@ -798,6 +802,19 @@ register_template(
format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]), format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]),
format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]), format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]),
format_observation=StringFormatter(slots=["<|im_start|>tool\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
format_prefix=EmptyFormatter(slots=[{"bos_token"}]),
stop_words=["<|im_end|>"],
)
# copied from chatml template
register_template(
name="cpm4",
format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]),
format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]),
format_observation=StringFormatter(slots=["<|im_start|>tool\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
format_prefix=EmptyFormatter(slots=[{"bos_token"}]), format_prefix=EmptyFormatter(slots=[{"bos_token"}]),
stop_words=["<|im_end|>"], stop_words=["<|im_end|>"],
) )
...@@ -880,7 +897,6 @@ register_template( ...@@ -880,7 +897,6 @@ register_template(
register_template( register_template(
name="empty", name="empty",
format_assistant=StringFormatter(slots=["{{content}}"]), format_assistant=StringFormatter(slots=["{{content}}"]),
replace_jinja_template=True,
) )
...@@ -1434,6 +1450,7 @@ register_template( ...@@ -1434,6 +1450,7 @@ register_template(
format_observation=StringFormatter(slots=["""[TOOL_RESULTS]{"content": {{content}}}[/TOOL_RESULTS]"""]), format_observation=StringFormatter(slots=["""[TOOL_RESULTS]{"content": {{content}}}[/TOOL_RESULTS]"""]),
format_tools=ToolFormatter(tool_format="mistral"), format_tools=ToolFormatter(tool_format="mistral"),
format_prefix=EmptyFormatter(slots=[{"bos_token"}]), format_prefix=EmptyFormatter(slots=[{"bos_token"}]),
mm_plugin=get_mm_plugin(name="pixtral", image_token="[IMG]"),
) )
......
...@@ -513,7 +513,7 @@ register_model_group( ...@@ -513,7 +513,7 @@ register_model_group(
register_model_group( register_model_group(
models={ models={
"DeepSeek-V2-236B-0628-Chat": { "DeepSeek-V2-0628-236B-Chat": {
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2-Chat-0628", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2-Chat-0628",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2-Chat-0628", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2-Chat-0628",
}, },
...@@ -521,7 +521,7 @@ register_model_group( ...@@ -521,7 +521,7 @@ register_model_group(
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2.5", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2.5",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2.5", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2.5",
}, },
"DeepSeek-V2.5-236B-1210-Chat": { "DeepSeek-V2.5-1210-236B-Chat": {
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2.5-1210", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2.5-1210",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2.5-1210", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2.5-1210",
}, },
...@@ -533,7 +533,7 @@ register_model_group( ...@@ -533,7 +533,7 @@ register_model_group(
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V3", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V3",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V3", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V3",
}, },
"DeepSeek-V3-671B-0324-Chat": { "DeepSeek-V3-0324-671B-Chat": {
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V3-0324", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V3-0324",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V3-0324", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V3-0324",
}, },
...@@ -556,10 +556,6 @@ register_model_group( ...@@ -556,10 +556,6 @@ register_model_group(
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
}, },
"DeepSeek-R1-8B-0528-Distill": {
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
},
"DeepSeek-R1-14B-Distill": { "DeepSeek-R1-14B-Distill": {
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
...@@ -580,7 +576,11 @@ register_model_group( ...@@ -580,7 +576,11 @@ register_model_group(
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1",
}, },
"DeepSeek-R1-671B-0528-Chat": { "DeepSeek-R1-0528-8B-Distill": {
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
},
"DeepSeek-R1-0528-671B-Chat": {
DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-0528", DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-0528",
DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-0528", DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-0528",
}, },
...@@ -756,15 +756,15 @@ register_model_group( ...@@ -756,15 +756,15 @@ register_model_group(
DownloadSource.DEFAULT: "THUDM/glm-4-9b-chat-1m", DownloadSource.DEFAULT: "THUDM/glm-4-9b-chat-1m",
DownloadSource.MODELSCOPE: "ZhipuAI/glm-4-9b-chat-1m", DownloadSource.MODELSCOPE: "ZhipuAI/glm-4-9b-chat-1m",
}, },
"GLM-4-9B-0414-Chat": { "GLM-4-0414-9B-Chat": {
DownloadSource.DEFAULT: "THUDM/GLM-4-9B-0414", DownloadSource.DEFAULT: "THUDM/GLM-4-9B-0414",
DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-9B-0414", DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-9B-0414",
}, },
"GLM-4-32B-0414": { "GLM-4-0414-32B-Base": {
DownloadSource.DEFAULT: "THUDM/GLM-4-32B-Base-0414", DownloadSource.DEFAULT: "THUDM/GLM-4-32B-Base-0414",
DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-32B-Base-0414", DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-32B-Base-0414",
}, },
"GLM-4-32B-0414-Chat": { "GLM-4-0414-32B-Chat": {
DownloadSource.DEFAULT: "THUDM/GLM-4-32B-0414", DownloadSource.DEFAULT: "THUDM/GLM-4-32B-0414",
DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-32B-0414", DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-32B-0414",
}, },
...@@ -775,11 +775,11 @@ register_model_group( ...@@ -775,11 +775,11 @@ register_model_group(
register_model_group( register_model_group(
models={ models={
"GLM-Z1-9B-0414-Chat": { "GLM-Z1-0414-9B-Chat": {
DownloadSource.DEFAULT: "THUDM/GLM-Z1-9B-0414", DownloadSource.DEFAULT: "THUDM/GLM-Z1-9B-0414",
DownloadSource.MODELSCOPE: "ZhipuAI/GLM-Z1-9B-0414", DownloadSource.MODELSCOPE: "ZhipuAI/GLM-Z1-9B-0414",
}, },
"GLM-Z1-32B-0414-Chat": { "GLM-Z1-0414-32B-Chat": {
DownloadSource.DEFAULT: "THUDM/GLM-Z1-32B-0414", DownloadSource.DEFAULT: "THUDM/GLM-Z1-32B-0414",
DownloadSource.MODELSCOPE: "ZhipuAI/GLM-Z1-32B-0414", DownloadSource.MODELSCOPE: "ZhipuAI/GLM-Z1-32B-0414",
}, },
...@@ -1503,6 +1503,21 @@ register_model_group( ...@@ -1503,6 +1503,21 @@ register_model_group(
) )
register_model_group(
models={
"MiniCPM4-0.5B-Chat": {
DownloadSource.DEFAULT: "openbmb/MiniCPM4-0.5B",
DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM4-0.5B",
},
"MiniCPM4-8B-Chat": {
DownloadSource.DEFAULT: "openbmb/MiniCPM4-8B",
DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM4-8B",
},
},
template="cpm4",
)
register_model_group( register_model_group(
models={ models={
"MiniCPM-o-2_6": { "MiniCPM-o-2_6": {
...@@ -1592,6 +1607,22 @@ register_model_group( ...@@ -1592,6 +1607,22 @@ register_model_group(
) )
register_model_group(
models={
"Mistral-Small-3.1-24B-Base": {
DownloadSource.DEFAULT: "mistralai/Mistral-Small-3.1-24B-Base-2503",
DownloadSource.MODELSCOPE: "mistralai/Mistral-Small-3.1-24B-Base-2503",
},
"Mistral-Small-3.1-24B-Instruct": {
DownloadSource.DEFAULT: "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
DownloadSource.MODELSCOPE: "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
},
},
template="mistral_small",
multimodal=True,
)
register_model_group( register_model_group(
models={ models={
"Mixtral-8x7B-v0.1": { "Mixtral-8x7B-v0.1": {
......
...@@ -27,7 +27,7 @@ import trl ...@@ -27,7 +27,7 @@ import trl
from transformers.utils import is_torch_cuda_available, is_torch_npu_available from transformers.utils import is_torch_cuda_available, is_torch_npu_available
VERSION = "0.9.3.dev0" VERSION = "0.9.4.dev0"
def print_env() -> None: def print_env() -> None:
......
...@@ -202,6 +202,15 @@ class RLHFArguments: ...@@ -202,6 +202,15 @@ class RLHFArguments:
default="lora", default="lora",
metadata={"help": "The type of the reward model in PPO training. Lora model only supports lora training."}, metadata={"help": "The type of the reward model in PPO training. Lora model only supports lora training."},
) )
ld_alpha: Optional[float] = field(
default=None,
metadata={
"help": (
"Alpha parameter from the LD-DPO paper, which controls the weighting of"
" the verbose token log-probabilities in responses."
)
},
)
@dataclass @dataclass
......
...@@ -148,7 +148,7 @@ def _check_extra_dependencies( ...@@ -148,7 +148,7 @@ def _check_extra_dependencies(
check_version("mixture-of-depth>=1.1.6", mandatory=True) check_version("mixture-of-depth>=1.1.6", mandatory=True)
if model_args.infer_backend == EngineName.VLLM: if model_args.infer_backend == EngineName.VLLM:
check_version("vllm>=0.4.3,<=0.8.6") check_version("vllm>=0.4.3,<=0.9.1")
check_version("vllm", mandatory=True) check_version("vllm", mandatory=True)
elif model_args.infer_backend == EngineName.SGLANG: elif model_args.infer_backend == EngineName.SGLANG:
check_version("sglang>=0.4.5") check_version("sglang>=0.4.5")
...@@ -169,10 +169,15 @@ def _check_extra_dependencies( ...@@ -169,10 +169,15 @@ def _check_extra_dependencies(
if finetuning_args.plot_loss: if finetuning_args.plot_loss:
check_version("matplotlib", mandatory=True) check_version("matplotlib", mandatory=True)
if training_args is not None and training_args.predict_with_generate: if training_args is not None:
check_version("jieba", mandatory=True) if training_args.deepspeed:
check_version("nltk", mandatory=True) # pin deepspeed version < 0.17 because of https://github.com/deepspeedai/DeepSpeed/issues/7347
check_version("rouge_chinese", mandatory=True) check_version("deepspeed>=0.10.0,<=0.16.9", mandatory=True)
if training_args.predict_with_generate:
check_version("jieba", mandatory=True)
check_version("nltk", mandatory=True)
check_version("rouge_chinese", mandatory=True)
def _parse_train_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _TRAIN_CLS: def _parse_train_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _TRAIN_CLS:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment