Commit 84987715 authored by chenych's avatar chenych
Browse files

update to v0.9.2

parent 317a82e2
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
[![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors) [![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors)
[![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml) [![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml)
[![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/) [![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/)
[![Citation](https://img.shields.io/badge/citation-319-green)](https://scholar.google.com/scholar?cites=12620864006390196564) [![Citation](https://img.shields.io/badge/citation-349-green)](https://scholar.google.com/scholar?cites=12620864006390196564)
[![GitHub pull request](https://img.shields.io/badge/PRs-welcome-blue)](https://github.com/hiyouga/LLaMA-Factory/pulls) [![GitHub pull request](https://img.shields.io/badge/PRs-welcome-blue)](https://github.com/hiyouga/LLaMA-Factory/pulls)
[![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai) [![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai)
...@@ -37,10 +37,10 @@ https://github.com/user-attachments/assets/7c96b465-9df7-45f4-8053-bf03e58386d3 ...@@ -37,10 +37,10 @@ https://github.com/user-attachments/assets/7c96b465-9df7-45f4-8053-bf03e58386d3
Choose your path: Choose your path:
- **Documentation (WIP)**: https://llamafactory.readthedocs.io/zh-cn/latest/ - **Documentation**: https://llamafactory.readthedocs.io/en/latest/
- **Colab**: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing - **Colab (free)**: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing
- **Local machine**: Please refer to [usage](#getting-started) - **Local machine**: Please refer to [usage](#getting-started)
- **PAI-DSW**: [Llama3 Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) | [Qwen2-VL Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) | [DeepSeek-R1-Distill Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b) - **PAI-DSW (free trial)**: [Llama3 Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) | [Qwen2-VL Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) | [DeepSeek-R1-Distill Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b)
- **Amazon SageMaker**: [Blog](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/) - **Amazon SageMaker**: [Blog](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)
> [!NOTE] > [!NOTE]
...@@ -403,7 +403,7 @@ huggingface-cli login ...@@ -403,7 +403,7 @@ huggingface-cli login
| Optional | Minimum | Recommend | | Optional | Minimum | Recommend |
| ------------ | ------- | --------- | | ------------ | ------- | --------- |
| CUDA | 11.6 | 12.2 | | CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.16.2 | | deepspeed | 0.10.0 | 0.16.4 |
| bitsandbytes | 0.39.0 | 0.43.1 | | bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.7.3 | | vllm | 0.4.3 | 0.7.3 |
| flash-attn | 2.3.0 | 2.7.2 | | flash-attn | 2.3.0 | 2.7.2 |
...@@ -412,15 +412,14 @@ huggingface-cli login ...@@ -412,15 +412,14 @@ huggingface-cli login
\* *estimated* \* *estimated*
| Method | Bits | 7B | 13B | 30B | 70B | 110B | 8x7B | 8x22B | | Method | Bits | 7B | 14B | 30B | 70B | `x`B |
| ------------------------ | ---- | ----- | ----- | ----- | ------ | ------ | ----- | ------ | | ------------------------------- | ---- | ----- | ----- | ----- | ------ | ------- |
| Full | 32 | 120GB | 240GB | 600GB | 1200GB | 2000GB | 900GB | 2400GB | | Full (`bf16` or `fp16`) | 32 | 120GB | 240GB | 600GB | 1200GB | `18x`GB |
| Full | 16 | 60GB | 120GB | 300GB | 600GB | 900GB | 400GB | 1200GB | | Full (`pure_bf16`) | 16 | 60GB | 120GB | 300GB | 600GB | `8x`GB |
| Freeze | 16 | 20GB | 40GB | 80GB | 200GB | 360GB | 160GB | 400GB | | Freeze/LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | `2x`GB |
| LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 240GB | 120GB | 320GB | | QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | `x`GB |
| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | 140GB | 60GB | 160GB | | QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | `x/2`GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 72GB | 30GB | 96GB | | QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | `x/4`GB |
| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | 48GB | 18GB | 48GB |
## Getting Started ## Getting Started
...@@ -490,12 +489,12 @@ bash Ascend-cann-kernels-910b_8.0.0.alpha002_linux-"$(uname -i)".run --install ...@@ -490,12 +489,12 @@ bash Ascend-cann-kernels-910b_8.0.0.alpha002_linux-"$(uname -i)".run --install
source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/ascend-toolkit/set_env.sh
``` ```
| Requirement | Minimum | Recommend | | Requirement | Minimum | Recommend |
| ------------ | ------- | ----------- | | ------------ | ------- | -------------- |
| CANN | 8.0.RC1 | 8.0.0.alpha002 | | CANN | 8.0.RC1 | 8.0.0.alpha002 |
| torch | 2.1.0 | 2.4.0 | | torch | 2.1.0 | 2.4.0 |
| torch-npu | 2.1.0 | 2.4.0.post2 | | torch-npu | 2.1.0 | 2.4.0.post2 |
| deepspeed | 0.13.2 | 0.16.2 | | deepspeed | 0.13.2 | 0.13.2 |
Remember to use `ASCEND_RT_VISIBLE_DEVICES` instead of `CUDA_VISIBLE_DEVICES` to specify the device to use. Remember to use `ASCEND_RT_VISIBLE_DEVICES` instead of `CUDA_VISIBLE_DEVICES` to specify the device to use.
...@@ -560,6 +559,8 @@ See [examples/README.md](examples/README.md) for advanced usage (including distr ...@@ -560,6 +559,8 @@ See [examples/README.md](examples/README.md) for advanced usage (including distr
> [!TIP] > [!TIP]
> Use `llamafactory-cli help` to show help information. > Use `llamafactory-cli help` to show help information.
>
> Read [FAQs](https://github.com/hiyouga/LLaMA-Factory/issues/4614) first if you encounter any problems.
### Fine-Tuning with LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio)) ### Fine-Tuning with LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio))
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
[![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors) [![GitHub contributors](https://img.shields.io/github/contributors/hiyouga/LLaMA-Factory?color=orange)](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors)
[![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml) [![GitHub workflow](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml/badge.svg)](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml)
[![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/) [![PyPI](https://img.shields.io/pypi/v/llamafactory)](https://pypi.org/project/llamafactory/)
[![Citation](https://img.shields.io/badge/citation-319-green)](https://scholar.google.com/scholar?cites=12620864006390196564) [![Citation](https://img.shields.io/badge/citation-349-green)](https://scholar.google.com/scholar?cites=12620864006390196564)
[![GitHub pull request](https://img.shields.io/badge/PRs-welcome-blue)](https://github.com/hiyouga/LLaMA-Factory/pulls) [![GitHub pull request](https://img.shields.io/badge/PRs-welcome-blue)](https://github.com/hiyouga/LLaMA-Factory/pulls)
[![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai) [![Twitter](https://img.shields.io/twitter/follow/llamafactory_ai)](https://twitter.com/llamafactory_ai)
...@@ -40,9 +40,9 @@ https://github.com/user-attachments/assets/e6ce34b0-52d5-4f3e-a830-592106c4c272 ...@@ -40,9 +40,9 @@ https://github.com/user-attachments/assets/e6ce34b0-52d5-4f3e-a830-592106c4c272
- **入门教程**:https://zhuanlan.zhihu.com/p/695287607 - **入门教程**:https://zhuanlan.zhihu.com/p/695287607
- **框架文档**:https://llamafactory.readthedocs.io/zh-cn/latest/ - **框架文档**:https://llamafactory.readthedocs.io/zh-cn/latest/
- **Colab**:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing - **Colab(免费)**:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing
- **本地机器**:请见[如何使用](#如何使用) - **本地机器**:请见[如何使用](#如何使用)
- **PAI-DSW**[Llama3 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) | [Qwen2-VL 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) | [DeepSeek-R1-Distill 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b) - **PAI-DSW(免费试用)**[Llama3 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) | [Qwen2-VL 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) | [DeepSeek-R1-Distill 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b)
- **Amazon SageMaker**[博客](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/) - **Amazon SageMaker**[博客](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)
> [!NOTE] > [!NOTE]
...@@ -405,7 +405,7 @@ huggingface-cli login ...@@ -405,7 +405,7 @@ huggingface-cli login
| 可选项 | 至少 | 推荐 | | 可选项 | 至少 | 推荐 |
| ------------ | ------- | --------- | | ------------ | ------- | --------- |
| CUDA | 11.6 | 12.2 | | CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.16.2 | | deepspeed | 0.10.0 | 0.16.4 |
| bitsandbytes | 0.39.0 | 0.43.1 | | bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.7.3 | | vllm | 0.4.3 | 0.7.3 |
| flash-attn | 2.3.0 | 2.7.2 | | flash-attn | 2.3.0 | 2.7.2 |
...@@ -414,15 +414,14 @@ huggingface-cli login ...@@ -414,15 +414,14 @@ huggingface-cli login
\* *估算值* \* *估算值*
| 方法 | 精度 | 7B | 13B | 30B | 70B | 110B | 8x7B | 8x22B | | 方法 | 精度 | 7B | 14B | 30B | 70B | `x`B |
| ------------------------ | ---- | ----- | ----- | ----- | ------ | ------ | ----- | ------ | | ------------------------------- | ---- | ----- | ----- | ----- | ------ | ------- |
| Full | 32 | 120GB | 240GB | 600GB | 1200GB | 2000GB | 900GB | 2400GB | | Full (`bf16` or `fp16`) | 32 | 120GB | 240GB | 600GB | 1200GB | `18x`GB |
| Full | 16 | 60GB | 120GB | 300GB | 600GB | 900GB | 400GB | 1200GB | | Full (`pure_bf16`) | 16 | 60GB | 120GB | 300GB | 600GB | `8x`GB |
| Freeze | 16 | 20GB | 40GB | 80GB | 200GB | 360GB | 160GB | 400GB | | Freeze/LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | `2x`GB |
| LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 240GB | 120GB | 320GB | | QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | `x`GB |
| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | 140GB | 60GB | 160GB | | QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | `x/2`GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 72GB | 30GB | 96GB | | QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | `x/4`GB |
| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | 48GB | 18GB | 48GB |
## 如何使用 ## 如何使用
...@@ -493,12 +492,12 @@ bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install ...@@ -493,12 +492,12 @@ bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install
source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/ascend-toolkit/set_env.sh
``` ```
| 依赖项 | 至少 | 推荐 | | 依赖项 | 至少 | 推荐 |
| ------------ | ------- | ----------- | | ------------ | ------- | -------------- |
| CANN | 8.0.RC1 | 8.0.RC1 | | CANN | 8.0.RC1 | 8.0.0.alpha002 |
| torch | 2.1.0 | 2.1.0 | | torch | 2.1.0 | 2.4.0 |
| torch-npu | 2.1.0 | 2.1.0.post3 | | torch-npu | 2.1.0 | 2.4.0.post2 |
| deepspeed | 0.13.2 | 0.13.2 | | deepspeed | 0.13.2 | 0.13.2 |
请使用 `ASCEND_RT_VISIBLE_DEVICES` 而非 `CUDA_VISIBLE_DEVICES` 来指定运算设备。 请使用 `ASCEND_RT_VISIBLE_DEVICES` 而非 `CUDA_VISIBLE_DEVICES` 来指定运算设备。
...@@ -563,6 +562,8 @@ llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml ...@@ -563,6 +562,8 @@ llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
> [!TIP] > [!TIP]
> 使用 `llamafactory-cli help` 显示帮助信息。 > 使用 `llamafactory-cli help` 显示帮助信息。
>
> 遇到报错请先看[常见问题](https://github.com/hiyouga/LLaMA-Factory/issues/4614)。
### LLaMA Board 可视化微调(由 [Gradio](https://github.com/gradio-app/gradio) 驱动) ### LLaMA Board 可视化微调(由 [Gradio](https://github.com/gradio-app/gradio) 驱动)
......
assets/wechat.jpg

167 KB | W: | H:

assets/wechat.jpg

164 KB | W: | H:

assets/wechat.jpg
assets/wechat.jpg
assets/wechat.jpg
assets/wechat.jpg
  • 2-up
  • Swipe
  • Onion skin
assets/wechat_npu.jpg

167 KB | W: | H:

assets/wechat_npu.jpg

167 KB | W: | H:

assets/wechat_npu.jpg
assets/wechat_npu.jpg
assets/wechat_npu.jpg
assets/wechat_npu.jpg
  • 2-up
  • Swipe
  • Onion skin
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
"role": "assistant" "role": "assistant"
}, },
{ {
"content": "What are they doing?", "content": "What are they doing?<image>",
"role": "user" "role": "user"
}, },
{ {
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
} }
], ],
"images": [ "images": [
"mllm_demo_data/1.jpg",
"mllm_demo_data/1.jpg" "mllm_demo_data/1.jpg"
] ]
}, },
...@@ -79,7 +80,7 @@ ...@@ -79,7 +80,7 @@
"role": "assistant" "role": "assistant"
}, },
{ {
"content": "他们在做什么?", "content": "他们在做什么?<image>",
"role": "user" "role": "user"
}, },
{ {
...@@ -88,6 +89,7 @@ ...@@ -88,6 +89,7 @@
} }
], ],
"images": [ "images": [
"mllm_demo_data/1.jpg",
"mllm_demo_data/1.jpg" "mllm_demo_data/1.jpg"
] ]
}, },
......
# Default use the NVIDIA official image with PyTorch 2.3.0
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.02-py3
FROM ${BASE_IMAGE}
# Define environments
ENV MAX_JOBS=4
ENV FLASH_ATTENTION_FORCE_BUILD=TRUE
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
# Define installation arguments
ARG INSTALL_BNB=false
ARG INSTALL_VLLM=false
ARG INSTALL_DEEPSPEED=false
ARG INSTALL_FLASHATTN=false
ARG INSTALL_LIGER_KERNEL=false
ARG INSTALL_HQQ=false
ARG INSTALL_EETQ=false
ARG PIP_INDEX=https://pypi.org/simple
ARG HTTP_PROXY=
# Set the working directory
WORKDIR /app
# Set http proxy
RUN if [ -n "$HTTP_PROXY" ]; then \
echo "Configuring proxy..."; \
export http_proxy=$HTTP_PROXY; \
export https_proxy=$HTTP_PROXY; \
fi
# Install the requirements
COPY requirements.txt /app
RUN pip config set global.index-url "$PIP_INDEX" && \
pip config set global.extra-index-url "$PIP_INDEX" && \
python -m pip install --upgrade pip && \
if [ -n "$HTTP_PROXY" ]; then \
python -m pip install --proxy=$HTTP_PROXY -r requirements.txt; \
else \
python -m pip install -r requirements.txt; \
fi
# Copy the rest of the application into the image
COPY . /app
# Install the LLaMA Factory
RUN EXTRA_PACKAGES="metrics"; \
if [ "$INSTALL_BNB" == "true" ]; then \
EXTRA_PACKAGES="${EXTRA_PACKAGES},bitsandbytes"; \
fi; \
if [ "$INSTALL_VLLM" == "true" ]; then \
EXTRA_PACKAGES="${EXTRA_PACKAGES},vllm"; \
fi; \
if [ "$INSTALL_DEEPSPEED" == "true" ]; then \
EXTRA_PACKAGES="${EXTRA_PACKAGES},deepspeed"; \
fi; \
if [ "$INSTALL_LIGER_KERNEL" == "true" ]; then \
EXTRA_PACKAGES="${EXTRA_PACKAGES},liger-kernel"; \
fi; \
if [ "$INSTALL_HQQ" == "true" ]; then \
EXTRA_PACKAGES="${EXTRA_PACKAGES},hqq"; \
fi; \
if [ "$INSTALL_EETQ" == "true" ]; then \
EXTRA_PACKAGES="${EXTRA_PACKAGES},eetq"; \
fi; \
if [ -n "$HTTP_PROXY" ]; then \
pip install --proxy=$HTTP_PROXY -e ".[$EXTRA_PACKAGES]"; \
else \
pip install -e ".[$EXTRA_PACKAGES]"; \
fi
# Rebuild flash attention
RUN pip uninstall -y transformer-engine flash-attn && \
if [ "$INSTALL_FLASHATTN" == "true" ]; then \
pip uninstall -y ninja && \
if [ -n "$HTTP_PROXY" ]; then \
pip install --proxy=$HTTP_PROXY ninja && \
pip install --proxy=$HTTP_PROXY --no-cache-dir flash-attn --no-build-isolation; \
else \
pip install ninja && \
pip install --no-cache-dir flash-attn --no-build-isolation; \
fi; \
fi
# Unset http proxy
RUN if [ -n "$HTTP_PROXY" ]; then \
unset http_proxy; \
unset https_proxy; \
fi
# Set up volumes
VOLUME [ "/root/.cache/huggingface", "/root/.cache/modelscope", "/app/data", "/app/output" ]
# Expose port 7860 for the LLaMA Board
ENV GRADIO_SERVER_PORT 7860
EXPOSE 7860
# Expose port 8000 for the API service
ENV API_PORT 8000
EXPOSE 8000
services:
llamafactory:
build:
dockerfile: ./docker/docker-cuda/Dockerfile
context: ../..
args:
INSTALL_BNB: false
INSTALL_VLLM: false
INSTALL_DEEPSPEED: false
INSTALL_FLASHATTN: false
INSTALL_LIGER_KERNEL: false
INSTALL_HQQ: false
INSTALL_EETQ: false
PIP_INDEX: https://pypi.org/simple
container_name: llamafactory
volumes:
- ../../hf_cache:/root/.cache/huggingface
- ../../ms_cache:/root/.cache/modelscope
- ../../om_cache:/root/.cache/openmind
- ../../data:/app/data
- ../../output:/app/output
ports:
- "7860:7860"
- "8000:8000"
ipc: host
tty: true
shm_size: '16gb'
stdin_open: true
command: bash
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: "all"
capabilities: [gpu]
restart: unless-stopped
# Use the Ubuntu 22.04 image with CANN 8.0.rc1
# More versions can be found at https://hub.docker.com/r/ascendai/cann/tags
# FROM ascendai/cann:8.0.rc1-910-ubuntu22.04-py3.8
FROM ascendai/cann:8.0.0-910b-ubuntu22.04-py3.10
# FROM ascendai/cann:8.0.rc1-910-openeuler22.03-py3.8
# FROM ascendai/cann:8.0.rc1-910b-openeuler22.03-py3.8
# Define environments
ENV DEBIAN_FRONTEND=noninteractive
# Define installation arguments
ARG INSTALL_DEEPSPEED=false
ARG PIP_INDEX=https://pypi.org/simple
ARG TORCH_INDEX=https://download.pytorch.org/whl/cpu
ARG HTTP_PROXY=
# Set the working directory
WORKDIR /app
# Set http proxy
RUN if [ -n "$HTTP_PROXY" ]; then \
echo "Configuring proxy..."; \
export http_proxy=$HTTP_PROXY; \
export https_proxy=$HTTP_PROXY; \
fi
# Install the requirements
COPY requirements.txt /app
RUN pip config set global.index-url "$PIP_INDEX" && \
pip config set global.extra-index-url "$TORCH_INDEX" && \
python -m pip install --upgrade pip && \
if [ -n "$HTTP_PROXY" ]; then \
python -m pip install --proxy=$HTTP_PROXY -r requirements.txt; \
else \
python -m pip install -r requirements.txt; \
fi
# Copy the rest of the application into the image
COPY . /app
# Install the LLaMA Factory
RUN EXTRA_PACKAGES="torch-npu,metrics"; \
if [ "$INSTALL_DEEPSPEED" == "true" ]; then \
EXTRA_PACKAGES="${EXTRA_PACKAGES},deepspeed"; \
fi; \
if [ -n "$HTTP_PROXY" ]; then \
pip install --proxy=$HTTP_PROXY -e ".[$EXTRA_PACKAGES]"; \
else \
pip install -e ".[$EXTRA_PACKAGES]"; \
fi
# Unset http proxy
RUN if [ -n "$HTTP_PROXY" ]; then \
unset http_proxy; \
unset https_proxy; \
fi
# Set up volumes
VOLUME [ "/root/.cache/huggingface", "/root/.cache/modelscope", "/app/data", "/app/output" ]
# Expose port 7860 for the LLaMA Board
ENV GRADIO_SERVER_PORT 7860
EXPOSE 7860
# Expose port 8000 for the API service
ENV API_PORT 8000
EXPOSE 8000
services:
llamafactory:
build:
dockerfile: ./docker/docker-npu/Dockerfile
context: ../..
args:
INSTALL_DEEPSPEED: "false"
PIP_INDEX: https://pypi.org/simple
container_name: llamafactory
volumes:
- ../../hf_cache:/root/.cache/huggingface
- ../../ms_cache:/root/.cache/modelscope
- ../../om_cache:/root/.cache/openmind
- ../../data:/app/data
- ../../output:/app/output
- /usr/local/dcmi:/usr/local/dcmi
- /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
- /usr/local/Ascend/driver:/usr/local/Ascend/driver
- /etc/ascend_install.info:/etc/ascend_install.info
ports:
- "7860:7860"
- "8000:8000"
ipc: host
tty: true
shm_size: '16gb'
stdin_open: true
command: bash
devices:
- /dev/davinci0
- /dev/davinci_manager
- /dev/devmm_svm
- /dev/hisi_hdc
restart: unless-stopped
...@@ -4,12 +4,12 @@ services: ...@@ -4,12 +4,12 @@ services:
dockerfile: ./docker/docker-rocm/Dockerfile dockerfile: ./docker/docker-rocm/Dockerfile
context: ../.. context: ../..
args: args:
INSTALL_BNB: false INSTALL_BNB: "false"
INSTALL_VLLM: false INSTALL_VLLM: "false"
INSTALL_DEEPSPEED: false INSTALL_DEEPSPEED: "false"
INSTALL_FLASHATTN: false INSTALL_FLASHATTN: "false"
INSTALL_LIGER_KERNEL: false INSTALL_LIGER_KERNEL: "false"
INSTALL_HQQ: false INSTALL_HQQ: "false"
PIP_INDEX: https://pypi.org/simple PIP_INDEX: https://pypi.org/simple
container_name: llamafactory container_name: llamafactory
volumes: volumes:
...@@ -24,7 +24,7 @@ services: ...@@ -24,7 +24,7 @@ services:
- "8000:8000" - "8000:8000"
ipc: host ipc: host
tty: true tty: true
shm_size: '16gb' shm_size: "16gb"
stdin_open: true stdin_open: true
command: bash command: bash
devices: devices:
......
...@@ -14,7 +14,7 @@ fsdp_config: ...@@ -14,7 +14,7 @@ fsdp_config:
fsdp_use_orig_params: true fsdp_use_orig_params: true
machine_rank: 0 machine_rank: 0
main_training_function: main main_training_function: main
mixed_precision: fp16 # or bf16 mixed_precision: bf16 # or fp16
num_machines: 1 # the number of nodes num_machines: 1 # the number of nodes
num_processes: 2 # the number of GPUs in all nodes num_processes: 2 # the number of GPUs in all nodes
rdzv_backend: static rdzv_backend: static
......
...@@ -19,7 +19,7 @@ ...@@ -19,7 +19,7 @@
"stage": 0, "stage": 0,
"allgather_partitions": true, "allgather_partitions": true,
"allgather_bucket_size": 5e8, "allgather_bucket_size": 5e8,
"overlap_comm": true, "overlap_comm": false,
"reduce_scatter": true, "reduce_scatter": true,
"reduce_bucket_size": 5e8, "reduce_bucket_size": 5e8,
"contiguous_gradients": true, "contiguous_gradients": true,
......
...@@ -19,7 +19,7 @@ ...@@ -19,7 +19,7 @@
"stage": 2, "stage": 2,
"allgather_partitions": true, "allgather_partitions": true,
"allgather_bucket_size": 5e8, "allgather_bucket_size": 5e8,
"overlap_comm": true, "overlap_comm": false,
"reduce_scatter": true, "reduce_scatter": true,
"reduce_bucket_size": 5e8, "reduce_bucket_size": 5e8,
"contiguous_gradients": true, "contiguous_gradients": true,
......
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
}, },
"allgather_partitions": true, "allgather_partitions": true,
"allgather_bucket_size": 5e8, "allgather_bucket_size": 5e8,
"overlap_comm": true, "overlap_comm": false,
"reduce_scatter": true, "reduce_scatter": true,
"reduce_bucket_size": 5e8, "reduce_bucket_size": 5e8,
"contiguous_gradients": true, "contiguous_gradients": true,
......
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
}, },
"zero_optimization": { "zero_optimization": {
"stage": 3, "stage": 3,
"overlap_comm": true, "overlap_comm": false,
"contiguous_gradients": true, "contiguous_gradients": true,
"sub_group_size": 1e9, "sub_group_size": 1e9,
"reduce_bucket_size": "auto", "reduce_bucket_size": "auto",
......
...@@ -25,7 +25,7 @@ ...@@ -25,7 +25,7 @@
"device": "cpu", "device": "cpu",
"pin_memory": true "pin_memory": true
}, },
"overlap_comm": true, "overlap_comm": false,
"contiguous_gradients": true, "contiguous_gradients": true,
"sub_group_size": 1e9, "sub_group_size": 1e9,
"reduce_bucket_size": "auto", "reduce_bucket_size": "auto",
......
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
trust_remote_code: true
### method ### method
stage: sft stage: sft
do_train: true do_train: true
finetuning_type: full finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
...@@ -14,6 +15,7 @@ cutoff_len: 2048 ...@@ -14,6 +15,7 @@ cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/full/sft output_dir: saves/llama3-8b/full/sft
...@@ -21,6 +23,7 @@ logging_steps: 10 ...@@ -21,6 +23,7 @@ logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true
overwrite_output_dir: true overwrite_output_dir: true
save_only_model: false
### train ### train
per_device_train_batch_size: 1 per_device_train_batch_size: 1
...@@ -31,9 +34,11 @@ lr_scheduler_type: cosine ...@@ -31,9 +34,11 @@ lr_scheduler_type: cosine
warmup_ratio: 0.1 warmup_ratio: 0.1
bf16: true bf16: true
ddp_timeout: 180000000 ddp_timeout: 180000000
resume_from_checkpoint: null
### eval ### eval
val_size: 0.1 # eval_dataset: alpaca_en_demo
per_device_eval_batch_size: 1 # val_size: 0.1
eval_strategy: steps # per_device_eval_batch_size: 1
eval_steps: 500 # eval_strategy: steps
# eval_steps: 500
...@@ -10,7 +10,7 @@ do_train: true ...@@ -10,7 +10,7 @@ do_train: true
finetuning_type: full finetuning_type: full
freeze_vision_tower: true # choices: [true, false] freeze_vision_tower: true # choices: [true, false]
freeze_multi_modal_projector: true # choices: [true, false] freeze_multi_modal_projector: true # choices: [true, false]
train_mm_proj_only: false # choices: [true, false] freeze_language_model: false # choices: [true, false]
deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json] deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
### dataset ### dataset
......
### model
model_name_or_path: /data/luopl/Qwen/Qwen2.5-72B
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: q_proj,v_proj
deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: identity,alpaca_zh_demo,alpaca_en_demo
template: qwen
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 4
### output
output_dir: saves/qwen2.5_72b/lora/sft/
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 250
### model
model_name_or_path: /data/luopl/Qwen/Qwen2.5-72B
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: q_proj,v_proj
deepspeed: examples/deepspeed/ds_z3_offload_config.json
### dataset
dataset: identity,alpaca_zh_demo,alpaca_en_demo
template: qwen
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 4
### output
output_dir: saves/qwen2.5_72b/lora/sft/
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 250
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment