"vllm/vscode:/vscode.git/clone" did not exist on "82ad323deed0d4f5fbdb6592f14314ca5b1118ad"
Commit b6ca96e3 authored by shihm's avatar shihm
Browse files

updata readme

parent ca625f43
...@@ -20,21 +20,23 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭 ...@@ -20,21 +20,23 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭
| ----------------------------------------------------------------- | -------------------------------- | ------------------- | | ----------------------------------------------------------------- | -------------------------------- | ------------------- |
| [Baichuan 2](https://huggingface.co/baichuan-inc) | 7B/13B | baichuan2 | | [Baichuan 2](https://huggingface.co/baichuan-inc) | 7B/13B | baichuan2 |
| [ChatGLM3](https://huggingface.co/THUDM) | 6B | chatglm3 | | [ChatGLM3](https://huggingface.co/THUDM) | 6B | chatglm3 |
| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek | | [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 | | [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 | | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink |
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma | | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma |
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie_nothink | | [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie_nothink |
| [Gemma 3](https://huggingface.co/google) | 1B/4B/12B/27B | gemma3/gemma (1B) | | [Gemma 3](https://huggingface.co/google) | 1B/4B/12B/27B | gemma3/gemma (1B) |
| [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/THUDM)** | 9B/32B | glm4 | | [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/THUDM)** | 9B/32B | glm4 |
| [GLM-4.1V](https://huggingface.co/THUDM)* | 9B | glm4v | | [GLM-4.1V](https://huggingface.co/THUDM)* | 9B | glm4v |
| [Granite 3-4](https://huggingface.co/ibm-granite) | 1B/2B/3B/7B/8B | granite3/granite4 | | [Granite 3-4](https://huggingface.co/ibm-granite) | 1B/2B/3B/7B/8B | granite3/granite4 |
| [Hunyuan](https://huggingface.co/tencent/) | 7B | hunyuan | | [Hunyuan(MT)](https://huggingface.co/tencent/) | 7B | hunyuan |
| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 | | [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
| [InternVL 2.5-3](https://huggingface.co/OpenGVLab) | 1B/2B/8B/14B/38B/78B | intern_vl | | [InternVL 2.5-3](https://huggingface.co/OpenGVLab) | 1B/2B/8B/14B/38B/78B | intern_vl |
| [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 | | [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 |
| [Llama 3-3.3](https://huggingface.co/meta-llama) | 1B/3B/8B/70B | llama3 | | [Llama 3-3.3](https://huggingface.co/meta-llama) | 1B/3B/8B/70B | llama3 |
| [Llama 4](https://huggingface.co/meta-llama) | 109B/402B | llama4 | | [Llama 4](https://huggingface.co/meta-llama) | 109B/402B | llama4 |
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 |
| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral | | [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral |
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral | | [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
| [Mistral Small](https://huggingface.co/mistralai) | 24B | mistral_small | | [Mistral Small](https://huggingface.co/mistralai) | 24B | mistral_small |
...@@ -68,10 +70,16 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭 ...@@ -68,10 +70,16 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭
> 4. `deepspeed-cpu-offload-stage3`出现`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`错误,是deepspeed本身bug,解决办法参考官方[issuse](https://github.com/microsoft/DeepSpeed/issues/5634) > 4. `deepspeed-cpu-offload-stage3`出现`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`错误,是deepspeed本身bug,解决办法参考官方[issuse](https://github.com/microsoft/DeepSpeed/issues/5634)
> >
> 5. `TypeError: argument of type 'NoneType' is not iterable`错误是官方transformers版本问题,参考[issuse](https://github.com/huggingface/transformers/pull/38328) > 5. `TypeError: argument of type 'NoneType' is not iterable`错误是官方transformers版本问题,参考[issuse](https://github.com/huggingface/transformers/pull/38328)
> 6. `MiMo`需要在yaml中添加`ddp_find_unused_parameters: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false`解决ddp和LLaMafactory之间的冲突
`
> >
> \*:您需要从 main 分支安装 `transformers` 并使用 `DISABLE_VERSION_CHECK=1` 来跳过版本检查。 > \*:您需要从 main 分支安装 `transformers` 并使用 `DISABLE_VERSION_CHECK=1` 来跳过版本检查。
> >
> \*\*:您需要安装特定版本的 `transformers` 以使用该模型,如**GLM4需要transformers==4.51.3** > \*\*:您需要安装特定版本的 `transformers` 以使用该模型,如**GLM4需要transformers==4.51.3**
>
>\*\*\*:您需要在安装前,修改文件`pyproject.toml`中的`requires-python = ">=3.11.0"`为`requires-python = ">=3.10.0"`跳过版本检查
>
>\*\*\*\*:根据`pytorch2.9.1`、`DTK`重新安装对应`torchaudio`版本
## 使用源码编译方式安装 ## 使用源码编译方式安装
### 环境准备 ### 环境准备
...@@ -80,13 +88,14 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭 ...@@ -80,13 +88,14 @@ LLaMA Factory是一个大语言模型训练和推理的框架,支持了魔搭
#### Docker(方法一) #### Docker(方法一)
基于光源pytorch2.4.1基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch2.4.1、python、dtk及系统下载对应的镜像版本。 基于光源pytorch2.9.1基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch2.9.1、python、dtk及系统下载对应的镜像版本。
```bash ```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10 docker pull harbor.sourcefind.cn:5443/dcu/admin/base/pytorch:2.9.1-ubuntu22.04-dtk26.04-0130-py3.10-20260204
docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
cd /your_code_path/llama_factory cd /your_code_path/LlamaFactory
git checkout v0.9.4
pip install -e ".[torch,metrics]" --no-build-isolation pip install -e ".[torch,metrics]" --no-build-isolation
``` ```
...@@ -97,7 +106,8 @@ cd docker ...@@ -97,7 +106,8 @@ cd docker
docker build --no-cache -t llama-factory:latest . docker build --no-cache -t llama-factory:latest .
docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
cd /your_code_path/llama_factory cd /your_code_path/LlamaFactory
git checkout v0.9.4
pip install -e ".[torch,metrics]" --no-build-isolation pip install -e ".[torch,metrics]" --no-build-isolation
``` ```
...@@ -107,7 +117,7 @@ pip install -e ".[torch,metrics]" --no-build-isolation ...@@ -107,7 +117,7 @@ pip install -e ".[torch,metrics]" --no-build-isolation
```bash ```bash
DTK: 25.04 DTK: 25.04
python: 3.10 python: 3.10
torch: 2.4.1 torch: 2.9.1
vllm: ≥0.4.3 vllm: ≥0.4.3
deepspeed: 0.14.2+das.opt2.dtk2504 deepspeed: 0.14.2+das.opt2.dtk2504
``` ```
...@@ -120,7 +130,8 @@ deepspeed: 0.14.2+das.opt2.dtk2504 ...@@ -120,7 +130,8 @@ deepspeed: 0.14.2+das.opt2.dtk2504
```bash ```bash
git clone http://developer.hpccube.com/codes/OpenDAS/llama-factory.git git clone http://developer.hpccube.com/codes/OpenDAS/llama-factory.git
cd /your_code_path/llama_factory cd /your_code_path/LlamaFactory
git checkout v0.9.4
pip install -e ".[torch,metrics]" --no-build-isolation pip install -e ".[torch,metrics]" --no-build-isolation
# (可选)deepspeed多机训练 # (可选)deepspeed多机训练
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment