#
Unsloth
## 简介 unsloth框架基于triton优化模型训练速度和显存占用,使用Unsloth微调Mistral、Gemma、Llama时,速度可提高2-5倍,内存使用可减少70%! ## 安装 组件支持 + Python 3.10 ### 1、使用源码编译方式安装 #### 编译环境准备 提供2种环境准备方式: 1. 基于光源pytorch基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch、python、dtk及系统下载对应的镜像版本。 2. 基于现有python环境:安装pytorch,pytorch whl包下载目录:[torch-2.1.0](https://cancon.hpccube.com:65024/4/main/pytorch/DAS1.1),根据python、dtk版本,下载对应pytorch的whl包。安装命令如下: ```shell pip install torch* (下载的torch的whl包) ``` #### 源码编译安装 - 代码下载 ```shell git clone http://developer.hpccube.com/codes/OpenDAS/unsloth.git # 根据编译需要切换分支 ``` ``` cd unsloth pip install . ``` ``` # if modify unsloth by yourself, you can gitclone from github and use the newest of author, and pip install, then: vim unsloth/kernels/cross_entropy_loss.py: MAX_FUSED_SIZE = 65536 -> MAX_FUSED_SIZE = 16384 num_warps = 32 -> num_warps = 8 # 位于Fast_CrossEntropyLoss类的_chunked_cross_entropy_forward[(n_rows, n_chunks,)]下面 vim unsloth/kernels/utils.py if BLOCK_SIZE >= 32768: num_warps = 32 -> if BLOCK_SIZE >= 32768: num_warps = 8 elif BLOCK_SIZE >= 8192: num_warps = 16 -> elif BLOCK_SIZE >= 8192: num_warps = 8 # 位于函数calculate_settings下面 vim unsloth/models/_utils.py model_architectures = ["llama", "mistral", "gemma", "gemma2", "qwen2",] -> model_architectures = ["llama", "mistral", "qwen2",] vim unsloth/models/llama.py Q = Q.transpose(1, 2) -> Q = Q.transpose(1, 2).half() K = K.transpose(1, 2) -> K = K.transpose(1, 2).half() V = V.transpose(1, 2) -> V = V.transpose(1, 2).half() # 位于函数LlamaAttention_fast_forward的elif HAS_FLASH_ATTENTION and attention_mask is None下面 ``` #### 注意事项 + 若使用pip install下载安装过慢,可添加pypi清华源:-i https://pypi.tuna.tsinghua.edu.cn/simple/ ## 验证 - python -c "import unsloth",显示结果:Unsloth: Will patch your computer to enable 2x faster free finetuning. ## Known Issue - 无 ## 参考资料 - [README_origin](README_origin.md) - [Unsloth](https://github.com/unslothai/unsloth.git)