#
Unsloth
## 简介
unsloth框架基于triton优化模型训练速度和显存占用,使用Unsloth微调Mistral、Gemma、Llama时,速度可提高2-5倍,内存使用可减少70%!
## 安装
组件支持
+ Python 3.10
### 1、使用源码编译方式安装
#### 编译环境准备
提供2种环境准备方式:
1. 基于光源pytorch基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch、python、dtk及系统下载对应的镜像版本。
2. 基于现有python环境:安装pytorch,pytorch whl包下载目录:[torch-2.1.0](https://cancon.hpccube.com:65024/4/main/pytorch/DAS1.1),根据python、dtk版本,下载对应pytorch的whl包。安装命令如下:
```shell
pip install torch* (下载的torch的whl包)
```
#### 源码编译安装
- 代码下载
```shell
git clone http://developer.hpccube.com/codes/OpenDAS/unsloth.git # 根据编译需要切换分支
```
```
cd unsloth
pip install .
```
```
# if modify unsloth by yourself, you can gitclone from github and use the newest of author, and pip install, then:
vim unsloth/kernels/cross_entropy_loss.py:
MAX_FUSED_SIZE = 65536 -> MAX_FUSED_SIZE = 16384
num_warps = 32 -> num_warps = 8 # 位于Fast_CrossEntropyLoss类的_chunked_cross_entropy_forward[(n_rows, n_chunks,)]下面
vim unsloth/kernels/utils.py
if BLOCK_SIZE >= 32768: num_warps = 32 -> if BLOCK_SIZE >= 32768: num_warps = 8
elif BLOCK_SIZE >= 8192: num_warps = 16 -> elif BLOCK_SIZE >= 8192: num_warps = 8
# 位于函数calculate_settings下面
vim unsloth/models/_utils.py
model_architectures = ["llama", "mistral", "gemma", "gemma2", "qwen2",] -> model_architectures = ["llama", "mistral", "qwen2",]
vim unsloth/models/llama.py
Q = Q.transpose(1, 2) -> Q = Q.transpose(1, 2).half()
K = K.transpose(1, 2) -> K = K.transpose(1, 2).half()
V = V.transpose(1, 2) -> V = V.transpose(1, 2).half()
# 位于函数LlamaAttention_fast_forward的elif HAS_FLASH_ATTENTION and attention_mask is None下面
```
#### 注意事项
+ 若使用pip install下载安装过慢,可添加pypi清华源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
## 验证
- python -c "import unsloth",显示结果:Unsloth: Will patch your computer to enable 2x faster free finetuning.
## Known Issue
- 无
## 参考资料
- [README_origin](README_origin.md)
- [Unsloth](https://github.com/unslothai/unsloth.git)