README.md

# <div align="center"><strong>Unsloth</strong></div>
## 简介
unsloth框架基于triton优化模型训练速度和显存占用，使用Unsloth微调Mistral、Gemma、Llama时,速度可提高2-5倍,内存使用可减少70%!

## 安装
组件支持
+ Python 3.10

### 1、使用源码编译方式安装

#### 编译环境准备
提供2种环境准备方式：

1. 基于光源pytorch基础镜像环境：镜像下载地址：[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch)，根据pytorch、python、dtk及系统下载对应的镜像版本。

2. 基于现有python环境：安装pytorch，pytorch whl包下载目录：[torch-2.1.0](https://cancon.hpccube.com:65024/4/main/pytorch/DAS1.1)，根据python、dtk版本,下载对应pytorch的whl包。安装命令如下：
```shell
pip install torch* (下载的torch的whl包)
```

#### 源码编译安装
- 代码下载
```shell
git clone http://developer.hpccube.com/codes/OpenDAS/unsloth.git # 根据编译需要切换分支
```
```
cd unsloth
pip install .
```

```
# if modify unsloth by yourself, you can gitclone from github and use the newest of author, and pip install, then:
vim unsloth/kernels/cross_entropy_loss.py:
MAX_FUSED_SIZE = 65536 -> MAX_FUSED_SIZE = 16384
num_warps = 32 -> num_warps = 8 # 位于Fast_CrossEntropyLoss类的_chunked_cross_entropy_forward[(n_rows, n_chunks,)]下面

vim unsloth/kernels/utils.py
if   BLOCK_SIZE >= 32768: num_warps = 32 -> if   BLOCK_SIZE >= 32768: num_warps = 8
elif BLOCK_SIZE >=  8192: num_warps = 16 -> elif BLOCK_SIZE >=  8192: num_warps = 8
# 位于函数calculate_settings下面

vim unsloth/models/_utils.py
model_architectures = ["llama", "mistral", "gemma", "gemma2", "qwen2",] -> model_architectures = ["llama", "mistral", "qwen2",] 

vim unsloth/models/llama.py
Q = Q.transpose(1, 2) -> Q = Q.transpose(1, 2).half()
K = K.transpose(1, 2) -> K = K.transpose(1, 2).half()
V = V.transpose(1, 2) -> V = V.transpose(1, 2).half()
# 位于函数LlamaAttention_fast_forward的elif HAS_FLASH_ATTENTION and attention_mask is None下面
```
#### 注意事项
+ 若使用pip install下载安装过慢，可添加pypi清华源：-i https://pypi.tuna.tsinghua.edu.cn/simple/

## 验证
- python -c "import unsloth"，显示结果：Unsloth: Will patch your computer to enable 2x faster free finetuning.

## Known Issue
- 无

## 参考资料
- [README_origin](README_origin.md)
- [Unsloth](https://github.com/unslothai/unsloth.git)