README.md 2.06 KB
Newer Older
xuxzh1's avatar
update  
xuxzh1 committed
1
# 					 				   llama.cpp
xuxzh1's avatar
init  
xuxzh1 committed
2

xuxzh1's avatar
update  
xuxzh1 committed
3
![cover](imgs/cover.png)
xuxzh1's avatar
init  
xuxzh1 committed
4

xuxzh1's avatar
update  
xuxzh1 committed
5
## 简介
xuxzh1's avatar
init  
xuxzh1 committed
6

xuxzh1's avatar
update  
xuxzh1 committed
7
**项目目标**
xuxzh1's avatar
init  
xuxzh1 committed
8

xuxzh1's avatar
update  
xuxzh1 committed
9
*llama.cpp*主要目标是在最小化设置和硬件上实现大规模语言模型(LLM)的推理,并在本地和云端都能达到最先进的性能.
xuxzh1's avatar
init  
xuxzh1 committed
10

xuxzh1's avatar
update  
xuxzh1 committed
11
**技术特点**
xuxzh1's avatar
init  
xuxzh1 committed
12

xuxzh1's avatar
update  
xuxzh1 committed
13
14
15
16
- **无依赖的 C/C++ 实现**:项目提供了纯 C/C++ 实现,没有外部依赖。
- **DCU 支持**:通过 HIP 和 Moore Threads 支持 DCU。
- **CPU+DCU 混合推理**:支持 CPU+DCU 混合推理,用于部分加速模型,特别是当模型大小超过显存容量时。
- **整数量化**:支持 1.5 位、2 位、3 位、4 位、5 位、6 位和 8 位整数量化,以加快推理速度并减少内存使用。
xuxzh1's avatar
init  
xuxzh1 committed
17
18
19



xuxzh1's avatar
update  
xuxzh1 committed
20
## 环境准备
xuxzh1's avatar
init  
xuxzh1 committed
21

xuxzh1's avatar
update  
xuxzh1 committed
22
**docker**
xuxzh1's avatar
init  
xuxzh1 committed
23

xuxzh1's avatar
update  
xuxzh1 committed
24
25
```shell
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-py3.10-dtk24.04.3-ubuntu20.04
xuxzh1's avatar
init  
xuxzh1 committed
26

xuxzh1's avatar
update  
xuxzh1 committed
27
docker run -i -t -d  --device=/dev/kfd --privileged --network=host --device=/dev/dri --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 容器外地址:容器内地址 -v /opt/hyhal:/opt/hyhal:ro  --group-add video --shm-size 16G --name {容器名} {镜像ID}
xuxzh1's avatar
init  
xuxzh1 committed
28
29
```

xuxzh1's avatar
update  
xuxzh1 committed
30
**下载源码**
xuxzh1's avatar
init  
xuxzh1 committed
31
32

```bash
xuxzh1's avatar
update  
xuxzh1 committed
33
git clone http://developer.sourcefind.cn/codes/OpenDAS/llama.cpp.git -b branch #分支号
xuxzh1's avatar
init  
xuxzh1 committed
34
35
```

xuxzh1's avatar
update  
xuxzh1 committed
36
**安装依赖**
xuxzh1's avatar
init  
xuxzh1 committed
37
38

```bash
xuxzh1's avatar
update  
xuxzh1 committed
39
40
cd llama.cpp
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
xuxzh1's avatar
init  
xuxzh1 committed
41
42
```

xuxzh1's avatar
update  
xuxzh1 committed
43
**编译**
xuxzh1's avatar
init  
xuxzh1 committed
44
45

```bash
xuxzh1's avatar
update  
xuxzh1 committed
46
47
export LIBRARY_PATH=/opt/dtk/llvm/lib/clang/15.0.0/lib/linux/:$LIBRARY_PATH
sh bianyi.sh
xuxzh1's avatar
init  
xuxzh1 committed
48
49
50
51
```



xuxzh1's avatar
update  
xuxzh1 committed
52
## **模型格式转换**
xuxzh1's avatar
init  
xuxzh1 committed
53
54

```bash
xuxzh1's avatar
update  
xuxzh1 committed
55
python convert_hf_to_gguf.py  /path/to/model    #同目录下生成model.gguf
xuxzh1's avatar
init  
xuxzh1 committed
56
57
58
59
```



xuxzh1's avatar
update  
xuxzh1 committed
60
## **运行**
xuxzh1's avatar
init  
xuxzh1 committed
61
62

```bash
xuxzh1's avatar
update  
xuxzh1 committed
63
cd build/bin
xuxzh1's avatar
init  
xuxzh1 committed
64

xuxzh1's avatar
update  
xuxzh1 committed
65
66
#方式一 命令行界面:
./llama-cli -m /path/to/model.gguf -ngl 9999 -fa -co -p "You are a helpful assistant" -cnv
xuxzh1's avatar
init  
xuxzh1 committed
67

xuxzh1's avatar
update  
xuxzh1 committed
68
69
#方式二 server:
./llama-server -m /path/to/model.gguf -ngl 9999 -fa --port 8080  
xuxzh1's avatar
init  
xuxzh1 committed
70
71
72
73
```



xuxzh1's avatar
update  
xuxzh1 committed
74
## **性能测试**
xuxzh1's avatar
init  
xuxzh1 committed
75
76

```bash
xuxzh1's avatar
update  
xuxzh1 committed
77
./llama-batched-bench -m /path/to/model.gguf -pps -c 0 -ngl 9999 -npp 2 -ntg 2000 -npl 1 -fa
xuxzh1's avatar
init  
xuxzh1 committed
78
79
80
```

> [!NOTE]
xuxzh1's avatar
update  
xuxzh1 committed
81
82
>
> 参数含义可--help查看
xuxzh1's avatar
init  
xuxzh1 committed
83
84
85



xuxzh1's avatar
update  
xuxzh1 committed
86
**参考资料**
xuxzh1's avatar
init  
xuxzh1 committed
87

xuxzh1's avatar
update  
xuxzh1 committed
88
[https://github.com/ggerganov/llama.cpp/tree/b4160?tab=readme-ov-file](/path/to/model.gguf)