"fmoe/git@developer.sourcefind.cn:OpenDAS/fastmoe.git" did not exist on "ddfaaf49858d0f270411bfee537897c8241ef07f"
README.md 5.87 KB
Newer Older
dcuai's avatar
dcuai committed
1
# LLaMA
huangwb's avatar
huangwb committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

## 论文
- [https://arxiv.org/pdf/2302.13971.pdf](https://arxiv.org/pdf/2302.13971.pdf)

## 模型结构
LLAMA网络基于 Transformer 架构。提出了各种改进,并用于不同的模型,例如 PaLM。以下是与原始架构的主要区别:
预归一化。为了提高训练稳定性,对每个transformer 子层的输入进行归一化,而不是对输出进行归一化。使用 RMSNorm 归一化函数。
SwiGLU 激活函数 [PaLM]。使用 SwiGLU 激活函数替换 ReLU 非线性以提高性能。使用 2 /3 4d 的维度而不是 PaLM 中的 4d。
旋转嵌入。移除了绝对位置嵌入,而是添加了旋转位置嵌入 (RoPE),在网络的每一层。

![img](./docs/llama_str.png)

## 算法原理
LLama是一个基础语言模型的集合,参数范围从7B到65B。在数万亿的tokens上训练出的模型,并表明可以专门使用公开可用的数据集来训练最先进的模型,而不依赖于专有的和不可访问的数据集。

![img](./docs/llama_pri.png)

## 环境配置

### Docker(方法一)
dcuai's avatar
dcuai committed
22
23
```
docker pull  image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
huangwb's avatar
huangwb committed
24

dcuai's avatar
dcuai committed
25
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
xuxzh1's avatar
update  
xuxzh1 committed
26

dcuai's avatar
dcuai committed
27
```
xuxzh1's avatar
update  
xuxzh1 committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
1. 安装Rust

```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

2. 安装Protoc

```shell
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
```

3. 安装TGI Service

```bash
cd llama_tgi
cd text-generation-inference
#安装exllama
cd server
make install-exllama #安装exllama kernels
make install-exllamav2 #安装exllmav2 kernels
cd .. #回到项目根目录
source $HOME/.cargo/env
BUILD_EXTENSIONS=True make install #安装text-generation服务
```

4. 安装benchmark

```bash
cd text-generation-inference
make install-benchmark
```

注意:若安装过程过慢,可以通过如下命令修改默认源提速。

```bash
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
```

另外,`cargo install` 太慢也可以通过在`~/.cargo/config`中添加源来提速。

dcuai's avatar
dcuai committed
73
5.  查看安装的版本号
xuxzh1's avatar
update  
xuxzh1 committed
74
75
76
77
78

```bash
text-generation-launcher -V  #版本号与官方版本同步
```

huangwb's avatar
huangwb committed
79
80
81
82
83
84
85
## 数据集


## 推理

### 模型下载

xuxzh1's avatar
update  
xuxzh1 committed
86
87
| 基座模型                                                     | chat模型                                                     | GPTQ模型                                                     |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
chenzk's avatar
chenzk committed
88
89
90
91
92
| [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | [Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) |
| [Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b) | [Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | [Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) |
| [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) | [Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | [Llama-2-70B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-Chat-GPTQ) |
| [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |                                                              |
| [Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) |                                                              |
huangwb's avatar
huangwb committed
93
94
95


### 部署TGI
dcuai's avatar
dcuai committed
96
97
98
99
100

#### 使用前
```bash
export PYTORCH_TUNABLEOP_ENABLED=0
```
huangwb's avatar
huangwb committed
101
#### 1. 启动TGI服务
huangwb's avatar
huangwb committed
102
103
104
105
106
107
108
```
HIP_VISIBLE_DEVICES=3 text-generation-launcher --dtype=float16 --model-id /path/to/Llama-2-7b-chat-hf --port 3001
```
更多参数可使用如下方式查看
```
text-generation-launcher --help
```
huangwb's avatar
huangwb committed
109
#### 2. 请求服务
huangwb's avatar
huangwb committed
110
111
112
113
114
115
116
117

curl命令方式:
```
curl 127.0.0.1:3001/generate \
    -X POST \
    -d '{"inputs":"What is deep learning?","parameters":{"max_new_tokens":100,"temperature":0.7}}' \
    -H 'Content-Type: application/json'
```
huangwb's avatar
huangwb committed
118
通过python调用的方式:
huangwb's avatar
huangwb committed
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
```
import requests

headers = {
    "Content-Type": "application/json",
}

data = {
    'inputs': 'What is Deep Learning?',
    'parameters': {
        'max_new_tokens': 20,
    },
}

response = requests.post('http://127.0.0.1:3001/generate', headers=headers, json=data)
print(response.json())
# {'generated_text': '\n\nDeep Learning is a subset of Machine Learning that is concerned with the development of algorithms that can'}
```
更多API查看,请参考 [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference)
huangwb's avatar
huangwb committed
138
#### 3. TGI benchmark
huangwb's avatar
huangwb committed
139
140
141
142
example:
```
text-generation-benchmark -s 32 -d 128 --runs 10 --tokenizer-name /path/to/Llama-2-7b-chat-hf
```
huangwb's avatar
huangwb committed
143
注意:需要先启动TGI服务才能使用TGI benchmark。此外,`--tokenizer-name`需要和服务中保持一致。
huangwb's avatar
huangwb committed
144
145
146
147
148
更多参数可使用如下方式查看
```
text-generation-benchmark --help
```

dcuai's avatar
dcuai committed
149
## result
huangwb's avatar
huangwb committed
150

dcuai's avatar
dcuai committed
151
![img1](./readme_images/img1.png)
dcuai's avatar
dcuai committed
152
153
154

### 精度

huangwb's avatar
huangwb committed
155

dcuai's avatar
dcuai committed
156
## 应用场景
huangwb's avatar
huangwb committed
157
158
159
160
161
162
163
### 算法类别
对话问答

### 热点应用行业
金融,科研,教育

## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
164
* [https://developer.sourcefind.cn/codes/modelzoo/llama_tgi](https://developer.sourcefind.cn/codes/modelzoo/llama_tgi)
huangwb's avatar
huangwb committed
165
166
167

## 参考资料
* [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference)