README.md 4.88 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2024-04-25 10:38:07
 * @LastEditTime: 2024-04-25 17:47:01
-->
# LLAMA

## 论文
- [https://arxiv.org/pdf/2302.13971.pdf](https://arxiv.org/pdf/2302.13971.pdf)

## 模型结构
LLAMA网络基于 Transformer 架构。提出了各种改进,并用于不同的模型,例如 PaLM。以下是与原始架构的主要区别:
预归一化。为了提高训练稳定性,对每个transformer 子层的输入进行归一化,而不是对输出进行归一化。使用 RMSNorm 归一化函数。
SwiGLU 激活函数 [PaLM]。使用 SwiGLU 激活函数替换 ReLU 非线性以提高性能。使用 2 /3 4d 的维度而不是 PaLM 中的 4d。
旋转嵌入。移除了绝对位置嵌入,而是添加了旋转位置嵌入 (RoPE),在网络的每一层。

zhuwenwen's avatar
zhuwenwen committed
18
![img](./docs/llama_str.png)
zhuwenwen's avatar
zhuwenwen committed
19
20
21
22

## 算法原理
LLama是一个基础语言模型的集合,参数范围从7B到65B。在数万亿的tokens上训练出的模型,并表明可以专门使用公开可用的数据集来训练最先进的模型,而不依赖于专有的和不可访问的数据集。

zhuwenwen's avatar
zhuwenwen committed
23
![img](./docs/llama_pri.png)
zhuwenwen's avatar
zhuwenwen committed
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

## 环境配置

提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像:
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.3.3-dtk23.10-py38
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run -it --name llama --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal -v <Host Path>:<Container Path> <Image ID> /bin/bash
```

镜像版本依赖:
* DTK驱动:dtk23.10
* Pytorch: 2.1.0
* vllm: 0.3.3
* xformers: 0.0.23
* flash_attn: 2.0.4
* python: python3.8

## 数据集


## 推理

### 模型下载

[LLama2-7B](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)

[LLama2-13B](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)

[LLama2-70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)

### 离线批量推理
```bash
python offline_inference.py
zhuwenwen's avatar
zhuwenwen committed
60
```
zhuwenwen's avatar
zhuwenwen committed
61
其中,`prompts`为提示词;`temperature`为控制采样随机性的值,值越小模型生成越确定,值变高模型生成更随机,0表示贪婪采样,默认为1;`max_tokens=16`为生成长度,默认为1;
zhuwenwen's avatar
zhuwenwen committed
62
`model`为模型路径;`tensor_parallel_size=1`为使用卡数,默认为1;`dtype="float16"`为推理数据类型,如果模型权重是bfloat16,需要修改为float16推理
zhuwenwen's avatar
zhuwenwen committed
63
64
65
66

### OpenAI兼容服务
启动服务:
```bash
zhuwenwen's avatar
zhuwenwen committed
67
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf --enforce-eager --dtype float16 --trust-remote-code
zhuwenwen's avatar
zhuwenwen committed
68
```
zhuwenwen's avatar
zhuwenwen committed
69
这里`--model`为加载模型路径,`--dtype`为数据类型:float16,默认情况使用tokenizer中的预定义聊天模板,`--chat-template`可以添加新模板覆盖默认模板
zhuwenwen's avatar
zhuwenwen committed
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

列出模型型号:
```bash
curl http://localhost:8000/v1/models
```

### OpenAI Completions API和vllm结合使用
```bash
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-2-7b-chat-hf",
        "prompt": "I believe the meaning of life is",
        "max_tokens": 7,
        "temperature": 0
    }'
```
zhuwenwen's avatar
zhuwenwen committed
87
或者使用[vllm/examples/openai_completion_client.py](https://developer.hpccube.com/codes/OpenDAS/vllm/-/tree/3e147e194e5a3b0fc25a61dd91fdc8a682cbba9d/examples/openai_completion_client.py)
zhuwenwen's avatar
zhuwenwen committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101


### OpenAI Chat API和vllm结合使用
```bash
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-2-7b-chat-hf",
        "messages": [
            {"role": "system", "content": "I believe the meaning of life is"},
            {"role": "user", "content": "I believe the meaning of life is"}
        ]
    }'
```
zhuwenwen's avatar
zhuwenwen committed
102
或者使用[vllm/examples/openai_chatcompletion_client.py](https://developer.hpccube.com/codes/OpenDAS/vllm/-/tree/3e147e194e5a3b0fc25a61dd91fdc8a682cbba9d/examples/openai_chatcompletion_client.py)
zhuwenwen's avatar
zhuwenwen committed
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126


## result
使用的加速卡:1张 DCU-K100-64G
```
Prompt: 'I believe the meaning of life is', Generated text: ' to find purpose, happiness, and fulfillment. Here are some reasons why:\n\n1. Purpose: Having a sense of purpose gives life meaning and direction. It helps individuals set goals and work towards achieving them, which can lead to a sense of accomplishment and fulfillment.\n2. Happiness: Happiness is a fundamental aspect of life that brings joy and satisfaction.
```

## 精度


## 应用场景

### 算法类别
对话问答

### 热点应用行业
金融,科研,教育

## 源码仓库及问题反馈
* [https://developer.hpccube.com/codes/modelzoo/llama_vllm](https://developer.hpccube.com/codes/modelzoo/llama_vllm)

## 参考资料
* [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)