Commit f19baa5f authored by zhuwenwen's avatar zhuwenwen
Browse files

add llama2 config and usage

parent 447071ef
...@@ -56,13 +56,17 @@ make -j12 ...@@ -56,13 +56,17 @@ make -j12
### 模型下载 ### 模型下载
[llama 7B](https://huggingface.co/decapoda-research/llama-7b-hf) [LLama-7B](https://huggingface.co/decapoda-research/llama-7b-hf)
[llama 13B](https://huggingface.co/decapoda-research/llama-13b-hf) [LLama-13B](https://huggingface.co/decapoda-research/llama-13b-hf)
[llama 30B](https://huggingface.co/decapoda-research/llama-30b-hf) [LLama-30B](https://huggingface.co/decapoda-research/llama-30b-hf)
[llama 65B](https://huggingface.co/decapoda-research/llama-65b-hf) [LLama-65B](https://huggingface.co/decapoda-research/llama-65b-hf)
[LLama-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf)
[LLama-13B](https://huggingface.co/meta-llama/Llama-2-13b-hf)
支持模型包括:llama-7B、llama-13B、llama-30B、llama-65B、llama2-7B、llama2-13B 支持模型包括:llama-7B、llama-13B、llama-30B、llama-65B、llama2-7B、llama2-13B
...@@ -77,14 +81,14 @@ python ../examples/cpp/llama/huggingface_llama_convert.py \ ...@@ -77,14 +81,14 @@ python ../examples/cpp/llama/huggingface_llama_convert.py \
例如llama-7b的转换:`-in_file`为模型输入路径,`-saved_dir`为模型输出路径,`-infer_gpu_num`为推理的tp大小,`-weight_data_type`为推理的数据类型,`-model_name`为模型名称.若使用其他模型,对应修改路径和`-model_name`. 例如llama-7b的转换:`-in_file`为模型输入路径,`-saved_dir`为模型输出路径,`-infer_gpu_num`为推理的tp大小,`-weight_data_type`为推理的数据类型,`-model_name`为模型名称.若使用其他模型,对应修改路径和`-model_name`.
### 运行 LLama-7b ### 运行 LLama-7B
1. 生成`gemm_config.in`文件 1. 生成`gemm_config.in`文件
data_type = 0 (FP32) or 1 (FP16) data_type = 0 (FP32) or 1 (FP16)
```bash ```bash
./bin/gpt_gemm 1 1 20 32 128 11008 32000 1 1 ./bin/gpt_gemm 1 1 7 32 128 11008 32000 1 1
``` ```
上述参数对应为 上述参数对应为
...@@ -105,31 +109,33 @@ data_type = 1时,data_type = fp16;data_type = 0时,data_type = fp32,tensor_p ...@@ -105,31 +109,33 @@ data_type = 1时,data_type = fp16;data_type = 0时,data_type = fp32,tensor_p
该程序会读取`../examples/cpp/llama//start_ids.csv`中的id作为输入tokens,生成的结果会保存在`.out`. 该程序会读取`../examples/cpp/llama//start_ids.csv`中的id作为输入tokens,生成的结果会保存在`.out`.
### 运行 LLama-13b ### 运行 LLama-13B
```bash ```bash
./bin/gpt_gemm 1 1 20 40 128 13824 32000 1 1 ./bin/gpt_gemm 1 1 7 40 128 13824 32000 1 1
./bin/llama_example ./bin/llama_example
``` ```
### 运行 LLama-33b ### 运行 LLama-33B
```bash ```bash
./bin/gpt_gemm 1 1 20 52 128 17920 32000 1 2 ./bin/gpt_gemm 1 1 7 52 128 17920 32000 1 2
mpirun --allow-run-as-root -np 2 ./bin/llama_example mpirun --allow-run-as-root -np 2 ./bin/llama_example
``` ```
### 运行 LLama-65b ### 运行 LLama-65B
```bash ```bash
./bin/gpt_gemm 1 1 20 64 128 22016 32000 1 8 ./bin/gpt_gemm 1 1 7 64 128 22016 32000 1 8
mpirun --allow-run-as-root -np 8 ./bin/llama_example mpirun --allow-run-as-root -np 8 ./bin/llama_example
``` ```
### 参数配置说明 ### 参数配置说明
llama-33b模型,使用fp16推理需要2张卡(32G),llama-65b模型,使用fp16推理需要8张卡(32G). 注意:LLama2-7B和LLama2-13B运行同LLama-7B和LLama-13B.
从huggingface下载llama模型,可以查看config.json文件,如下左边为fastertransformer参数,后边对应config.son文件中的参数值. LLama-33B模型,使用fp16推理需要2张卡(32G),LLama-65B模型,使用fp16推理需要8张卡(32G).
从huggingface下载LLama模型,可以查看config.json文件,如下左边为fastertransformer参数,后边对应config.son文件中的参数值.
```bash ```bash
head_num=num_attention_heads head_num=num_attention_heads
......
...@@ -18,10 +18,10 @@ model_dir=/data/models/llama-7b-infer/1-gpu ...@@ -18,10 +18,10 @@ model_dir=/data/models/llama-7b-infer/1-gpu
; model_dir=/data/models/llama-65b-hf-infer/8-gpu ; model_dir=/data/models/llama-65b-hf-infer/8-gpu
; model_name=llama2_7b ; model_name=llama2_7b
; model_dir=/data/models/chinese-llama-2-7b-infer/1-gpu ; model_dir=/data/models/llama-2-7b-infer/1-gpu
; model_name=llama2_13b ; model_name=llama2_13b
; model_dir=/data/models/chinese-llama-2-13b-infer/1-gpu ; model_dir=/data/models/llama-2-13b-infer/1-gpu
[request] [request]
beam_width=1 # beam width for beam search beam_width=1 # beam width for beam search
...@@ -90,9 +90,9 @@ inter_size = 11008 ...@@ -90,9 +90,9 @@ inter_size = 11008
num_layer = 32 num_layer = 32
rotary_embedding = 128 rotary_embedding = 128
layernorm_eps = 1e-05 layernorm_eps = 1e-05
vocab_size = 55296 vocab_size = 32000
start_id = 0 start_id = 1
end_id = 1 end_id = 2
weight_data_type = fp16 weight_data_type = fp16
[llama2_13b] [llama2_13b]
...@@ -102,7 +102,7 @@ inter_size = 13824 ...@@ -102,7 +102,7 @@ inter_size = 13824
num_layer = 40 num_layer = 40
rotary_embedding = 128 rotary_embedding = 128
layernorm_eps = 1e-05 layernorm_eps = 1e-05
vocab_size = 55296 vocab_size = 32000
start_id = 0 start_id = 1
end_id = 1 end_id = 2
weight_data_type = fp16 weight_data_type = fp16
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment