Commit 7d576a9a authored by Rayyyyy's avatar Rayyyyy
Browse files

Add results images and infer_hf.py

parent e2fa7e61
...@@ -128,22 +128,23 @@ bash examples/pretrain_yuan2.0_2.1B.sh ...@@ -128,22 +128,23 @@ bash examples/pretrain_yuan2.0_2.1B.sh
``` ```
## 推理 ## 推理
如果不指定`--model_path_or_name`参数,当前默认`IEITYuan/Yuan2-M32-hf`模型进行推理。
Tips: 为避免出现 `RuntimeError: FlashAttention forward only supports head dimension at most 128`错误,修改 `/path/of/Yuan2-M32-hf/config.json` 文件中 `"use_flash_attention":false`
```bash ```bash
python vllm/yuan_inference.py --model_path /path/of/Yuan2-M32-hf pip install -U huggingface_hub hf_transfer
export HF_ENDPOINT=https://hf-mirror.com/
HIP_VISIBLE_DEVICES=0,1,2,3 python infer_hf.py --model_path_or_name /path/of/Yuan2-M32-hf
``` ```
## result ## result
<div align=center> <div align=center>
<img src="./doc/result.png" width=1500 heigh=400/> <img src="./doc/result.png" width=1500 heigh=400/>
</div> </div>
### 精度 ### 精度
测试数据:智源glm_trian_data数据,使用的加速卡:K100。 暂无
| device | dtype | params | acc |
| :------: | :------: | :------: | :------: |
| A800 | fp16 | | |
| K100 | fp16 | | |
## 应用场景 ## 应用场景
### 算法类别 ### 算法类别
......
# Load model directly
import argparse
import torch
from transformers import AutoModelForCausalLM, LlamaTokenizer
## params
parser = argparse.ArgumentParser()
parser.add_argument('--model_path_or_name', default="IEITYuan/Yuan2-M32-hf", help='model path')
args = parser.parse_args()
model_path_or_name = args.model_path_or_name
device = "cuda"
tokenizer = LlamaTokenizer.from_pretrained(model_path_or_name, add_eos_token=False, add_bos_token=False, eos_token='<eod>')
tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)
model = AutoModelForCausalLM.from_pretrained(model_path_or_name, trust_remote_code=True, device_map='auto', torch_dtype=torch.float16)
prompts = "写一篇春游作文"
input_tensor = tokenizer(prompts, return_tensors="pt")["input_ids"].to("cuda:0")
outputs = model.generate(input_tensor, do_sample=False, max_length=100)
result = tokenizer.decode(outputs[0])
print("***", result)
...@@ -3,7 +3,7 @@ modelCode=710 ...@@ -3,7 +3,7 @@ modelCode=710
# 模型名称 # 模型名称
modelName=yuan2.0-m32_pytorch modelName=yuan2.0-m32_pytorch
# 模型描述 # 模型描述
modelDescription=源2.0-M32大幅提升了模型算力效率,在性能全面对标LLaMA3-700亿的同时,显著降低了在模型训练、微调和推理所需的算力开销,算力消耗仅为LLaMA3的1/19 modelDescription=浪潮信息发布的新一代基础语言大模型
# 应用场景 # 应用场景
appScenario=推理,训练,对话问答,家居,教育,科研 appScenario=推理,训练,对话问答,家居,教育,科研
# 框架类型 # 框架类型
......
accelerate
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment