Commit 7d576a9a authored by Rayyyyy's avatar Rayyyyy
Browse files

Add results images and infer_hf.py

parent e2fa7e61
......@@ -128,22 +128,23 @@ bash examples/pretrain_yuan2.0_2.1B.sh
```
## 推理
如果不指定`--model_path_or_name`参数,当前默认`IEITYuan/Yuan2-M32-hf`模型进行推理。
Tips: 为避免出现 `RuntimeError: FlashAttention forward only supports head dimension at most 128`错误,修改 `/path/of/Yuan2-M32-hf/config.json` 文件中 `"use_flash_attention":false`
```bash
python vllm/yuan_inference.py --model_path /path/of/Yuan2-M32-hf
pip install -U huggingface_hub hf_transfer
export HF_ENDPOINT=https://hf-mirror.com/
HIP_VISIBLE_DEVICES=0,1,2,3 python infer_hf.py --model_path_or_name /path/of/Yuan2-M32-hf
```
## result
<div align=center>
<img src="./doc/result.png" width=1500 heigh=400/>
</div>
### 精度
测试数据:智源glm_trian_data数据,使用的加速卡:K100。
| device | dtype | params | acc |
| :------: | :------: | :------: | :------: |
| A800 | fp16 | | |
| K100 | fp16 | | |
暂无
## 应用场景
### 算法类别
......
# Load model directly
import argparse
import torch
from transformers import AutoModelForCausalLM, LlamaTokenizer
## params
parser = argparse.ArgumentParser()
parser.add_argument('--model_path_or_name', default="IEITYuan/Yuan2-M32-hf", help='model path')
args = parser.parse_args()
model_path_or_name = args.model_path_or_name
device = "cuda"
tokenizer = LlamaTokenizer.from_pretrained(model_path_or_name, add_eos_token=False, add_bos_token=False, eos_token='<eod>')
tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)
model = AutoModelForCausalLM.from_pretrained(model_path_or_name, trust_remote_code=True, device_map='auto', torch_dtype=torch.float16)
prompts = "写一篇春游作文"
input_tensor = tokenizer(prompts, return_tensors="pt")["input_ids"].to("cuda:0")
outputs = model.generate(input_tensor, do_sample=False, max_length=100)
result = tokenizer.decode(outputs[0])
print("***", result)
......@@ -3,7 +3,7 @@ modelCode=710
# 模型名称
modelName=yuan2.0-m32_pytorch
# 模型描述
modelDescription=源2.0-M32大幅提升了模型算力效率,在性能全面对标LLaMA3-700亿的同时,显著降低了在模型训练、微调和推理所需的算力开销,算力消耗仅为LLaMA3的1/19
modelDescription=浪潮信息发布的新一代基础语言大模型
# 应用场景
appScenario=推理,训练,对话问答,家居,教育,科研
# 框架类型
......
accelerate
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment