Add results images and infer_hf.py

7d576a9a · Rayyyyy · e2fa7e61 · 7d576a9a · 7d576a9a · 7d576a9a
Commit 7d576a9a authored Jun 26, 2024 by Rayyyyy
Showing with 35 additions and 8 deletions

README.md README.md +8 -7

doc/result.png doc/result.png +0 -0

infer_hf.py infer_hf.py +25 -0

model.properties model.properties +1 -1

requirements.txt requirements.txt +1 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -128,22 +128,23 @@ bash examples/pretrain_yuan2.0_2.1B.sh
 ```
 ## 推理
+如果不指定`--model_path_or_name`参数，当前默认`IEITYuan/Yuan2-M32-hf`模型进行推理。
+Tips: 为避免出现 `RuntimeError: FlashAttention forward only supports head dimension at most 128`错误，修改 `/path/of/Yuan2-M32-hf/config.json` 文件中 `"use_flash_attention":false`
 ```bash
-python vllm/yuan_inference.py --model_path /path/of/Yuan2-M32-hf
+pip install -U huggingface_hub hf_transfer
+export HF_ENDPOINT=https://hf-mirror.com/
+HIP_VISIBLE_DEVICES=0,1,2,3 python infer_hf.py --model_path_or_name /path/of/Yuan2-M32-hf
 ```
 ## result
 <div align=center>
    <img src="./doc/result.png" width=1500 heigh=400/>
 </div>
 ### 精度
-测试数据：智源glm_trian_data数据，使用的加速卡:K100。
+暂无
-| device | dtype | params | acc |
-| :------: | :------: | :------: | :------: |
-| A800 | fp16 |     |     |
-| K100 | fp16 |     |     |
 ## 应用场景
 ### 算法类别

--- a/doc/result.png
+++ b/doc/result.png
--- a/infer_hf.py
+++ b/infer_hf.py
+# Load model directly
+import argparse
+import torch
+from transformers import AutoModelForCausalLM, LlamaTokenizer
+## params
+parser = argparse.ArgumentParser()
+parser.add_argument('--model_path_or_name', default="IEITYuan/Yuan2-M32-hf", help='model path')
+args = parser.parse_args()
+model_path_or_name = args.model_path_or_name
+device = "cuda"
+tokenizer = LlamaTokenizer.from_pretrained(model_path_or_name, add_eos_token=False, add_bos_token=False, eos_token='<eod>')
+tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)
+model = AutoModelForCausalLM.from_pretrained(model_path_or_name, trust_remote_code=True, device_map='auto', torch_dtype=torch.float16)
+prompts = "写一篇春游作文"
+input_tensor = tokenizer(prompts, return_tensors="pt")["input_ids"].to("cuda:0")
+outputs = model.generate(input_tensor, do_sample=False, max_length=100)
+result = tokenizer.decode(outputs[0])
+print("***", result)
--- a/model.properties
+++ b/model.properties
@@ -3,7 +3,7 @@ modelCode=710
 # 模型名称
 modelName=yuan2.0-m32_pytorch
 # 模型描述
-modelDescription=源2.0-M32大幅提升了模型算力效率，在性能全面对标LLaMA3-700亿的同时，显著降低了在模型训练、微调和推理所需的算力开销，算力消耗仅为LLaMA3的1/19。
+modelDescription=浪潮信息发布的新一代基础语言大模型。
 # 应用场景
 appScenario=推理,训练,对话问答,家居,教育,科研
 # 框架类型

--- a/requirements.txt
+++ b/requirements.txt
+accelerate
\ No newline at end of file