Initial commit

dd7e61d4 · luopl · dd7e61d4 · dd7e61d4 · dd7e61d4 · dd7e61d4
Commit dd7e61d4 authored Feb 12, 2025 by luopl
11 changed files
--- a/Dockerfile
+++ b/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# phi-4_pytorch
+## 论文
+Phi-4 Technical Report
+-https://arxiv.org/abs/2412.08905
+## 模型结构
+phi-4 是一个 140 亿参数的模型，基于 Transformer 架构，专注于提升推理和 STEM 领域的问答能力。phi-4在架构设计上与phi-3几乎一致，但通过改进数据质量、优化训练方案，
+该模型在以推理为重点的基准测试中，展现出在其参数规模上的卓越表现。与之前的 Phi 系列模型不同，phi-4 不仅通过蒸馏教师模型（如 GPT-4）来提升能力，还在某些任务上超越了教师模型。
+<div align=center>
+    <img src="./assets/transfomer.png"/>
+</div>
+## 算法原理
+phi-4的开发基于以下三大核心原则：
+- 合成数据的预训练与中期训练：设计高质量的合成数据集以提升推理与问题解决能力，同时保证数据的多样性与相关性。通过调整训练课程及数据混合策略，与前一代phi模型相比，合成token在预训练与中期训练中的比例显著增加。
+- 高质量有机数据的筛选与过滤：通过筛选网页内容、授权书籍与代码库等有机数据，提取能够促进模型深度推理并具有教育价值的种子数据。这些数据不仅是合成数据生成的基础，还被直接用于预训练，以提高知识与推理质量。
+- 后期训练优化：通过精炼监督微调（SFT）数据集版本，并结合关键token搜索技术开发新的DPO对生成方法，进一步完善了后期训练流程。
+## 环境配置
+### Docker（方法一）
+推荐使用docker方式运行， 此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10
+docker run -it --shm-size=1024G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name phi-4_pytorch  <your IMAGE ID> bash # <your IMAGE ID>为以上拉取的docker的镜像ID替换，本镜像为：b030eb4a853a
+cd /path/your_code_data/
+git clone http://developer.sourcefind.cn/codes/OpenDAS/llama-factory.git
+cd llama-factory
+pip install -e ".[torch,metrics]"
+```
+Tips:以上dtk驱动、python、torch、vllm等DCU相关工具版本需要严格一一对应。
+### Dockerfile（方法二）
+此处提供dockerfile的使用方法
+```
+docker build -t phi4:latest .
+docker run -it --shm-size=1024G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name phi-4_pytorch  phi4 bash 
+cd /path/your_code_data/
+git clone http://developer.sourcefind.cn/codes/OpenDAS/llama-factory.git
+cd llama-factory
+pip install -e ".[torch,metrics]"
+```
+### Anaconda（方法三）
+此处提供本地配置、编译的详细步骤，例如：
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+DTK驱动:dtk24.04.3
+python:3.10
+torch:2.3.0
+flash-attn:2.6.1
+vllm:0.6.2
+lmslim:0.1.2
+xformers:0.0.25
+triton:2.1.0
+deepspeed:0.14.2
+apx:1.3.0
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
+其它非深度学习库参照requirement.txt安装：
+```
+cd /path/your_code_data/
+git clone http://developer.sourcefind.cn/codes/OpenDAS/llama-factory.git
+cd llama-factory
+pip install -e ".[torch,metrics]"
+```
+## 数据集
+使用identity,alpaca_en_demo数据集，已经包含在data目录中
+训练数据目录结构如下，用于正常训练的完整数据集请按此目录结构进行制备：
+```
+ ── data
+    ├── identity.json
+    ├── alpaca_en_demo.json
+    └── ...
+```
+## 训练
+使用LLaMA-Factory框架微调
+### 单机单卡（LoRA-finetune）
+```
+# 注意：根据自己的模型切换.yaml文件中的模型位置并调整其他参数
+cd /path/your_code_data/
+mv phi-4_lora_sft.yaml llama-factory/examples/train_lora/phi-4_lora_sft.yaml
+cd llama-factory
+HIP_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/phi-4_lora_sft.yaml
+```
+### 单机多卡(LoRA-finetune）
+4卡微调
+```
+HIP_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/phi-4_lora_sft.yaml
+```
+## 推理
+使用vllm框架推理
+### 单机单卡
+```
+#注意：根据自己的模型切换文件中的模型位置并调整其他参数
+cd /path/your_code_data/
+python ./inference_vllm/phi_single_infer.py
+```
+### 单机多卡
+```
+HIP_VISIBLE_DEVICES=0,1 python ./inference_vllm/phi_multi_infer.py
+```
+其中，tensor_parallel_size=2为使用卡数,prompt为你想输入的内容。
+## result
+result:
+<div align=left>
+    <img src="./assets/result.png"/>
+</div>
+### 精度
+模型:phi-4
+数据: identity,alpaca_en_demo
+训练模式:Lora finetune
+硬件：2卡，K100 AI
+在DCU上训练的收敛情况：
+<div align=left>
+    <img src="./assets/training_loss.png"/>
+</div>
+## 应用场景
+### 算法类别
+`对话问答`
+### 热点应用行业
+`科研,教育,政府,金融`
+## 预训练权重
+[phi-4模型下载SCNet链接](http://113.200.138.88:18080/aimodels/microsoft/phi-4)
+## 源码仓库及问题反馈
+- http://developer.hpccube.com/codes/modelzoo/phi-4_pytorch.git
+## 参考资料
+- https://github.com/hiyouga/LLaMA-Factory
+- https://huggingface.co/microsoft/phi-4
--- a/assets/result.png
+++ b/assets/result.png
--- a/assets/training_eval_loss.png
+++ b/assets/training_eval_loss.png
--- a/assets/training_loss.png
+++ b/assets/training_loss.png
--- a/assets/transfomer.png
+++ b/assets/transfomer.png
--- a/icon.png
+++ b/icon.png
--- a/inference/phi_multi_infer.py
+++ b/inference/phi_multi_infer.py
+import torch
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+def main():
+    # Initialize the tokenizer
+    tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-4", trust_remote_code=True)
+    # Create a sampling params object.
+    sampling_params = SamplingParams(temperature=0.8, repetition_penalty=1.05, max_tokens=512)
+    # Create an LLM object with model path and configuration.
+    llm = LLM(model="microsoft/phi-4",
+              tensor_parallel_size=2,
+              trust_remote_code=True,
+              gpu_memory_utilization=0.95,
+              dtype="float16",
+              max_model_len=512,
+              enforce_eager=True)
+    # Prepare your prompts
+    prompt = "Tell me something about large language models."
+    messages = [
+    {"role": "user", "content": prompt}
+    ]
+    text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+    )
+    # generate outputs
+    outputs = llm.generate([text], sampling_params)
+    # Print the outputs.
+    for output in outputs:
+        prompt = output.prompt
+        generated_text = output.outputs[0].text
+        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+if __name__ == '__main__':
+    main()
--- a/inference/phi_single_infer.py
+++ b/inference/phi_single_infer.py
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+# Initialize the tokenizer
+tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-4", trust_remote_code=True)
+# Pass the default decoding hyperparameters of TeleChat2-7B
+# max_tokens is for the maximum length for generation.
+sampling_params = SamplingParams(temperature=0.0, repetition_penalty=1.05, max_tokens=512)
+# Input the model name or path.
+llm = LLM(model="microsoft/phi-4", trust_remote_code=True)
+# Prepare your prompts
+prompt = "Tell me something about large language models."
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+# generate outputs
+outputs = llm.generate([text], sampling_params)
+# Print the outputs.
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode=1406
+# 模型名称
+modelName=phi-4_pytorch
+# 模型描述
+modelDescription=phi-4 是一个 140 亿参数的模型，基于 Transformer 架构，专注于提升推理和 STEM 领域的问答能力。
+# 应用场景
+appScenario=推理,训练,对话问答,科研,教育,政府,金融
+# 框架类型
+frameType=Pytorch
--- a/phi-4_lora_sft.yaml
+++ b/phi-4_lora_sft.yaml
+### model
+model_name_or_path: microsoft/phi-4
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_target: all
+### dataset
+dataset: identity,alpaca_en_demo
+template: phi4
+cutoff_len: 1024
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+### output
+output_dir: saves/phi-4-14b/lora/sft2
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 1
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.2
+bf16: true
+ddp_timeout: 180000000
+### eval
+val_size: 0.2
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 300