v1.1

f34e15fb · chenzk · 1e883691 · f34e15fb · f34e15fb · f34e15fb
Commit f34e15fb authored Apr 30, 2025 by chenzk
Hide whitespace changes
Inline Side-by-side

Showing with 17 additions and 14 deletions

README.md README.md +12 -9

docker/Dockerfile docker/Dockerfile +1 -1

infer_vllm.py infer_vllm.py +4 -4

No files found.
--- a/README.md
+++ b/README.md
@@ -20,7 +20,8 @@ mv Qwen3_pytorch Qwen3 # 去框架名后缀
 ### Docker（方法一）
 ```
-docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
+docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.4-ubuntu22.04-dtk25.04-rc7-das1.5-py3.10-20250429-dev-qwen3-only
+# docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
 # <your IMAGE ID>为以上拉取的docker的镜像ID替换，本镜像为：6063b673703a
 docker run -it --shm-size=64G -v $PWD/Qwen3:/home/Qwen3 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name qwen3 <your IMAGE ID> bash
 cd /home/Qwen3
@@ -42,7 +43,7 @@ python:python3.10
 torch:2.4.1
 torchvision:0.19.1
 triton:3.0.0
-vllm:0.6.2
+vllm:0.8.4
 flash-attn:2.6.1
 deepspeed:0.14.2
 apex:1.4.0
@@ -72,15 +73,21 @@ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
 ### 单机多卡
 ```
-# 本项目以Qwen3-8B示例，其它Qwen3模型以此类推。
 cd /home/Qwen3
+# 方法一：pytorch推理
+# 本项目以Qwen3-8B示例，其它Qwen3模型以此类推。
 python infer_transformers.py
-# vllm>=0.8.4正在适配中，后期将陆续开放vllm版推理。
+方法一：vllm推理
+python infer_vllm.py # vllm=0.8.4
 ```
 更多资料可参考源项目中的[`README_orgin`](./README_orgin.md)。
 ## result
+vllm推理效果示例：
 `输入: `
 ```
 prompt: "Give me a short introduction to large language models."
@@ -88,11 +95,7 @@ prompt: "Give me a short introduction to large language models."
 `输出:`
 ```
-<think>
+Generated text: "<think>\nOkay, the user wants a short introduction to large language models. Let me start by defining what they are. They're AI systems trained on massive text data, right? I should mention their ability to understand and generate human-like text. Maybe include examples like GPT or BERT.\n\nWait, the user might not know the difference between different models. Should I explain the training process? Like using unsupervised learning on vast datasets. Also, highlight their applications: answering questions, writing stories, coding. But keep it concise since it's supposed to be short.\n\nOh, and maybe touch on their significance in NLP. Emphasize that they can handle multiple languages and tasks. Need to make sure it's clear without too much jargon. Let me check if I'm missing any key points. Oh, scalability and adaptability could be important. Alright, structure it with a definition, how they work, applications, and impact. Keep each part brief.\n</think>\n\nLarge language models (LLMs) are advanced artificial intelligence systems trained on vast amounts of text data to understand and generate human-like language. They use deep learning techniques to process and produce coherent responses across multiple languages and tasks, such as answering questions, writing stories, coding, and more. By analyzing patterns in text, LLMs can adapt to diverse contexts, making them powerful tools for natural language processing (NLP) and a wide range of applications, from customer service to creative writing. Their ability to scale and learn from extensive datasets has revolutionized how machines interact with and understand human communication."
-Okay, the user wants a short introduction to large language models. Let me start by defining what they are. I should mention they're AI systems trained on massive text data. Maybe include how they process and generate human-like text. Also, touch on their applications like answering questions, creating content, coding. Need to keep it concise but cover the key points. Oh, and maybe mention their size, like parameters, but not too technical. Avoid jargon. Make sure it's easy to understand. Let me check if I'm missing anything important. Oh, maybe a sentence about their training process? Or just stick to the basics. Alright, structure: definition, training data, capabilities, applications. Keep each part brief. That should work.
-</think>
-Large language models (LLMs) are advanced artificial intelligence systems trained on vast amounts of text data to understand and generate human-like language. They can process and respond to complex queries, create written content, code, and even engage in conversations. These models, often with billions of parameters, excel at tasks like answering questions, summarizing information, and translating languages, making them versatile tools for various applications, from customer service to research and creative writing.
 ```
 ### 精度

--- a/docker/Dockerfile
+++ b/docker/Dockerfile
-FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
+FROM image.sourcefind.cn:5000/dcu/admin/base/custom:vllm0.8.4-ubuntu22.04-dtk25.04-rc7-das1.5-py3.10-20250429-dev-qwen3-only
 ENV DEBIAN_FRONTEND=noninteractive
 # RUN yum update && yum install -y git cmake wget build-essential
 # RUN source /opt/dtk-dtk25.04/env.sh

--- a/infer_vllm.py
+++ b/infer_vllm.py
@@ -23,7 +23,7 @@ if __name__ == '__main__':
        {"role": "user", "content": prompt}
    ]
    '''
-    prompt = "How many r's are in the word \"strawberry\""
+    prompt = "Give me a short introduction to large language models."
    messages = [
        {"role": "user", "content": prompt}
    ]
@@ -31,7 +31,8 @@ if __name__ == '__main__':
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
-        add_generation_prompt=True
+        add_generation_prompt=True,
+        enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
    )
    # generate outputs
@@ -41,5 +42,4 @@ if __name__ == '__main__':
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Generated text: {generated_text!r}")
\ No newline at end of file