Commit a52e53db authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2680 canceled with stages
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
version: 2
build:
os: ubuntu-22.04
tools:
python: "3"
sphinx:
configuration: docs/source/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
# formats:
# - pdf
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/requirements-docs.txt
---
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3-8B-Base
---
# Qwen3-8B
<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
</a>
## Qwen3 Highlights
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
- **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
- **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
- **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
## Model Overview
**Qwen3-8B** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 8.2B
- Number of Paramaters (Non-Embedding): 6.95B
- Number of Layers: 36
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
## Quickstart
The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
With `transformers<4.51.0`, you will encounter the following error:
```
KeyError: 'qwen3'
```
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-8B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
```
For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.4` or to create an OpenAI-compatible API endpoint:
- SGLang:
```shell
python -m sglang.launch_server --model-path Qwen/Qwen3-8B --reasoning-parser qwen3
```
- vLLM:
```shell
vllm serve Qwen/Qwen3-8B --enable-reasoning --reasoning-parser deepseek_r1
```
For local use, applications such as llama.cpp, Ollama, LMStudio, and MLX-LM have also supported Qwen3.
## Switching Between Thinking and Non-Thinking Mode
> [!TIP]
> The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
> Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
### `enable_thinking=True`
By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting `enable_thinking=True` or leaving it as the default value in `tokenizer.apply_chat_template`, the model will engage its thinking mode.
```python
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # True is the default value for enable_thinking
)
```
In this mode, the model will generate think content wrapped in a `<think>...</think>` block, followed by the final response.
> [!NOTE]
> For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
### `enable_thinking=False`
We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.
```python
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False # Setting enable_thinking=False disables thinking mode
)
```
In this mode, the model will not generate any think content and will not include a `<think>...</think>` block.
> [!NOTE]
> For non-thinking mode, we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input
We provide a soft switch mechanism that allows users to dynamically control the model's behavior when `enable_thinking=True`. Specifically, you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
Here is an example of a multi-turn conversation:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
class QwenChatbot:
def __init__(self, model_name="Qwen/Qwen3-8B"):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForCausalLM.from_pretrained(model_name)
self.history = []
def generate_response(self, user_input):
messages = self.history + [{"role": "user", "content": user_input}]
text = self.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = self.tokenizer(text, return_tensors="pt")
response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
response = self.tokenizer.decode(response_ids, skip_special_tokens=True)
# Update history
self.history.append({"role": "user", "content": user_input})
self.history.append({"role": "assistant", "content": response})
return response
# Example Usage
if __name__ == "__main__":
chatbot = QwenChatbot()
# First input (without /think or /no_think tags, thinking mode is enabled by default)
user_input_1 = "How many r's in strawberries?"
print(f"User: {user_input_1}")
response_1 = chatbot.generate_response(user_input_1)
print(f"Bot: {response_1}")
print("----------------------")
# Second input with /no_think
user_input_2 = "Then, how many r's in blueberries? /no_think"
print(f"User: {user_input_2}")
response_2 = chatbot.generate_response(user_input_2)
print(f"Bot: {response_2}")
print("----------------------")
# Third input with /think
user_input_3 = "Really? /think"
print(f"User: {user_input_3}")
response_3 = chatbot.generate_response(user_input_3)
print(f"Bot: {response_3}")
```
> [!NOTE]
> For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
> When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
## Agentic Use
Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
```python
from qwen_agent.agents import Assistant
# Define LLM
llm_cfg = {
'model': 'Qwen3-8B',
# Use the endpoint provided by Alibaba Model Studio:
# 'model_type': 'qwen_dashscope',
# 'api_key': os.getenv('DASHSCOPE_API_KEY'),
# Use a custom endpoint compatible with OpenAI API:
'model_server': 'http://localhost:8000/v1', # api_base
'api_key': 'EMPTY',
# Other parameters:
# 'generate_cfg': {
# # Add: When the response content is `<think>this is the thought</think>this is the answer;
# # Do not add: When the response has been separated by reasoning_content and content.
# 'thought_in_content': True,
# },
}
# Define Tools
tools = [
{'mcpServers': { # You can specify the MCP configuration file
'time': {
'command': 'uvx',
'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
},
"fetch": {
"command": "uvx",
"args": ["mcp-server-fetch"]
}
}
},
'code_interpreter', # Built-in tools
]
# Define Agent
bot = Assistant(llm=llm_cfg, function_list=tools)
# Streaming generation
messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
for responses in bot.run(messages=messages):
pass
print(responses)
```
## Processing Long Texts
Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
- Modifying the model files:
In the `config.json` file, add the `rope_scaling` fields:
```json
{
...,
"rope_scaling": {
"type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
}
```
For `llama.cpp`, you need to regenerate the GGUF file after the modification.
- Passing command line arguments:
For `vllm`, you can use
```shell
vllm serve ... --rope-scaling '{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
```
For `sglang`, you can use
```shell
python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
```
For `llama-server` from `llama.cpp`, you can use
```shell
llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
```
> [!IMPORTANT]
> If you encounter the following warning
> ```
> Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'original_max_position_embeddings'}
> ```
> please upgrade `transformers>=4.51.0`.
> [!NOTE]
> All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.**
> We advise adding the `rope_scaling` configuration only when processing long contexts is required.
> It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0.
> [!NOTE]
> The default `max_position_embeddings` in `config.json` is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance.
> [!TIP]
> The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed.
## Best Practices
To achieve optimal performance, we recommend the following settings:
1. **Sampling Parameters**:
- For thinking mode (`enable_thinking=True`), use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0`. **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
- For non-thinking mode (`enable_thinking=False`), we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
- For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.
3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
- **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.
### Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{qwen3,
title = {Qwen3},
url = {https://qwenlm.github.io/blog/qwen3/},
author = {Qwen Team},
month = {April},
year = {2025}
}
```
\ No newline at end of file
# Qwen3
骨干网络仅含0.45B参数,支持口音强度控制,适于实时语音交互,能满足不同场景下对语音口音克隆的多样化需求。
## 论文
`无`
## 模型结构
Qwen3采用通用的Decoder-Only结构,引入了MoE提升性能,首个「混合推理模型」,将「快思考」与「慢思考」集成进同一个模型。
<div align=center>
<img src="./doc/qwen.png"/>
</div>
## 算法原理
将输入embedding后放入attention、ffn等提取特征,最后利用Softmax将解码器最后一层产生的未经归一化的分数向量(logits)转换为概率分布,其中每个元素表示生成对应词汇的概率,这使得模型可以生成一个分布,并从中选择最可能的词作为预测结果。
## 环境配置
```
mv Qwen3_pytorch Qwen3 # 去框架名后缀
```
### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
# <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:e77c15729879
docker run -it --shm-size=64G -v $PWD/Qwen3:/home/Qwen3 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name qwen3 <your IMAGE ID> bash
cd /home/Qwen3
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
```
### Dockerfile(方法二)
```
cd /home/Qwen3/docker
docker build --no-cache -t qwen3:latest .
docker run --shm-size=64G --name qwen3 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../Qwen3:/home/Qwen3 -it qwen3 bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
- https://developer.hpccube.com/tool/
```
DTK驱动:dtk2504
python:python3.10
torch:2.4.1
torchvision:0.19.1
triton:3.0.0
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.4.0
transformers:4.51.0
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`
2、其它非特殊库参照requirements.txt安装
```
cd /home/Qwen3
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
```
## 数据集
`无`
## 训练
## 推理
预训练权重目录结构:
```
/home/Qwen3/
└── Qwen/Qwen3-8B
```
### 单机多卡
```
# 本项目以Qwen3-8B示例,其它Qwen3模型以此类推。
cd /home/Qwen3
python infer_transformers.py
# vllm>=0.8.4正在适配中,后期将陆续开放vllm版推理。
```
更多资料可参考源项目中的[`README_orgin`](./README_orgin.md)
## result
`输入: `
```
prompt: "Give me a short introduction to large language models."
```
`输出:`
```
<think>
Okay, the user wants a short introduction to large language models. Let me start by defining what they are. I should mention they're AI systems trained on massive text data. Maybe include how they process and generate human-like text. Also, touch on their applications like answering questions, creating content, coding. Need to keep it concise but cover the key points. Oh, and maybe mention their size, like parameters, but not too technical. Avoid jargon. Make sure it's easy to understand. Let me check if I'm missing anything important. Oh, maybe a sentence about their training process? Or just stick to the basics. Alright, structure: definition, training data, capabilities, applications. Keep each part brief. That should work.
</think>
Large language models (LLMs) are advanced artificial intelligence systems trained on vast amounts of text data to understand and generate human-like language. They can process and respond to complex queries, create written content, code, and even engage in conversations. These models, often with billions of parameters, excel at tasks like answering questions, summarizing information, and translating languages, making them versatile tools for various applications, from customer service to research and creative writing.
```
### 精度
DCU与GPU精度一致,推理框架:pytorch。
## 应用场景
### 算法类别
`对话问答`
### 热点应用行业
`制造,广媒,金融,能源,医疗,家居,教育`
## 预训练权重
魔搭社区下载地址为:[Qwen/Qwen3-8B](https://www.modelscope.cn/Qwen/Qwen3-8B.git)
## 源码仓库及问题反馈
- http://developer.sourcefind.cn/codes/modelzoo/Qwen3_pytorch.git
## 参考资料
- https://github.com/QwenLM/Qwen3.git
# Qwen3
<p align="center">
<img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>
<p>
<p align="center">
💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/qwen">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 Paper &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/blog/qwen3/">Blog</a> &nbsp&nbsp | &nbsp&nbsp📖 <a href="https://qwen.readthedocs.io/">Documentation</a>
<br>
🖥️ <a href="https://huggingface.co/spaces/Qwen/Qwen3-Demo">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp
</p>
Visit our Hugging Face or ModelScope organization (click links above), search checkpoints with names starting with `Qwen3-` or visit the [Qwen3 collection](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f), and you will find all you need! Enjoy!
To learn more about Qwen3, feel free to read our documentation \[[EN](https://qwen.readthedocs.io/en/latest/)|[ZH](https://qwen.readthedocs.io/zh-cn/latest/)\]. Our documentation consists of the following sections:
- Quickstart: the basic usages and demonstrations;
- Inference: the guidance for the inference with Transformers, including batch inference, streaming, etc.;
- Run Locally: the instructions for running LLM locally on CPU and GPU, with frameworks like llama.cpp and Ollama;
- Deployment: the demonstration of how to deploy Qwen for large-scale inference with frameworks like SGLang, vLLM, TGI, etc.;
- Quantization: the practice of quantizing LLMs with GPTQ, AWQ, as well as the guidance for how to make high-quality quantized GGUF files;
- Training: the instructions for post-training, including SFT and RLHF (TODO) with frameworks like Axolotl, LLaMA-Factory, etc.
- Framework: the usage of Qwen with frameworks for application, e.g., RAG, Agent, etc.
## Introduction
We are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models.
These models represent our most advanced and intelligent systems to date, improving from our experience in building QwQ and Qwen2.5.
We are making the weights of Qwen3 available to the public, including both dense and Mixture-of-Expert (MoE) models.
The highlights from Qwen3 include:
- **Dense and Mixture-of-Experts (MoE) models of various sizes**, available in 0.6B, 1.7B, 4B, 8B, 14B, 32B and 30B-A3B, 235B-A22B.
- **Seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose chat), ensuring optimal performance across various scenarios.
- **Significantly enhancement in reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
- **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
- **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
- **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
> ![IMPORTANT]
> Qwen3 models adopt a different naming scheme.
>
> The post-trained models do not use the "-Instruct" suffix any more. For example, Qwen3-32B is the newer version of Qwen2.5-32B-Instruct.
>
> The base models now have names ending with "-Base".
## News
- 2025.04.29: We released the Qwen3 series. Check our [blog](https://qwenlm.github.io/blog/qwen3) for more details!
- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more!
- 2024.06.06: We released the Qwen2 series. Check our [blog](https://qwenlm.github.io/blog/qwen2/)!
- 2024.03.28: We released the first MoE model of Qwen: Qwen1.5-MoE-A2.7B! Temporarily, only HF transformers and vLLM support the model. We will soon add the support of llama.cpp, mlx-lm, etc. Check our [blog](https://qwenlm.github.io/blog/qwen-moe/) for more information!
- 2024.02.05: We released the Qwen1.5 series.
## Performance
Detailed evaluation results are reported in this <a href="https://qwenlm.github.io/blog/qwen3/"> 📑 blog</a>.
For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/getting_started/speed_benchmark.html) .
## Run Qwen3
### 🤗 Transformers
Transformers is a library of pretrained natural language processing for inference and training.
The latest version of `transformers` is recommended and `transformers>=4.51.0` is required.
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-8B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language models."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# the result will begin with thinking content in <think></think> tags, followed by the actual response
print(tokenizer.decode(output_ids, skip_special_tokens=True))
```
By default, Qwen3 models will think before response.
This could be controled by
- `enable_thinking=False`: Passing `enable_thinking=False` to `tokenizer.apply_chat_template` will strictly prevent the model from generating thinking content.
- `/think` and `/nothink` instructions: Use those words in the system or user message to signify whether Qwen3 should think. In multi-turn conversations, the latest instruction is followed.
### ModelScope
We strongly advise users especially those in mainland China to use ModelScope.
ModelScope adopts a Python API similar to Transformers.
The CLI tool `modelscope download` can help you solve issues concerning downloading checkpoints.
### llama.cpp
[`llama.cpp`](https://github.com/ggml-org/llama.cpp) enables LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware.
`llama.cpp>=b5092` is required.
To use the CLI, run the following in a terminal:
```shell
./llama-cli -hf Qwen/Qwen3-8B-GGUF:Q8_0 --jinja --color -ngl 99 -fa -sm row --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 -c 40960 -n 32768 --no-context-shift
# CTRL+C to exit
```
To use the API server, run the following in a terminal:
```shell
./llama-server -hf Qwen/Qwen3-8B-GGUF:Q8_0 --jinja --reasoning-format deepseek -ngl 99 -fa -sm row --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 -c 40960 -n 32768 --no-context-shift --port 8080
```
A simple web front end will be at `http://localhost:8080` and an OpenAI-compatible API will be at `http://localhost:8080/v1`.
For additional guides, please refer to [our documentation](https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html).
### Ollama
After [installing ollama](https://ollama.com/), you can initiate the ollama service with the following command:
```shell
ollama serve
# You need to keep this service running whenever you are using ollama
```
To pull a model checkpoint and run the model, use the `ollama run` command. You can specify a model size by adding a suffix to `qwen3`, such as `:8b` or `:30b-a3b`:
```shell
ollama run qwen3:8b
# To exit, type "/bye" and press ENTER
```
You can also access the ollama service via its OpenAI-compatible API.
Please note that you need to (1) keep `ollama serve` running while using the API, and (2) execute `ollama run qwen3:8b` before utilizing this API to ensure that the model checkpoint is prepared.
The API is at `http://localhost:11434/v1/` by default.
For additional details, please visit [ollama.ai](https://ollama.com/).
### LMStudio
Qwen3 has already been supported by [lmstudio.ai](https://lmstudio.ai/). You can directly use LMStudio with our GGUF files.
### MLX-LM
If you are running on Apple Silicon, [`mlx-lm`](https://github.com/ml-explore/mlx-lm) also supports Qwen3 (`mlx-lm>=0.24.0`).
Look for models ending with MLX on HuggingFace Hub.
<!-- ### OpenVINO
Qwen2.5 has already been supported by [OpenVINO toolkit](https://github.com/openvinotoolkit). You can install and run this [chatbot example](https://github.com/OpenVINO-dev-contest/Qwen2.openvino) with Intel CPU, integrated GPU or discrete GPU. -->
<!-- ### Text generation web UI
You can directly use [`text-generation-webui`](https://github.com/oobabooga/text-generation-webui) for creating a web UI demo. If you use GGUF, remember to install the latest wheel of `llama.cpp` with the support of Qwen2.5. -->
<!-- ### llamafile
Clone [`llamafile`](https://github.com/Mozilla-Ocho/llamafile), run source install, and then create your own llamafile with the GGUF file following the guide [here](https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#creating-llamafiles). You are able to run one line of command, say `./qwen.llamafile`, to create a demo. -->
## Deploy Qwen3
Qwen3 is supported by multiple inference frameworks.
Here we demonstrate the usage of `SGLang` and `vLLM`.
You can also find Qwen3 models from various inference providers, e.g., [Alibaba Cloud Model Studio](https://www.alibabacloud.com/en/product/modelstudio).
### SGLang
[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
SGLang could be used to launch a server with OpenAI-compatible API service.
`sglang>=0.4.6.post1` is required.
It is as easy as
```shell
python -m sglang.launch_server --model-path Qwen/Qwen3-8B --port 30000 --reasoning-parser qwen3
```
An OpenAI-compatible API will be available at `http://localhost:30000/v1`.
### vLLM
[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs.
`vllm>=0.8.4` is required.
```shell
vllm serve Qwen/Qwen3-8B --port 8000 --enable-reasoning-parser --reasoning-parser deepseek_r1
```
An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
### MindIE
For depolyment on Ascend NPUs, please visit [Modelers](https://modelers.cn/) and search for Qwen3.
<!--
### OpenLLM
[OpenLLM](https://github.com/bentoml/OpenLLM) allows you to easily run Qwen2.5 as OpenAI-compatible APIs. You can start a model server using `openllm serve`. For example:
```bash
openllm serve qwen2.5:7b
```
The server is active at `http://localhost:3000/`, providing OpenAI-compatible APIs. You can create an OpenAI client to call its chat API. For more information, refer to [our documentation](https://qwen.readthedocs.io/en/latest/deployment/openllm.html). -->
## Build with Qwen3
### Tool Use
For tool use capabilities, we recommend taking a look at [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent), which provides a wrapper around these APIs to support tool use or function calling with MCP support.
Tool use with Qwen3 can also be conducted with SGLang, vLLM, Transformers, llama.cpp, Ollama, etc.
Follow guides in our documentation to see how to enable the support.
### Finetuning
We advise you to use training frameworks, including [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl), [unsloth](https://github.com/unslothai/unsloth), [Swift](https://github.com/modelscope/swift), [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory), etc., to finetune your models with SFT, DPO, GRPO, etc.
## License Agreement
All our open-source models are licensed under Apache 2.0.
You can find the license files in the respective Hugging Face repositories.
## Citation
If you find our work helpful, feel free to give us a cite.
```
@article{qwen2.5,
title = {Qwen2.5 Technical Report},
author = {An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mingfeng Xue and Pei Zhang and Qin Zhu and Rui Men and Runji Lin and Tianhao Li and Tingyu Xia and Xingzhang Ren and Xuancheng Ren and Yang Fan and Yang Su and Yichang Zhang and Yu Wan and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zihan Qiu},
journal = {arXiv preprint arXiv:2412.15115},
year = {2024}
}
@article{qwen2,
title = {Qwen2 Technical Report},
author = {An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
journal = {arXiv preprint arXiv:2407.10671},
year = {2024}
}
```
## Contact Us
If you are interested to leave a message to either our research team or product team, join our [Discord](https://discord.gg/z3GAxXZ9Ce) or [WeChat groups](assets/wechat.png)!
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy
ENV DEBIAN_FRONTEND=noninteractive
# RUN yum update && yum install -y git cmake wget build-essential
# RUN source /opt/dtk-dtk25.04/env.sh
# # 安装pip相关依赖
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
transformers>=4.51.0
ARG CUDA_VERSION=12.1.0
ARG from=nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu20.04
FROM ${from} as base
RUN <<EOF
apt update -y && apt upgrade -y && apt install -y --no-install-recommends \
git \
git-lfs \
python3 \
python3-pip \
python3-dev \
wget \
vim \
&& rm -rf /var/lib/apt/lists/*
EOF
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN git lfs install
FROM base as dev
WORKDIR /
RUN mkdir -p /data/shared/Qwen
WORKDIR /data/shared/Qwen/
FROM dev as bundle_req
RUN pip install --no-cache-dir networkx==3.1
RUN pip3 install --no-cache-dir torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
RUN pip3 install --no-cache-dir transformers==4.40.2 accelerate tiktoken einops scipy
FROM bundle_req as bundle_finetune
ARG BUNDLE_FINETUNE=true
RUN <<EOF
if [ "$BUNDLE_FINETUNE" = "true" ]; then
cd /data/shared/Qwen
# Full-finetune / LoRA.
pip3 install --no-cache-dir "deepspeed==0.14.2" "peft==0.11.1"
# Q-LoRA.
apt update -y && DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends \
libopenmpi-dev openmpi-bin \
&& rm -rf /var/lib/apt/lists/*
pip3 install --no-cache-dir "optimum==1.20.0" "auto-gptq==0.7.1" "autoawq==0.2.5" mpi4py
fi
EOF
FROM bundle_finetune as bundle_vllm
ARG BUNDLE_VLLM=true
RUN <<EOF
if [ "$BUNDLE_VLLM" = "true" ]; then
cd /data/shared/Qwen
pip3 install --no-cache-dir vllm==0.4.3 "fschat[model_worker,webui]==0.2.36"
fi
EOF
FROM bundle_vllm as bundle_flash_attention
ARG BUNDLE_FLASH_ATTENTION=true
RUN <<EOF
if [ "$BUNDLE_FLASH_ATTENTION" = "true" ]; then
pip3 install --no-cache-dir flash-attn==2.5.8 --no-build-isolation
fi
EOF
FROM bundle_flash_attention as final
COPY ../examples/sft/* ./
COPY ../examples/demo/* ./
EXPOSE 80
#!/usr/bin/env bash
#
# This script will automatically pull docker image from DockerHub, and start a container to run the Qwen-Chat cli-demo.
IMAGE_NAME=qwenllm/qwen:2-cu121
QWEN_CHECKPOINT_PATH=/path/to/Qwen-Instruct
CONTAINER_NAME=qwen2
function usage() {
echo '
Usage: bash docker/docker_cli_demo.sh [-i IMAGE_NAME] -c [/path/to/Qwen-Instruct] [-n CONTAINER_NAME]
'
}
while [[ "$1" != "" ]]; do
case $1 in
-i | --image-name )
shift
IMAGE_NAME=$1
;;
-c | --checkpoint )
shift
QWEN_CHECKPOINT_PATH=$1
;;
-n | --container-name )
shift
CONTAINER_NAME=$1
;;
-h | --help )
usage
exit 0
;;
* )
echo "Unknown argument ${1}"
exit 1
;;
esac
shift
done
if [ ! -e ${QWEN_CHECKPOINT_PATH}/config.json ]; then
echo "Checkpoint config.json file not found in ${QWEN_CHECKPOINT_PATH}, exit."
exit 1
fi
sudo docker pull ${IMAGE_NAME} || {
echo "Pulling image ${IMAGE_NAME} failed, exit."
exit 1
}
sudo docker run --gpus all --rm --name ${CONTAINER_NAME} \
--mount type=bind,source=${QWEN_CHECKPOINT_PATH},target=/data/shared/Qwen/Qwen-Instruct \
-it ${IMAGE_NAME} \
python cli_demo.py -c /data/shared/Qwen/Qwen-Instruct/
\ No newline at end of file
#!/usr/bin/env bash
#
# This script will automatically pull docker image from DockerHub, and start a daemon container to run the Qwen-Chat web-demo.
IMAGE_NAME=qwenllm/qwen:2-cu121
QWEN_CHECKPOINT_PATH=/path/to/Qwen-Instruct
PORT=8901
CONTAINER_NAME=qwen2
function usage() {
echo '
Usage: bash docker/docker_web_demo.sh [-i IMAGE_NAME] -c [/path/to/Qwen-Instruct] [-n CONTAINER_NAME] [--port PORT]
'
}
while [[ "$1" != "" ]]; do
case $1 in
-i | --image-name )
shift
IMAGE_NAME=$1
;;
-c | --checkpoint )
shift
QWEN_CHECKPOINT_PATH=$1
;;
-n | --container-name )
shift
CONTAINER_NAME=$1
;;
--port )
shift
PORT=$1
;;
-h | --help )
usage
exit 0
;;
* )
echo "Unknown argument ${1}"
exit 1
;;
esac
shift
done
if [ ! -e ${QWEN_CHECKPOINT_PATH}/config.json ]; then
echo "Checkpoint config.json file not found in ${QWEN_CHECKPOINT_PATH}, exit."
exit 1
fi
sudo docker pull ${IMAGE_NAME} || {
echo "Pulling image ${IMAGE_NAME} failed, exit."
exit 1
}
sudo docker run --gpus all -d --restart always --name ${CONTAINER_NAME} \
-v /var/run/docker.sock:/var/run/docker.sock -p ${PORT}:80 \
--mount type=bind,source=${QWEN_CHECKPOINT_PATH},target=/data/shared/Qwen/Qwen-Instruct \
-it ${IMAGE_NAME} \
python web_demo.py --server-port 80 --server-name 0.0.0.0 -c /data/shared/Qwen/Qwen-Instruct/ && {
echo "Successfully started web demo. Open 'http://localhost:${PORT}' to try!
Run \`docker logs ${CONTAINER_NAME}\` to check demo status.
Run \`docker rm -f ${CONTAINER_NAME}\` to stop and remove the demo."
}
\ No newline at end of file
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
# Qwen Documentation
This is the source of the documentation at <https://qwen.readthedocs.io>.
## Quick Start
We use `sphinx` to manage the documentation and use the `furo` theme.
To get started, simply run
```bash
pip install -r requirements-docs.txt
```
Then run `make html` or `sphinx-build -M html source build` and it will compile the docs and put it under the `build/html` directory.
## Translation
The documentation is available in both English and Simplified Chinese. We use
`sphinx-intl` to work with Sphinx translation flow, following [this article](https://www.sphinx-doc.org/en/master/usage/advanced/intl.html).
You need to install the Python package `sphinx-intl` before starting.
1. After updating the English documentation, run `make gettext`, and the pot files will be placed in the `build/gettext` directory. `make gettext` can be slow if the doc is long.
2. Use the generated pot files to update the po files:
```bash
sphinx-intl update -p build/gettext -l zh_CN -w 0
```
3. Translate po files at `locales\zh_CN\LC_MESSAGES`. Pay attention to fuzzy matches (messages after `#, fuzzy`). Please be careful not to break reST notation.
4. Build translated document: `make -e SPHINXOPTS="-D language='zh_CN'" html` or `sphinx-build -M html source build -D language=zh_CN`
## Auto Build
```bash
pip install sphinx-autobuild
```
To autobuild the default version:
```bash
sphinx-autobuild source build/html
```
To autobuild the translated version:
```bash
sphinx-autobuild source build/html -D language=zh_CN --watch locales/zh_CN
```
By default, the doc is at `http://127.0.0.1:8000`
\ No newline at end of file
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/deployment/openllm.rst:2 986ea00cb5af4a0d82f974ed79a82430
msgid "OpenLLM"
msgstr "OpenLLM"
#: ../../Qwen/source/deployment/openllm.rst:5 78be03fbdccb429892b03bf84596411b
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"
#: ../../Qwen/source/deployment/openllm.rst:7 a001f11d1c5440188121d20b3baf59db
msgid "OpenLLM allows developers to run Qwen2.5 models of different sizes as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Qwen2.5. Visit `the OpenLLM repository <https://github.com/bentoml/OpenLLM/>`_ to learn more."
msgstr "OpenLLM 允许开发者通过一个命令运行不同大小的 Qwen2.5 模型,提供 OpenAI 兼容的 API。它具有内置的聊天 UI,先进的推理后端,以及简化的工作流程来使用 Qwen2.5 创建企业级云部署。访问 `OpenLLM 仓库 <https://github.com/bentoml/OpenLLM/>`_ 了解更多信息。"
#: ../../Qwen/source/deployment/openllm.rst:10 229f89c3be65442bbe15905d75a0d13d
msgid "Installation"
msgstr "安装"
#: ../../Qwen/source/deployment/openllm.rst:12 79421f700fbc426cb6ce9841aff67503
msgid "Install OpenLLM using ``pip``."
msgstr "使用 ``pip`` 安装 OpenLLM。"
#: ../../Qwen/source/deployment/openllm.rst:18 69cfd6fe2e274173ad4065be91b71472
msgid "Verify the installation and display the help information:"
msgstr "验证安装并显示帮助信息:"
#: ../../Qwen/source/deployment/openllm.rst:25 503cae99b14c4ef4b322b8ec0bd2d32d
msgid "Quickstart"
msgstr "快速开始"
#: ../../Qwen/source/deployment/openllm.rst:27 0ea788c801404d8780404611c87644b0
msgid "Before you run any Qwen2.5 model, ensure your model repository is up to date by syncing it with OpenLLM's latest official repository."
msgstr "在运行任何 Qwen2.5 模型之前,确保您的模型仓库与 OpenLLM 的最新官方仓库同步。"
#: ../../Qwen/source/deployment/openllm.rst:33 8852ff46ecdb45b2bfc9885bbfaacb02
msgid "List the supported Qwen2.5 models:"
msgstr "列出支持的 Qwen2.5 模型:"
#: ../../Qwen/source/deployment/openllm.rst:39 3e4f6c11396844adb30d4e5812339484
msgid "The results also display the required GPU resources and supported platforms:"
msgstr "结果还会显示所需的 GPU 资源和支持的平台:"
#: ../../Qwen/source/deployment/openllm.rst:57 ac4c0db02f5249d5882940820779db9a
msgid "To start a server with one of the models, use ``openllm serve`` like this:"
msgstr "要使用其中一个模型来启动服务器,请使用 ``openllm serve`` 命令,例如:"
#: ../../Qwen/source/deployment/openllm.rst:63 0a1d3ec35c684e3bb3e971c916aa9be7
msgid "By default, the server starts at ``http://localhost:3000/``."
msgstr "默认情况下,服务器启动在 http://localhost:3000/。"
#: ../../Qwen/source/deployment/openllm.rst:66 2e787de9a62f4342bdf8f88ee0df5379
msgid "Interact with the model server"
msgstr "与模型服务器交互"
#: ../../Qwen/source/deployment/openllm.rst:68 b22802ad9027458bb30ea0da665fea36
msgid "With the model server up and running, you can call its APIs in the following ways:"
msgstr "服务器运行后,可以通过以下方式调用其 API:"
#: ../../Qwen/source/deployment/openllm.rst 76214ea690094930899d6f2eddcc1454
msgid "CURL"
msgstr "CURL"
#: ../../Qwen/source/deployment/openllm.rst:74 42775a3df58f474782d29f2f82707bd9
msgid "Send an HTTP request to its ``/generate`` endpoint via CURL:"
msgstr "通过 CURL 向其 ``/generate`` 端点发送 HTTP 请求:"
#: ../../Qwen/source/deployment/openllm.rst 4f0ff3eee2ab49dda5a72bd611a9d45e
msgid "Python client"
msgstr "Python 客户端"
#: ../../Qwen/source/deployment/openllm.rst:91 ce2e11a46e434798947b1e74ce82a19c
msgid "Call the OpenAI-compatible endpoints with frameworks and tools that support the OpenAI API protocol. Here is an example:"
msgstr "使用支持 OpenAI API 协议的框架和工具来调用。例如:"
#: ../../Qwen/source/deployment/openllm.rst 107921d1a855430ca70c8c163d37c7f2
msgid "Chat UI"
msgstr "聊天 UI"
#: ../../Qwen/source/deployment/openllm.rst:118
#: b92df2759cd54c2b8316e2a160ede656
msgid "OpenLLM provides a chat UI at the ``/chat`` endpoint for the LLM server at http://localhost:3000/chat."
msgstr "OpenLLM 为 LLM 服务器提供的聊天 UI 位于 ``/chat`` 端点,地址为 http://localhost:3000/chat。"
#: ../../Qwen/source/deployment/openllm.rst:123
#: 0d3fa679178f443caf9c87623001be1f
msgid "Model repository"
msgstr "模型仓库"
#: ../../Qwen/source/deployment/openllm.rst:125
#: 54d6a9bdcc064aeb95a23b60d3d575ab
msgid "A model repository in OpenLLM represents a catalog of available LLMs. You can add your own repository to OpenLLM with custom Qwen2.5 variants for your specific needs. See our `documentation to learn details <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_."
msgstr "OpenLLM 中的模型仓库表示可用的 LLM 目录。您可以为 OpenLLM 添加自定义的 Qwen2.5 模型仓库,以满足您的特定需求。请参阅 `我们的文档 <https://github.com/bentoml/OpenLLM?tab=readme-ov-file#model-repository>`_ 了解详细信息。"
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/deployment/sglang.md:1 4886c9be510e44ba968bba79c7e01e2b
msgid "SGLang"
msgstr ""
#: ../../Qwen/source/deployment/sglang.md:3 fa388b3c599c454bbe22dc7c831723c1
msgid "[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models."
msgstr "[SGLang](https://github.com/sgl-project/sglang) 是一个用于大型语言模型和视觉语言模型的快速推理框架。"
#: ../../Qwen/source/deployment/sglang.md:5 43fe1ab3622b4d619de1ba451ff5b5c4
msgid "To learn more about SGLang, please refer to the [documentation](https://docs.sglang.ai/)."
msgstr "要了解更多关于 SGLang 的信息,请参阅[官方文档](https://docs.sglang.ai/)。"
#: ../../Qwen/source/deployment/sglang.md:7 4e7093847f104f5c91bf12495db0e2df
msgid "Environment Setup"
msgstr "环境配置"
#: ../../Qwen/source/deployment/sglang.md:9 404501b6bb754a01afa398ce270f4ad6
msgid "By default, you can install `sglang` with pip in a clean environment:"
msgstr "默认情况下,你可以通过 pip 在新环境中安装 `sglang` : "
#: ../../Qwen/source/deployment/sglang.md:15 8794cc70acd141eeaef4717a190b11f4
msgid "Please note that `sglang` relies on `flashinfer-python` and has strict dependencies on `torch` and its CUDA versions. Check the note in the official document for installation ([link](https://docs.sglang.ai/start/install.html)) for more help."
msgstr "请留意预构建的 `sglang` 依赖 `flashinfer-python`,并对`torch`和其CUDA版本有强依赖。请查看[官方文档](https://docs.sglang.ai/start/install.html)中的注意事项以获取有关安装的帮助。"
#: ../../Qwen/source/deployment/sglang.md:18 06e04edfe3094363bcbc5b8758c8b16c
msgid "API Service"
msgstr "API 服务"
#: ../../Qwen/source/deployment/sglang.md:20 5969d8121d8a4af99d790844c4b348c5
msgid "It is easy to build an OpenAI-compatible API service with SGLang, which can be deployed as a server that implements OpenAI API protocol. By default, it starts the server at `http://localhost:30000`. You can specify the address with `--host` and `--port` arguments. Run the command as shown below:"
msgstr "借助 SGLang ,构建一个与OpenAI API兼容的API服务十分简便,该服务可以作为实现OpenAI API协议的服务器进行部署。默认情况下,它将在 `http://localhost:30000` 启动服务器。您可以通过 `--host` 和 `--port` 参数来自定义地址。请按照以下所示运行命令:"
#: ../../Qwen/source/deployment/sglang.md:28 32a52bb639634b9b9c196696dc20e2c5
msgid "By default, if the `--model-path` does not point to a valid local directory, it will download the model files from the HuggingFace Hub. To download model from ModelScope, set the following before running the above command:"
msgstr "默认情况下,如果模型未指向有效的本地目录,它将从 HuggingFace Hub 下载模型文件。要从 ModelScope 下载模型,请在运行上述命令之前设置以下内容:"
#: ../../Qwen/source/deployment/sglang.md:34 cd33984af3e045549668c8ad682f7612
msgid "For distrbiuted inference with tensor parallelism, it is as simple as"
msgstr "对于使用张量并行的分布式推理,操作非常简单:"
#: ../../Qwen/source/deployment/sglang.md:38 4db95581ffd046a9b6d532933403d985
msgid "The above command will use tensor parallelism on 4 GPUs. You should change the number of GPUs according to your demand."
msgstr "上述命令将在 4 块 GPU 上使用张量并行。您应根据需求调整 GPU 的数量。"
#: ../../Qwen/source/deployment/sglang.md:41 765cc12e934b4ab6881f0a71693fcc3d
msgid "Basic Usage"
msgstr "基本用法"
#: ../../Qwen/source/deployment/sglang.md:43 51032557dac94cb3b14c3842076192a8
msgid "Then, you can use the [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) to communicate with Qwen:"
msgstr "然后,您可以利用 [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) 来与Qwen进行对话:"
#: ../../Qwen/source/deployment/sglang.md 0708e8d2e6a44e94956e44f3a83bb4d8
#: 3bfc1bfe04ea4b49bfd1d5c6b5af52d7
msgid "curl"
msgstr ""
#: ../../Qwen/source/deployment/sglang.md 3b62fc6e456d44a6ba9cc8f5519fc3c6
#: ab964c5641584f7a9ef4252ecf0428cb
msgid "Python"
msgstr ""
#: ../../Qwen/source/deployment/sglang.md:63
#: ../../Qwen/source/deployment/sglang.md:130 18da82bbe0db4a59aa430b68b91db904
#: a2fccb4d7c164911a35e5ff6f30d98df
msgid "You can use the API client with the `openai` Python SDK as shown below:"
msgstr "或者您可以如下面所示使用 `openai` Python SDK中的 API 客户端:"
#: ../../Qwen/source/deployment/sglang.md:91 2bff40bb9f104cf2b19e6cf8169bf18d
msgid "While the default sampling parameters would work most of the time for thinking mode, it is recommended to adjust the sampling parameters according to your application, and always pass the sampling parameters to the API."
msgstr "虽然默认的采样参数在大多数情况下适用于思考模式,但建议根据您的应用调整采样参数,并始终将采样参数传递给 API。"
#: ../../Qwen/source/deployment/sglang.md:97 e10b0bbcaa7c4e54a59f7a30fa8760ef
msgid "Thinking & Non-Thinking Modes"
msgstr "思考与非思考模式"
#: ../../Qwen/source/deployment/sglang.md:100 ff0a121d43d5494597e8fc3b832f4893
msgid "This feature has not been released. For more information, please see this [pull request](https://github.com/sgl-project/sglang/pull/5551)."
msgstr "此功能尚未发布。更多信息,请参阅此[pull request](https://github.com/sgl-project/sglang/pull/5551)。"
#: ../../Qwen/source/deployment/sglang.md:104 8ba3c8c378ed4df7acb28f04e41bf067
msgid "Qwen3 models will think before respond. This behaviour could be controled by either the hard switch, which could disable thinking completely, or the soft switch, where the model follows the instruction of the user on whether or not it should think."
msgstr "Qwen3 模型会在回复前进行思考。这种行为可以通过硬开关(完全禁用思考)或软开关(模型遵循用户关于是否应该思考的指令)来控制。"
#: ../../Qwen/source/deployment/sglang.md:107 dcc39b3925704aee927b220cbf9b341d
msgid "The hard switch is availabe in SGLang through the following configuration to the API call. To disable thinking, use"
msgstr "硬开关在 vLLM 中可以通过以下 API 调用配置使用。要禁用思考,请使用"
#: ../../Qwen/source/deployment/sglang.md:159 952c90f4f1c84daba9cb66bfeb32725f
msgid "It is recommended to set sampling parameters differently for thinking and non-thinking modes."
msgstr "建议为思考模式和非思考模式分别设置不同的采样参数。"
#: ../../Qwen/source/deployment/sglang.md:162 750cee1281d74246bc7cf47ac9e0d502
msgid "Parsing Thinking Content"
msgstr "解析思考内容"
#: ../../Qwen/source/deployment/sglang.md:164 4f1f6c5d59134ea1bf6a625cd5081c51
msgid "SGLang supports parsing the thinking content from the model generation into structured messages:"
msgstr "SGLang 支持将模型生成的思考内容解析为结构化消息:"
#: ../../Qwen/source/deployment/sglang.md:169 0517d0a9cf694f6caabcbe69e3e1e845
msgid "The response message will have a field named `reasoning_content` in addition to `content`, containing the thinking content generated by the model."
msgstr "响应消息除了包含 `content` 字段外,还会有一个名为 `reasoning_content` 的字段,其中包含模型生成的思考内容。"
#: ../../Qwen/source/deployment/sglang.md:172 0225706aa7fe441c82d34f81b348fd42
msgid "Please note that this feature is not OpenAI API compatible."
msgstr "请注意,此功能与 OpenAI API 规范不一致。"
#: ../../Qwen/source/deployment/sglang.md:175 45a5f606e86543c08eacf7686b5a2def
msgid "Parsing Tool Calls"
msgstr "解析工具调用"
#: ../../Qwen/source/deployment/sglang.md:177 0aa3be18c7a5476cb915d6686c58387d
msgid "SGLang supports parsing the tool calling content from the model generation into structured messages:"
msgstr "SGLang 支持将模型生成的工具调用内容解析为结构化消息:"
#: ../../Qwen/source/deployment/sglang.md:182 dc096c7fb79c4b9ca0dd2c9cdd7ec890
msgid "For more information, please refer to [our guide on Function Calling](../framework/function_call.md)."
msgstr "详细信息,请参阅[函数调用的指南](../framework/function_call.md#vllm)。"
#: ../../Qwen/source/deployment/sglang.md:184 a58bba52efc44663af792d859bd3b410
msgid "Structured/JSON Output"
msgstr "结构化/JSON输出"
#: ../../Qwen/source/deployment/sglang.md:186 518f257e9d6d4080b41f980467573f7f
msgid "SGLang supports structured/JSON output. Please refer to [SGLang's documentation](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API). Besides, it is also recommended to instruct the model to generate the specific format in the system message or in your prompt."
msgstr "SGLang 支持结构化/JSON 输出。请参阅[SGLan文档](https://docs.sglang.ai/backend/structured_outputs.html#OpenAI-Compatible-API)。此外,还建议在系统消息或您的提示中指示模型生成特定格式。"
#: ../../Qwen/source/deployment/sglang.md:190 3a6d08a831584d6b8392da2650e8bf0b
msgid "Serving Quantized models"
msgstr "部署量化模型"
#: ../../Qwen/source/deployment/sglang.md:192 b2fe212a02b84349940f4c0c30cde88d
msgid "Qwen3 comes with two types of pre-quantized models, FP8 and AWQ."
msgstr "Qwen3 提供了两种类型的预量化模型:FP8 和 AWQ。"
#: ../../Qwen/source/deployment/sglang.md:194 efc85fc46a564483bdb872dbf5d61f3c
msgid "The command serving those models are the same as the original models except for the name change:"
msgstr "部署这些模型的命令与原始模型相同,只是名称有所更改:"
#: ../../Qwen/source/deployment/sglang.md:203 11a6d1bb983d4e60a55f5d579f1eb76b
msgid "Context Length"
msgstr "上下文长度"
#: ../../Qwen/source/deployment/sglang.md:205 de0293719e06477fbde6afc533973b1a
msgid "The context length for Qwen3 models in pretraining is up to 32,768 tokenns. To handle context length substantially exceeding 32,768 tokens, RoPE scaling techniques should be applied. We have validated the performance of [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts."
msgstr "Qwen3 模型在预训练中的上下文长度最长为 32,768 个 token。为了处理显著超过 32,768 个 token 的上下文长度,应应用 RoPE 缩放技术。我们已经验证了 [YaRN](https://arxiv.org/abs/2309.00071) 的性能,这是一种增强模型长度外推的技术,可确保在长文本上的最佳性能。"
#: ../../Qwen/source/deployment/sglang.md:209 0ee16aabbc794331a329e52ab2ca40e7
msgid "SGLang supports YaRN, which can be configured as"
msgstr "SGLang 支持 YaRN,可以配置为"
#: ../../Qwen/source/deployment/sglang.md:215 c3ba0a9b3502462795dfd887912e9357
msgid "SGLang implements static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** We advise adding the `rope_scaling` configuration only when processing long contexts is required. It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0."
msgstr "SGLang 实现了静态 YaRN,这意味着无论输入长度如何,缩放因子都保持不变,**这可能会对较短文本的性能产生影响。** 我们建议仅在需要处理长上下文时添加 `rope_scaling` 配置。还建议根据需要调整 `factor`。例如,如果您的应用程序的典型上下文长度为 65,536 个 token,则最好将 `factor` 设置为 2.0。"
#: ../../Qwen/source/deployment/sglang.md:221 398c3e38c94e446aa9922dd04dce609c
msgid "The default `max_position_embeddings` in `config.json` is set to 40,960, which is used by SGLang. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing and leave adequate room for model thinking. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance."
msgstr "`config.json` 中的默认 `max_position_embeddings` 被设置为 40,960,SGLang 将使用该值。此分配包括为输出保留 32,768 个 token,为典型提示保留 8,192 个 token,这足以应对大多数涉及短文本处理的场景,并为模型思考留出充足空间。如果平均上下文长度不超过 32,768 个 token,我们不建议在此场景中启用 YaRN,因为这可能会降低模型性能。"
# Copyright (C) 2024, Qwen Team, Alibaba Group.
# This file is distributed under the same license as the Qwen package.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/deployment/skypilot.rst:2 795ad4f30e27494d93675f71bb1a5cc4
msgid "SkyPilot"
msgstr ""
#: ../../Qwen/source/deployment/skypilot.rst:5 aad807db94a24d868c9c1b364b47e152
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"
#: ../../Qwen/source/deployment/skypilot.rst:8 d6bbf736584f4bbfa9c300d50a2ed669
msgid "What is SkyPilot"
msgstr "SkyPilot 是什么"
#: ../../Qwen/source/deployment/skypilot.rst:10
#: b66facae41bf493880e43044e2915a45
msgid "SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, the highest GPU availability, and managed execution. Its features include:"
msgstr "SkyPilot 是一个可以在任何云上运行 LLM 、 AI 应用以及批量任务的框架,旨在实现最大程度的成本节省、最高的 GPU 可用性以及受管理的执行过程。其特性包括:"
#: ../../Qwen/source/deployment/skypilot.rst:14
#: 621f021163c549d0aadb1c911a3a3ef5
msgid "Get the best GPU availability by utilizing multiple resources pools across multiple regions and clouds."
msgstr "通过跨区域和跨云充分利用多个资源池,以获得最佳的 GPU 可用性。"
#: ../../Qwen/source/deployment/skypilot.rst:16
#: ea1723c3b5be454cad3219836f4386d8
msgid "Pay absolute minimum — SkyPilot picks the cheapest resources across regions and clouds. No managed solution markups."
msgstr "把费用降到最低—— SkyPilot 在各区域和云平台中为您挑选最便宜的资源。无需任何托管解决方案的额外加价。"
#: ../../Qwen/source/deployment/skypilot.rst:18
#: e479693ecf08411ca35d8d0727c8f441
msgid "Scale up to multiple replicas across different locations and accelerators, all served with a single endpoint"
msgstr "将服务扩展到多个副本上,所有副本通过单一 endpoint 对外提供服务"
#: ../../Qwen/source/deployment/skypilot.rst:20
#: 1f9cdd2ae2544d1faa8a4c463ee0e42c
msgid "Everything stays in your cloud account (your VMs & buckets)"
msgstr "所有内容均保存在您的云账户中(包括您的虚拟机和 bucket )"
#: ../../Qwen/source/deployment/skypilot.rst:21
#: 5bb9b617764942d989e5093463a359f0
msgid "Completely private - no one else sees your chat history"
msgstr "完全私密 - 没有其他人能看到您的聊天记录"
#: ../../Qwen/source/deployment/skypilot.rst:24
#: cf0c456ac72f40ac98790c11dc243317
msgid "Install SkyPilot"
msgstr "安装 SkyPilot"
#: ../../Qwen/source/deployment/skypilot.rst:26
#: 78d86c1fa8104b138b01aed640b262fc
msgid "We advise you to follow the `instruction <https://skypilot.readthedocs.io/en/latest/getting-started/installation.html>`__ to install SkyPilot. Here we provide a simple example of using ``pip`` for the installation as shown below."
msgstr "我们建议您按照 `指示 <https://skypilot.readthedocs.io/en/latest/getting-started/installation.html>`__ 安装 SkyPilot 。以下为您提供了一个使用 ``pip`` 进行安装的简单示例:"
#: ../../Qwen/source/deployment/skypilot.rst:38
#: a7c88265bf404f55b85388c81a240199
msgid "After that, you need to verify cloud access with a command like:"
msgstr "随后,您需要用如下命令确认是否能使用云:"
#: ../../Qwen/source/deployment/skypilot.rst:44
#: 72025dfba0144f63a720f6da0dd39bfa
msgid "For more information, check the `official document <https://skypilot.readthedocs.io/en/latest/getting-started/installation.html>`__ and see if you have set up your cloud accounts correctly."
msgstr "若需更多信息,请查阅官方文档,确认您的云账户设置是否正确无误。"
#: ../../Qwen/source/deployment/skypilot.rst:47
#: 61be006061554e5ea40d55497e11e192
msgid "Alternatively, you can also use the official docker image with SkyPilot master branch automatically cloned by running:"
msgstr "或者,您也可以使用官方提供的 docker 镜像,可以自动克隆 SkyPilot 的主分支:"
#: ../../Qwen/source/deployment/skypilot.rst:63
#: 4ae89fb44c6643a3a82fca5cee622af4
msgid "Running Qwen2.5-72B-Instruct with SkyPilot"
msgstr "使用 SkyPilot 运行 Qwen2.5-72B-Instruct "
#: ../../Qwen/source/deployment/skypilot.rst:65
#: 1bc4973c2eb745689ded0af54ba33e0e
msgid "Start serving Qwen2.5-72B-Instruct on a single instance with any available GPU in the list specified in `serve-72b.yaml <https://github.com/skypilot-org/skypilot/blob/master/llm/qwen/serve-72b.yaml>`__ with a vLLM-powered OpenAI-compatible endpoint:"
msgstr "`serve-72b.yaml <https://github.com/skypilot-org/skypilot/blob/master/llm/qwen/serve-72b.yaml>`__ 中列出了支持的 GPU 。您可使用配备这类 GPU 的单个运算实例来部署 Qwen2.5-72B-Instruct 服务。该服务由 vLLM 搭建,并与 OpenAI API 兼容。以下为部署方法:"
#: ../../Qwen/source/deployment/skypilot.rst:74
#: ../../Qwen/source/deployment/skypilot.rst:123
#: ac3692ed16974facbd58b6886cd111af b325de015e7b4bb0a91491d3f7418792
msgid "**Before launching, make sure you have changed Qwen/Qwen2-72B-Instruct to Qwen/Qwen2.5-72B-Instruct in the YAML file.**"
msgstr "**在启动之前,请先将 YAML 文件中的 Qwen/Qwen2-72B-Instruct 修改为 Qwen/Qwen2.5-72B-Instruct。**"
#: ../../Qwen/source/deployment/skypilot.rst:76
#: 6046b3c86fae4a43878fbadbeb33fbd8
msgid "Send a request to the endpoint for completion:"
msgstr "向该 endpoint 发送续写请求:"
#: ../../Qwen/source/deployment/skypilot.rst:90
#: 2ec56c2028a94f568fd2c1a65063d25a
msgid "Send a request for chat completion:"
msgstr "向该 endpoint 发送对话续写请求"
#: ../../Qwen/source/deployment/skypilot.rst:112
#: c8e140ddfd914ff5a460621a7ca1891e
msgid "Scale up the service with SkyPilot Serve"
msgstr "使用 SkyPilot Serve 扩展服务规模"
#: ../../Qwen/source/deployment/skypilot.rst:114
#: 0db304ab396d45adb6017d78cd1ee4a2
msgid "With `SkyPilot Serve <https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html>`__, a serving library built on top of SkyPilot, scaling up the Qwen service is as simple as running:"
msgstr "使用 `SkyPilot Serve <https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html>`__ 扩展 Qwen 的服务规模非常容易,只需运行:"
#: ../../Qwen/source/deployment/skypilot.rst:125
#: 25bbbf9e49be44d3899074ff97202d71
msgid "This will start the service with multiple replicas on the cheapest available locations and accelerators. SkyServe will automatically manage the replicas, monitor their health, autoscale based on load, and restart them when needed."
msgstr "这将启动服务,使用多个副本部署在最经济的可用位置和加速器上。 SkyServe 将自动管理这些副本,监控其健康状况,根据负载进行自动伸缩,并在必要时重启它们。"
#: ../../Qwen/source/deployment/skypilot.rst:130
#: bda628bab7ef41a0918dc4b80a9b3cfe
msgid "A single endpoint will be returned and any request sent to the endpoint will be routed to the ready replicas."
msgstr "将返回一个 endpoint ,所有发送至该endpoint的请求都将被路由至就绪状态的副本。"
#: ../../Qwen/source/deployment/skypilot.rst:133
#: b232dbbdcf674d56bcf9c0331c020864
msgid "To check the status of the service, run:"
msgstr "运行如下命令检查服务的状态:"
#: ../../Qwen/source/deployment/skypilot.rst:139
#: 556b854caf7243fb93f253ebe2dc9033
msgid "After a while, you will see the following output:"
msgstr "很快,您将看到如下输出:"
#: ../../Qwen/source/deployment/skypilot.rst:152
#: 5a6055c5a42c4b2db6693c1095688de8
msgid "As shown, the service is now backed by 2 replicas, one on Azure and one on GCP, and the accelerator type is chosen to be **the cheapest available one** on the clouds. That said, it maximizes the availability of the service while minimizing the cost."
msgstr "如下所示:该服务现由两个副本提供支持,一个位于 Azure 平台,另一个位于 GCP 平台。同时,已为服务选择云服务商提供的 **最经济实惠** 的加速器类型。这样既最大限度地提升了服务的可用性,又尽可能降低了成本。"
#: ../../Qwen/source/deployment/skypilot.rst:157
#: a18533d33dc54a1091ded0b4bba0a1eb
msgid "To access the model, we use a ``curl -L`` command (``-L`` to follow redirect) to send the request to the endpoint:"
msgstr "要访问模型,我们使用带有 ``curl -L`` (用于跟随重定向),将请求发送到 endpoint :"
#: ../../Qwen/source/deployment/skypilot.rst:182
#: 34cd50fd79e24d8895075f7841b025e4
msgid "Accessing Qwen2.5 with Chat GUI"
msgstr "使用 Chat GUI 调用 Qwen2.5"
#: ../../Qwen/source/deployment/skypilot.rst:184
#: ca6994cda1cb469e83ce8c026bb67e42
msgid "It is also possible to access the Qwen2.5 service with GUI by connecting a `FastChat GUI server <https://github.com/lm-sys/FastChat>`__ to the endpoint launched above (see `gui.yaml <https://github.com/skypilot-org/skypilot/blob/master/llm/qwen/gui.yaml>`__)."
msgstr "可以通过 `FastChat <https://github.com/lm-sys/FastChat>`__ 来使用 GUI 调用 Qwen2.5 的服务:"
#: ../../Qwen/source/deployment/skypilot.rst:188
#: 99a63e55ab5c46258c20ab89cdfa39dc
msgid "Start the Chat Web UI:"
msgstr "开启一个 Chat Web UI"
#: ../../Qwen/source/deployment/skypilot.rst:194
#: e61593a092c146f8a06af896d6af17f2
msgid "**Before launching, make sure you have changed Qwen/Qwen1.5-72B-Chat to Qwen/Qwen2.5-72B-Instruct in the YAML file.**"
msgstr "**在启动之前,请先将 YAML 文件中的 Qwen/Qwen1.5-72B-Chat 修改为 Qwen/Qwen2.5-72B-Instruct。**"
#: ../../Qwen/source/deployment/skypilot.rst:196
#: 9631068a8b424aa8af6dc6911daac7a9
msgid "Then, we can access the GUI at the returned gradio link:"
msgstr "随后,我们可以通过返回的 gradio 链接来访问 GUI :"
#: ../../Qwen/source/deployment/skypilot.rst:202
#: 1464a56dcd06404aafbe6d7d2c72212b
msgid "Note that you may get better results by using a different temperature and top_p value."
msgstr "你可以通过使用不同的温度和 top_p 值来尝试取得更好的结果。"
#: ../../Qwen/source/deployment/skypilot.rst:205
#: d257f49d835e4c12b28bc680bb78a9cb
msgid "Summary"
msgstr "总结"
#: ../../Qwen/source/deployment/skypilot.rst:207
#: 06b9684a19774eaba4f69862332c5166
msgid "With SkyPilot, it is easy for you to deploy Qwen2.5 on any cloud. We advise you to read the official doc for more usages and updates. Check `this <https://skypilot.readthedocs.io/>`__ out!"
msgstr "通过 SkyPilot ,你可以轻松地在任何云上部署 Qwen2.5 。我们建议您阅读 `官方文档 <https://skypilot.readthedocs.io/>`__ 了解更多用法和最新进展。"
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/deployment/tgi.rst:2 2abcc96f9deb4b9187ac9d88fc69e929
msgid "TGI"
msgstr ""
#: ../../Qwen/source/deployment/tgi.rst:5 2d124d7cb95f47388aa48c662932ef9b
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"
#: ../../Qwen/source/deployment/tgi.rst:7 4e5d299c4fdd46d5aba38c9af5765792
msgid "Hugging Face's Text Generation Inference (TGI) is a production-ready framework specifically designed for deploying and serving large language models (LLMs) for text generation tasks. It offers a seamless deployment experience, powered by a robust set of features:"
msgstr "Hugging Face 的 Text Generation Inference (TGI) 是一个专为部署大规模语言模型 (Large Language Models, LLMs) 而设计的生产级框架。TGI提供了流畅的部署体验,并稳定支持如下特性:"
#: ../../Qwen/source/deployment/tgi.rst:9 ecd4fc11a95140959915d062791ceba1
msgid "`Speculative Decoding <Speculative Decoding_>`_: Accelerates generation speeds."
msgstr "`推测解码 (Speculative Decoding) <Speculative Decoding_>`_ :提升生成速度。"
#: ../../Qwen/source/deployment/tgi.rst:10 84590a56416348bf85b3f296cf57e257
msgid "`Tensor Parallelism`_: Enables efficient deployment across multiple GPUs."
msgstr "张量并行 (`Tensor Parallelism`_) :高效多卡部署。"
#: ../../Qwen/source/deployment/tgi.rst:11 a996d6ecd7b94c5cb9752d370f29a9b1
msgid "`Token Streaming`_: Allows for the continuous generation of text."
msgstr "流式生成 (`Token Streaming`_) :支持持续性生成文本。"
#: ../../Qwen/source/deployment/tgi.rst:12 8f591c045ba34f4581bb19652db9f9b3
msgid "Versatile Device Support: Works seamlessly with `AMD`_, `Gaudi`_ and `AWS Inferentia`_."
msgstr "灵活的硬件支持:与 `AMD`_ , `Gaudi`_ 和 `AWS Inferentia`_ 无缝衔接。"
#: ../../Qwen/source/deployment/tgi.rst:21 5e8a98b91fc146e0b581422faa683a18
msgid "Installation"
msgstr "安装"
#: ../../Qwen/source/deployment/tgi.rst:23 684ef25bfb0e460999d6dcccce41b85f
msgid "The easiest way to use TGI is via the TGI docker image. In this guide, we show how to use TGI with docker."
msgstr "通过 TGI docker 镜像使用 TGI 轻而易举。本文将主要介绍 TGI 的 docker 用法。"
#: ../../Qwen/source/deployment/tgi.rst:25 c563fa3eccb04d00a477c1d2e8b15c38
msgid "It's possible to run it locally via Conda or build locally. Please refer to `Installation Guide <https://huggingface.co/docs/text-generation-inference/installation>`_ and `CLI tool <https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_cli>`_ for detailed instructions."
msgstr "也可通过 Conda 实机安装或搭建服务。请参考 `Installation Guide <https://huggingface.co/docs/text-generation-inference/installation>`_ 与 `CLI tool <https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/using_cli>`_ 以了解详细说明。"
#: ../../Qwen/source/deployment/tgi.rst:28 b55fc58ff4cb472abca08296409c7837
msgid "Deploy Qwen2.5 with TGI"
msgstr "通过 TGI 部署 Qwen2.5"
#: ../../Qwen/source/deployment/tgi.rst:30 586a8425ec5d413592fd7daf579c7e87
msgid "**Find a Qwen2.5 Model:** Choose a model from `the Qwen2.5 collection <https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e>`_."
msgstr "**选定 Qwen2.5 模型:** 从 `the Qwen2.5 collection <https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e>`_ 中挑选模型。"
#: ../../Qwen/source/deployment/tgi.rst:31 50fcab8da35941eca308786979dbaf38
msgid "**Deployment Command:** Run the following command in your terminal, replacing ``model`` with your chosen Qwen2.5 model ID and ``volume`` with the path to your local data directory:"
msgstr "**部署TGI服务:** 在终端中运行以下命令,注意替换 ``model`` 为选定的 Qwen2.5 模型 ID 、 ``volume`` 为本地的数据路径: "
#: ../../Qwen/source/deployment/tgi.rst:42 2a800533a7d84bdeab1da0976b0cab53
msgid "Using TGI API"
msgstr "使用 TGI API"
#: ../../Qwen/source/deployment/tgi.rst:44 f05d1ec08140452782d0659543fad7d1
msgid "Once deployed, the model will be available on the mapped port (8080)."
msgstr "一旦成功部署,API 将于选定的映射端口 (8080) 提供服务。"
#: ../../Qwen/source/deployment/tgi.rst:46 f265dc1522b049c98ba31fd5d255c50f
msgid "TGI comes with a handy API for streaming response:"
msgstr "TGI 提供了简单直接的 API 支持流式生成:"
#: ../../Qwen/source/deployment/tgi.rst:54 e9cc4c0571b74bd08b2a59347503e653
msgid "It's also available on OpenAI style API:"
msgstr "也可使用 OpenAI 风格的 API 使用 TGI :"
#: ../../Qwen/source/deployment/tgi.rst:73 5dc7e9c74fc04483ba8e5dcdd7052020
msgid "The model field in the JSON is not used by TGI, you can put anything."
msgstr "JSON 中的 model 字段不会被 TGI 识别,您可传入任意值。"
#: ../../Qwen/source/deployment/tgi.rst:75 d60f837152014cda8baebc90d65d1cc0
#, python-format
msgid "Refer to the `TGI Swagger UI <https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/completions>`_ for a complete API reference."
msgstr "完整 API 文档,请查阅 `TGI Swagger UI <https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/completions>`_ 。"
#: ../../Qwen/source/deployment/tgi.rst:77 b59564031e5548088aef828f9753e337
msgid "You can also use Python API:"
msgstr "你也可以使用 Python 访问 API :"
#: ../../Qwen/source/deployment/tgi.rst:106 62646cecb024479ebfeca5f3063e7322
msgid "Quantization for Performance"
msgstr "量化"
#: ../../Qwen/source/deployment/tgi.rst:108 4a8d39bf37be4820afb230f9a977b431
msgid "Data-dependent quantization (GPTQ and AWQ)"
msgstr "依赖数据的量化方案( GPTQ 与 AWQ )"
#: ../../Qwen/source/deployment/tgi.rst:110 ef2b18f47e4f4f7ebb017be628cb0be9
msgid "Both GPTQ and AWQ models are data-dependent. The official quantized models can be found from `the Qwen2.5 collection`_ and you can also quantize models with your own dataset to make it perform better on your use case."
msgstr "GPTQ 与 AWQ 均依赖数据进行量化。我们提供了预先量化好的模型,请于 `the Qwen2.5 collection`_ 查找。你也可以使用自己的数据集自行量化,以在你的场景中取得更好效果。"
#: ../../Qwen/source/deployment/tgi.rst:112 53d94278a2e3409abb9980ebc7c96c24
msgid "The following shows the command to start TGI with Qwen2.5-7B-Instruct-GPTQ-Int4:"
msgstr "以下是通过 TGI 部署 Qwen2.5-7B-Instruct-GPTQ-Int4 的指令:"
#: ../../Qwen/source/deployment/tgi.rst:122 68ff8a07d0eb40cfa67d79e01adea070
msgid "If the model is quantized with AWQ, e.g. Qwen/Qwen2.5-7B-Instruct-AWQ, please use ``--quantize awq``."
msgstr "如果模型是 AWQ 量化的,如 Qwen/Qwen2.5-7B-Instruct-AWQ ,请使用 ``--quantize awq`` 。"
#: ../../Qwen/source/deployment/tgi.rst:124 b4c3b82b1f2a43a8a02383fd0afbda5f
msgid "Data-agnostic quantization"
msgstr "不依赖数据的量化方案"
#: ../../Qwen/source/deployment/tgi.rst:126 7a6b89c94b72407482b96790f5bbd272
msgid "EETQ on the other side is not data dependent and can be used with any model. Note that we're passing in the original model (instead of a quantized model) with the ``--quantize eetq`` flag."
msgstr "EETQ 是一种不依赖数据的量化方案,可直接用于任意模型。请注意,我们需要传入原始模型,并使用 ``--quantize eetq`` 标志。"
#: ../../Qwen/source/deployment/tgi.rst:138 763166da65924887b3bba99ea4d2baab
msgid "Multi-Accelerators Deployment"
msgstr "多卡部署"
#: ../../Qwen/source/deployment/tgi.rst:140 ddcfcff947894f168c7945ae9c42a579
msgid "Use the ``--num-shard`` flag to specify the number of accelerators. Please also use ``--shm-size 1g`` to enable shared memory for optimal NCCL performance (`reference <https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#a-note-on-shared-memory-shm>`__):"
msgstr "使用 ``--num-shard`` 指定卡书数量。 请务必传入 ``--shm-size 1g`` 让 NCCL 发挥最好性能 (`说明 <https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#a-note-on-shared-memory-shm>`__) :"
#: ../../Qwen/source/deployment/tgi.rst:151 520c46fb404c4ec9bf89280e4a71f1e8
msgid "Speculative Decoding"
msgstr "推测性解码 (Speculative Decoding)"
#: ../../Qwen/source/deployment/tgi.rst:153 74c6b65f76b74d56ad109af9da11f66e
msgid "Speculative decoding can reduce the time per token by speculating on the next token. Use the ``--speculative-decoding`` flag, setting the value to the number of tokens to speculate on (default: 0 for no speculation):"
msgstr "推测性解码 (Speculative Decoding) 通过预先推测下一 token 来节约每 token 需要的时间。使用 ``--speculative-decoding`` 设定预先推测 token 的数量 (默认为0,表示不预先推测):"
#: ../../Qwen/source/deployment/tgi.rst:164 dee05ee0fb1a4f2da42b250192d943f5
msgid "The overall performance of speculative decoding highly depends on the type of task. It works best for code or highly repetitive text."
msgstr "推测性解码的加速效果依赖于任务类型,对于代码或重复性较高的文本生成任务,提速更明显。"
#: ../../Qwen/source/deployment/tgi.rst:166 731f300bc1174589901dd5feb26e8b2f
msgid "More context on speculative decoding can be found `here <https://huggingface.co/docs/text-generation-inference/conceptual/speculation>`__."
msgstr "更多说明可查阅 `此文档 <https://huggingface.co/docs/text-generation-inference/conceptual/speculation>`__ 。"
#: ../../Qwen/source/deployment/tgi.rst:170 65a7d5553dd145398f9705c1ee6c28f0
msgid "Zero-Code Deployment with HF Inference Endpoints"
msgstr "使用 HF Inference Endpoints 零代码部署"
#: ../../Qwen/source/deployment/tgi.rst:172 721c3a7578f846ae8e21e595923e17e7
msgid "For effortless deployment, leverage Hugging Face Inference Endpoints:"
msgstr "使用 Hugging Face Inference Endpoints 不费吹灰之力:"
#: ../../Qwen/source/deployment/tgi.rst:174 7741607488d94a9f8be2ffcb6a5322fb
msgid "**GUI interface:** `<https://huggingface.co/inference-endpoints/dedicated>`__"
msgstr ""
#: ../../Qwen/source/deployment/tgi.rst:175 02ff4520e66f4a42828483da7d25445f
msgid "**Coding interface:** `<https://huggingface.co/blog/tgi-messages-api>`__"
msgstr ""
#: ../../Qwen/source/deployment/tgi.rst:177 d35f9dd4bc96400cb6c7584012d2df49
msgid "Once deployed, the endpoint can be used as usual."
msgstr "一旦部署成功,服务使用与本地无异。"
#: ../../Qwen/source/deployment/tgi.rst:181 61c1b825bbf24be2aaaeb99de3f0660e
msgid "Common Issues"
msgstr "常见问题"
#: ../../Qwen/source/deployment/tgi.rst:183 b55a2d286fc24dbe92b79ab5c010c7af
msgid "Qwen2.5 supports long context lengths, so carefully choose the values for ``--max-batch-prefill-tokens``, ``--max-total-tokens``, and ``--max-input-tokens`` to avoid potential out-of-memory (OOM) issues. If an OOM occurs, you'll receive an error message upon startup. The following shows an example to modify those parameters:"
msgstr "Qwen2.5 支持长上下文,谨慎设定 ``--max-batch-prefill-tokens`` , ``--max-total-tokens`` 和 ``--max-input-tokens`` 以避免 out-of-memory (OOM) 。如 OOM ,你将在启动 TGI 时收到错误提示。以下为修改这些参数的示例:"
# Copyright (C) 2024, Qwen Team, Alibaba Group.
# This file is distributed under the same license as the Qwen package.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/deployment/vllm.md:1 faa6e2bc47c24c6dab7113f73d67b0c4
msgid "vLLM"
msgstr ""
#: ../../Qwen/source/deployment/vllm.md:3 45682d60b2ee469bac6a473f7aacbe38
msgid "We recommend you trying [vLLM](https://github.com/vllm-project/vllm) for your deployment of Qwen. It is simple to use, and it is fast with state-of-the-art serving throughput, efficient management of attention key value memory with PagedAttention, continuous batching of input requests, optimized CUDA kernels, etc. To learn more about vLLM, please refer to the [paper](https://arxiv.org/abs/2309.06180) and [documentation](https://docs.vllm.ai/)."
msgstr "我们建议您在部署 Qwen 时尝试使用 [vLLM](https://github.com/vllm-project/vllm)。它易于使用,且具有最先进的服务吞吐量、高效的注意力键值内存管理(通过PagedAttention实现)、连续批处理输入请求、优化的CUDA内核等功能。要了解更多关于vLLM的信息,请参阅 [论文](https://arxiv.org/abs/2309.06180) 和 [文档](https://docs.vllm.ai/)。"
#: ../../Qwen/source/deployment/vllm.md:7 b6e6f5a91b9e4b749a6aaeca89358752
msgid "Environment Setup"
msgstr "环境配置"
#: ../../Qwen/source/deployment/vllm.md:9 14f9a1b8015d45388f5681145ddfcb0b
msgid "By default, you can install `vllm` with pip in a clean environment:"
msgstr "默认情况下,你可以通过 pip 在新环境中安装 `vllm` : "
#: ../../Qwen/source/deployment/vllm.md:15 fdda232bfb3c4d3895643b3ba7f78cbd
msgid "Please note that the prebuilt `vllm` has strict dependencies on `torch` and its CUDA versions. Check the note in the official document for installation ([link](https://docs.vllm.ai/en/latest/getting_started/installation.html)) for more help."
msgstr "请留意预构建的`vllm`对`torch`和其CUDA版本有强依赖。请查看[vLLM官方文档](https://docs.vllm.ai/en/latest/getting_started/installation.html)中的注意事项以获取有关安装的帮助。"
#: ../../Qwen/source/deployment/vllm.md:18 a175a37698bf4cfcb0c7bb33509e3775
msgid "API Service"
msgstr "API 服务"
#: ../../Qwen/source/deployment/vllm.md:20 6221b5708f054b839b17d3c21d086657
msgid "It is easy to build an OpenAI-compatible API service with vLLM, which can be deployed as a server that implements OpenAI API protocol. By default, it starts the server at `http://localhost:8000`. You can specify the address with `--host` and `--port` arguments. Run the command as shown below:"
msgstr "借助vLLM,构建一个与OpenAI API兼容的API服务十分简便,该服务可以作为实现OpenAI API协议的服务器进行部署。默认情况下,它将在 `http://localhost:8000` 启动服务器。您可以通过 `--host` 和 `--port` 参数来自定义地址。请按照以下所示运行命令:"
#: ../../Qwen/source/deployment/vllm.md:28 5b3fb351eff5402caf53fb28af098a14
msgid "By default, if the model does not point to a valid local directory, it will download the model files from the HuggingFace Hub. To download model from ModelScope, set the following before running the above command:"
msgstr "默认情况下,如果模型未指向有效的本地目录,它将从 HuggingFace Hub 下载模型文件。要从 ModelScope 下载模型,请在运行上述命令之前设置以下内容:"
#: ../../Qwen/source/deployment/vllm.md:34 e968dc2b83f94d8db88730e49cc2b557
msgid "For distrbiuted inference with tensor parallelism, it is as simple as"
msgstr "对于使用张量并行的分布式推理,操作非常简单:"
#: ../../Qwen/source/deployment/vllm.md:38 a017a4820c164f99b1b818eff1ece7e2
msgid "The above command will use tensor parallelism on 4 GPUs. You should change the number of GPUs according to your demand."
msgstr "上述命令将在 4 块 GPU 上使用张量并行。您应根据需求调整 GPU 的数量。"
#: ../../Qwen/source/deployment/vllm.md:41 cf79ffc98aeb4aaeaa87eb67a27bf931
msgid "Basic Usage"
msgstr "基本用法"
#: ../../Qwen/source/deployment/vllm.md:43 f422098f08af453fba9a04ffba7a65cf
msgid "Then, you can use the [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) to communicate with Qwen:"
msgstr "然后,您可以利用 [create chat interface](https://platform.openai.com/docs/api-reference/chat/completions/create) 来与Qwen进行对话:"
#: ../../Qwen/source/deployment/vllm.md 4fe8fdefc345451692648e733e009f2f
#: a8b7164993794f1398b4fb97662752d5
msgid "curl"
msgstr ""
#: ../../Qwen/source/deployment/vllm.md 5e2fab5952fd4fca952fb4d6bbca2a00
#: c5aff5ea28cd48bc8040b80173a609b8
msgid "Python"
msgstr ""
#: ../../Qwen/source/deployment/vllm.md:63
#: ../../Qwen/source/deployment/vllm.md:127 648478738dd3476c8dcdaa99cd345bfe
#: aa39dbf0acd646afa03c5fe79bb74011
msgid "You can use the API client with the `openai` Python SDK as shown below:"
msgstr "或者您可以如下面所示使用 `openai` Python SDK中的 API 客户端:"
#: ../../Qwen/source/deployment/vllm.md:91 a4dc5343d3214279828ee2a8e8d06106
msgid "`vllm` will use the sampling parameters from the `generation_config.json` in the model files."
msgstr "`vllm` 将使用模型文件中 `generation_config.json` 的采样参数。"
#: ../../Qwen/source/deployment/vllm.md:93 ba2912bb69f64837887bc32f1107b9f0
msgid "While the default sampling parameters would work most of the time for thinking mode, it is recommended to adjust the sampling parameters according to your application, and always pass the sampling parameters to the API."
msgstr "虽然默认的采样参数在大多数情况下适用于思考模式,但建议根据您的应用调整采样参数,并始终将采样参数传递给 API。"
#: ../../Qwen/source/deployment/vllm.md:99 9a47113d7a7b44b89f504244fada649f
msgid "Thinking & Non-Thinking Modes"
msgstr "思考与非思考模式"
#: ../../Qwen/source/deployment/vllm.md:101 20105d15da5a44bc8af9f9d628b54cb3
msgid "Qwen3 models will think before respond. This behaviour could be controled by either the hard switch, which could disable thinking completely, or the soft switch, where the model follows the instruction of the user on whether or not it should think."
msgstr "Qwen3 模型会在回复前进行思考。这种行为可以通过硬开关(完全禁用思考)或软开关(模型遵循用户关于是否应该思考的指令)来控制。"
#: ../../Qwen/source/deployment/vllm.md:104 df2e0e8d7b77407ba31e9efcea1440cf
msgid "The hard switch is availabe in vLLM through the following configuration to the API call. To disable thinking, use"
msgstr "硬开关在 vLLM 中可以通过以下 API 调用配置使用。要禁用思考,请使用"
#: ../../Qwen/source/deployment/vllm.md:156 31b40d3493e1487b8624048e64b07321
msgid "It is recommended to set sampling parameters differently for thinking and non-thinking modes."
msgstr "建议为思考模式和非思考模式分别设置不同的采样参数。"
#: ../../Qwen/source/deployment/vllm.md:159 ca30c135f89d4d0f9297484617c2c291
msgid "Parsing Thinking Content"
msgstr "解析思考内容"
#: ../../Qwen/source/deployment/vllm.md:161 8f8766d09dcc42dab5dfeba44e9495f0
msgid "vLLM supports parsing the thinking content from the model generation into structured messages:"
msgstr "vLLM 支持将模型生成的思考内容解析为结构化消息:"
#: ../../Qwen/source/deployment/vllm.md:166 ce2e2c1c45804d758aaa536b2a134236
msgid "The response message will have a field named `reasoning_content` in addition to `content`, containing the thinking content generated by the model."
msgstr "响应消息除了包含 `content` 字段外,还会有一个名为 `reasoning_content` 的字段,其中包含模型生成的思考内容。"
#: ../../Qwen/source/deployment/vllm.md:169 5740a6011e2e4b7194c8b3ede0ede490
msgid "Please note that this feature is not OpenAI API compatible."
msgstr "请注意,此功能与 OpenAI API 规范不一致。"
#: ../../Qwen/source/deployment/vllm.md:172 4bab7e0f542e42c1a6834649c677a13f
msgid "Parsing Tool Calls"
msgstr "解析工具调用"
#: ../../Qwen/source/deployment/vllm.md:174 0285408a203d41ff9b7e35f216f911f3
msgid "vLLM supports parsing the tool calling content from the model generation into structured messages:"
msgstr "vLLM 支持将模型生成的工具调用内容解析为结构化消息:"
#: ../../Qwen/source/deployment/vllm.md:179 b73b1c7b953c4cf9ab0fc4c7f7cce27f
msgid "For more information, please refer to [our guide on Function Calling](../framework/function_call.md#vllm)."
msgstr "详细信息,请参阅[函数调用的指南](../framework/function_call.md#vllm)。"
#: ../../Qwen/source/deployment/vllm.md:182 768aec6e39424dd1835c56497f3f9c19
msgid "As of vLLM 0.5.4, it is not supported to parse the thinking content and the tool calling from the model generation at the same time."
msgstr "在 vLLM 0.5.4 版本中,尚不支持同时解析模型生成的思考内容和工具调用。"
#: ../../Qwen/source/deployment/vllm.md:185 51dc25116a9346849b9c25436c64e770
msgid "Structured/JSON Output"
msgstr "结构化/JSON输出"
#: ../../Qwen/source/deployment/vllm.md:187 abfa9f6a9eb942d5878241295a2fd7d2
msgid "vLLM supports structured/JSON output. Please refer to [vLLM's documentation](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#extra-parameters-for-chat-api) for the `guided_json` parameters. Besides, it is also recommended to instruct the model to generate the specific format in the system message or in your prompt."
msgstr "vLLM 支持结构化/JSON 输出。请参照[vLLM文档](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#extra-parameters-for-chat-api)了解 `guided_json` 参数。此外,也建议在系统消息或用户提示中指示模型生成特定格式,避免仅依赖于推理参数配置。"
#: ../../Qwen/source/deployment/vllm.md:192 d84c568411164a8c8259228be8f6433a
msgid "Serving Quantized models"
msgstr "部署量化模型"
#: ../../Qwen/source/deployment/vllm.md:194 d3b9d138d1ec4bf88eec3130a92c8d32
msgid "Qwen3 comes with two types of pre-quantized models, FP8 and AWQ."
msgstr "Qwen3 提供了两种类型的预量化模型:FP8 和 AWQ。"
#: ../../Qwen/source/deployment/vllm.md:196 c16aa9098f9e4faf8b19df4430181d1c
msgid "The command serving those models are the same as the original models except for the name change:"
msgstr "部署这些模型的命令与原始模型相同,只是名称有所更改:"
#: ../../Qwen/source/deployment/vllm.md:206 774a3f0a10914f988aff2609b02ccb4c
msgid "FP8 computation is supported on NVIDIA GPUs with compute capability > 8.9, that is, Ada Lovelace, Hopper, and later GPUs."
msgstr "FP8 计算在计算能力 > 8.9 的 NVIDIA GPU 上受支持,即 Ada Lovelace、Hopper 及更新的 GPU。"
#: ../../Qwen/source/deployment/vllm.md:208 71a5529b10014da598a308aed2ff81cb
msgid "FP8 models will run on compute capability > 8.0 (Ampere) as weight-only W8A16, utilizing FP8 Marlin."
msgstr ""msgstr "FP8 模型将在计算能力 > 8.0(Ampere)的 GPU 上以仅权重 W8A16 的形式运行,利用 FP8 Marlin 技术。"
#: ../../Qwen/source/deployment/vllm.md:212 66310479fe11424d926533edd6d21dd0
msgid "As of vLLM 0.5.4, there are currently compatibility issues with `vllm` with the Qwen3 FP8 checkpoints. For a quick fix, you should make the following changes to the file `vllm/vllm/model_executor/layers/linear.py`:"
msgstr "在 vLLM 0.5.4 版本中,目前 `vllm` 与 Qwen3 FP8 检查点存在兼容性问题。要快速解决此问题,您应对文件 `vllm/vllm/model_executor/layers/linear.py` 进行以下更改:"
#: ../../Qwen/source/deployment/vllm.md:236 e10bf662530d4be884f61474a197df6f
msgid "Context Length"
msgstr "上下文长度"
#: ../../Qwen/source/deployment/vllm.md:238 100e104b60fb418b8c79ed341092efaa
msgid "The context length for Qwen3 models in pretraining is up to 32,768 tokenns. To handle context length substantially exceeding 32,768 tokens, RoPE scaling techniques should be applied. We have validated the performance of [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts."
msgstr "Qwen3 模型在预训练中的上下文长度最长为 32,768 个 token。为了处理显著超过 32,768 个 token 的上下文长度,应应用 RoPE 缩放技术。我们已经验证了 [YaRN](https://arxiv.org/abs/2309.00071) 的性能,这是一种增强模型长度外推的技术,可确保在长文本上的最佳性能。"
#: ../../Qwen/source/deployment/vllm.md:242 3f987cfdb9114eb7be76a18cd0d01a1a
msgid "vLLM supports YaRN, which can be configured as"
msgstr "vLLM 支持 YaRN,可以配置为"
#: ../../Qwen/source/deployment/vllm.md:248 fb9ef4ed4c9640a18574830368379d15
msgid "vLLM implements static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.** We advise adding the `rope_scaling` configuration only when processing long contexts is required. It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0."
msgstr "vLLM 实现了静态 YaRN,这意味着无论输入长度如何,缩放因子都保持不变,**这可能会对较短文本的性能产生影响。** 我们建议仅在需要处理长上下文时添加 `rope_scaling` 配置。还建议根据需要调整 `factor`。例如,如果您的应用程序的典型上下文长度为 65,536 个 token,则最好将 `factor` 设置为 2.0。"
#: ../../Qwen/source/deployment/vllm.md:254 e670bcf6664e490c8b5a7e0cc4ebba41
msgid "The default `max_position_embeddings` in `config.json` is set to 40,960, which used by vLLM, if `--max-model-len` is not specified. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing and leave adequate room for model thinking. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance."
msgstr "如果未指定 `--max-model-len`,`config.json` 中的默认 `max_position_embeddings` 被设置为 40,960,vLLM 将使用该值。此分配包括为输出保留 32,768 个 token,为典型提示保留 8,192 个 token,这足以应对大多数涉及短文本处理的场景,并为模型思考留出充足空间。如果平均上下文长度不超过 32,768 个 token,我们不建议在此场景中启用 YaRN,因为这可能会降低模型性能。"
#: ../../Qwen/source/deployment/vllm.md:259 386a1e853cbb4ec38ea7050f21bdd0d8
msgid "Python Library"
msgstr "Python 库使用"
#: ../../Qwen/source/deployment/vllm.md:261 8ec4de6067e24624807613c89745e894
msgid "vLLM can also be directly used as a Python library, which is convinient for offline batch inference but lack some API-only features, such as parsing model generation to structure messages."
msgstr "vLLM 也可以直接用作 Python 库,这对离线批量推理非常方便,但缺少一些仅限 API 的功能,例如将模型生成解析为结构化消息。"
#: ../../Qwen/source/deployment/vllm.md:263 b3722c7e75bd44e5a6b2c3b7f44fd30f
msgid "The following shows the basic usage of vLLM as a library:"
msgstr "以下展示了将 vLLM 用作库的基本用法:"
#: ../../Qwen/source/deployment/vllm.md:300 534845226be74248bd11a9b93fa153a0
msgid "FAQ"
msgstr "常见问题解答"
#: ../../Qwen/source/deployment/vllm.md:302 d1d72b72ccdc459c85ad8e383207931c
msgid "You may encounter OOM issues that are pretty annoying. We recommend two arguments for you to make some fix."
msgstr "您可能会遇到令人烦恼的OOM(内存溢出)问题。我们推荐您尝试两个参数进行修复。"
#: ../../Qwen/source/deployment/vllm.md:305 e5c77a3ba017433cb6b1a5c7e6015863
msgid "The first one is `--max-model-len`. Our provided default `max_position_embedding` is `40960` and thus the maximum length for the serving is also this value, leading to higher requirements of memory. Reducing it to a proper length for yourself often helps with the OOM issue."
msgstr "第一个参数是 `--max-model-len` 。我们提供的默认最大位置嵌入(`max_position_embedding`)为 40960 ,因此服务时的最大长度也是这个值,这会导致更高的内存需求。将此值适当减小通常有助于解决OOM问题。"
#: ../../Qwen/source/deployment/vllm.md:308 dc513e2855454776a4902cc8381b6c72
msgid "Another argument you can pay attention to is `--gpu-memory-utilization`. vLLM will pre-allocate this much GPU memory. By default, it is `0.9`. This is also why you find a vLLM service always takes so much memory. If you are in eager mode (by default it is not), you can level it up to tackle the OOM problem. Otherwise, CUDA Graphs are used, which will use GPU memory not controlled by vLLM, and you should try lowering it. If it doesn't work, you should try `--enforce-eager`, which may slow down infernece, or reduce the `--max-model-len`."
msgstr "另一个您可以关注的参数是 `--gpu-memory-utilization` 。 vLLM将预分配该参数指定比例的显存。默认情况下,该值为 `0.9`。这也是为什么您发现一个vLLM服务总是占用大量内存的原因。如果你使用了eager模式(默认不是),您可以将其调高以应对OOM问题。反之,vLLM会使用CUDA Graphs,而CUDA Graphs会额外占用不受vLLM管理的显存;此时,您应当尝试降低`--gpu-memory-utilization`。如果还是无法解决,可以尝试`--enforce-eager`(这会影响推理效率)或缩小`--max-model-len`。"
#~ msgid "Installation"
#~ msgstr "安装"
#~ msgid "Offline Batched Inference"
#~ msgstr "离线推理"
#~ msgid "Models supported by Qwen2.5 codes are supported by vLLM. The simplest usage of vLLM is offline batched inference as demonstrated below."
#~ msgstr "Qwen2.5代码支持的模型都被vLLM所支持。 vLLM最简单的使用方式是通过以下演示进行离线批量推理。"
#~ msgid "OpenAI-Compatible API Service"
#~ msgstr "OpenAI兼容的API服务"
#~ msgid "You don't need to worry about chat template as it by default uses the chat template provided by the tokenizer."
#~ msgstr "你无需担心chat模板,因为它默认会使用由tokenizer提供的chat模板。"
#~ msgid "The OpenAI-compatible server in `vllm` comes with [a default set of sampling parameters](https://github.com/vllm-project/vllm/blob/v0.5.2/vllm/entrypoints/openai/protocol.py#L130), which are not suitable for Qwen2.5 models and prone to repetition. We advise you to always pass sampling parameters to the API."
#~ msgstr "`vllm` 中的 OpenAI 兼容服务器使用 [一组默认的采样参数](https://github.com/vllm-project/vllm/blob/v0.5.2/vllm/entrypoints/openai/protocol.py#L130)。这组默认参数并不适用于 Qwen2.5 模型,并可能加重重复问题。我们建议您总是为该API传入合适的采样参数。"
#~ msgid "Tool Use"
#~ msgstr "工具使用"
#~ msgid "Multi-GPU Distributed Serving"
#~ msgstr "多卡分布式部署"
#~ msgid "To scale up your serving throughput, distributed serving helps you by leveraging more GPU devices. Besides, for large models like `Qwen2.5-72B-Instruct`, it is impossible to serve it on a single GPU. Here, we demonstrate how to run `Qwen2.5-72B-Instruct` with tensor parallelism just by passing in the argument `tensor_parallel_size`:"
#~ msgstr "要提高模型的处理吞吐量,分布式服务可以通过利用更多的GPU设备来帮助您。特别是对于像 `Qwen2.5-72B-Instruct` 这样的大模型,单个GPU无法支撑其在线服务。在这里,我们通过演示如何仅通过传入参数 `tensor_parallel_size` ,来使用张量并行来运行 `Qwen2.5-72B-Instruct` 模型:"
#~ msgid "Offline"
#~ msgstr "离线推理"
#~ msgid "API"
#~ msgstr ""
#~ msgid "Extended Context Support"
#~ msgstr "上下文支持扩展"
#~ msgid "vLLM supports YARN and it can be enabled by add a `rope_scaling` field to the `config.json` file of the model. For example,"
#~ msgstr "vLLM 支持 YaRN,并且可以通过在模型的 `config.json` 文件中添加一个 `rope_scaling` 字段来启用它。例如,"
#~ msgid "vLLM supports different types of quantized models, including AWQ, GPTQ, SqueezeLLM, etc. Here we show how to deploy AWQ and GPTQ models. The usage is almost the same as above except for an additional argument for quantization. For example, to run an AWQ model. e.g., `Qwen2.5-7B-Instruct-AWQ`:"
#~ msgstr "vLLM 支持多种类型的量化模型,例如 AWQ、GPTQ、SqueezeLLM 等。这里我们将展示如何部署 AWQ 和 GPTQ 模型。使用方法与上述基本相同,只不过需要额外指定一个量化参数。例如,要运行一个 AWQ 模型,例如 `Qwen2.5-7B-Instruct-AWQ` :"
#~ msgid "or GPTQ models like `Qwen2.5-7B-Instruct-GPTQ-Int4`:"
#~ msgstr "或者是GPTQ模型比如 `Qwen2.5-7B-Instruct-GPTQ-Int4` :"
#~ msgid "Additionally, vLLM supports the combination of AWQ or GPTQ models with KV cache quantization, namely FP8 E5M2 KV Cache. For example,"
#~ msgstr "此外,vLLM支持将AWQ或GPTQ模型与KV缓存量化相结合,即FP8 E5M2 KV Cache方案。例如:"
#~ msgid "Troubleshooting"
#~ msgstr "常见问题"
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/framework/Langchain.rst:2 6f9b66430d9c495592b1e275fdfd7c9e
msgid "Langchain"
msgstr ""
#: ../../Qwen/source/framework/Langchain.rst:5 1205af46f88e4d6681003403109385c3
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"
#: ../../Qwen/source/framework/Langchain.rst:7 115ee7b1c8404629a8f98175264cc114
msgid "This guide helps you build a question-answering application based on a local knowledge base using ``Qwen2.5-7B-Instruct`` with ``langchain``. The goal is to establish a knowledge base Q&A solution."
msgstr "本教程旨在帮助您利用 ``Qwen2.5-7B-Instruct`` 与 ``langchain`` ,基于本地知识库构建问答应用。目标是建立一个知识库问答解决方案。"
#: ../../Qwen/source/framework/Langchain.rst:12
#: 7257b95612fb423bb9ca73212fd12a02
msgid "Basic Usage"
msgstr "基础用法"
#: ../../Qwen/source/framework/Langchain.rst:14
#: fecf7a682dcc4c15a53da1f7cdf145e5
msgid "The implementation process of this project includes loading files -> reading text -> segmenting text -> vectorizing text -> vectorizing questions -> matching the top k most similar text vectors with the question vectors -> incorporating the matched text as context along with the question into the prompt -> submitting to the Qwen2.5-7B-Instruct to generate an answer. Below is an example:"
msgstr "您可以仅使用您的文档配合 ``langchain`` 来构建一个问答应用。该项目的实现流程包括加载文件 -> 阅读文本 -> 文本分段 -> 文本向量化 -> 问题向量化 -> 将最相似的前k个文本向量与问题向量匹配 -> 将匹配的文本作为上下文连同问题一起纳入提示 -> 提交给Qwen2.5-7B-Instruct生成答案。以下是一个示例:"
#: ../../Qwen/source/framework/Langchain.rst:98
#: 6ad1ebd2ef4a49f9aa66cfdf777e1290
msgid "After loading the Qwen2.5-7B-Instruct model, you should specify the txt file for retrieval."
msgstr "加载Qwen2.5-7B-Instruct模型后,您可以指定需要用于知识库问答的txt文件。"
#: ../../Qwen/source/framework/Langchain.rst:274
#: 00467b1e4e294a26b9f49886633331e0
msgid "Next Step"
msgstr "下一步"
#: ../../Qwen/source/framework/Langchain.rst:276
#: 15ed906687054af78545290ba0746380
msgid "Now you can chat with Qwen2.5 use your own document. Continue to read the documentation and try to figure out more advanced usages of model retrieval!"
msgstr "现在,您可以在您自己的文档上与Qwen2.5进行交流。继续阅读文档,尝试探索模型检索的更多高级用法!"
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/framework/LlamaIndex.rst:2
#: 2e41f8696c20488d8593b670c6361edf
msgid "LlamaIndex"
msgstr "LlamaIndex"
#: ../../Qwen/source/framework/LlamaIndex.rst:5
#: 20b3836fd391457bb00bf75b61e23e0d
msgid "To be updated for Qwen3."
msgstr "仍需为Qwen3更新。"
#: ../../Qwen/source/framework/LlamaIndex.rst:7
#: 86d9e6f0684749aab40a9824cd026fa3
msgid "To connect Qwen2.5 with external data, such as documents, web pages, etc., we offer a tutorial on `LlamaIndex <https://www.llamaindex.ai/>`__. This guide helps you quickly implement retrieval-augmented generation (RAG) using LlamaIndex with Qwen2.5."
msgstr "为了实现 Qwen2.5 与外部数据(例如文档、网页等)的连接,我们提供了 `LlamaIndex <https://www.llamaindex.ai/>`__ 的详细教程。本指南旨在帮助用户利用 LlamaIndex 与 Qwen2.5 快速部署检索增强生成(RAG)技术。"
#: ../../Qwen/source/framework/LlamaIndex.rst:11
#: 71ed222858054687a5b33222bb6ac086
msgid "Preparation"
msgstr "环境准备"
#: ../../Qwen/source/framework/LlamaIndex.rst:13
#: 161d9153d6484dd5a1f1bdb340847814
msgid "To implement RAG, we advise you to install the LlamaIndex-related packages first."
msgstr "为实现检索增强生成(RAG),我们建议您首先安装与 LlamaIndex 相关的软件包。"
#: ../../Qwen/source/framework/LlamaIndex.rst:16
#: a8d6acb1001a42c88185b971ae2de3bf
msgid "The following is a simple code snippet showing how to do this:"
msgstr "以下是一个简单的代码示例:"
#: ../../Qwen/source/framework/LlamaIndex.rst:25
#: e441d3b8fb6d4a13b52e1560ef250b16
msgid "Set Parameters"
msgstr "设置参数"
#: ../../Qwen/source/framework/LlamaIndex.rst:27
#: c2481804c3f34c7f883eed92ffa3111e
msgid "Now we can set up LLM, embedding model, and the related configurations. Qwen2.5-Instruct supports conversations in multiple languages, including English and Chinese. You can use the ``bge-base-en-v1.5`` model to retrieve from English documents, and you can download the ``bge-base-zh-v1.5`` model to retrieve from Chinese documents. You can also choose ``bge-large`` or ``bge-small`` as the embedding model or modify the context window size or text chunk size depending on your computing resources. Qwen2.5 model families support a maximum of 32K context window size (up to 128K for 7B, 14B, 32B, and 72B, requiring extra configuration)"
msgstr "现在,我们可以设置语言模型和向量模型。Qwen2.5-Instruct支持包括英语和中文在内的多种语言对话。您可以使用 ``bge-base-en-v1.5`` 模型来检索英文文档,下载 ``bge-base-zh-v1.5`` 模型以检索中文文档。根据您的计算资源,您还可以选择 ``bge-large`` 或 ``bge-small`` 作为向量模型,或调整上下文窗口大小或文本块大小。Qwen2.5模型系列支持最大32K上下文窗口大小(7B 、14B 、32B 及 72B可扩展支持 128K 上下文,但需要额外配置)"
#: ../../Qwen/source/framework/LlamaIndex.rst:85
#: 74c35d5a03734c289d162dfa3813ada6
msgid "Build Index"
msgstr "构建索引"
#: ../../Qwen/source/framework/LlamaIndex.rst:87
#: c49859d4ea5f49dba1fa2263f3ae284d
msgid "Now we can build index from documents or websites."
msgstr "现在我们可以从文档或网站构建索引。"
#: ../../Qwen/source/framework/LlamaIndex.rst:89
#: b460d000037e4266a4d9f43d38f1f9b0
msgid "The following code snippet demonstrates how to build an index for files (regardless of whether they are in PDF or TXT format) in a local folder named 'document'."
msgstr "以下代码片段展示了如何为本地名为'document'的文件夹中的文件(无论是PDF格式还是TXT格式)构建索引。"
#: ../../Qwen/source/framework/LlamaIndex.rst:102
#: a416d18b227940e29fac1f59851ff8c4
msgid "The following code snippet demonstrates how to build an index for the content in a list of websites."
msgstr "以下代码片段展示了如何为一系列网站的内容构建索引。"
#: ../../Qwen/source/framework/LlamaIndex.rst:118
#: 487cf928d048424fa1b50438f701137c
msgid "To save and load the index, you can use the following code snippet."
msgstr "要保存和加载已构建的索引,您可以使用以下代码示例。"
#: ../../Qwen/source/framework/LlamaIndex.rst:132
#: c68419c4318d46e891f5df9191be6d2d
msgid "RAG"
msgstr "检索增强(RAG)"
#: ../../Qwen/source/framework/LlamaIndex.rst:134
#: 8ad20a8f43fe496084a40f963ba97440
msgid "Now you can perform queries, and Qwen2.5 will answer based on the content of the indexed documents."
msgstr "现在您可以输入查询,Qwen2.5 将基于索引文档的内容提供答案。"
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2024, Qwen Team
# This file is distributed under the same license as the Qwen package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
msgid ""
msgstr ""
"Project-Id-Version: Qwen \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-04-28 19:42+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"
#: ../../Qwen/source/framework/function_call.md:6
#: 9beab99bf6ea4ebaa37d53ed4100b34d
msgid "Function Calling"
msgstr "函数调用"
#: ../../Qwen/source/framework/function_call.md:9
#: 68bbe10408334355bc375ced535d2192
msgid "To be updated for Qwen3. Since the support for tool calling in Qwen3 is a superset of that in Qwen2, the examples would still work."
msgstr "即将更新以适配 Qwen3。由于 Qwen3 对工具调用的支持是 Qwen2 的超集,因此这些示例仍然适用。"
#: ../../Qwen/source/framework/function_call.md:13
#: e365d2d9a6d0456f8c7eacd41676a9bb
msgid "Preface"
msgstr "前言"
#: ../../Qwen/source/framework/function_call.md:15
#: 098c449c10fc4d7a9d2b43607428bc4b
msgid "Function calling with large language models is a huge and evolving topic. It is particularly important for AI applications:"
msgstr "使用大型语言模型进行函数调用 (Function Calling) 是一个庞大且不断发展的主题。这对AI应用尤为重要:"
#: ../../Qwen/source/framework/function_call.md:17
#: 1374bf593fd547d2abe3bd539785fd93
msgid "either for AI-native applications that strive to work around the shortcomings of current AI technology,"
msgstr "无论是为了绕过当前AI技术的局限性,而设计的原生AI应用,"
#: ../../Qwen/source/framework/function_call.md:18
#: 49bfba2b48c344aab7704fd297e46075
msgid "or for existing applications that seeks the integration of AI technology to improve performance, user interaction and experience, or efficiency."
msgstr "还是为了提升性能、用户体验或效率,寻求整合AI技术的现有应用。"
#: ../../Qwen/source/framework/function_call.md:20
#: 2d90524d75e84021857692fad0253dfd
msgid "This guide will not delve into those discussions or which role an LLM should play in an application and the related best practice. Those views are reflected in the design of AI application frameworks: from LangChain to LlamaIndex to QwenAgent."
msgstr "本指南不会深入讨论LLM在应用中应扮演的角色及相关的最佳实践。这些观点反映在AI应用框架的设计上:从LangChain到LlamaIndex再到QwenAgent。"
#: ../../Qwen/source/framework/function_call.md:23
#: eb136bb2cdd845d580dca33855b8926c
msgid "Instead, we will talk about how Qwen2.5 can be used to support function calling and how it can be used to achieve your goals, from the inference usage for developing application to the inner workings for hardcore customizations. In this guide,"
msgstr "相反,我们将讨论如何使用Qwen2.5来支持函数调用,以及如何利用它实现你的目标,从开发应用时的推理用途,到硬核定制的内部运作。在这个指南中,"
#: ../../Qwen/source/framework/function_call.md:25
#: f354efacac02496799de32fa0f819a20
msgid "We will first demonstrate how to use function calling with Qwen2.5."
msgstr "我们首先将展示如何使用Qwen2.5进行函数调用。"
#: ../../Qwen/source/framework/function_call.md:26
#: 0f77fb1c6a63449a924575764b204612
msgid "Then, we will introduce the technical details on functional calling with Qwen2.5, which are mainly about the templates."
msgstr "接着,我们将介绍使用Qwen2.5进行函数调用的技术细节,主要涉及模板的使用。"
#: ../../Qwen/source/framework/function_call.md:28
#: 174e8cb35a594a06aac5d8fe3f0f96ba
msgid "Before starting, there is one thing we have not yet introduced, that is ..."
msgstr "在开始之前,还有一件事我们尚未介绍,那就是…"
#: ../../Qwen/source/framework/function_call.md:30
#: 61295ff5189b4887aded9a3dd4c87b3a
msgid "What is function calling?"
msgstr "什么是函数调用?"
#: ../../Qwen/source/framework/function_call.md:33
#: 8256bec4b06b401db4f1d6af9e169a1e
msgid "There is another term \"tool use\" that may be used to refer to the same concept. While some may argue that tools are a generalized form of functions, at present, their difference exists only technically as different I/O types of programming interfaces."
msgstr "这一概念也可能被称为“工具使用” (\"tool use\")。虽然有人认为“工具”是“函数”的泛化形式,但在当前,它们的区别仅在技术层面上,表现为编程接口的不同输入输出类型。"
#: ../../Qwen/source/framework/function_call.md:37
#: 2ac692a7f3fc4c1182ba5ca670e2569b
msgid "Large language models (LLMs) are powerful things. However, sometimes LLMs by themselves are simply not capable enough."
msgstr "大型语言模型(LLMs)确实强大。然而,有时候单靠大型语言模型的能力还是不够的。"
#: ../../Qwen/source/framework/function_call.md:39
#: f41173c244994701a410d58790e8d053
msgid "On the one hand, LLMs have inherent modeling limitations. For one, they do not know things that are not in their training data, which include those happened after their training ended. In addition, they learn things in the way of likelihood, which suggests that they may not be precise enough for tasks with fixed rule sets, e.g., mathematical computation."
msgstr "一方面,大型语言模型存在建模局限性。首先,对于训练数据中没有的信息,包括训练结束后发生的事情,它们并不了解。此外,它们通过概率方式学习,这意味着对于有固定规则集的任务,如数学计算,可能不够精确。"
#: ../../Qwen/source/framework/function_call.md:42
#: 3ab73b4ca14e40bca40e5a657f284f78
msgid "On the other hand, it is not easy to use LLMs as a Plug-and-Play service programmatically with other things. LLMs mostly talk in words that are open to interpretation and thus ambiguous, while other software or applications or systems talk in code and through programming interfaces that are pre-defined and fixed and structured."
msgstr "另一方面,将大型语言模型作为即插即用服务与其它系统进行编程式协作,并非易事。大型语言模型的表达多含主观解释成分,因而产生歧义;而其他软件、应用或系统则通过预定义、固定和结构化的代码及编程接口进行沟通。"
#: ../../Qwen/source/framework/function_call.md:45
#: f65c57ab8f254bff9a7c281260f5e6c7
msgid "To this end, function calling establishes a common protocol that specifies how LLMs should interact with the other things. The procedure is mainly as follows:"
msgstr "为此,函数调用确立了一个通用协议,规定了大型语言模型应与其他实体互动的流程。主要流程如下:"
#: ../../Qwen/source/framework/function_call.md:47
#: f246814de8eb454983996810f4dbe082
msgid "The application provides a set of functions and the instructions of the functions to an LLM."
msgstr "应用程序向大型语言模型提供一组函数及其使用说明。"
#: ../../Qwen/source/framework/function_call.md:48
#: c520064a05924ef2923ebd712a2c8e52
msgid "The LLM choose to or not to, or is forced to use one or many of the functions, in response to user queries."
msgstr "大型语言模型根据用户查询,选择使用或不使用,或被迫使用一个或多个函数。"
#: ../../Qwen/source/framework/function_call.md:49
#: 6f5467688c8c474b89580a7d53f718a1
msgid "If the LLM chooses to use the functions, it states how the functions should be used based on the function instructions."
msgstr "如果大型语言模型选择使用这些函数,它会根据函数说明如何使用。"
#: ../../Qwen/source/framework/function_call.md:50
#: 0054606454134e82b07427591564cac4
msgid "The chosen functions are used as such by the application and the results are obtained, which are then given to the LLM if further interaction is needed."
msgstr "应用程序按照选择使用这些函数,并获取结果。如果需要进一步互动,结果将提供给大型语言模型。"
#: ../../Qwen/source/framework/function_call.md:52
#: 143abc83c56042d09ab8440a8a91b0dd
msgid "They are many ways for LLMs to understand and follow this protocol. As always, the key is prompt engineering or an internalized template known by the model. Qwen2.5 were pre-trained with various types of templates that could support function calling, so that users can directly make use of this procedure."
msgstr "大型语言模型理解并遵循此协议有多种方式。关键在于提示工程 (Prompt Engineering) 或模型内化的模板。Qwen2预先训练了多种支持函数调用的模板,以便用户可以直接利用这一过程。"
#: ../../Qwen/source/framework/function_call.md:57
#: e469caad6ca54deb8a8d34249f3b35cd
msgid "Inference with Function Calling"
msgstr "使用函数调用进行推理"
#: ../../Qwen/source/framework/function_call.md:60
#: ded933602fd14966978b243ad3046976
msgid "Please be aware that the inference usage is subject to change as the frameworks and the Qwen models evolve."
msgstr "请注意,随着框架和Qwen模型的不断演进,推理的使用方式可能会发生变化。"
#: ../../Qwen/source/framework/function_call.md:63
#: ab264772fc0e496693304ab0e1b77f31
msgid "As function calling is essentially implemented using prompt engineering, you could manually construct the model inputs for Qwen2 models. However, frameworks with function calling support can help you with all that laborious work."
msgstr "由于函数调用本质上是通过提示工程实现的,您可以手动构建Qwen2模型的输入。但是,支持函数调用的框架可以帮助您完成所有繁重的工作。"
#: ../../Qwen/source/framework/function_call.md:66
#: 8b53faa722604d40b8e6a16679742736
msgid "In the following, we will introduce the usage (via dedicated function calling chat template) with"
msgstr "接下来,我们将介绍(通过专用的函数调用模板)使用"
#: ../../Qwen/source/framework/function_call.md:67
#: 0375c440ccf04d778f992d6d0a4cbe88
msgid "**Qwen-Agent**,"
msgstr "**Qwen-Agent**,"
#: ../../Qwen/source/framework/function_call.md:68
#: 2ddfd6f2c18048fdbfd5c6ef4e9c15eb
msgid "**Hugging Face transformers**,"
msgstr "**Hugging Face transformers**,"
#: ../../Qwen/source/framework/function_call.md:69
#: f5a16e12451744558b5f0aa1e830a158
msgid "**Ollama**, and"
msgstr "**Ollama**,和"
#: ../../Qwen/source/framework/function_call.md:70
#: 9ab26b9373194451abc76749fefdb6d4
msgid "**vLLM**."
msgstr "**vLLM**。"
#: ../../Qwen/source/framework/function_call.md:72
#: 6157f3d3533b41a0821a42366eab0623
msgid "If you are familiar with the usage of OpenAI API, you could also directly use the OpenAI-compatible API services for Qwen2.5. However, not all of them support function calling for Qwen2.5. Currently, supported solutions include the self-hosted service by [Ollama](https://github.com/ollama/ollama/blob/main/docs/openai.md) or [vLLM](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api) and the cloud service of [ModelStudio \\[zh\\]](https://help.aliyun.com/zh/model-studio/developer-reference/compatibility-of-openai-with-dashscope#97e2b45391x08)."
msgstr "如果您熟悉OpenAI API的使用,您也可以直接使用适用于Qwen2.5的OpenAI兼容API服务。然而,并非所有服务都支持Qwen2.5的函数调用。目前,支持的解决方案包括由[Ollama](https//github.com/ollama/ollama/blob/main/docs/openai.md)或[vLLM](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api)提供的自托管服务和[阿里云百炼](https://help.aliyun.com/zh/model-studio/developer-reference/compatibility-of-openai-with-dashscope#97e2b45391x08)的云服务。"
#: ../../Qwen/source/framework/function_call.md:76
#: 425c7b0191fa48a6bfd57ecfd1764041
msgid "If you are familiar with application frameworks, e.g., LangChain, you can also use function calling abilities in Qwen2.5 via ReAct Prompting."
msgstr "如果您熟悉应用框架,例如LangChain,您也可以通过ReAct Prompting在Qwen2.5中使用函数调用功能。"
#: ../../Qwen/source/framework/function_call.md:78
#: 508b40fa17d24550a46415881e7db25b
msgid "The Example Case"
msgstr "案例"
#: ../../Qwen/source/framework/function_call.md:80
#: 0b47cd45a43345a79f6746c6955ca28f
msgid "Let's also use an example to demonstrate the inference usage. We assume **Python 3.11** is used as the programming language."
msgstr "我们同样通过一个示例来展示推理的使用方法。假设我们使用的编程语言是**Python 3.11**。"
#: ../../Qwen/source/framework/function_call.md:83
#: ff88f7b842024062bcff856d0e1e4771
msgid "**Scenario**: Suppose we would like to ask the model about the temperature of a location. Normally, the model would reply that it cannot provide real-time information. But we have two tools that can be used to obtain the current temperature of and the temperature at a given date of a city respectively, and we would like the model to make use of them."
msgstr "**场景**:假设我们要询问模型某个地点的温度。通常,模型会回答无法提供实时信息。但我们有两个工具,可以分别获取城市的当前温度和指定日期的温度,我们希望模型能够利用这些工具。"
#: ../../Qwen/source/framework/function_call.md:87
#: 9e478a1634bb499c88442ed6a8dcddc8
msgid "To set up the example case, you can use the following code:"
msgstr "为了这个示例案例,您可以使用以下代码:"
#: ../../Qwen/source/framework/function_call.md
#: aebb484e1452437eb26717c460a1f7ea
msgid "Preparation Code"
msgstr "准备代码"
#: ../../Qwen/source/framework/function_call.md:194
#: c6066c74ceab4ca4b2a15a5d4137c22a
msgid "In particular, the tools should be described using JSON Schema and the messages should contain as much available information as possible. You can find the explanations of the tools and messages below:"
msgstr "工具应使用JSON Schema进行描述,消息应包含尽可能多的有效信息。您可以在下面找到工具和消息的解释:"
#: ../../Qwen/source/framework/function_call.md
#: 9a22b936b9894fa6be0cc492c64abb63
msgid "Example Tools"
msgstr "示例工具"
#: ../../Qwen/source/framework/function_call.md:199
#: 596b3ce24b164e23a09a375ac55ada45
msgid "The tools should be described using the following JSON:"
msgstr "工具应使用以下JSON进行描述:"
#: ../../Qwen/source/framework/function_call.md:263
#: 3be94377bbd94229b3da3cf040818ab9
msgid "For each **tool**, it is a JSON object with two fields:"
msgstr "对于每个**工具**,它是一个具有两个字段的JSON object:"
#: ../../Qwen/source/framework/function_call.md:264
#: 8f3b20c0c2494d51ae6afe6a75662540
msgid "`type`: a string specifying the type of the tool, currently only `\"function\"` is valid"
msgstr "`type`:string,用于指定工具类型,目前仅`\"function\"`有效"
#: ../../Qwen/source/framework/function_call.md:265
#: 23a80c3dfb31433598154f6d75e5fa67
msgid "`function`: an object detailing the instructions to use the function"
msgstr "`function`:object,详细说明了如何使用该函数"
#: ../../Qwen/source/framework/function_call.md:267
#: 25d6764f41c14cd5a54ea390f5fa746d
msgid "For each **function**, it is a JSON object with three fields:"
msgstr "对于每个**function**,它是一个具有三个字段的JSON object:"
#: ../../Qwen/source/framework/function_call.md:268
#: cd048320507042129336abe75fa962e7
msgid "`name`: a string indicating the name of the function"
msgstr "`name`:string 表示函数名称"
#: ../../Qwen/source/framework/function_call.md:269
#: 20186ca105aa4914a4ed6a9821a80555
msgid "`description`: a string describing what the function is used for"
msgstr "`description`:string 描述函数用途"
#: ../../Qwen/source/framework/function_call.md:270
#: 93a5888532534627afafdb4a2ed8d2be
msgid "`parameters`: [a JSON Schema](https://json-schema.org/learn/getting-started-step-by-step) that specifies the parameters the function accepts. Please refer to the linked documentation for how to compose a JSON Schema. Notable fields include `type`, `required`, and `enum`."
msgstr "`parameters`:[JSON Schema](https://json-schema.org/learn/getting-started-step-by-step),用于指定函数接受的参数。请参阅链接文档以了解如何构建JSON Schema。值得注意的字段包括`type`、`required`和`enum`。"
#: ../../Qwen/source/framework/function_call.md:272
#: 0f8d5fa5bf764e888441b5fa445c98ae
msgid "Most frameworks use the tool format and some may use the function format. Which one to use should be obvious according to the naming."
msgstr "大多数框架使用“工具”格式,有些可能使用“函数”格式。根据命名,应该很明显应该使用哪一个。"
#: ../../Qwen/source/framework/function_call.md
#: f15fa96fb4b245b28fd544a5c4a74958
msgid "Example Messages"
msgstr "示例消息"
#: ../../Qwen/source/framework/function_call.md:279
#: 906e53e3f10542819f8506c780957196
msgid "Our query is `What's the temperature in San Francisco now? How about tomorrow?`. Since the model does not know what the current date is, let alone tomorrow, we should provide the date in the inputs. Here, we decide to supply that information in the system message after the default system message `You are Qwen, created by Alibaba Cloud. You are a helpful assistant.`. You could append the date to user message in your application code."
msgstr "我们的查询是`What's the temperature in San Francisco now? How about tomorrow?`。由于模型不知道当前日期,更不用说明天了,我们应该在输入中提供日期。在这里,我们决定在默认系统消息`You are Qwen, created by Alibaba Cloud. You are a helpful assistant.`之后的系统消息中提供该信息。您可以在应用程序代码中将日期附加到用户消息。"
#: ../../Qwen/source/framework/function_call.md:292
#: ../../Qwen/source/framework/function_call.md:555
#: 16b171b36c9f46fea2b30a3b0491db55 ce3d6dc46c5b420484ad78a89e492b1e
msgid "Qwen-Agent"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:294
#: 6ef16c1b08664d7bb94253e4726d1ad9
msgid "[Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) is actually a Python Agent framework for developing AI applications. Although its intended use cases are higher-level than efficient inference, it does contain the **canonical implementation** of function calling for Qwen2.5. It provides the function calling ability for Qwen2.5 to an OpenAI-compatible API through templates that is transparent to users."
msgstr "[Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) 实际上是一个用于开发AI应用的Python智能体框架。尽管其设计用例比高效推理更高级,但它确实包含了Qwen2.5函数调用的**规范实现**。基于OpenAI兼容API,它可以通过模板为Qwen2.5提供了对用户透明的的函数调用能力。"
#: ../../Qwen/source/framework/function_call.md:299
#: 978b5270e15749269a680fbae4a05ab2
msgid "It's worth noting that since a lot of stuff can be done under the scene with application frameworks, currently the official function calling implementation for Qwen2.5 is very flexible and beyond simple templating, making it hard to adapt it other frameworks that use less capable templating engines."
msgstr "值得注意的是,由于应用框架可以在幕后完成大量工作,目前Qwen2.5官方的函数调用实现非常灵活且超出了简单的模板化,这使得它难以适应那些使用能力较弱的模板引擎的其他框架。"
#: ../../Qwen/source/framework/function_call.md:301
#: 8108cc93576b427f86c033cf6847c59a
msgid "Before starting, let's make sure the latest library is installed:"
msgstr "在开始之前,让我们确保已安装了最新的库:"
#: ../../Qwen/source/framework/function_call.md:306
#: baeca47856fe4f8e9f105b6d1678c648
#, fuzzy
msgid "For this guide, we are at version v0.0.10."
msgstr "对于本指南,我们处于版本v0.0.9。"
#: ../../Qwen/source/framework/function_call.md:308
#: ../../Qwen/source/framework/function_call.md:454
#: ../../Qwen/source/framework/function_call.md:670
#: ../../Qwen/source/framework/function_call.md:782
#: b9ce56b7e4544aaba3812335006de981 e3b06686bcbd4cd6bc99ee238def55ea
#: ecfd39b570fb4fd0b405abccefe91de2 f111d6e7f89e498798711aad21fe8dd4
msgid "Preparing"
msgstr "准备工作"
#: ../../Qwen/source/framework/function_call.md:310
#: 45fecd8307254305a2018726f8adb3ac
msgid "Qwen-Agent can wrap an OpenAI-compatible API that does not support function calling. You can serve such an API with most inference frameworks or obtain one from cloud providers like DashScope or Together."
msgstr "Qwen-Agent可以封装一个不支持函数调用的OpenAI兼容API。您可以使用大多数推理框架来提供此类API,或者从DashScope或Together等云提供商处获取一个。"
#: ../../Qwen/source/framework/function_call.md:313
#: 615ebcb0c6cb437f9723d5b4119800a5
msgid "Assuming there is an OpenAI-compatible API at `http://localhost:8000/v1`, Qwen-Agent provides a shortcut function `get_chat_model` to obtain a model inference class with function calling support:"
msgstr "假设在`http://localhost:8000/v1`处有一个OpenAI兼容API,Qwen-Agent提供了一个快捷函数`get_chat_model`,用于获取具有函数调用支持的模型推理类:"
#: ../../Qwen/source/framework/function_call.md:325
#: 783f6b93cc164dfaac23aa4ac20e2c2a
msgid "In the above, `model_server` is the `api_base` common used in other OpenAI-compatible API clients. It is advised to provide the `api_key` (but not via plaintext in the code), even if the API server does not check it, in which case, you can set it to anything."
msgstr "在上述代码中,`model_server`是其他OpenAI兼容API客户端常用的`api_base`。建议您提供`api_key`(但不要以明文形式出现在代码中),即使API服务器不检查它,在这种情况下,您可以将其设置为任何值。"
#: ../../Qwen/source/framework/function_call.md:328
#: 0231335d788a4ff08ba062da648afc94
msgid "For model inputs, the common message structure for system, user, and assistant history should be used:"
msgstr "对于模型输入,应使用系统、用户和助手历史记录的通用消息结构:"
#: ../../Qwen/source/framework/function_call.md:338
#: fe2abed2ecda45a88a0a819123624376
msgid "We add the current date to the system message so that the \"tomorrow\" in the user message is anchored. It can also be added to the user message if one desires."
msgstr "我们在系统消息中添加当前日期,以便使用户消息中的\"明天\"有明确的参照点。如果需要,也可以将其添加到用户消息中。"
#: ../../Qwen/source/framework/function_call.md:341
#: 89c4992bdf324623b53ebe1a54191a99
msgid "At the time, Qwen-Agent works with functions instead of tools. This requires a small change to our tool descriptions, that is, extracting the function fields:"
msgstr "目前,Qwen-Agent使用“函数”而非“工具”。这需要对我们工具描述进行一些小的更改,即提取函数字段:"
#: ../../Qwen/source/framework/function_call.md:348
#: ../../Qwen/source/framework/function_call.md:495
#: ../../Qwen/source/framework/function_call.md:684
#: ../../Qwen/source/framework/function_call.md:813
#: 17daa4342c054a8e9b72169a6ebf49a1 67ef3d2ae00b4b4c9153a10047ca2522
#: 75839b0a245e4704834a7d8b12c5d2b9 7dd2c5cc0b404552afb2dbbfe1532cda
msgid "Tool Calls and Tool Results"
msgstr "工具调用和工具结果"
#: ../../Qwen/source/framework/function_call.md:350
#: b271cc28f0ff4422bb5f1984363e4730
msgid "To interact with the model, the `chat` method should be used:"
msgstr "为了与模型交互,应使用`chat`方法:"
#: ../../Qwen/source/framework/function_call.md:362
#: c3ea39a48e3d4890a006d4cedb109064
msgid "In the above code, the `chat` method receives the `messages`, the `functions`, and an `extra_generate_cfg` parameter. You can put sampling parameters, such as `temperature`, and `top_p`, in the `extra_generate_cfg`. Here, we add to it a special control `parallel_function_calls` provided by Qwen-Agent. As its name suggests, it will enable parallel function calls, which means that the model may generate multiple function calls for a single turn as it deems fit."
msgstr "在上述代码中,`chat`方法接收`messages`、`functions`以及一个`extra_generate_cfg`参数。你可以在`extra_generate_cfg`中放入诸如`temperature`和`top_p`等采样参数。这里,我们添加了Qwen-Agent提供的特殊控制`parallel_function_calls`。顾名思义,它将启用并行函数调用,这意味着模型可能为单次请求生成多个函数调用,按照其判断进行。"
#: ../../Qwen/source/framework/function_call.md:367
#: 717eca19455f452daaccc3626dd93ac9
msgid "The `chat` method returns a generator of list, each of which may contain multiple messages. Since we enable `parallel_function_calls`, we should get two messages in the responses:"
msgstr "`chat`方法返回一个列表的生成器,每个列表可能包含多条消息。因为我们启用了`parallel_function_calls`,我们应该在响应中得到两条消息:"
#: ../../Qwen/source/framework/function_call.md:377
#: 67f5fab10f914b37bf0ae28e5fb4a271
msgid "As we can see, Qwen-Agent attempts to parse the model generation in an easier to use structural format. The details related to function calls are placed in the `function_call` field of the messages:"
msgstr "我们可以看到,Qwen-Agent试图以更易于使用的结构化格式解析模型生成。与函数调用相关的详细信息被放置在消息的`function_call`字段中:"
#: ../../Qwen/source/framework/function_call.md:379
#: 29c4b62805bc4e2ab7a914c591e0c9da
msgid "`name`: a string representing the function to call"
msgstr "`name`:代表要调用的函数的字符串"
#: ../../Qwen/source/framework/function_call.md:380
#: 36396d04bf1e412380a4c03c286e263a
msgid "`arguments`: a JSON-formatted string representing the arguments the function should be called with"
msgstr "`arguments`:表示函数应带有的参数的JSON格式字符串"
#: ../../Qwen/source/framework/function_call.md:382
#: 4630f89282b5414ea41d1ef19cd21b6b
msgid "Note that Qwen2.5-7B-Instruct is quite capable:"
msgstr "请注意,Qwen2.5-7B-Instruct相当强大:"
#: ../../Qwen/source/framework/function_call.md:383
#: 74c060396f974d09aa25053dd1a3d401
msgid "It has followed the function instructions to add the state and the country to the location."
msgstr "它遵循函数指令,在位置中添加了州和国家。"
#: ../../Qwen/source/framework/function_call.md:384
#: b0ee4de6a3844f2dbd728c6957b033d2
msgid "It has correctly induced the date of tomorrow and given in the format required by the function."
msgstr "它正确地推断出明天的日期,并以函数要求的格式给出。"
#: ../../Qwen/source/framework/function_call.md:386
#: 44c59fa19c3a40e2a9b06c24245c1801
msgid "Then comes the critical part -- checking and applying the function call:"
msgstr "接下来是关键部分——检查和应用函数调用:"
#: ../../Qwen/source/framework/function_call.md:402
#: fca90539a17c459b960daf68d509970f
msgid "To get tool results:"
msgstr "获取工具结果:"
#: ../../Qwen/source/framework/function_call.md:403
#: 0d88f729b9474ab0a760f004a7594752
msgid "line 1: We should iterate the function calls in the order the model generates them."
msgstr "第1行:我们应该按模型生成它们的顺序迭代函数调用。"
#: ../../Qwen/source/framework/function_call.md:404
#: 6e061ac2d81f46a2b93830206c2841e4
msgid "line 2: We can check if a function call is needed as deemed by the model by checking the `function_call` field of the generated messages."
msgstr "第2行:通过检查生成消息的`function_call`字段,我们可以查看是否需要按模型判断进行函数调用。"
#: ../../Qwen/source/framework/function_call.md:405
#: b400b27066094316b28b0f5742a080a9
msgid "line 3-4: The related details including the name and the arguments of the function can also be found there, which are `name` and `arguments` respectively."
msgstr "第3-4行:相关详情,包括函数名称和参数,也可以在那里找到,分别是`name`和`arguments`。"
#: ../../Qwen/source/framework/function_call.md:406
#: 9d38560976b0456fa38de4b87007cf27
msgid "line 6: With the details, one should call the function and obtain the results. Here, we assume there is a function named [`get_function_by_name`](#prepcode) to help us get the related function by its name."
msgstr "第6行:有了这些细节,应该调用函数并获取结果。这里,我们假设有一个名为[`get_function_by_name`](#prepcode)的函数来帮助我们根据名称获取相关函数。"
#: ../../Qwen/source/framework/function_call.md:408
#: 4c335071c730449faf0eed0159d525c4
msgid "line 8-12: With the result obtained, add the function result to the messages as `content` and with `role` as `\"function\"`."
msgstr "第8-12行:获得结果后,将函数结果作为`content`添加到消息中,并将`role`设置为`\"function\"`。"
#: ../../Qwen/source/framework/function_call.md:410
#: 0b71ee2ed93342018509aaafad0701d4
msgid "Now the messages are"
msgstr "现在消息是"
#: ../../Qwen/source/framework/function_call.md:422
#: ../../Qwen/source/framework/function_call.md:624
#: ../../Qwen/source/framework/function_call.md:750
#: ../../Qwen/source/framework/function_call.md:900
#: 5e6a5cb155d74d94b1adf0278bb896a1 738554523dce4f6394a9aaf5ffd935f6
#: 7b7eaed481b344b2b581aed31cafbb67 c9afee6dee944cb393d302c297b13b27
msgid "Final Response"
msgstr "最终响应"
#: ../../Qwen/source/framework/function_call.md:424
#: 8b6ac2c7a95747e1bbc86cbc445709d0
msgid "Finally, run the model again to get the final model results:"
msgstr "最后,再次运行模型以获取最终的模型结果:"
#: ../../Qwen/source/framework/function_call.md:432
#: 6ce8ec5e4d0049dab95f830e46c6dcea
msgid "The final response should be like"
msgstr "最终响应应如下所示"
#: ../../Qwen/source/framework/function_call.md:438
#: ../../Qwen/source/framework/function_call.md:555
#: 8c8a902a7d3a40e3b8b6304e3cfd60aa d73648ac333743cd8e21e14eae3db734
msgid "Hugging Face transformers"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:440
#: f5acbc281bcb44efa53d6818fa72f5d7
msgid "Since function calling is based on prompt engineering and templates, `transformers` supports it with its tokenizer utilities, in particular, the `tokenizer.apply_chat_template` method, which hides the sophistication of constructing the model inputs, using the Jinja templating engine. However, it means that users should handle the model output part on their own, which includes parsing the generated function call message."
msgstr "由于函数调用基于提示工程和模板,`transformers`通过其tokenizer工具支持这一功能,特别是`tokenizer.apply_chat_template`方法,它利用Jinja模板引擎隐藏了构建模型输入的复杂性。然而,这意味着用户需要自行处理模型输出部分,包括解析生成的函数调用消息。"
#: ../../Qwen/source/framework/function_call.md:443
#: 43387eb2f7c74c0980b05b258aec9491
msgid "The blog piece [_Tool Use, Unified_](https://huggingface.co/blog/unified-tool-use) is very helpful in understanding its design. Be sure to take a look."
msgstr "博客文章[_Tool Use, Unified_](https://huggingface.co/blog/unified-tool-use)对于理解其设计非常有帮助。务必阅读一下。"
#: ../../Qwen/source/framework/function_call.md:446
#: 09e1372474d8426dac3a9241c06eda5c
msgid "Tool use API is available in transformers since v4.42.0. Before starting, let's check that:"
msgstr "自v4.42.0版本起,transformers中提供了工具使用API。在开始之前,让我们确认这一点:"
#: ../../Qwen/source/framework/function_call.md:452
#: 322f241f77b040048a084f8cd76c2e0a
msgid "For this guide, we are at version v4.44.2."
msgstr "对于本指南,我们处于v4.44.2版本。"
#: ../../Qwen/source/framework/function_call.md:456
#: 1b1b289980df4b6a9cb4bf1c29d4272d
msgid "For Qwen2.5, the chat template in `tokenizer_config.json` has already included support for the Hermes-style tool use. We simply need to load the model and the tokenizer:"
msgstr "对于 Qwen2.5,`tokenizer_config.json` 中的聊天模板已经包含了对 Hermes 风格工具调用的支持。我们只需加载模型和分词器:"
#: ../../Qwen/source/framework/function_call.md:472
#: ../../Qwen/source/framework/function_call.md:674
#: ../../Qwen/source/framework/function_call.md:790
#: 6541fc7c9b774e69b2b0b97a4c491459 888996a83df34b91b30b1355ddfc3494
#: ea933584017e47cfb87ceff594f54c9c
msgid "The inputs are the same with those in [the preparation code](#prepcode):"
msgstr "输入与[准备代码](#prepcode)中的相同:"
#: ../../Qwen/source/framework/function_call.md:479
#: d7ab8f1b743b41038810a7d223c8ffc9
msgid "In `transformers`, you can also directly use Python functions as tools with certain constraints[^get_json_schema_note]:"
msgstr "在`transformers`中,您也可以直接将Python函数作为工具使用,但需遵循特定约束[^get_json_schema_note]:"
#: ../../Qwen/source/framework/function_call.md:497
#: a5231c97237249739c5d73db49695b05
msgid "To construct the input sequence, we should use the `apply_chat_template` method and then let the model continue the texts:"
msgstr "为了构造输入序列,我们应该使用`apply_chat_template`方法,然后让模型继续生成文本:"
#: ../../Qwen/source/framework/function_call.md:506
#: cb19105f6e464ad1a3eee0c9fe907bb1
msgid "The output texts should be like"
msgstr "输出文本应如下所示:"
#: ../../Qwen/source/framework/function_call.md:516
#: 0300726829e542d9942f51a2772206ff
msgid "Now we need to do two things:"
msgstr "现在我们需要做两件事:"
#: ../../Qwen/source/framework/function_call.md:517
#: a0c8a86da0524084b636946b4cfeaf87
msgid "Parse the generated tool calls to a message and add them to the messages, so that the model knows which tools are used."
msgstr "解析生成的工具调用为一条消息,并将其添加到消息列表中,以便模型了解所使用的工具。"
#: ../../Qwen/source/framework/function_call.md:518
#: 90ccd935a4554dcbb536a444cd96592d
msgid "Obtain the results of the tools and add them to the messages, so that the model knows the results of the tool calls."
msgstr "获取工具的结果并将其添加到消息列表中,以便模型了解工具调用的结果。"
#: ../../Qwen/source/framework/function_call.md:520
#: 8568f15805ef469790801e79525ba25c
msgid "In `transformers`, the tool calls should be a field of assistant messages. Let's use a simple function called `try_parse_tool_calls` to parse the tool calls:"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:552
#: be6ae865b4a6414681ef3d5ed1957c39
msgid "This function does not cover all possible scenarios and thus is prone to errors. But it should suffice for the purpose of this guide."
msgstr ""
#: ../../Qwen/source/framework/function_call.md:556
#: 9de3bba707eb45f09490e25489b410fd
msgid "The template in the `tokenizer_config.json` assumes that the generated content alongside tool calls is in the same message instead of separate assistant messages, e.g.,"
msgstr "`tokenizer_config.json` 中的模板假设生成的内容和工具调用是在同一消息中,而不是分开的助手消息,例如:"
#: ../../Qwen/source/framework/function_call.md:566
#: 098c29ea125943ebbc162aabb091c773
msgid "instead of"
msgstr "而非"
#: ../../Qwen/source/framework/function_call.md:583
#: e6738454a6cd4f408b5d375eadd851ee
msgid "This is implemented roughly in `try_parse_tool_calls` but keep that in mind if you are writing your own tool call parser."
msgstr "`try_parse_tool_calls` 中大致实现了这一约定,但如果你正在编写自己的工具调用解析器,请留意这一点。"
#: ../../Qwen/source/framework/function_call.md:604
#: b2b6427b91524ed0bb0c333c0aebfe53
msgid "The messages now should be like"
msgstr "现在消息应如下所示:"
#: ../../Qwen/source/framework/function_call.md:618
#: 7eae05a0693f4e10bb0a3d505939b1ba
msgid "The messages are similar to those of Qwen-Agent, but there are some major differences:"
msgstr "这些消息类似于Qwen-Agent的消息,但存在一些主要差异:"
#: ../../Qwen/source/framework/function_call.md:619
#: 0685adf142544e6cb3eb76eec3cd9017
msgid "Tools instead of functions"
msgstr "工具而非函数"
#: ../../Qwen/source/framework/function_call.md:620
#: e5c7371882e441008feb0b17910716ee
msgid "Parallel calls are by default"
msgstr "默认情况下为并行调用"
#: ../../Qwen/source/framework/function_call.md:621
#: e6e4530ff2fb4689ac48203bb796b250
msgid "Multiple tool calls as a list in a single assistant message, instead of multiple messages."
msgstr "多个工具调用以列表形式在一个助手消息中,而不是多个消息"
#: ../../Qwen/source/framework/function_call.md:622
#: d6e37c7dd0fb44a79d29e0306a1ff80a
msgid "The function arguments are parsed into a dict if it is a valid JSON-formatted string."
msgstr "如果函数参数是有效的JSON格式字符串,则将其解析为字典。"
#: ../../Qwen/source/framework/function_call.md:626
#: db64d30026b645d389aa1366d55eb177
msgid "Then it's time for the model to generate the actual response for us based on the tool results. Let's query the model again:"
msgstr "现在是时候根据工具结果,让模型为我们生成实际响应了。再次查询模型:"
#: ../../Qwen/source/framework/function_call.md:636
#: 58924dcb06804e05a7b9308933733104
msgid "The output_text should be like"
msgstr "输出文本应如下所示:"
#: ../../Qwen/source/framework/function_call.md:641
#: 00a88d2136e648f68133ebc6cf0e01b6
msgid "Add the result text as an assistant message and the final messages should be ready for further interaction:"
msgstr "将结果文本作为助手消息添加,最终消息应准备好进行进一步交互:"
#: ../../Qwen/source/framework/function_call.md:555
#: ../../Qwen/source/framework/function_call.md:646
#: 581caffc70b5478d8508e59d56166ad5 f64b7d2b14f64b42b959e5a6e75a3bf4
msgid "Ollama"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:648
#: 5c295d58873d4f949e8a640ab1309f30
msgid "Ollama is a set of tools for serving LLMs locally. It also relies on its template implementation to support function calling. Different from transformers, which is written in Python and uses the Jinja template whose syntax is heavily inspired by Django and Python, Ollama, which is mostly written in Go, uses Go's [text/template](https://pkg.go.dev/text/template) packages. In addition, Ollama implements internally a helper function so that it can automatically parse the generated tool calls in texts to structured messages if the format supported."
msgstr "Ollama是一套用于本地部署LLMs的工具集。它还依赖于其模板实现来支持函数调用。不同于使用Python编写的transformers,采用了受Django和Python语法启发的Jinja模板,主要用Go编写的Ollama则使用了Go的[text/template](https://pkg.go.dev/text/template)包。此外,Ollama内部实现了辅助函数,如果格式被支持的话,它可以自动解析文本中生成的工具调用为结构化的消息。"
#: ../../Qwen/source/framework/function_call.md:653
#: 92e60ea979c74a0aa9c49d3d4175f12a
msgid "You could check the [Tool support](https://ollama.com/blog/tool-support) blog post first."
msgstr "您可以先查阅[Tool support](https://ollama.com/blog/tool-support)的博客文章。"
#: ../../Qwen/source/framework/function_call.md:655
#: 76a1fb72c14f4d9fbf77fe08e79d2c35
msgid "Tool support has been available in Ollama since v0.3.0. You can run the following to check the Ollama version:"
msgstr "自v0.3.0版本以来,Ollama已经提供了工具支持。您可以运行以下命令来检查Ollama的版本:"
#: ../../Qwen/source/framework/function_call.md:660
#: 772f68fcfa2d412a86182d819dde28ad
msgid "If lower than expected, follow [the official instructions](https://ollama.com/download) to install the latest version."
msgstr "如果版本低于预期,请遵循[官方说明](https://ollama.com/download)安装最新版本。"
#: ../../Qwen/source/framework/function_call.md:662
#: 718f28a07fe64d2687e09d472d5cce3a
msgid "In this guide, we will aslo use [ollama-python](https://github.com/ollama/ollama-python), before starting, make sure it is available in your environment:"
msgstr "在本指南中,我们将使用[ollama-python](https://github.com/ollama/ollama-python),在开始之前,请确保您的环境中已安装此库:"
#: ../../Qwen/source/framework/function_call.md:667
#: 34d15300b24e447bb2ee7e24dd0567f7
msgid "For this guide, the `ollama` binary is at v0.3.9 and the `ollama` Python library is at v0.3.2."
msgstr "对于本指南,`ollama`二进制文件的版本为v0.3.9,`ollama` Python库的版本为v0.3.2。"
#: ../../Qwen/source/framework/function_call.md:672
#: e0b6a3f28628471f88bb07298f742b12
msgid "The messages structure used in Ollama is the same with that in `transformers` and the template in [Qwen2.5 Ollama models](https://ollama.com/library/qwen2.5) has supported tool use."
msgstr "Ollama 中使用的消息结构与 `transformers` 中的相同,并且 [Qwen2.5 Ollama 模型](https://ollama.com/library/qwen2.5) 的模板已经支持工具调用。"
#: ../../Qwen/source/framework/function_call.md:681
#: 64eb1d9868c54bc19076c00b0485d371
msgid "Note that you cannot pass Python functions as tools directly and `tools` has to be a `dict`."
msgstr "请注意,您不能直接将Python函数作为工具传递,`tool`的类型必须是`dict`。"
#: ../../Qwen/source/framework/function_call.md:686
#: 3b4c611ec6ce484ca21bb4ed97255d4d
msgid "We can use the `ollama.chat` method to directly query the underlying API:"
msgstr "我们可以使用`ollama.chat`方法直接查询底层API:"
#: ../../Qwen/source/framework/function_call.md:698
#: 94735aa651304d088c97dadacd7c456b
msgid "The main fields in the response could be:"
msgstr "响应中的主要字段可能是:"
#: ../../Qwen/source/framework/function_call.md:713
#: 347166b959a2441c856d78ec1b964233
#, fuzzy
msgid "Ollama's tool call parser has succeeded in parsing the tool results. If not, you may refine [the `try_parse_tool_calls` function above](#parse-function). Then, we can obtain the tool results and add them to the messages. The following is basically the same with `transformers`:"
msgstr "Ollama的工具调用解析器成功解析出了工具调用。[^tool_call_arg_format] 但如果失败了,您可能需要尝试改进[上面的`try_parse_tool_calls`函数](#prepcode)。 然后,我们可以获取工具的结果并将其添加到消息中。以下操作基本上与`transformers`相同:"
#: ../../Qwen/source/framework/function_call.md:736
#: ../../Qwen/source/framework/function_call.md:886
#: 3413d71073e543e793ff7a41961402dc e3daa8f3d7a74680b3412b5ad71936fc
msgid "The messages are now like"
msgstr "现在消息如下:"
#: ../../Qwen/source/framework/function_call.md:752
#: bbeba05f6afa4ac8b7aa8116a2968155
msgid "The rest are easy:"
msgstr "剩下的部分很简单:"
#: ../../Qwen/source/framework/function_call.md:763
#: ed728030eac04637b75496c8f9dc8d42
msgid "The final message should be like the following:"
msgstr "最终的消息应该如下所示:"
#: ../../Qwen/source/framework/function_call.md:555
#: ../../Qwen/source/framework/function_call.md:769
#: d1f29f8199d44329a0f14b51930c9bb8 e0704a724b0842c1bdaa29dd46f0b21a
msgid "vLLM"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:771
#: 055e984addc44e04a88e762a01dd54e0
msgid "vLLM is a fast and easy-to-use library for LLM inference and serving. It uses the tokenizer from `transformers` to format the input, so we should have no trouble preparing the input. In addition, vLLm also implements helper functions so that generated tool calls can be parsed automatically if the format is supported."
msgstr "vLLM 是一个快速且易于使用的库,用于大型语言模型的推理和部署。它使用 `transformers` 中的分词器来格式化输入,因此我们在准备输入时应该不会遇到任何问题。此外,vLLM 还实现了辅助函数,以便在支持的情况下自动解析生成的工具调用。"
#: ../../Qwen/source/framework/function_call.md:775
#: 34aeafdc888f4d109f08c6cb46b80d03
msgid "Tool support has been available in `vllm` since v0.6.0. Be sure to install a version that supports tool use. For more information, check the [vLLM documentation](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api)."
msgstr "工具支持自 v0.6.0 版本起已在 `vllm` 中可用。请确保安装了一个支持工具调用的版本。更多信息,请查阅 [vLLM 文档](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api)"
#: ../../Qwen/source/framework/function_call.md:779
#: 926aa4074d81422f94dba12bd220cf22
msgid "For this guide, we are at version v0.6.1.post2. We will use the OpenAI-Compatible API by `vllm` with the API client from the `openai` Python library."
msgstr "在本指南中,我们使用的是 v0.6.1.post2 版本。我们将使用 `vllm` 提供的 OpenAI 兼容 API,并通过 `openai` Python 库的 API 客户端来进行操作。"
#: ../../Qwen/source/framework/function_call.md:784
#: 18d4ce8e46714651a01fc8c6b8c30587
msgid "For Qwen2.5, the chat template in tokenizer_config.json has already included support for the Hermes-style tool use. We simply need to start a OpenAI-compatible API with vLLM:"
msgstr "对于 Qwen2.5,`tokenizer_config.json` 中的聊天模板已经包含了对 Hermes 风格工具调用的支持。我们只需要启动一个由 vLLM 提供的 OpenAI 兼容 API 即可:"
#: ../../Qwen/source/framework/function_call.md:797
#: 2c4b4f2aa0414a7baeeb48701898c09c
msgid "Let's also initialize the client:"
msgstr "我们先初始化API客户端:"
#: ../../Qwen/source/framework/function_call.md:815
#: 0aa57070a73c4785a371c2a454ce9360
msgid "We can use the create chat completions endpoint to query the model:"
msgstr "我们可以使用create chat completions endpoint直接查询底层API:"
#: ../../Qwen/source/framework/function_call.md:831
#: ec216e66499d48ee8db45c7fe0a92ebb
msgid "vLLM should be able to parse the tool calls for us, and the main fields in the response (`response.choices[0]`) should be like"
msgstr "vLLM应当可以为我们解析工具调用,回复的主要字段(`response.choices[0]`)应如下所示:"
#: ../../Qwen/source/framework/function_call.md:858
#: 82944ee90fcf4223a220674c83ca0255
msgid "Note that the function arguments are JSON-formatted strings, which Qwen-Agent follows but `transformers` and Ollama differs."
msgstr "请注意这里函数的参数是JSON格式字符串,Qwen-Agent与其一致,但`transformers`和Ollama与之相异。"
#: ../../Qwen/source/framework/function_call.md:860
#: afbeea17f4974f3aa4cd04d1af81f6e1
msgid "As before, chances are that there are corner cases where tool calls are generated but they are malformed and cannot be parsed. For production code, we should try parsing by ourselves."
msgstr "如前所述,有可能存在边界情况,模型生成了工具调用但格式不良也无法被解析。对于生产代码,我们需要尝试自行解析。"
#: ../../Qwen/source/framework/function_call.md:863
#: 6679ef4d0e494546bada423b26f7427c
msgid "Then, we can obtain the tool results and add them to the messages as shown below:"
msgstr "随后,我们可以调用工具并获得结果,然后将它们加入消息中:"
#: ../../Qwen/source/framework/function_call.md:884
#: 2efb4934cb904461aa61117a3df94c1d
msgid "It should be noted that the OpenAI API uses `tool_call_id` to identify the relation between tool results and tool calls."
msgstr "这里需要注意OpenAI API使用`tool_call_id`字段来识别工具结果和工具调用间的联系。"
#: ../../Qwen/source/framework/function_call.md:902
#: bbcb57b29a374c25bf6114fd4ca1e44a
msgid "Let's call the endpoint again to seed the tool results and get response:"
msgstr "让我们再次查询接口,以给模型提供工具结果并获得回复:"
#: ../../Qwen/source/framework/function_call.md:919
#: 7a2c0dfa5cff41df911c16597fd6166d
#, fuzzy
msgid "The final response (`response.choices[0].message.content`) should be like"
msgstr "最终响应 (`response.choices[0].message`)应如"
#: ../../Qwen/source/framework/function_call.md:924
#: af3cbc40b1b440868c89b2db695caf78
msgid "Discussions"
msgstr "小结"
#: ../../Qwen/source/framework/function_call.md:926
#: 00b863962f2349c1843e5caef08e7b11
msgid "Now, we have introduced how to conduct inference with function calling using Qwen2 in three different frameworks! Let's make a brief comparison."
msgstr "现在,我们已经介绍了如何使用Qwen2在三种不同的框架中通过函数调用进行推理!让我们做一个简要的比较。"
#: ../../Qwen/source/framework/function_call.md:555
#: d76096fd53be4eedaf0356aefa56711d
msgid "Item"
msgstr "项目"
#: ../../Qwen/source/framework/function_call.md:555
#: c3673767712f4e08888b6be0600889e4
msgid "OpenAI API"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 82945e42496f4f41aec5173a035e1b57
msgid "Type"
msgstr "类型"
#: ../../Qwen/source/framework/function_call.md:555
#: 1815e9c221894b3babd8a442fa32bcbd 4b2955158e474544b12f4da04ad815f9
#: 6cab1554299047a1aad0c6e1f86ee5cf a5ebb0d610af4661bee4ffc8d041b819
msgid "HTTP API"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 043bdab8051049ac8cf9e76c535ade2b 89650682599d48708f5784a805fa4e0b
msgid "Python Library"
msgstr "Python库"
#: ../../Qwen/source/framework/function_call.md:555
#: f90552caaf474c0fa1df085f82461eef
msgid "Inference Backend"
msgstr "推理后端"
#: ../../Qwen/source/framework/function_call.md:555
#: 6cc61d9e39fe4eeaa2bbacc9dc576fdb e70b9a499e5f454ba898c004872f531b
msgid "-"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 59f22256744d416f98d03df8f2278c5f ffcd7f68799c4a499a2c14135aec2b87
msgid "PyTorch"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: a9e2e27a54c542e0a62d2dfa88852b10
msgid "llama.cpp"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 0024df8a279c4bb4a8a15785807595d1
msgid "Templating Backend"
msgstr "模板后端"
#: ../../Qwen/source/framework/function_call.md:555
#: 0a624fffdba54431b5e296c3aacf622d 2d81ae6fb5994f4e88303841395a8f05
msgid "Jinja"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 3ec6d091c75d47838ca192daccd85a8b
msgid "Go `text/template`"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: c538c37491c14245995e39510fc3488a
msgid "Python"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: bc969123acb444f49b6f6f34fa1a765b
msgid "Tools/Functions"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 0099d942538c498789553fad009c1ab0 0285033e57ff48e19d792bbdd164c0be
#: 59560c34831d48db925d9e56e862c152 9de65d6f04584de483b2224e91424c03
msgid "Tools"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 32250c9196b64591833201139c23afc9
msgid "Functions"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 38fed11b5de64e1dba338d98a9b83acd
msgid "Parallel Calls"
msgstr "并行调用"
#: ../../Qwen/source/framework/function_call.md:555
#: e0ac64fc47db455c9dbc435b9e944146
msgid "Default Yes (Configurable)"
msgstr "默认是(可配置)"
#: ../../Qwen/source/framework/function_call.md:555
#: 316ff3cad523414d87967f17ec0d8ca6 62922963f9ef49b9b759e5b1c88c241f
#: c0513c97d7964c708d6858315d1e64de
msgid "Yes"
msgstr "是"
#: ../../Qwen/source/framework/function_call.md:555
#: 53188b15532042388b3d90cf06546c0e
msgid "Default No (Configurable)"
msgstr "默认否(可配置)"
#: ../../Qwen/source/framework/function_call.md:555
#: 7885d6b81a36481f8cb099f1c0fe9635
msgid "Call Format"
msgstr "调用格式"
#: ../../Qwen/source/framework/function_call.md:555
#: 365e626c700b47e2b943c616796ad4e7 915ff0bab0534d7399db78c8f80177fc
#: ca4dcf05fd06448387b191655a3eb286 d74bbd9ef649402e97a8a92fd1669646
msgid "Single assistant message with `tool_calls`"
msgstr "带有`tool_calls`的单个助手消息"
#: ../../Qwen/source/framework/function_call.md:555
#: 0f3197469a194ebdab97cc71290d76c5
msgid "Multiple assistant messages with `function_call`"
msgstr "带有`function_call`的多个助手消息"
#: ../../Qwen/source/framework/function_call.md:555
#: 8c9ca98e2d504ffcb7a01c44511ff570
msgid "Call Argument Format"
msgstr "调用参数格式"
#: ../../Qwen/source/framework/function_call.md:555
#: 20a03bfe9dc64de1b50fbbc02d704fb4 7e738ab62348417d8853aeaa45c6c91e
#: ccc62ac381c6488ba814d5a11a846dc9
msgid "string"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 021b4af5157d4ba4b329dd5b20c01424 f450234ab42e4a4b82d5f1bd2a3bc6b3
msgid "object"
msgstr ""
#: ../../Qwen/source/framework/function_call.md:555
#: 821091c4a16f4b13b76c8c6b3ddb0288
msgid "Call Result Format"
msgstr "调用结果格式"
#: ../../Qwen/source/framework/function_call.md:555
#: a09900cc7d3b403bbbe3539ac5b30dcc ba591284bea645129938adf064ee939a
#: cbba7c99bad7497b85bec11223978cc9 cde37dd4f93546a08987269cbf5889cd
msgid "Multiple tool messages with `content`"
msgstr "带有`content`的多个工具消息"
#: ../../Qwen/source/framework/function_call.md:555
#: c760b606407d46499fcff6b5fb498a16
msgid "Multiple function messages with `content`"
msgstr "带有`content`的多个函数消息"
#: ../../Qwen/source/framework/function_call.md:941
#: f72961ec5f5448849f916c3fa2fab7aa
msgid "There are some details not shown in the above table:"
msgstr "上表中有些特性未被体现:"
#: ../../Qwen/source/framework/function_call.md:942
#: 1874dbb8d1814d7a91e949f90e56f0eb
msgid "OpenAI API comes with Python, Node.js, Go, and .NET SDKs. It also follows the OpenAPI standard."
msgstr "OpenAI API附带了Python、Node.js、Go和.NET SDK。它还遵循OpenAPI标准。"
#: ../../Qwen/source/framework/function_call.md:943
#: 3c70aea4b962498da3822193c6052cb8
msgid "Ollama comes with Python and Node.js SDKs. It has OpenAI-compatible API at a different base url that can be accessed using OpenAI API SDK."
msgstr "Ollama附带了Python和Node.js SDK。它在不同的base URL上具有与OpenAI兼容的API,可以使用OpenAI API SDK访问。"
#: ../../Qwen/source/framework/function_call.md:944
#: 01ff2cb248df45049c9c6dd1b415bba8
msgid "Qwen-Agent as an application framework can call the tools automatically for you, which is introduced in [the Qwen-Agent guide](./qwen_agent)."
msgstr "作为应用程序框架,Qwen-Agent可以自动为您调用工具,这在[Qwen-Agent指南](./qwen_agent)中有所介绍。"
#: ../../Qwen/source/framework/function_call.md:947
#: 515bc99e272149938b4c2add0deee022
msgid "In addition, there are more on the model side of function calling, which means you may need to consider more things in production code:"
msgstr "此外,在函数调用的模型方面还有更多内容,这意味着您可能需要在生产代码中考虑更多的事情:"
#: ../../Qwen/source/framework/function_call.md:948
#: d988e24a141d430980204d8badc9a44b
msgid "**Accuracy of function calling**: When it comes to evaluate the accuracy of function calling, there are two aspects: (a) whether the correct functions (including no functions) are selected and (b) whether the correct function arguments are generated. It is not always the case that Qwen2.5 will be accurate. Function calling can involve knowledge that is deep and domain-specific. Sometimes, it doesn't fully understand the function and select the wrong one by mistake. Sometimes, it can fall into a loop and require calling the same function again and again. Sometimes, it will fabricate required function arguments instead of asking the user for input. To improve the function calling accuracy, it is advised to first try prompt engineering: does a more detailed function description help? can we provide instructions and examples to the model in the system message? If not, finetuning on your own data could also improve performance."
msgstr "**函数调用准确性**:在评估函数调用的准确性时,有两个方面:(a) 是否选择了正确的函数(包括没有函数)以及(b) 是否生成了正确的函数参数。Qwen2.5并不总是准确的。函数调用可能涉及深入且领域特定的知识。有时,它不能完全理解函数并错误地选择了错误的函数。有时,它可能会陷入循环,需要反复调用相同的函数。有时,它会伪造所需的函数参数而不是向用户请求输入。为了提高函数调用的准确性,建议首先尝试提示工程:更详细的函数描述是否有所帮助?我们是否可以在系统消息中为模型提供指导和示例?如果没有,使用自己的数据进行微调也可以提高性能。"
#: ../../Qwen/source/framework/function_call.md:961
#: 4215aa1f6ddd4cf29be2e121fc6313ff
msgid "**Protocol consistency**: Even with the proper function calling template, the protocol may break. The model may generate extra texts to tool calls, e.g., explanations. The generated tool call may be invalid JSON-formatted string but a representation of a Python dict The generated tool call may be valid JSON but not conforms to the provided JSON Schema. For those kinds of issues, while some of them could be addressed with prompt engineering, some are caused by the nature of LLMs and can be hard to resolve in a general manner by LLMs themselves. While we strive to improve Qwen2.5 in this regard, edge cases are unlikely to be eliminated completely."
msgstr "**协议一致性**:即使具备恰当的函数调用模板,协议也可能被破坏。模型可能会在工具调用中生成额外文本,例如解释说明。生成的工具调用可能是无效的JSON格式字符串,但是是Python dict的字符串表示;生成的工具调用可能是有效的JSON,但不符合提供的JSON Schema。对于这类问题,虽然有些可以通过提示工程解决,但有些是由大型语言模型的本质引起的,很难由大模型本身以通用方式解决。尽管我们在这一方面努力改进Qwen2.5,但极端情况不太可能被完全消除。"
#: ../../Qwen/source/framework/function_call.md:970
#: ba9dfcb4b398472a977699081eb2e1af
msgid "Function Calling Templates"
msgstr "函数调用模板"
#: ../../Qwen/source/framework/function_call.md:972
#: bf17665d24494f919113d22f59aca750
msgid "The template design for function calling often includes the following aspects:"
msgstr "函数调用的模板设计通常包括以下方面:"
#: ../../Qwen/source/framework/function_call.md:973
#: 59c47a9fdc2246c08655158e700e457c
msgid "How to describe the functions to the model, so that the model understands what they are and how to use them."
msgstr "如何向模型描述这些函数,以便模型理解它们是什么以及如何使用它们。"
#: ../../Qwen/source/framework/function_call.md:974
#: 6c1d6432a8c24e19bccb999462c604bb
msgid "How to prompt the model, so that it knows that functions can be used and in what format to generate the function calls."
msgstr "如何提示模型,以便它知道可以使用函数,并以何种格式生成函数调用。"
#: ../../Qwen/source/framework/function_call.md:975
#: db33ee8dc25049218020c43a98bd82c3
msgid "How to tell a function call generation from others in generated text, so that we can extract the calls from the generated texts and actually make the calls."
msgstr "如何从生成的文本中区分函数调用与其他内容,以便我们能够从生成的文本中提取调用并实际执行调用。"
#: ../../Qwen/source/framework/function_call.md:976
#: a8365ddddf4140239d1de118b25dcecc
msgid "How to incorporate the function results to the text, so that the model can tell them from its own generation and make connection among the calls and the results."
msgstr "如何将函数结果融入文本中,以便模型能够将其与自己的生成区分开来,并在调用和结果之间建立联系。"
#: ../../Qwen/source/framework/function_call.md:978
#: e9842cf66f10461eacecec888bd6ec9e
msgid "For experienced prompt engineers, it should be possible to make any LLM support function calling, using in-context learning techniques and with representative examples, though with varied accuracy and stability depending on how \"zero-shot\" the task at hand is."
msgstr "对于经验丰富的提示工程师而言,应该有可能利用上下文学习技术和代表性示例,使任何大模型支持函数调用,尽管准确性和稳定性会根据手头任务的“零样本”程度而有所不同。"
#: ../../Qwen/source/framework/function_call.md:980
#: 27d8b9ba72444e5d89f42674af815d5f
msgid "Starting from ReAct Prompting"
msgstr "从ReAct Prompting开始"
#: ../../Qwen/source/framework/function_call.md:982
#: 895d264e474046a6b293266a43788e5a
msgid "For example, ReAct Prompting can be used to implement function calling with an extra element of planning:"
msgstr "例如,可以使用ReAct Prompting实现带有额外规划元素的函数调用:"
#: ../../Qwen/source/framework/function_call.md:983
#: 1ee0b022742b44c4984b3230b9b8d59c
msgid "**Thought**: the overt reasoning path, analyzing the functions and the user query and saying it out \"loud\""
msgstr "**Thought**:显而易见的推理路径,分析函数和用户查询,并大声“说”出来"
#: ../../Qwen/source/framework/function_call.md:984
#: 65c7b8f481e34cd287d757d7717b47d7
msgid "**Action**: the function to use and the arguments with which the function should be called"
msgstr "**Action**:要使用的函数以及调用该函数时应使用的参数"
#: ../../Qwen/source/framework/function_call.md:985
#: db5bbcc3860047b3b5022c48d1ca45f1
msgid "**Observation**: the results of the function"
msgstr "**Observation**:函数的结果"
#: ../../Qwen/source/framework/function_call.md:987
#: e0da151009e94c31a8475e2bb1e24694
msgid "In fact, Qwen2 is verse in the following variant of ReAct Prompting (similar to LangChain ReAct) to make the intermediate texts more structured:"
msgstr "实际上,Qwen2熟练掌握以下变体的ReAct Prompting(类似于LangChain ReAct),以使中间文本更具结构化:"
#: ../../Qwen/source/framework/function_call.md:1017
#: b13ae622a62f40a0bd6b79da2f9cdfe1
msgid "As you can see, there is no apparent user/assistant conversation structure in the template. The model will simply continue the texts. One should write the code to actively detect which step the model is at and in particular to add the observations in the process, until the Final Answer is generated."
msgstr "如您所见,模板中没有明显的用户/助手对话结构。模型将简单地继续文本。应该编写代码来主动检测模型处于哪个步骤,并特别在过程中添加观察结果,直到生成最终答案。"
#: ../../Qwen/source/framework/function_call.md:1021
#: cde0405e4e434d38b652f48e09c215b8
#, fuzzy
msgid "However, as most programming interfaces accept the message structure, there should be some kind of adapter between the two. [The ReAct Chat Agent](https://github.com/QwenLM/Qwen-Agent/blob/v0.0.10/qwen_agent/agents/react_chat.py) in Qwen-Agent facilitates this kind of conversion."
msgstr "然而,由于大多数编程接口接受“message”结构,两者之间应该有某种适配器。[Qwen-Agent中的ReAct Chat Agent](https://github.com/QwenLM/Qwen-Agent/blob/v0.0.9/qwen_agent/agents/react_chat.py)实现了这种转换。"
#: ../../Qwen/source/framework/function_call.md:1024
#: c477ceb3bba940cdb1ad6697a80000af
msgid "Qwen2 Function Calling Template"
msgstr "Qwen2 函数调用模板"
#: ../../Qwen/source/framework/function_call.md:1026
#: 59e9eb1bca624a3980c17f50de84eec4
msgid "As a step forward, the official Qwen2 function calling template is in the vein of the ReAct Prompting format but focuses more on"
msgstr "作为向前迈进的一步,官方的Qwen2函数调用模板沿袭了ReAct Prompting格式,但更侧重于"
#: ../../Qwen/source/framework/function_call.md:1027
#: 002b6a55039a45b192c9dff521b7c360
msgid "differentiating the keywords like `Question`, `Thought`, `Action`, etc., from generation,"
msgstr "将诸如`Question`、`Thought`、`Action`等关键词与生成区分开来,"
#: ../../Qwen/source/framework/function_call.md:1028
#: 041d040c25754ea0b50f80cf75deebf8
msgid "simplifying the process,"
msgstr "简化这一过程,"
#: ../../Qwen/source/framework/function_call.md:1029
#: 102345af873d49d8b455d3e0bb840ea7
msgid "supporting better multi-turn conversation, and"
msgstr "更好支持多轮对话,以及"
#: ../../Qwen/source/framework/function_call.md:1030
#: 7fae2cd9100a40f4b4928bd8e191c489
msgid "adding controls for specialized usage."
msgstr "为特异性使用添加控制。"
#: ../../Qwen/source/framework/function_call.md:1033
#: dd5768a0d69a4276a79e6ffbf7fba497
msgid "An equivalent example would be"
msgstr "一个等效的例子是"
#: ../../Qwen/source/framework/function_call.md:1065
#: 55d0107bfa5e47ddbbb631e6a8f7a113
msgid "Let's first list the obvious differences:"
msgstr "我们先列出明显的差异:"
#: ../../Qwen/source/framework/function_call.md:1066
#: b7f6a14154d84d6f922d350a07f72abd
msgid "Keywords (`✿FUNCTION✿`, `✿ARGS✿`, etc.) seem rare in ordinary text and more semantically related to function calling, but not special tokens yet."
msgstr "关键字(`✿FUNCTION✿`, `✿ARGS✿`等)在普通文本中似乎很少见,且与函数调用语义相关,但尚未成为特殊token。"
#: ../../Qwen/source/framework/function_call.md:1067
#: 978d3ec5660642948dab4f42e3364745
msgid "Thought is omitted. This could affect accuracy for some use cases."
msgstr "Thought被省略了。这可能会影响某些使用场景的准确性。"
#: ../../Qwen/source/framework/function_call.md:1068
#: f0876e39e84641ada6c6a0f2338e20cc
msgid "Use the system-user-assistant format for multi-turn conversations. Function calling prompting is moved to the system message."
msgstr "对于多轮对话,请采用系统-用户-助手格式。函数调用提示已移至系统消息中。"
#: ../../Qwen/source/framework/function_call.md:1070
#: 921d88ca676942ee859d82fd557b4173
msgid "How about adding controls for specialized usage? The template actually has the following variants:"
msgstr "那对于特异性使用添加的控制呢?实际上,该模板有以下变体:"
#: ../../Qwen/source/framework/function_call.md:1072
#: be16d60d0f9c4e389c256c7af46c5c88
msgid "Language: the above is for non-Chinese language; there is another template in Chinese."
msgstr "语言:上述内容适用于非中文;另有一份中文模板。"
#: ../../Qwen/source/framework/function_call.md:1073
#: 6b4267e7d3854c5db6329d7451d4b085
msgid "Parallel Calls: the above is for non-parallel calls; there is another template for parallel calls."
msgstr "并行调用:上述内容适用于非并行调用;另有一份并行调用的模板。"
#: ../../Qwen/source/framework/function_call.md:1075
#: 9b3596c169504fe08ccb3287addfdcab
msgid "In the canonical implementation in Qwen-Agent, those switches are implemented in Python, according to the configuration and current input."
msgstr "在Qwen-Agent的标准实现中,这些开关是根据配置和当前输入,用Python实现的。"
#: ../../Qwen/source/framework/function_call.md:1077
#: 947ef7d8aa5e4025b483d97410764ed7
#, fuzzy
msgid "The actual text with _parallel calls_ should be like the following:"
msgstr "带有并行调用的实际文本应如下所示:"
#: ../../Qwen/source/framework/function_call.md:1123
#: 935ec61058b44a7eb1fdd5a31df47e6e
#, fuzzy
msgid "This template is hard to adapt it for other frameworks that use less capable templating engines. But it is doable at least partially for Jinja, which is Python-oriented after all. We didn't use it because using the template in `transformers` leads to more changes to the inference usage, which are not very common for beginners."
msgstr "[之前](#note-official-template),我们说过,Qwen2的函数调用模板很难为使用功能较弱模板引擎的其他框架进行适应。但至少部分地,对于Jinja(毕竟它是面向Python的)来说是可行的。我们没有使用它,因为在`transformers`中使用模板会导致对推理使用的更多变更,而这对于初学者来说并不常见。"
#: ../../Qwen/source/framework/function_call.md:1127
#: 4e3c19410244459785fa13e410b209cb
msgid "For the interested, you can find the Jinja template and key points on usage below:"
msgstr "对于有兴趣的人,您可以在下方找到Jinja模板及其使用要点:"
#: ../../Qwen/source/framework/function_call.md
#: bf0cdb89d8c84bb180cee54c0f8c9274
msgid "Qwen2 Function Calling Jinja Template"
msgstr "Qwen2 函数调用Jinja模板"
#: ../../Qwen/source/framework/function_call.md:1200
#: d4defd9c03cb4b28aa9018dde74639c5
msgid "To use this template in `transformers`:"
msgstr "要在`transformers`中使用此模板:"
#: ../../Qwen/source/framework/function_call.md:1202
#: e8ca3b34c632423d8d4ae3c7e272b7b4
msgid "Switches can be enabled by passing them to the `apply_chat_template` method, e.g., `tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, parallel_tool_call=True, language=\"zh\", tokenize=False)`. By default, it is for English non-parallel function calling."
msgstr "可以通过将它们传递给`apply_chat_template`方法来启用开关,例如,`tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, parallel_tool_call=True, language=\"zh\", tokenize=False)`。默认情况下,这是用于英语非并行函数调用。"
#: ../../Qwen/source/framework/function_call.md:1204
#: 73a45eb9f800455dbfa23b73c6b8add7
#, fuzzy
msgid "The tool arguments should be a Python `dict` instead of a JSON-formatted object `str`."
msgstr "如果函数参数是有效的JSON格式字符串,则将其解析为字典。"
#: ../../Qwen/source/framework/function_call.md:1206
#: 7d3bbe9beebb475fb1bf567820a2723f
msgid "Since the generation needs to be stopped at `✿RESULT✿` or else the model will generate fabricated tool results, we should add it to `stop_strings` in `generation_config`:"
msgstr "由于生成需要在遇到`✿RESULT✿`时停止,不然模型会继续生成编造的工具结果,我们需要将这些字符串加到`generation_config`中的`stop_strings`字段:"
#: ../../Qwen/source/framework/function_call.md:1211
#: ca9c67ecfacd49918c85b52101ee2ca3
msgid "As a result of using `stop_strings`, you need to pass the tokenizer to `model.generate` as `model.generate(**inputs, tokenizer=tokenizer, max_new_tokens=512)`."
msgstr "由于使用了`stop_strings`,您需要将tokenizer传递给`model.generate`,即`model.generate(**inputs, tokenizer=tokenizer, max_new_tokens=512)`。"
#: ../../Qwen/source/framework/function_call.md:1213
#: f7dd23c1309c4da8bd95fb1a21ace195
msgid "`response`, i.e., the model generation based on the tool calls and tool results, may contain a leading space. You should not strip it for the model. It is resulted from the tokenization and the template design."
msgstr "基于工具调用和工具结果的模型生成,即`response`,可能包含一个前导空格。作为后续消息输入模型时,不要碰这个空格。这是由tokenization和模板设计导致的。"
#: ../../Qwen/source/framework/function_call.md:1215
#: ddc6078fa85040fd96b4c211b4ef8091
msgid "The `try_parse_tool_calls` function should also be modified accordingly."
msgstr "`try_parse_tool_calls`函数也应进行相应的修改。"
#: ../../Qwen/source/framework/function_call.md:1219
#: a959a209b43144a883a32645a1da9a7b
msgid "Qwen2.5 Function Calling Templates"
msgstr "Qwen2.5 函数调用模板\""
#: ../../Qwen/source/framework/function_call.md:1221
#: df3d4e40a6024963999e703789407c4b
msgid "For `transformers` and Ollama, we have also used templates that are easier to implement with Jinja or Go. They are variants of [the Nous Research's Hermes function calling template](https://github.com/NousResearch/Hermes-Function-Calling#prompt-format-for-function-calling). The Jinja template and the Go template should produce basically the same results. They final text should look like the following:"
msgstr "对于`transformers`和Ollama,我们也使用易于Jinja和Go实现的模板,它们是[Nous Research的Hermes函数调用模板](https://github.com/NousResearch/Hermes-Function-Calling#prompt-format-for-function-calling)的变体。Jinja模板和Go模板应基本产生相同的结果。最终文本应如下所示:"
#: ../../Qwen/source/framework/function_call.md:1266
#: 679f928fd3b64de9ae62ffbd36b8d8de
msgid "While the text may seem different from the previous one, the basic prompting structure is still the same. There are just more structural tags and more JSON-formatted strings."
msgstr "虽然文本可能与官方的有所不同,但基本的提示结构仍然相同。只是有更多结构标签和更多JSON格式的字符串。"
#: ../../Qwen/source/framework/function_call.md:1271
#: de22c8cf18f24afcbfc29fb305c8099a
msgid "There is one thing we haven't talked about: how should functions be described to the LLMs. In short, you could describe them as you would normally describe them in an API documentation, as long as you can effectively parse, validate, and execute the tool calls generated by the models. The format with JSON Schema appears a valid and common choice."
msgstr "有一件事我们尚未提及:如何向大型语言模型描述函数。简而言之,你可以像在API文档中通常描述它们那样来描述它们,只要你能有效地解析、验证并执行由模型生成的工具调用。带有JSON Schema的格式似乎是一个有效且常见的选择。"
#: ../../Qwen/source/framework/function_call.md:1276
#: 0bda938218774642b5d0296ecdd6a5bc
msgid "Finally"
msgstr "最后"
#: ../../Qwen/source/framework/function_call.md:1278
#: f981058838464bdaac62e94f18d148bd
msgid "In whichever way you choose to use function calling with Qwen2.5, keep in mind that the limitation and the perks of prompt engineering applies:"
msgstr "无论你选择哪种方式在Qwen2.5中使用函数调用,请记住提示工程的限制和优势适用:"
#: ../../Qwen/source/framework/function_call.md:1279
#: bdd5e0dee036466ba54c614e8bd254ca
msgid "It is not guaranteed that the model generation will always follow the protocol even with proper prompting or templates. Especially, for the templates that are more complex and relies more on the model itself to think and stay on track than the ones that are simpler and relies on the template and the use of control or special tokens. The latter one, of course, requires some kind of training. In production code, be prepared that if it breaks, countermeasures or rectifications are in place."
msgstr "无法保证模型生成将始终遵循协议,即使有适当的提示或模板。特别是对于那些更复杂且更多依赖于模型本身思考和保持方向的模板,而非那些更简单且依赖于模板以及控制或特殊标记使用的模板。当然,后者需要某种训练。在生产代码中,要准备好如果出现问题,采取补救措施或修正措施。"
#: ../../Qwen/source/framework/function_call.md:1283
#: e032b53f33c042ba9e84a5e16b27edeb
msgid "If in certain scenarios, the generation is not up to expectation, you can refine the template to add more instructions or constraints. While the templates mentioned here are general enough, they may not be the best or the most specific or the most concise for your use cases. The ultimate solution is fine-tuning using your own data."
msgstr "如果在某些场景下,生成结果未达到预期,你可以细化模板以添加更多指令或约束。尽管这里提到的模板足够通用,但对于你的具体使用案例,它们可能不是最佳的、最具体的或最简洁的。最终解决方案是使用你自己的数据进行微调。"
#: ../../Qwen/source/framework/function_call.md:1287
#: 091873c9749b4684891f95af4418d831
msgid "Have fun prompting!"
msgstr "享受提示的乐趣吧!"
#: ../../Qwen/source/framework/function_call.md:485
#: e73e525ef897455dbf61663090503acf
msgid "`transformers` will use `transformers.utils.get_json_schema` to generate the tool descriptions from Python functions. There are some gotchas with `get_json_schema`, and it is advised to check [its doc \\[v4.44.2\\]](https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/utils/chat_template_utils.py#L183-L288) before relying on it."
msgstr "`transformers`将使用`transformers.utils.get_json_schema`从Python函数生成工具描述。`get_json_schema`存在一些陷阱,在依赖它之前建议查看[其文档\\[v4.44.2\\]](https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/utils/chat_template_utils.py#L183-L288)。"
#: ../../Qwen/source/framework/function_call.md:488
#: b365673acafc44fdbd6b5335aff712e9
msgid "The function should use Python type hints for parameter types and has a Google-style docstring for function description and parameter descriptions."
msgstr "函数应使用Python类型注释表示参数类型,并具有Google风格的docstring用于函数描述和参数描述。"
#: ../../Qwen/source/framework/function_call.md:489
#: 5088a9323fbc4f0cb076dc1599cdfd7f
msgid "Supported types are limited, since the types needs to be mapped to JSON Schema. In particular, `typing.Literal` is not supported. You can instead add `(choices: ...)` at the end of a parameter description, which will be mapped to a `enum` type in JSON Schema."
msgstr "支持的类型有限,因为这些类型需要映射到JSON Schema。特别是,`typing.Literal`不受支持。你可以在参数描述的末尾添加`(choices: ...)`,这将在JSON Schema中映射为`enum`类型。"
#: ../../Qwen/source/framework/function_call.md:493
#: 799f082def754a39aac54e6201210023
msgid "Please be aware that all the returned results in the examples in the linked docstring are actually the content of the `function` field in the actual returned results."
msgstr "请注意,链接docstring中的所有返回结果示例实际上是实际返回结果中`function`字段的内容。"
#~ msgid "In `transformers`, the tool calls should be a field of assistant messages.[^tool_call_arg_format] Let's use a simple function called `try_parse_tool_calls` to parse the tool calls, which can be found in [the preparation code](#prepcode). This function does not cover all possible scenarios and thus is prone to errors. But it should suffice for the purpose of this guide."
#~ msgstr "在`transformers`中,工具调用应该是助手消息的一个字段[^tool_call_arg_format]。让我们使用一个简单的函数`try_parse_tool_calls`来解析工具调用,该函数可以在[准备代码](#prepcode)中找到。此函数并未涵盖所有可能场景,因此容易出错。但对于本指南的目的而言,它应该足够了。"
#~ msgid "However, note that the model generates arguments in tool calls not as a JSON object but a JSON-formatted string of the JSON object. For `transformers` and `ollama`, as the interfaces require the arguments to be JSON objects or Python dicts, there will be differences between the actual model generation and the template results for tool call arguments."
#~ msgstr "然而,请注意,模型在工具调用中生成的参数不是作为JSON对象,而是该JSON对象的JSON格式字符串。对于`transformers`和`ollama`,由于接口要求参数为JSON对象或Python字典,因此实际模型生成和模板结果之间的工具调用参数格式将存在差异。"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment