Commit 9d6c8867 authored by chenych's avatar chenych
Browse files

Add vllm serve

parent 8011b5aa
......@@ -61,6 +61,7 @@ pip install transformers>=4.51.0
## 推理
### vllm推理方法
#### offline
```bash
## 必须添加HF_ENDPOINT环境变量
export HF_ENDPOINT=https://hf-mirror.com
......@@ -68,6 +69,37 @@ export HF_ENDPOINT=https://hf-mirror.com
python infer_vllm.py --model_name_or_path /path/your_model_path/
```
#### server
官方对于qwen3发布的rerank模型增加了`hf-overrides`,详细原因见[[New Model]: Support Qwen3 Embedding & Reranker by noooop · Pull Request #19260 · vllm-project/vllm](https://github.com/vllm-project/vllm/pull/19260)
1. 启动命令如下
```bash
vllm serve Qwen/Qwen3-Reranker-0.6B \
--host 0.0.0.0 --port 8080 --block-size 16 \
--api-key 123456 --dtype auto \
--trust-remote-code \
--served-model-name Qwen3-reranker \
--enable-prefix-caching \
--gpu-memory-utilization 0.9 \
--task score --disable-log-requests \
--hf_overrides '{"architectures":["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'
```
2. 访问命令如下
```
curl -X 'POST' 'http://127.0.0.1:8080/score' \
-H 'accept: application/json' \
-H 'Authorization: Bearer 123456' \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen3-reranker",
"encoding_format": "float",
"text_1": "What is the capital of France?",
"text_2": "The capital of France is Paris."
}'
```
## result
<div align=center>
<img src="./doc/results-dcu.png"/>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment