README.md 3.33 KB
Newer Older
chenych's avatar
chenych committed
1
2
3
# Qwen3-Reranker
## 论文
[Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models](https://arxiv.org/abs/2506.05176)
chenych's avatar
chenych committed
4

chenych's avatar
chenych committed
5
6
## 模型简介
Qwen3嵌入模型系列是Qwen3家族最新的专有模型,专门为文本嵌入和排序任务而设计此系列。继承了其基础模型出色的多语言能力、长文本理解和推理技能。Qwen3 嵌入系列在文本检索、代码检索、文本分类、文本聚类和双语文本挖掘等多种文本嵌入和排序任务中取得了显著进展。
chenych's avatar
chenych committed
7
8
9
10
<div align=center>
    <img src="./doc/methods.png"/>
</div>

chenych's avatar
chenych committed
11
12
13
14
15
16
17
18
## 环境依赖
| 软件 | 版本 |
| :------: | :------: |
| DTK | 26.04 |
| Python | 3.10.12 |
| Transformers | 4.57.6 |
| Torch | 2.5.1+das.opt1.dtk2604.20260206.ga29664ea |
| vLLM | 0.11.0+das.opt1.rc4.dtk2604.20260305.g49a30c70 |
chenych's avatar
chenych committed
19

chenych's avatar
chenych committed
20
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-py3.10
chenych's avatar
chenych committed
21
22

```bash
chenych's avatar
chenych committed
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
docker run -it \
    --shm-size 256g \
    --network=host \
    --name qwen3-reranker \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --device=/dev/mkfd \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -u root \
    -v /opt/hyhal/:/opt/hyhal/:ro \
    -v /path/your_code_data/:/path/your_code_data/ \
    harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk26.04-py3.10 bash
chenych's avatar
chenych committed
38
39
```

chenych's avatar
chenych committed
40
41
## 预训练权重
**请根据`支持的DCU型号`选择对应模型下载,FP8模型仅在BW1100/BW1101上支持,其他型号请勿使用!**
chenych's avatar
chenych committed
42

chenych's avatar
chenych committed
43
44
45
46
47
| 模型名称  | 权重大小  | 数据类型 | 支持的DCU型号  | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:----------:|:---------------------:|:----------:|
| Qwen3-Reranker-0.6B | 0.6B | BF16 | K100AI | 1 | [HuggingFace](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) |
| Qwen3-Reranker-4B | 4B | BF16 | K100AI | 1 | [HuggingFace](https://huggingface.co/Qwen/Qwen3-Reranker-4B) |
| Qwen3-Reranker-8B | 8B | BF16 | K100AI | 1 | [HuggingFace](https://huggingface.co/Qwen/Qwen3-Reranker-8B) |
chenych's avatar
chenych committed
48
49

## 数据集
chenych's avatar
chenych committed
50
`暂无`
chenych's avatar
chenych committed
51
52

## 训练
chenych's avatar
chenych committed
53
`暂无`
chenych's avatar
chenych committed
54
55

## 推理
chenych's avatar
chenych committed
56
57
58
### vLLM
#### 单机推理
##### offline
chenych's avatar
chenych committed
59
```bash
chenych's avatar
chenych committed
60
61
export VLLM_USE_NN=0
export ALLREDUCE_STREAM_WITH_COMPUTE=1
chenych's avatar
chenych committed
62
63
64
65
## model_name_or_path 模型地址参数
python infer_vllm.py --model_name_or_path /path/your_model_path/
```

chenych's avatar
chenych committed
66
67
##### serve
1. 启动服务
chenych's avatar
chenych committed
68
```bash
chenych's avatar
chenych committed
69
70
71
export VLLM_USE_NN=0
export ALLREDUCE_STREAM_WITH_COMPUTE=1

chenych's avatar
chenych committed
72
73
74
75
76
77
78
79
80
81
vllm serve Qwen/Qwen3-Reranker-0.6B \
    --max-model-len 4096 \
    --block-size 16 \
    --trust-remote-code \
    --enforce-eager \
    --enable-prefix-caching \
    --served-model-name Qwen3-reranker \
    --task score \
    --disable-log-requests \
    --hf_overrides '{"architectures":["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'
chenych's avatar
chenych committed
82
83
```

chenych's avatar
chenych committed
84
2. 测试命令:
chenych's avatar
chenych committed
85
```bash
chenych's avatar
chenych committed
86
87
88
89
90
91
92
93
curl http://127.0.0.1:8000/score   \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "text_1": "ping",
        "text_2": "pong",
        "model": "Qwen3-reranker"
    }'
chenych's avatar
chenych committed
94
95
```

chenych's avatar
chenych committed
96
## 效果展示
chenych's avatar
chenych committed
97
<div align=center>
chenych's avatar
chenych committed
98
    <img src="./doc/results-dcu.png"/>
chenych's avatar
chenych committed
99
100
101
</div>

### 精度
chenych's avatar
chenych committed
102
`DCU与GPU精度一致,推理框架:vllm。`
chenych's avatar
chenych committed
103
104

## 源码仓库及问题反馈
chenych's avatar
chenych committed
105
- https://developer.sourcefind.cn/codes/modelzoo/qwen3-reranker
chenych's avatar
chenych committed
106
107

## 参考资料
chenych's avatar
chenych committed
108
- http://github.com/QwenLM/Qwen3-Embedding