# Why do we need hf_overrides for the official original version:
# vllm converts it to Qwen3ForSequenceClassification when loaded for
# better performance.
# - Firstly, we need using `"architectures": ["Qwen3ForSequenceClassification"],`
# to manually route to Qwen3ForSequenceClassification.
# - Then, we will extract the vector corresponding to classifier_from_token
# from lm_head using `"classifier_from_token": ["no", "yes"]`.
# - Third, we will convert these two vectors into one vector. The use of
# conversion logic is controlled by `using "is_original_qwen3_reranker": True`.
# Please use the query_template and document_template to format the query and
# document for better reranker results.
prefix='<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
"Given a web search query, retrieve relevant passages that answer the query"
)
queries=[
"What is the capital of China?",
"Explain gravity",
]
documents=[
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
# Corresponding documents to be scored against each query
documents=[
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
# Initialize the LLM model with the original Qwen3-Reranker configuration
llm=get_llm()
# Compute relevance scores for each query-document pair
# The score() method returns a relevance score for each pair
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
# Request payload for the score API
data={
"model":"Qwen/Qwen3-Reranker-0.6B",
"text_1":queries,
"text_2":documents,
}
defmain():
"""Main function to send a score request to the vLLM server.
This function sends a POST request to the /score endpoint with
the query and documents, then prints the relevance scores.
"""
# Send POST request to the vLLM server's score endpoint
You are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).