Unverified Commit af295e9b authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Bugfix] Update `--hf-overrides` for `Alibaba-NLP/gte-Qwen2` (#14609)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent a1c8f379
...@@ -541,14 +541,11 @@ You should manually set mean pooling by passing `--override-pooler-config '{"poo ...@@ -541,14 +541,11 @@ You should manually set mean pooling by passing `--override-pooler-config '{"poo
::: :::
:::{note} :::{note}
Unlike base Qwen2, `Alibaba-NLP/gte-Qwen2-7B-instruct` uses bi-directional attention. The HF implementation of `Alibaba-NLP/gte-Qwen2-1.5B-instruct` is hardcoded to use causal attention despite what is shown in `config.json`. To compare vLLM vs HF results,
You can set `--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly. you should set `--hf-overrides '{"is_causal": true}'` in vLLM so that the two implementations are consistent with each other.
On the other hand, its 1.5B variant (`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention For both the 1.5B and 7B variants, you also need to enable `--trust-remote-code` for the correct tokenizer to be loaded.
despite being described otherwise on its model card. See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
Regardless of the variant, you need to enable `--trust-remote-code` for the correct tokenizer to be
loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
::: :::
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using
......
...@@ -42,8 +42,8 @@ def test_models( ...@@ -42,8 +42,8 @@ def test_models(
if model == "ssmits/Qwen2-7B-Instruct-embed-base": if model == "ssmits/Qwen2-7B-Instruct-embed-base":
vllm_extra_kwargs["override_pooler_config"] = \ vllm_extra_kwargs["override_pooler_config"] = \
PoolerConfig(pooling_type="MEAN") PoolerConfig(pooling_type="MEAN")
if model == "Alibaba-NLP/gte-Qwen2-7B-instruct": if model == "Alibaba-NLP/gte-Qwen2-1.5B-instruct":
vllm_extra_kwargs["hf_overrides"] = {"is_causal": False} vllm_extra_kwargs["hf_overrides"] = {"is_causal": True}
# The example_prompts has ending "\n", for example: # The example_prompts has ending "\n", for example:
# "Write a short story about a robot that dreams for the first time.\n" # "Write a short story about a robot that dreams for the first time.\n"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment