[Bugfix] Update `--hf-overrides` for `Alibaba-NLP/gte-Qwen2` (#14609)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Bugfix] Update `--hf-overrides` for `Alibaba-NLP/gte-Qwen2` (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
af295e9b · Cyrus Leung · GitHub · a1c8f379 · af295e9b · af295e9b
Unverified Commit af295e9b authored Mar 11, 2025 by Cyrus Leung Committed by GitHub Mar 11, 2025
Showing with 6 additions and 9 deletions

docs/source/models/supported_models.md docs/source/models/supported_models.md +4 -7

tests/models/embedding/language/test_embedding.py tests/models/embedding/language/test_embedding.py +2 -2

No files found.
--- a/docs/source/models/supported_models.md
+++ b/docs/source/models/supported_models.md
@@ -541,14 +541,11 @@ You should manually set mean pooling by passing `--override-pooler-config '{"poo
 :::
 :::{note}
-Unlike base Qwen2, `Alibaba-NLP/gte-Qwen2-7B-instruct` uses bi-directional attention.
+The HF implementation of `Alibaba-NLP/gte-Qwen2-1.5B-instruct` is hardcoded to use causal attention despite what is shown in `config.json`. To compare vLLM vs HF results,
-You can set `--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly.
+you should set `--hf-overrides '{"is_causal": true}'` in vLLM so that the two implementations are consistent with each other.
-On the other hand, its 1.5B variant (`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention
+For both the 1.5B and 7B variants, you also need to enable `--trust-remote-code` for the correct tokenizer to be loaded.
-despite being described otherwise on its model card.
+See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
-Regardless of the variant, you need to enable `--trust-remote-code` for the correct tokenizer to be
-loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
 :::
 If your model is not in the above list, we will try to automatically convert the model using

--- a/tests/models/embedding/language/test_embedding.py
+++ b/tests/models/embedding/language/test_embedding.py
@@ -42,8 +42,8 @@ def test_models(
    if model == "ssmits/Qwen2-7B-Instruct-embed-base":
        vllm_extra_kwargs["override_pooler_config"] = \
            PoolerConfig(pooling_type="MEAN")
-    if model == "Alibaba-NLP/gte-Qwen2-7B-instruct":
+    if model == "Alibaba-NLP/gte-Qwen2-1.5B-instruct":
-        vllm_extra_kwargs["hf_overrides"] = {"is_causal": False}
+        vllm_extra_kwargs["hf_overrides"] = {"is_causal": True}
    # The example_prompts has ending "\n", for example:
    # "Write a short story about a robot that dreams for the first time.\n"