[Doc] Link to RFC for pooling optimizations (#21806)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Doc] Link to RFC for pooling optimizations (#21806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
a2480251 · Cyrus Leung · GitHub · 7234fe26 · a2480251
Unverified Commit a2480251 authored Jul 29, 2025 by Cyrus Leung Committed by GitHub Jul 28, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 3 deletions

docs/models/pooling_models.md docs/models/pooling_models.md +3 -3

No files found.
--- a/docs/models/pooling_models.md
+++ b/docs/models/pooling_models.md
@@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
 before returning them.
 !!! note
-    We currently support pooling models primarily as a matter of convenience.
+    We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
-    As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
-    pooling models as they only work on the generation or decode stage, so performance may not improve as much.
+    We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
 ## Configuration