Merge tag 'v0.18.1rc0' into v0.18.1rc0-ori

0da93439 · zhuwenwen · 25f2f756 · 298e5108 · 0da93439 · 0da93439
Commit 0da93439 authored Mar 26, 2026 by zhuwenwen
20 changed files
--- a/docs/models/pooling_models/token_classify.md
+++ b/docs/models/pooling_models/token_classify.md
--- a/docs/models/pooling_models/token_embed.md
+++ b/docs/models/pooling_models/token_embed.md
--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
--- a/docs/serving/expert_parallel_deployment.md
+++ b/docs/serving/expert_parallel_deployment.md
@@ -23,7 +23,6 @@ vLLM provides multiple communication backends for EP. Use `--all2all-backend` to
 | `deepep_low_latency` | Multi-node decode | CUDA graph support, masked layout, optimized for decode | Decode-dominated workloads, low-latency scenarios |
 | `flashinfer_nvlink_one_sided` | MNNVL systems | FlashInfer's one-sided A2A strategy for multi-node NVLink | High-throughput workloads |
 | `flashinfer_nvlink_two_sided` | MNNVL systems | FlashInfer's two-sided A2A strategy for multi-node NVLink | Systems with NVLink across nodes |
-| `naive` | Testing/debugging | Simple broadcast-based implementation | Debugging, not recommended for production |

 ## Single Node Deployment


--- a/docs/serving/offline_inference.md
+++ b/docs/serving/offline_inference.md
@@ -16,7 +16,7 @@ After initializing the `LLM` instance, use the available APIs to perform model i
 The available APIs depend on the model type:

 - [Generative models](../models/generative_models.md) output logprobs which are sampled from to obtain the final output text.
- [Pooling models](../models/pooling_models.md) output their hidden states directly.
+- [Pooling models](../models/pooling_models/README.md) output their hidden states directly.

 !!! info
    [API Reference](../api/README.md#offline-inference)

--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
--- a/docs/training/async_rl.md
+++ b/docs/training/async_rl.md
--- a/docs/training/rlhf.md
+++ b/docs/training/rlhf.md
--- a/docs/training/weight_transfer/README.md
+++ b/docs/training/weight_transfer/README.md
--- a/docs/training/weight_transfer/base.md
+++ b/docs/training/weight_transfer/base.md
--- a/docs/training/weight_transfer/ipc.md
+++ b/docs/training/weight_transfer/ipc.md
--- a/docs/training/weight_transfer/nccl.md
+++ b/docs/training/weight_transfer/nccl.md
--- a/examples/offline_inference/audio_language.py
+++ b/examples/offline_inference/audio_language.py
--- a/examples/offline_inference/rlhf.py
+++ b/examples/offline_inference/rlhf.py
--- a/examples/offline_inference/rlhf_colocate.py
+++ b/examples/offline_inference/rlhf_colocate.py
--- a/examples/offline_inference/rlhf_online_quant.py
+++ b/examples/offline_inference/rlhf_online_quant.py
--- a/examples/offline_inference/rlhf_utils.py
+++ b/examples/offline_inference/rlhf_utils.py
--- a/examples/online_serving/openai_chat_completion_client_for_multimodal.py
+++ b/examples/online_serving/openai_chat_completion_client_for_multimodal.py
--- a/examples/online_serving/openai_realtime_client.py
+++ b/examples/online_serving/openai_realtime_client.py
--- a/examples/online_serving/openai_realtime_microphone_client.py
+++ b/examples/online_serving/openai_realtime_microphone_client.py