@@ -34,10 +34,10 @@ To enable the DBO system pass in the `--enable-dbo` argument to your vllm serve
...
@@ -34,10 +34,10 @@ To enable the DBO system pass in the `--enable-dbo` argument to your vllm serve
*`--dbo-decode-token-threshold` the minimum number of tokens in a decode-only batch required to enable DBO for that batch
*`--dbo-decode-token-threshold` the minimum number of tokens in a decode-only batch required to enable DBO for that batch
*`--dbo-prefill-token-threshold` the minimum number of tokens in a batch containing at least one prefill required to enable DBO for that batch
*`--dbo-prefill-token-threshold` the minimum number of tokens in a batch containing at least one prefill required to enable DBO for that batch
Currently, DBO is only supported with DeepEP, so DeepEP must be installed and the `VLLM_ALL2ALL_BACKEND` environment variable must be set to `deepep_low_latency` if your workload is primarily decode requests, or `deepep_high_throughput` if your workload is primarily prefill requests.
Currently, DBO is only supported with DeepEP, so DeepEP must be installed and the `--all2all-backend` argument must be set to `deepep_low_latency` if your workload is primarily decode requests, or `deepep_high_throughput` if your workload is primarily prefill requests.
Below is a command that will spin up a two DP rank server with expert parallelism and DBO enabled.
Below is a command that will spin up a two DP rank server with expert parallelism and DBO enabled.
@@ -14,13 +14,16 @@ Before using EP, you need to install the necessary dependencies. We are actively
...
@@ -14,13 +14,16 @@ Before using EP, you need to install the necessary dependencies. We are actively
### Backend Selection Guide
### Backend Selection Guide
vLLM provides three communication backends for EP:
vLLM provides multiple communication backends for EP. Use `--all2all-backend` to select one:
| Backend | Use Case | Features | Best For |
| Backend | Use Case | Features | Best For |
|---------|----------|----------|----------|
|---------|----------|----------|----------|
| `pplx` | Single node | Chunked prefill support | Development, best for intra-node deployments |
| `allgather_reducescatter` | Default backend | Standard all2all using allgather/reducescatter primitives | General purpose, works with any EP+DP configuration |