<imgsrc="../assets/img/disagg-same-node.svg"alt="Same-node RDMA communication between prefill and decode pods"/>
</Frame>
**Options (best to worst):**
1. InfiniBand RDMA with GPUDirect → GPU-to-GPU, bypasses CPU
2. RoCE RDMA with GPUDirect → GPU-to-GPU, bypasses CPU
3. Host-staged RDMA → GPU→CPU→RDMA→CPU→GPU
4. TCP (fallback) → GPU→CPU→TCP→CPU→GPU
**Best Practice**: Use RDMA even for same-node communication. The overhead is minimal and it provides consistent behavior whether pods land on the same or different nodes.
...
...
@@ -177,24 +110,9 @@ When prefill and decode workers are on the **same physical node**:
When prefill and decode workers are on **different nodes**: