***Notice: We will package all deployment files into Helm Chart format in the near future. Interested community members can contact us to contribute***
At this point, we have completed the deployment of the 1P1D SGlang engine part.
To allow our users to directly experience the model API, we still need a load balancer to handle sequential calls between prefill and decode. Different companies implement LBs differently, and the community will also officially release a new LB component written in Rust in the near future.
Currently, we use a static K8S service + minilb approach to implement model API calls.
1. The current deployment startup parameters may not be fully compatible with all RDMA scenarios. Different RDMA NCCL-related environment configurations may be needed in different network environments.
2. Some preset, optimized configurations for EPLB are not used here. You can adjust them according to [6017](https://github.com/sgl-project/sglang/issues/6017) as needed.