[k8s] Clarified the usage of shared memory. (#4341)

f60f2931 · Jiří Suchomel · GitHub · 17000d2b · f60f2931 · f60f2931
Unverified Commit f60f2931 authored Mar 27, 2025 by Jiří Suchomel Committed by GitHub Mar 27, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

docker/k8s-sglang-service.yaml docker/k8s-sglang-service.yaml +6 -0

docs/backend/server_arguments.md docs/backend/server_arguments.md +1 -0

No files found.
--- a/docker/k8s-sglang-service.yaml
+++ b/docker/k8s-sglang-service.yaml
@@ -39,6 +39,8 @@ spec:
            limits:
              nvidia.com/gpu: 1
          volumeMounts:
+            - name: shm
+              mountPath: /dev/shm
            - name: hf-cache
              mountPath: /root/.cache/huggingface
              readOnly: true
@@ -52,6 +54,10 @@ spec:
            initialDelaySeconds: 30
            periodSeconds: 10
      volumes:
+        - name: shm
+          emptyDir:
+            medium: Memory
+            sizeLimit: 10Gi
        - name: hf-cache
          hostPath:
            path: /root/.cache/huggingface

--- a/docs/backend/server_arguments.md
+++ b/docs/backend/server_arguments.md
@@ -21,6 +21,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 ```

 - See [hyperparameter tuning](hyperparameter_tuning.md) on tuning hyperparameters for better performance.
+- For docker and Kubernetes runs, you need to set up shared memory which is used for communication between processes. See `--shm-size` for docker and `/dev/shm` size update for Kubernetes manifests.
 - If you see out-of-memory errors during prefill for long prompts, try to set a smaller chunked prefill size.

 ```bash