Unverified Commit 053ac33e authored by ishandhanani's avatar ishandhanani Committed by GitHub
Browse files

fix: readme instructions for worker running (#2266)

parent dbb4caaf
...@@ -115,11 +115,11 @@ Dynamo provides a simple way to spin up a local set of inference components incl ...@@ -115,11 +115,11 @@ Dynamo provides a simple way to spin up a local set of inference components incl
``` ```
# Start an OpenAI compatible HTTP server, a pre-processor (prompt templating and tokenization) and a router: # Start an OpenAI compatible HTTP server, a pre-processor (prompt templating and tokenization) and a router:
python -m dynamo.frontend [--http-port 8080] python -m dynamo.frontend --http-port 8080
# Start the SGLang engine, connecting to NATS and etcd to receive requests. You can run several of these, # Start the SGLang engine, connecting to NATS and etcd to receive requests. You can run several of these,
# both for the same model and for multiple models. The frontend node will discover them. # both for the same model and for multiple models. The frontend node will discover them.
python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B python -m dynamo.sglang.worker --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --skip-tokenizer-init
``` ```
#### Send a Request #### Send a Request
......
...@@ -67,8 +67,6 @@ docker run \ ...@@ -67,8 +67,6 @@ docker run \
```bash ```bash
# run ingress # run ingress
python3 -m dynamo.frontend --http-port=8000 & python3 -m dynamo.frontend --http-port=8000 &
# optionally run the http server that allows you to flush the kv cache for all workers (see benchmarking section below)
python3 utils/sgl_http_server.py --ns dynamo &
# run prefill worker # run prefill worker
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=2048 \ SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=2048 \
MC_TE_METRIC=true \ MC_TE_METRIC=true \
...@@ -82,7 +80,7 @@ NCCL_CUMEM_ENABLE=1 \ ...@@ -82,7 +80,7 @@ NCCL_CUMEM_ENABLE=1 \
SGLANG_USE_MESSAGE_QUEUE_BROADCASTER=0 \ SGLANG_USE_MESSAGE_QUEUE_BROADCASTER=0 \
SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK=1 \ SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK=1 \
PYTHONUNBUFFERED=1 \ PYTHONUNBUFFERED=1 \
python3 components/worker.py \ python3 -m dynamo.sglang.worker \
--served-model-name deepseek-ai/DeepSeek-R1 \ --served-model-name deepseek-ai/DeepSeek-R1 \
--model-path /model/ \ --model-path /model/ \
--skip-tokenizer-init \ --skip-tokenizer-init \
...@@ -90,7 +88,6 @@ python3 components/worker.py \ ...@@ -90,7 +88,6 @@ python3 components/worker.py \
--disaggregation-mode prefill \ --disaggregation-mode prefill \
--dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \ --dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--disaggregation-transfer-backend nixl \
--nnodes 2 \ --nnodes 2 \
--node-rank 0 \ --node-rank 0 \
--tp-size 8 \ --tp-size 8 \
...@@ -134,7 +131,7 @@ NCCL_CUMEM_ENABLE=1 \ ...@@ -134,7 +131,7 @@ NCCL_CUMEM_ENABLE=1 \
SGLANG_USE_MESSAGE_QUEUE_BROADCASTER=0 \ SGLANG_USE_MESSAGE_QUEUE_BROADCASTER=0 \
SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK=1 \ SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK=1 \
PYTHONUNBUFFERED=1 \ PYTHONUNBUFFERED=1 \
python3 components/decode_worker.py \ python3 -m dynamo.sglang.decode_worker \
--served-model-name deepseek-ai/DeepSeek-R1 \ --served-model-name deepseek-ai/DeepSeek-R1 \
--model-path /model/ \ --model-path /model/ \
--skip-tokenizer-init \ --skip-tokenizer-init \
......
...@@ -94,7 +94,6 @@ if [ "$mode" = "prefill" ]; then ...@@ -94,7 +94,6 @@ if [ "$mode" = "prefill" ]; then
--disaggregation-mode prefill \ --disaggregation-mode prefill \
--dist-init-addr "$HOST_IP:$PORT" \ --dist-init-addr "$HOST_IP:$PORT" \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--disaggregation-transfer-backend nixl \
--nnodes "$TOTAL_NODES" \ --nnodes "$TOTAL_NODES" \
--node-rank "$RANK" \ --node-rank "$RANK" \
--tp-size "$TOTAL_GPUS" \ --tp-size "$TOTAL_GPUS" \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment