Unverified Commit 2becce56 authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

chore: Add SERVED_MODEL_NAME for consistent model name regardless of MODEL_PATH (#1632)

parent 57f5725d
...@@ -99,6 +99,13 @@ export MOUNTS="${PWD}:/mnt" ...@@ -99,6 +99,13 @@ export MOUNTS="${PWD}:/mnt"
# https://huggingface.co/deepseek-ai/DeepSeek-R1 # https://huggingface.co/deepseek-ai/DeepSeek-R1
export MODEL_PATH="nvidia/DeepSeek-R1-FP4" export MODEL_PATH="nvidia/DeepSeek-R1-FP4"
# The name the model will be served/queried under, matching what's
# returned by the /v1/models endpoint.
#
# By default this is inferred from MODEL_PATH, but when using locally downloaded
# model weights, it can be nice to have explicit control over the name.
export SERVED_MODEL_NAME="nvidia/DeepSeek-R1-FP4"
# NOTE: This path assumes you have mounted the config file into /mnt inside # NOTE: This path assumes you have mounted the config file into /mnt inside
# the container. See the MOUNTS variable in srun_script.sh # the container. See the MOUNTS variable in srun_script.sh
export ENGINE_CONFIG="/mnt/agg_DEP16_dsr1.yaml" export ENGINE_CONFIG="/mnt/agg_DEP16_dsr1.yaml"
...@@ -148,7 +155,7 @@ export ENGINE_CONFIG="/mnt/agg_DEP16_dsr1.yaml" ...@@ -148,7 +155,7 @@ export ENGINE_CONFIG="/mnt/agg_DEP16_dsr1.yaml"
4. After the model fully finishes loading on all ranks, the worker will register itself, 4. After the model fully finishes loading on all ranks, the worker will register itself,
and the OpenAI frontend will detect it, signaled by this output: and the OpenAI frontend will detect it, signaled by this output:
``` ```
0: 2025-06-13T02:46:35.040Z INFO dynamo_llm::discovery::watcher: added model model_name="Deepseek-R1-FP4" 0: 2025-06-13T02:46:35.040Z INFO dynamo_llm::discovery::watcher: added model model_name="nvidia/DeepSeek-R1-FP4"
``` ```
5. At this point, with the worker fully initialized and detected by the frontend, 5. At this point, with the worker fully initialized and detected by the frontend,
it is now ready for inference. it is now ready for inference.
...@@ -161,11 +168,11 @@ To verify the deployed model is working, send a `curl` request: ...@@ -161,11 +168,11 @@ To verify the deployed model is working, send a `curl` request:
# NOTE: $HOST assumes running on head node, but can be changed to $HEAD_NODE_IP instead. # NOTE: $HOST assumes running on head node, but can be changed to $HEAD_NODE_IP instead.
HOST=localhost HOST=localhost
PORT=8000 PORT=8000
MODEL=Deepseek-R1-FP4 # "model" here should match the model name returned by the /v1/models endpoint
curl -w "%{http_code}" ${HOST}:${PORT}/v1/chat/completions \ curl -w "%{http_code}" ${HOST}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
"model": "'${MODEL}'", "model": "'${SERVED_MODEL_NAME}'",
"messages": [ "messages": [
{ {
"role": "user", "role": "user",
......
...@@ -10,6 +10,12 @@ if [[ -z ${MODEL_PATH} ]]; then ...@@ -10,6 +10,12 @@ if [[ -z ${MODEL_PATH} ]]; then
exit 1 exit 1
fi fi
if [[ -z ${SERVED_MODEL_NAME} ]]; then
echo "WARNING: SERVED_MODEL_NAME was not set. It will be derived from MODEL_PATH."
fi
if [[ -z ${ENGINE_CONFIG} ]]; then if [[ -z ${ENGINE_CONFIG} ]]; then
echo "ERROR: ENGINE_CONFIG was not set." echo "ERROR: ENGINE_CONFIG was not set."
echo "ERROR: ENGINE_CONFIG must be set to a valid Dynamo+TRTLLM engine config file." echo "ERROR: ENGINE_CONFIG must be set to a valid Dynamo+TRTLLM engine config file."
...@@ -23,4 +29,5 @@ fi ...@@ -23,4 +29,5 @@ fi
trtllm-llmapi-launch \ trtllm-llmapi-launch \
python3 /workspace/launch/dynamo-run/src/subprocess/trtllm_inc.py \ python3 /workspace/launch/dynamo-run/src/subprocess/trtllm_inc.py \
--model-path "${MODEL_PATH}" \ --model-path "${MODEL_PATH}" \
--model-name "${SERVED_MODEL_NAME}" \
--extra-engine-args "${ENGINE_CONFIG}" --extra-engine-args "${ENGINE_CONFIG}"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment