# 10MB which allows for larger context size : The default NATS max payload size is 1MB, and 256K tokens (with tokens being int32 - 4 bytes each) tips over that 1MB max.
| [**Prompt Embeddings**](./prompt-embeddings.md) | ✅ | Requires `--enable-prompt-embeds` flag |
### Large Scale P/D and WideEP Features
### Large Scale P/D and WideEP Features
...
@@ -152,6 +153,10 @@ vLLM workers are configured through command-line arguments. Key parameters inclu
...
@@ -152,6 +153,10 @@ vLLM workers are configured through command-line arguments. Key parameters inclu
-`--is-prefill-worker`: Enable prefill-only mode for disaggregated serving
-`--is-prefill-worker`: Enable prefill-only mode for disaggregated serving
-`--metrics-endpoint-port`: Port for publishing KV metrics to Dynamo
-`--metrics-endpoint-port`: Port for publishing KV metrics to Dynamo
-`--connector`: Specify which kv_transfer_config you want vllm to use `[nixl, lmcache, kvbm, none]`. This is a helper flag which overwrites the engines KVTransferConfig.
-`--connector`: Specify which kv_transfer_config you want vllm to use `[nixl, lmcache, kvbm, none]`. This is a helper flag which overwrites the engines KVTransferConfig.
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
# Prompt Embeddings
Dynamo supports prompt embeddings (also known as prompt embeds) as a secure alternative input method to traditional text prompts. By allowing applications to use pre-computed embeddings for inference, this feature not only offers greater flexibility in prompt engineering but also significantly enhances privacy and data security. With prompt embeddings, sensitive user data can be transformed into embeddings before ever reaching the inference server, reducing the risk of exposing confidential information during the AI workflow.
/// Note that <|endoftext|> is the document separator that the model sees during training, so if a prompt is not specified the model will generate as if from the beginning of a new document.
/// Note that <|endoftext|> is the document separator that the model sees during training, so if a prompt is not specified the model will generate as if from the beginning of a new document.