@@ -101,7 +101,7 @@ docker compose -f deploy/docker-compose.yml up -d
...
@@ -101,7 +101,7 @@ docker compose -f deploy/docker-compose.yml up -d
## 2. Select an engine
## 2. Select an engine
We publish Python wheels specialized for each of our supported engines: vllm, sglang, trtllm, and llama.cpp. The examples that follow use SGLang; continue reading for other engines.
We publish Python wheels specialized for each of our supported engines: vllm, sglang, and trtllm. The examples that follow use SGLang; continue reading for other engines.
You can enable [request migration](/docs/architecture/request_migration.md) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](/docs/architecture/request_migration.md) documentation for details on how this works.
help=f"Dynamo endpoint string in 'dyn://namespace.component.endpoint' format. Default: {DEFAULT_ENDPOINT}",
)
parser.add_argument(
"--model-name",
type=str,
default="",
help="Name to serve the model under. Defaults to deriving it from model path.",
)
parser.add_argument(
"--context-length",
type=int,
default=None,
help="Max model context length. Defaults to models max, usually model_max_length from tokenizer_config.json. Reducing this reduces VRAM requirements.",
)
parser.add_argument(
"--migration-limit",
type=int,
default=0,
help="Maximum number of times a request may be migrated to a different engine worker. The number may be overridden by the engine.",
)
args=parser.parse_args()
config=Config()
config.model_path=args.model_path
ifargs.model_name:
config.model_name=args.model_name
else:
# This becomes an `Option` on the Rust side
config.model_name=None
endpoint_str=args.endpoint.replace("dyn://","",1)
endpoint_parts=endpoint_str.split(".")
iflen(endpoint_parts)!=3:
logging.error(
f"Invalid endpoint format: '{args.endpoint}'. Expected 'dyn://namespace.component.endpoint' or 'namespace.component.endpoint'."
`dynamo-run` is a Rust binary that lets you easily run a model, explore the Dynamo components, and demonstrates the Rust API. It supports the `mistral.rs`and `llama.cpp` engines. `mistralrs` is the default engine.
`dynamo-run` is a Rust binary that lets you easily run a model, explore the Dynamo components, and demonstrates the Rust API. It supports the `mistral.rs`engines, as well as testing engines `echo` and `mocker`.
It is primarily for development and rapid prototyping. For production use we recommend the Python wrapped components, see the main project README.
It is primarily for development and rapid prototyping. For production use we recommend the Python wrapped components, see the main project README.
...
@@ -16,7 +16,6 @@ To adjust verbosity, use `-v` to enable debug logging or `-vv` to enable full tr
...
@@ -16,7 +16,6 @@ To adjust verbosity, use `-v` to enable debug logging or `-vv` to enable full tr