@@ -411,6 +411,22 @@ To pass extra arguments to the vllm engine see [Extra engine arguments](#extra-e
...
@@ -411,6 +411,22 @@ To pass extra arguments to the vllm engine see [Extra engine arguments](#extra-e
vllm attempts to allocate enough KV cache for the full context length at startup. If that does not fit in your available memory pass `--context-length <value>`.
vllm attempts to allocate enough KV cache for the full context length at startup. If that does not fit in your available memory pass `--context-length <value>`.