[Docs] Update documentation for gpu-memory-utilization option (#2162)

3ec8c25c · Suhong Moon · GitHub · 671af2b1 · 3ec8c25c
Unverified Commit 3ec8c25c authored Dec 17, 2023 by Suhong Moon Committed by GitHub Dec 17, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

docs/source/models/engine_args.rst docs/source/models/engine_args.rst +4 -2

No files found.
--- a/docs/source/models/engine_args.rst
+++ b/docs/source/models/engine_args.rst
@@ -89,9 +89,11 @@ Below, you can find an explanation of every engine argument for vLLM:
    CPU swap space size (GiB) per GPU.
-.. option:: --gpu-memory-utilization <percentage>
+.. option:: --gpu-memory-utilization <fraction>
-    The percentage of GPU memory to be used for the model executor.
+    The fraction of GPU memory to be used for the model executor, which can range from 0 to 1. 
+    For example, a value of 0.5 would imply 50% GPU memory utilization.
+    If unspecified, will use the default value of 0.9.
 .. option:: --max-num-batched-tokens <tokens>