Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
norm
vllm
Commits
3ec8c25c
Unverified
Commit
3ec8c25c
authored
Dec 17, 2023
by
Suhong Moon
Committed by
GitHub
Dec 17, 2023
Browse files
[Docs] Update documentation for gpu-memory-utilization option (#2162)
parent
671af2b1
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
2 deletions
+4
-2
docs/source/models/engine_args.rst
docs/source/models/engine_args.rst
+4
-2
No files found.
docs/source/models/engine_args.rst
View file @
3ec8c25c
...
@@ -89,9 +89,11 @@ Below, you can find an explanation of every engine argument for vLLM:
...
@@ -89,9 +89,11 @@ Below, you can find an explanation of every engine argument for vLLM:
CPU swap space size (GiB) per GPU.
CPU swap space size (GiB) per GPU.
.. option:: --gpu-memory-utilization <
percentage
>
.. option:: --gpu-memory-utilization <
fraction
>
The percentage of GPU memory to be used for the model executor.
The fraction of GPU memory to be used for the model executor, which can range from 0 to 1.
For example, a value of 0.5 would imply 50% GPU memory utilization.
If unspecified, will use the default value of 0.9.
.. option:: --max-num-batched-tokens <tokens>
.. option:: --max-num-batched-tokens <tokens>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment