"benchmarks/vscode:/vscode.git/clone" did not exist on "b6ec077e058e15e5b853793924e6643ec6c579aa"
Unverified Commit d6484ef3 authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

Add full API docs and improve the UX of navigating them (#17485)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent 46fae69c
...@@ -26,7 +26,7 @@ logger = init_logger(__name__) ...@@ -26,7 +26,7 @@ logger = init_logger(__name__)
class XPUWorker(LoRANotSupportedWorkerBase, Worker): class XPUWorker(LoRANotSupportedWorkerBase, Worker):
"""A worker class that executes (a partition of) the model on a GPU. """A worker class that executes (a partition of) the model on a GPU.
Each worker is associated with a single XPU device. The worker is Each worker is associated with a single XPU device. The worker is
responsible for maintaining the KV cache and executing the model on the responsible for maintaining the KV cache and executing the model on the
XPU. In case of distributed inference, each worker is assigned a partition XPU. In case of distributed inference, each worker is assigned a partition
...@@ -93,9 +93,10 @@ class XPUWorker(LoRANotSupportedWorkerBase, Worker): ...@@ -93,9 +93,10 @@ class XPUWorker(LoRANotSupportedWorkerBase, Worker):
Then, it calculate the maximum possible number of GPU and CPU blocks Then, it calculate the maximum possible number of GPU and CPU blocks
that can be allocated with the remaining free memory. that can be allocated with the remaining free memory.
.. tip:: :::{tip}
You may limit the usage of GPU memory You may limit the usage of GPU memory
by adjusting the `gpu_memory_utilization` parameter. by adjusting the `gpu_memory_utilization` parameter.
:::
""" """
# Profile the memory usage of the model and get the maximum number of # Profile the memory usage of the model and get the maximum number of
# cache blocks that can be allocated with the remaining free memory. # cache blocks that can be allocated with the remaining free memory.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment