[Doc] Update documentation on Tensorizer (#5471)

6e2527a7 · Sanger Steel · GitHub · cdab68dc · 6e2527a7 · 6e2527a7
Unverified Commit 6e2527a7 authored Jun 14, 2024 by Sanger Steel Committed by GitHub Jun 14, 2024
Showing with 14 additions and 1 deletion

docs/source/index.rst docs/source/index.rst +1 -0

docs/source/serving/tensorizer.rst docs/source/serving/tensorizer.rst +12 -0

vllm/engine/arg_utils.py vllm/engine/arg_utils.py +1 -1

No files found.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -81,6 +81,7 @@ Documentation
   serving/env_vars
   serving/usage_stats
   serving/integrations
+   serving/tensorizer
 .. toctree::
   :maxdepth: 1

--- a/docs/source/serving/tensorizer.rst
+++ b/docs/source/serving/tensorizer.rst
+.. _tensorizer:
+Loading Models with CoreWeave's Tensorizer
+==========================================
+vLLM supports loading models with `CoreWeave's Tensorizer <https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer>`_.
+vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
+at runtime extremely quickly directly to the GPU, resulting in significantly
+shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.
+For more information on CoreWeave's Tensorizer, please refer to
+`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_. For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
+the `vLLM example script <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_.
\ No newline at end of file
--- a/vllm/engine/arg_utils.py
+++ b/vllm/engine/arg_utils.py
@@ -230,7 +230,7 @@ class EngineArgs:
            '* "dummy" will initialize the weights with random values, '
            'which is mainly for profiling.\n'
            '* "tensorizer" will load the weights using tensorizer from '
-            'CoreWeave. See the Tensorize vLLM Model script in the Examples'
+            'CoreWeave. See the Tensorize vLLM Model script in the Examples '
            'section for more information.\n'
            '* "bitsandbytes" will load the weights using bitsandbytes '
            'quantization.\n')