Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
b1cb1d3d
Unverified
Commit
b1cb1d3d
authored
Mar 26, 2026
by
Mateusz Sokół
Committed by
GitHub
Mar 26, 2026
Browse files
DOC: Documentation pages fixes (#38125)
Signed-off-by:
Mateusz Sokół
<
mat646@gmail.com
>
parent
6ae8bbd0
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
31 additions
and
26 deletions
+31
-26
vllm/config/load.py
vllm/config/load.py
+31
-26
No files found.
vllm/config/load.py
View file @
b1cb1d3d
...
...
@@ -24,38 +24,43 @@ class LoadConfig:
"""Configuration for loading the model weights."""
load_format
:
str
|
LoadFormats
=
"auto"
"""The format of the model weights to load:
\n
"""
The format of the model weights to load.
- "auto" will try to load the weights in the safetensors format and fall
back to the pytorch bin format if safetensors format is not available.
\n
- "pt" will load the weights in the pytorch bin format.
\n
- "safetensors" will load the weights in the safetensors format.
\n
back to the pytorch bin format if safetensors format is not available.
- "pt" will load the weights in the pytorch bin format.
- "safetensors" will load the weights in the safetensors format.
- "instanttensor" will load the Safetensors weights on CUDA devices using
InstantTensor, which enables distributed loading with pipelined prefetching
and fast direct I/O.
\n
and fast direct I/O.
- "npcache" will load the weights in pytorch format and store a numpy cache
to speed up the loading.
\n
to speed up the loading.
- "dummy" will initialize the weights with random values, which is mainly
for profiling.
\n
for profiling.
- "tensorizer" will use CoreWeave's tensorizer library for fast weight
loading. See the Tensorize vLLM Model script in the Examples section for
more information.
\n
more information.
- "runai_streamer" will load the Safetensors weights using Run:ai Model
Streamer.
\n
Streamer.
- "runai_streamer_sharded" will load weights from pre-sharded checkpoint
files using Run:ai Model Streamer.
\n
- "bitsandbytes" will load the weights using bitsandbytes quantization.
\n
files using Run:ai Model Streamer.
- "bitsandbytes" will load the weights using bitsandbytes quantization.
- "sharded_state" will load weights from pre-sharded checkpoint files,
supporting efficient loading of tensor-parallel models.
\n
supporting efficient loading of tensor-parallel models.
- "gguf" will load weights from GGUF format files (details specified in
https://github.com/ggml-org/ggml/blob/master/docs/gguf.md).
\n
https://github.com/ggml-org/ggml/blob/master/docs/gguf.md).
- "mistral" will load weights from consolidated safetensors files used by
Mistral models.
\n
- Other custom values can be supported via plugins."""
- Other custom values can be supported via plugins.
"""
download_dir
:
str
|
None
=
None
"""Directory to download and load the weights, default to the default
cache directory of Hugging Face."""
safetensors_load_strategy
:
str
|
None
=
None
"""Specifies the loading strategy for safetensors weights.
"""
Specifies the loading strategy for safetensors weights.
- None (default): Uses memory-mapped (lazy) loading. When an NFS
filesystem is detected and the total checkpoint size fits within 90%%
of available RAM, prefetching is enabled automatically.
...
...
@@ -72,7 +77,7 @@ class LoadConfig:
- "torchao": Weights are loaded in upfront and then reconstructed
into torchao tensor subclasses. This is used when the checkpoint
was quantized using torchao and saved using safetensors.
Needs torchao >= 0.14.0
Needs
`
torchao >= 0.14.0
`.
"""
model_loader_extra_config
:
dict
|
TensorizerConfig
=
Field
(
default_factory
=
dict
)
"""Extra config for model loader. This will be passed to the model loader
...
...
@@ -88,13 +93,13 @@ class LoadConfig:
weights."""
pt_load_map_location
:
str
|
dict
[
str
,
str
]
=
"cpu"
"""
pt_load_map_location: t
he map location for loading pytorch checkpoint, to
support loading
checkpoints can only be loaded on certain devices like
"cuda", this
is equivalent to {"": "cuda"}. Another supported format is
mapping
from different devices like from GPU 1 to GPU 0:
{"cuda:1": "cuda:0"}.
Note that when passed from command line, the strings
in dictionary
need
s
to be double quoted for json parsing. For more details,
se
e original doc for `map_location`
in https://pytorch.org/docs/stable/generated/torch.load.html
T
he map location for loading pytorch checkpoint, to
support loading
checkpoints can only be loaded on certain devices like
"cuda", this
is equivalent to
`
{"": "cuda"}
`
. Another supported format is
mapping
from different devices like from GPU 1 to GPU 0:
`{"cuda:1": "cuda:0"}`.
Note that when passed from command line, the strings
in dictionary
need to be double quoted for json parsing. For more details,
see
th
e original doc for `map_location`
parameter in [`torch.load`][] parameter.
"""
def
compute_hash
(
self
)
->
str
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment