Unverified Commit b1cb1d3d authored by Mateusz Sokół's avatar Mateusz Sokół Committed by GitHub
Browse files

DOC: Documentation pages fixes (#38125)


Signed-off-by: default avatarMateusz Sokół <mat646@gmail.com>
parent 6ae8bbd0
...@@ -24,38 +24,43 @@ class LoadConfig: ...@@ -24,38 +24,43 @@ class LoadConfig:
"""Configuration for loading the model weights.""" """Configuration for loading the model weights."""
load_format: str | LoadFormats = "auto" load_format: str | LoadFormats = "auto"
"""The format of the model weights to load:\n """
The format of the model weights to load.
- "auto" will try to load the weights in the safetensors format and fall - "auto" will try to load the weights in the safetensors format and fall
back to the pytorch bin format if safetensors format is not available.\n back to the pytorch bin format if safetensors format is not available.
- "pt" will load the weights in the pytorch bin format.\n - "pt" will load the weights in the pytorch bin format.
- "safetensors" will load the weights in the safetensors format.\n - "safetensors" will load the weights in the safetensors format.
- "instanttensor" will load the Safetensors weights on CUDA devices using - "instanttensor" will load the Safetensors weights on CUDA devices using
InstantTensor, which enables distributed loading with pipelined prefetching InstantTensor, which enables distributed loading with pipelined prefetching
and fast direct I/O.\n and fast direct I/O.
- "npcache" will load the weights in pytorch format and store a numpy cache - "npcache" will load the weights in pytorch format and store a numpy cache
to speed up the loading.\n to speed up the loading.
- "dummy" will initialize the weights with random values, which is mainly - "dummy" will initialize the weights with random values, which is mainly
for profiling.\n for profiling.
- "tensorizer" will use CoreWeave's tensorizer library for fast weight - "tensorizer" will use CoreWeave's tensorizer library for fast weight
loading. See the Tensorize vLLM Model script in the Examples section for loading. See the Tensorize vLLM Model script in the Examples section for
more information.\n more information.
- "runai_streamer" will load the Safetensors weights using Run:ai Model - "runai_streamer" will load the Safetensors weights using Run:ai Model
Streamer.\n Streamer.
- "runai_streamer_sharded" will load weights from pre-sharded checkpoint - "runai_streamer_sharded" will load weights from pre-sharded checkpoint
files using Run:ai Model Streamer.\n files using Run:ai Model Streamer.
- "bitsandbytes" will load the weights using bitsandbytes quantization.\n - "bitsandbytes" will load the weights using bitsandbytes quantization.
- "sharded_state" will load weights from pre-sharded checkpoint files, - "sharded_state" will load weights from pre-sharded checkpoint files,
supporting efficient loading of tensor-parallel models.\n supporting efficient loading of tensor-parallel models.
- "gguf" will load weights from GGUF format files (details specified in - "gguf" will load weights from GGUF format files (details specified in
https://github.com/ggml-org/ggml/blob/master/docs/gguf.md).\n https://github.com/ggml-org/ggml/blob/master/docs/gguf.md).
- "mistral" will load weights from consolidated safetensors files used by - "mistral" will load weights from consolidated safetensors files used by
Mistral models.\n Mistral models.\n
- Other custom values can be supported via plugins.""" - Other custom values can be supported via plugins.
"""
download_dir: str | None = None download_dir: str | None = None
"""Directory to download and load the weights, default to the default """Directory to download and load the weights, default to the default
cache directory of Hugging Face.""" cache directory of Hugging Face."""
safetensors_load_strategy: str | None = None safetensors_load_strategy: str | None = None
"""Specifies the loading strategy for safetensors weights. """
Specifies the loading strategy for safetensors weights.
- None (default): Uses memory-mapped (lazy) loading. When an NFS - None (default): Uses memory-mapped (lazy) loading. When an NFS
filesystem is detected and the total checkpoint size fits within 90%% filesystem is detected and the total checkpoint size fits within 90%%
of available RAM, prefetching is enabled automatically. of available RAM, prefetching is enabled automatically.
...@@ -72,7 +77,7 @@ class LoadConfig: ...@@ -72,7 +77,7 @@ class LoadConfig:
- "torchao": Weights are loaded in upfront and then reconstructed - "torchao": Weights are loaded in upfront and then reconstructed
into torchao tensor subclasses. This is used when the checkpoint into torchao tensor subclasses. This is used when the checkpoint
was quantized using torchao and saved using safetensors. was quantized using torchao and saved using safetensors.
Needs torchao >= 0.14.0 Needs `torchao >= 0.14.0`.
""" """
model_loader_extra_config: dict | TensorizerConfig = Field(default_factory=dict) model_loader_extra_config: dict | TensorizerConfig = Field(default_factory=dict)
"""Extra config for model loader. This will be passed to the model loader """Extra config for model loader. This will be passed to the model loader
...@@ -88,13 +93,13 @@ class LoadConfig: ...@@ -88,13 +93,13 @@ class LoadConfig:
weights.""" weights."""
pt_load_map_location: str | dict[str, str] = "cpu" pt_load_map_location: str | dict[str, str] = "cpu"
""" """
pt_load_map_location: the map location for loading pytorch checkpoint, to The map location for loading pytorch checkpoint, to support loading
support loading checkpoints can only be loaded on certain devices like checkpoints can only be loaded on certain devices like "cuda", this
"cuda", this is equivalent to {"": "cuda"}. Another supported format is is equivalent to `{"": "cuda"}`. Another supported format is mapping
mapping from different devices like from GPU 1 to GPU 0: from different devices like from GPU 1 to GPU 0: `{"cuda:1": "cuda:0"}`.
{"cuda:1": "cuda:0"}. Note that when passed from command line, the strings Note that when passed from command line, the strings in dictionary
in dictionary needs to be double quoted for json parsing. For more details, need to be double quoted for json parsing. For more details, see
see original doc for `map_location` in https://pytorch.org/docs/stable/generated/torch.load.html the original doc for `map_location` parameter in [`torch.load`][] parameter.
""" """
def compute_hash(self) -> str: def compute_hash(self) -> str:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment