"...text-generation-inference.git" did not exist on "91d9beec90fba479a6751a4c8efae25adc28b001"
Unverified Commit 421ee07e authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] Parallel loading of shards (#12135)



* initial

* feedback

* Update docs/source/en/using-diffusers/loading.md

---------
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
parent 123506ee
...@@ -112,6 +112,30 @@ print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16) ...@@ -112,6 +112,30 @@ print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)
If a component is not explicitly specified in the dictionary and no `default` is provided, it will be loaded with `torch.float32`. If a component is not explicitly specified in the dictionary and no `default` is provided, it will be loaded with `torch.float32`.
### Parallel loading
Large models are often [sharded](../training/distributed_inference#model-sharding) into smaller files so that they are easier to load. Diffusers supports loading shards in parallel to speed up the loading process.
Set the environment variables below to enable parallel loading.
- Set `HF_ENABLE_PARALLEL_LOADING` to `"YES"` to enable parallel loading of shards.
- Set `HF_PARALLEL_LOADING_WORKERS` to configure the number of parallel threads to use when loading shards. More workers loads a model faster but uses more memory.
The `device_map` argument should be set to `"cuda"` to pre-allocate a large chunk of memory based on the model size. This substantially reduces model load time because warming up the memory allocator now avoids many smaller calls to the allocator later.
```py
import os
import torch
from diffusers import DiffusionPipeline
os.environ["HF_ENABLE_PARALLEL_LOADING"] = "YES"
pipeline = DiffusionPipeline.from_pretrained(
"Wan-AI/Wan2.2-I2V-A14B-Diffusers",
torch_dtype=torch.bfloat16,
device_map="cuda"
)
```
### Local pipeline ### Local pipeline
To load a pipeline locally, use [git-lfs](https://git-lfs.github.com/) to manually download a checkpoint to your local disk. To load a pipeline locally, use [git-lfs](https://git-lfs.github.com/) to manually download a checkpoint to your local disk.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment