Unverified Commit ab986769 authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] Maintenance (#3552)

* doc fixes

* fix latex

* parenthesis on inside
parent bdc75e75
...@@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License. ...@@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
# Models # Models
Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models. Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$. The primary function of these models is to denoise an input sample, by modeling the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub. The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
## ModelMixin ## ModelMixin
......
...@@ -113,105 +113,3 @@ each pipeline, one should look directly into the respective pipeline. ...@@ -113,105 +113,3 @@ each pipeline, one should look directly into the respective pipeline.
**Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should **Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should
not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community). not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community).
## Contribution
We are more than happy about any contribution to the officially supported pipelines 🤗. We aspire
all of our pipelines to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**.
- **Self-contained**: A pipeline shall be as self-contained as possible. More specifically, this means that all functionality should be either directly defined in the pipeline file itself, should be inherited from (and only from) the [`DiffusionPipeline` class](.../diffusion_pipeline) or be directly attached to the model and scheduler components of the pipeline.
- **Easy-to-use**: Pipelines should be extremely easy to use - one should be able to load the pipeline and
use it for its designated task, *e.g.* text-to-image generation, in just a couple of lines of code. Most
logic including pre-processing, an unrolled diffusion loop, and post-processing should all happen inside the `__call__` method.
- **Easy-to-tweak**: Certain pipelines will not be able to handle all use cases and tasks that you might like them to. If you want to use a certain pipeline for a specific use case that is not yet supported, you might have to copy the pipeline file and tweak the code to your needs. We try to make the pipeline code as readable as possible so that each part –from pre-processing to diffusing to post-processing– can easily be adapted. If you would like the community to benefit from your customized pipeline, we would love to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community). If you feel that an important pipeline should be part of the official pipelines but isn't, a contribution to the [official pipelines](./overview) would be even better.
- **One-purpose-only**: Pipelines should be used for one task and one task only. Even if two tasks are very similar from a modeling point of view, *e.g.* image2image translation and in-painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*.
## Examples
### Text-to-Image generation with Stable Diffusion
```python
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
```
### Image-to-Image text-guided generation with Stable Diffusion
The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
```python
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionImg2ImgPipeline
# load the pipeline
device = "cuda"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
device
)
# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))
prompt = "A fantasy landscape, trending on artstation"
images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
images[0].save("fantasy_landscape.png")
```
You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
### Tweak prompts reusing seeds and latents
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb)
### In-painting using Stable Diffusion
The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and text prompt.
```python
import PIL
import requests
import torch
from io import BytesIO
from diffusers import StableDiffusionInpaintPipeline
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
```
You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
...@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License. ...@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation –not just code– are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it! We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation –not just code– are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it!
Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://Discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/Discord/823813159592001537?color=5865F2&logo=Discord&logoColor=white"></a> Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://Discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>
Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility. Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility.
......
...@@ -50,7 +50,6 @@ from diffusers import DiffusionPipeline ...@@ -50,7 +50,6 @@ from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -85,7 +84,6 @@ from diffusers import DiffusionPipeline ...@@ -85,7 +84,6 @@ from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -112,7 +110,6 @@ from diffusers import StableDiffusionPipeline ...@@ -112,7 +110,6 @@ from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -166,7 +163,6 @@ from diffusers import StableDiffusionPipeline ...@@ -166,7 +163,6 @@ from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -191,7 +187,6 @@ from diffusers import StableDiffusionPipeline ...@@ -191,7 +187,6 @@ from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -409,7 +404,14 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference ...@@ -409,7 +404,14 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference
| A100-SXM4-40GB | 18.6it/s | 29.it/s | | A100-SXM4-40GB | 18.6it/s | 29.it/s |
| A100-SXM-80GB | 18.7it/s | 29.5it/s | | A100-SXM-80GB | 18.7it/s | 29.5it/s |
To leverage it just make sure you have: To leverage it just make sure you have:
<Tip warning={true}>
If you have PyTorch 2.0 installed, you shouldn't use xFormers!
</Tip>
- PyTorch > 1.12 - PyTorch > 1.12
- Cuda available - Cuda available
- [Installed the xformers library](xformers). - [Installed the xformers library](xformers).
......
...@@ -23,7 +23,7 @@ To benefit from the accelerated attention implementation and `torch.compile()`, ...@@ -23,7 +23,7 @@ To benefit from the accelerated attention implementation and `torch.compile()`,
when PyTorch 2.0 is available. when PyTorch 2.0 is available.
```bash ```bash
pip install --upgrade torch torchvision diffusers pip install --upgrade torch diffusers
``` ```
## Using accelerated transformers and `torch.compile`. ## Using accelerated transformers and `torch.compile`.
......
...@@ -266,6 +266,6 @@ image_grid(images) ...@@ -266,6 +266,6 @@ image_grid(images)
In this tutorial, you learned how to optimize a [`DiffusionPipeline`] for computational and memory efficiency as well as improving the quality of generated outputs. If you're interested in making your pipeline even faster, take a look at the following resources: In this tutorial, you learned how to optimize a [`DiffusionPipeline`] for computational and memory efficiency as well as improving the quality of generated outputs. If you're interested in making your pipeline even faster, take a look at the following resources:
- Learn how [PyTorch 2.0](./optimization/torch2.0) and [`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html) can yield 5 - 300% faster inference speed. - Learn how [PyTorch 2.0](./optimization/torch2.0) and [`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html) can yield 5 - 300% faster inference speed. On an A100 GPU, inference can be up to 50% faster!
- If you can't use PyTorch 2, we recommend you install [xFormers](./optimization/xformers). Its memory-efficient attention mechanism works great with PyTorch 1.13.1 for faster speed and reduced memory consumption. - If you can't use PyTorch 2, we recommend you install [xFormers](./optimization/xformers). Its memory-efficient attention mechanism works great with PyTorch 1.13.1 for faster speed and reduced memory consumption.
- Other optimization techniques, such as model offloading, are covered in [this guide](./optimization/fp16). - Other optimization techniques, such as model offloading, are covered in [this guide](./optimization/fp16).
...@@ -97,7 +97,8 @@ accelerate launch train_controlnet.py \ ...@@ -97,7 +97,8 @@ accelerate launch train_controlnet.py \
--learning_rate=1e-5 \ --learning_rate=1e-5 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=4 --train_batch_size=4 \
--push_to_hub
``` ```
This default configuration requires ~38GB VRAM. This default configuration requires ~38GB VRAM.
...@@ -120,7 +121,8 @@ accelerate launch train_controlnet.py \ ...@@ -120,7 +121,8 @@ accelerate launch train_controlnet.py \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=1 \ --train_batch_size=1 \
--gradient_accumulation_steps=4 --gradient_accumulation_steps=4 \
--push_to_hub
``` ```
## Training with multiple GPUs ## Training with multiple GPUs
...@@ -143,7 +145,8 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \ ...@@ -143,7 +145,8 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \
--train_batch_size=4 \ --train_batch_size=4 \
--mixed_precision="fp16" \ --mixed_precision="fp16" \
--tracker_project_name="controlnet-demo" \ --tracker_project_name="controlnet-demo" \
--report_to=wandb --report_to=wandb \
--push_to_hub
``` ```
## Example results ## Example results
...@@ -191,7 +194,8 @@ accelerate launch train_controlnet.py \ ...@@ -191,7 +194,8 @@ accelerate launch train_controlnet.py \
--train_batch_size=1 \ --train_batch_size=1 \
--gradient_accumulation_steps=4 \ --gradient_accumulation_steps=4 \
--gradient_checkpointing \ --gradient_checkpointing \
--use_8bit_adam --use_8bit_adam \
--push_to_hub
``` ```
## Training on a 12 GB GPU ## Training on a 12 GB GPU
...@@ -219,7 +223,8 @@ accelerate launch train_controlnet.py \ ...@@ -219,7 +223,8 @@ accelerate launch train_controlnet.py \
--gradient_checkpointing \ --gradient_checkpointing \
--use_8bit_adam \ --use_8bit_adam \
--enable_xformers_memory_efficient_attention \ --enable_xformers_memory_efficient_attention \
--set_grads_to_none --set_grads_to_none \
--push_to_hub
``` ```
When using `enable_xformers_memory_efficient_attention`, please make sure to install `xformers` by `pip install xformers`. When using `enable_xformers_memory_efficient_attention`, please make sure to install `xformers` by `pip install xformers`.
...@@ -283,7 +288,8 @@ accelerate launch train_controlnet.py \ ...@@ -283,7 +288,8 @@ accelerate launch train_controlnet.py \
--gradient_checkpointing \ --gradient_checkpointing \
--enable_xformers_memory_efficient_attention \ --enable_xformers_memory_efficient_attention \
--set_grads_to_none \ --set_grads_to_none \
--mixed_precision fp16 --mixed_precision fp16 \
--push_to_hub
``` ```
## Inference ## Inference
......
...@@ -100,7 +100,8 @@ accelerate launch train_custom_diffusion.py \ ...@@ -100,7 +100,8 @@ accelerate launch train_custom_diffusion.py \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--max_train_steps=250 \ --max_train_steps=250 \
--scale_lr --hflip \ --scale_lr --hflip \
--modifier_token "<new1>" --modifier_token "<new1>" \
--push_to_hub
``` ```
**Use `--enable_xformers_memory_efficient_attention` for faster training with lower VRAM requirement (16GB per GPU). Follow [this guide](https://github.com/facebookresearch/xformers) for installation instructions.** **Use `--enable_xformers_memory_efficient_attention` for faster training with lower VRAM requirement (16GB per GPU). Follow [this guide](https://github.com/facebookresearch/xformers) for installation instructions.**
...@@ -132,7 +133,8 @@ accelerate launch train_custom_diffusion.py \ ...@@ -132,7 +133,8 @@ accelerate launch train_custom_diffusion.py \
--scale_lr --hflip \ --scale_lr --hflip \
--modifier_token "<new1>" \ --modifier_token "<new1>" \
--validation_prompt="<new1> cat sitting in a bucket" \ --validation_prompt="<new1> cat sitting in a bucket" \
--report_to="wandb" --report_to="wandb" \
--push_to_hub
``` ```
Here is an example [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/26ghrcau) where you can check out the intermediate results along with other training details. Here is an example [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/26ghrcau) where you can check out the intermediate results along with other training details.
...@@ -168,7 +170,8 @@ accelerate launch train_custom_diffusion.py \ ...@@ -168,7 +170,8 @@ accelerate launch train_custom_diffusion.py \
--max_train_steps=500 \ --max_train_steps=500 \
--num_class_images=200 \ --num_class_images=200 \
--scale_lr --hflip \ --scale_lr --hflip \
--modifier_token "<new1>+<new2>" --modifier_token "<new1>+<new2>" \
--push_to_hub
``` ```
Here is an example [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/3990tzkg) where you can check out the intermediate results along with other training details. Here is an example [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/3990tzkg) where you can check out the intermediate results along with other training details.
...@@ -207,7 +210,8 @@ accelerate launch train_custom_diffusion.py \ ...@@ -207,7 +210,8 @@ accelerate launch train_custom_diffusion.py \
--scale_lr --hflip --noaug \ --scale_lr --hflip --noaug \
--freeze_model crossattn \ --freeze_model crossattn \
--modifier_token "<new1>" \ --modifier_token "<new1>" \
--enable_xformers_memory_efficient_attention --enable_xformers_memory_efficient_attention \
--push_to_hub
``` ```
## Inference ## Inference
......
...@@ -130,7 +130,8 @@ python train_dreambooth_flax.py \ ...@@ -130,7 +130,8 @@ python train_dreambooth_flax.py \
--resolution=512 \ --resolution=512 \
--train_batch_size=1 \ --train_batch_size=1 \
--learning_rate=5e-6 \ --learning_rate=5e-6 \
--max_train_steps=400 --max_train_steps=400 \
--push_to_hub
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
...@@ -187,7 +188,8 @@ python train_dreambooth_flax.py \ ...@@ -187,7 +188,8 @@ python train_dreambooth_flax.py \
--train_batch_size=1 \ --train_batch_size=1 \
--learning_rate=5e-6 \ --learning_rate=5e-6 \
--num_class_images=200 \ --num_class_images=200 \
--max_train_steps=800 --max_train_steps=800 \
--push_to_hub
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
...@@ -223,7 +225,7 @@ accelerate launch train_dreambooth.py \ ...@@ -223,7 +225,7 @@ accelerate launch train_dreambooth.py \
--class_prompt="a photo of dog" \ --class_prompt="a photo of dog" \
--resolution=512 \ --resolution=512 \
--train_batch_size=1 \ --train_batch_size=1 \
--use_8bit_adam --use_8bit_adam \
--gradient_checkpointing \ --gradient_checkpointing \
--learning_rate=2e-6 \ --learning_rate=2e-6 \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
...@@ -253,7 +255,8 @@ python train_dreambooth_flax.py \ ...@@ -253,7 +255,8 @@ python train_dreambooth_flax.py \
--train_batch_size=1 \ --train_batch_size=1 \
--learning_rate=2e-6 \ --learning_rate=2e-6 \
--num_class_images=200 \ --num_class_images=200 \
--max_train_steps=800 --max_train_steps=800 \
--push_to_hub
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
......
...@@ -100,7 +100,8 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ ...@@ -100,7 +100,8 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \
--conditioning_dropout_prob=0.05 \ --conditioning_dropout_prob=0.05 \
--mixed_precision=fp16 \ --mixed_precision=fp16 \
--seed=42 --seed=42 \
--push_to_hub
``` ```
Additionally, we support performing validation inference to monitor training progress Additionally, we support performing validation inference to monitor training progress
...@@ -121,7 +122,8 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ ...@@ -121,7 +122,8 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
--val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \ --val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \
--validation_prompt="make the mountains snowy" \ --validation_prompt="make the mountains snowy" \
--seed=42 \ --seed=42 \
--report_to=wandb --report_to=wandb \
--push_to_hub
``` ```
We recommend this type of validation as it can be useful for model debugging. Note that you need `wandb` installed to use this. You can install `wandb` by running `pip install wandb`. We recommend this type of validation as it can be useful for model debugging. Note that you need `wandb` installed to use this. You can install `wandb` by running `pip install wandb`.
...@@ -148,7 +150,8 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py ...@@ -148,7 +150,8 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py
--learning_rate=5e-05 --lr_warmup_steps=0 \ --learning_rate=5e-05 --lr_warmup_steps=0 \
--conditioning_dropout_prob=0.05 \ --conditioning_dropout_prob=0.05 \
--mixed_precision=fp16 \ --mixed_precision=fp16 \
--seed=42 --seed=42 \
--push_to_hub
``` ```
## Inference ## Inference
......
...@@ -76,13 +76,25 @@ Launch the [PyTorch training script](https://github.com/huggingface/diffusers/bl ...@@ -76,13 +76,25 @@ Launch the [PyTorch training script](https://github.com/huggingface/diffusers/bl
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument. Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument.
<literalinclude> ```bash
{"path": "../../../../examples/text_to_image/README.md", export MODEL_NAME="CompVis/stable-diffusion-v1-4"
"language": "bash", export dataset_name="lambdalabs/pokemon-blip-captions"
"start-after": "accelerate_snippet_start",
"end-before": "accelerate_snippet_end", accelerate launch --mixed_precision="fp16" train_text_to_image.py \
"dedent": 0} --pretrained_model_name_or_path=$MODEL_NAME \
</literalinclude> --dataset_name=$dataset_name \
--use_ema \
--resolution=512 --center_crop --random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" \
--push_to_hub
```
To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder). To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder).
...@@ -105,8 +117,10 @@ accelerate launch train_text_to_image.py \ ...@@ -105,8 +117,10 @@ accelerate launch train_text_to_image.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant"
--output_dir=${OUTPUT_DIR} --lr_warmup_steps=0 \
--output_dir=${OUTPUT_DIR} \
--push_to_hub
``` ```
#### Training with multiple GPUs #### Training with multiple GPUs
...@@ -129,8 +143,10 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ ...@@ -129,8 +143,10 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" \
--output_dir="sd-pokemon-model" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" \
--push_to_hub
``` ```
</pt> </pt>
...@@ -159,7 +175,8 @@ python train_text_to_image_flax.py \ ...@@ -159,7 +175,8 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model" \
--push_to_hub
``` ```
To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder). To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder).
...@@ -179,7 +196,8 @@ python train_text_to_image_flax.py \ ...@@ -179,7 +196,8 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model" \
--push_to_hub
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
......
...@@ -120,7 +120,8 @@ accelerate launch textual_inversion.py \ ...@@ -120,7 +120,8 @@ accelerate launch textual_inversion.py \
--learning_rate=5.0e-04 --scale_lr \ --learning_rate=5.0e-04 --scale_lr \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--output_dir="textual_inversion_cat" --output_dir="textual_inversion_cat" \
--push_to_hub
``` ```
<Tip> <Tip>
...@@ -161,7 +162,8 @@ python textual_inversion_flax.py \ ...@@ -161,7 +162,8 @@ python textual_inversion_flax.py \
--train_batch_size=1 \ --train_batch_size=1 \
--max_train_steps=3000 \ --max_train_steps=3000 \
--learning_rate=5.0e-04 --scale_lr \ --learning_rate=5.0e-04 --scale_lr \
--output_dir="textual_inversion_cat" --output_dir="textual_inversion_cat" \
--push_to_hub
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
......
...@@ -141,5 +141,6 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \ ...@@ -141,5 +141,6 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
--learning_rate=1e-4 \ --learning_rate=1e-4 \
--lr_warmup_steps=500 \ --lr_warmup_steps=500 \
--mixed_precision="fp16" \ --mixed_precision="fp16" \
--logger="wandb" --logger="wandb" \
--push_to_hub
``` ```
\ No newline at end of file
...@@ -20,12 +20,12 @@ The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion syst ...@@ -20,12 +20,12 @@ The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion syst
Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) you would like to download. Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) you would like to download.
In this guide, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256): In this guide, you'll use [`DiffusionPipeline`] for text-to-image generation with [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5):
```python ```python
>>> from diffusers import DiffusionPipeline >>> from diffusers import DiffusionPipeline
>>> generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256") >>> generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
``` ```
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components.
......
...@@ -28,18 +28,15 @@ The following paragraphs show how to do so with the 🧨 Diffusers library. ...@@ -28,18 +28,15 @@ The following paragraphs show how to do so with the 🧨 Diffusers library.
## Load pipeline ## Load pipeline
Let's start by loading the stable diffusion pipeline. Let's start by loading the [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) model in the [`DiffusionPipeline`]:
Remember that you have to be a registered user on the 🤗 Hugging Face Hub, and have "click-accepted" the [license](https://huggingface.co/runwayml/stable-diffusion-v1-5) in order to use stable diffusion.
```python ```python
from huggingface_hub import login from huggingface_hub import login
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
import torch import torch
# first we need to login with our access token
login() login()
# Now we can download the pipeline
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
``` ```
......
...@@ -30,14 +30,7 @@ pipeline = StableDiffusionPipeline.from_ckpt( ...@@ -30,14 +30,7 @@ pipeline = StableDiffusionPipeline.from_ckpt(
## Convert to safetensors ## Convert to safetensors
Not all weights on the Hub are available in the `.safetensors` format, and you may encounter weights stored as `.bin`. In this case, use the Space below to convert the weights to `.safetensors`. The Convert Space downloads the pickled weights, converts them, and opens a Pull Request to upload the newly converted `.safetensors` file on the Hub. This way, if there is any malicious code contained in the pickled files, they're uploaded to the Hub - which has a [security scanner](https://huggingface.co/docs/hub/security-pickle#hubs-security-scanner) to detect unsafe files and suspicious pickle imports - instead of your computer. Not all weights on the Hub are available in the `.safetensors` format, and you may encounter weights stored as `.bin`. In this case, use the [Convert Space](https://huggingface.co/spaces/diffusers/convert) to convert the weights to `.safetensors`. The Convert Space downloads the pickled weights, converts them, and opens a Pull Request to upload the newly converted `.safetensors` file on the Hub. This way, if there is any malicious code contained in the pickled files, they're uploaded to the Hub - which has a [security scanner](https://huggingface.co/docs/hub/security-pickle#hubs-security-scanner) to detect unsafe files and suspicious pickle imports - instead of your computer.
<iframe
src="https://safetensors-convert.hf.space"
frameborder="0"
width="850"
height="450"
></iframe>
You can use the model with the new `.safetensors` weights by specifying the reference to the Pull Request in the `revision` parameter (you can also test it in this [Check PR](https://huggingface.co/spaces/diffusers/check_pr) Space on the Hub), for example `refs/pr/22`: You can use the model with the new `.safetensors` weights by specifying the reference to the Pull Request in the `revision` parameter (you can also test it in this [Check PR](https://huggingface.co/spaces/diffusers/check_pr) Space on the Hub), for example `refs/pr/22`:
......
...@@ -36,7 +36,7 @@ A pipeline is a quick and easy way to run a model for inference, requiring no mo ...@@ -36,7 +36,7 @@ A pipeline is a quick and easy way to run a model for inference, requiring no mo
That was super easy, but how did the pipeline do that? Let's breakdown the pipeline and take a look at what's happening under the hood. That was super easy, but how did the pipeline do that? Let's breakdown the pipeline and take a look at what's happening under the hood.
In the example above, the pipeline contains a UNet model and a DDPM scheduler. The pipeline denoises an image by taking random noise the size of the desired output and passing it through the model several times. At each timestep, the model predicts the *noise residual* and the scheduler uses it to predict a less noisy image. The pipeline repeats this process until it reaches the end of the specified number of inference steps. In the example above, the pipeline contains a [`UNet2DModel`] model and a [`DDPMScheduler`]. The pipeline denoises an image by taking random noise the size of the desired output and passing it through the model several times. At each timestep, the model predicts the *noise residual* and the scheduler uses it to predict a less noisy image. The pipeline repeats this process until it reaches the end of the specified number of inference steps.
To recreate the pipeline with the model and scheduler separately, let's write our own denoising process. To recreate the pipeline with the model and scheduler separately, let's write our own denoising process.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment