Unverified Commit 013edb64 authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

Update main docs (#1706)



* Remove bogus file

* [Docs] Remove mentioning of gated access since no longer exsits

* add docs to index

* Apply suggestions from code review
Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
parent 86ac3ea1
...@@ -79,19 +79,13 @@ In order to get started, we recommend taking a look at two notebooks: ...@@ -79,19 +79,13 @@ In order to get started, we recommend taking a look at two notebooks:
Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/) and [RunwayML](https://runwayml.com/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM. Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/) and [RunwayML](https://runwayml.com/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM.
See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.
You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
### Text-to-Image generation with Stable Diffusion ### Text-to-Image generation with Stable Diffusion
First let's install First let's install
```bash
pip install --upgrade diffusers transformers scipy
```
Run this command to log in with your HF Hub token if you haven't before (you can skip this step if you prefer to run the model locally, follow [this](#running-the-model-locally) instead)
```bash ```bash
huggingface-cli login pip install --upgrade diffusers transformers accelerate
``` ```
We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
...@@ -101,7 +95,7 @@ precision while being roughly twice as fast and requiring half the amount of GPU ...@@ -101,7 +95,7 @@ precision while being roughly twice as fast and requiring half the amount of GPU
import torch import torch
from diffusers import StableDiffusionPipeline from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, revision="fp16") pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars" prompt = "a photo of an astronaut riding a horse on mars"
...@@ -109,17 +103,16 @@ image = pipe(prompt).images[0] ...@@ -109,17 +103,16 @@ image = pipe(prompt).images[0]
``` ```
#### Running the model locally #### Running the model locally
If you don't want to login to Hugging Face, you can also simply download the model folder
(after having [accepted the license](https://huggingface.co/runwayml/stable-diffusion-v1-5)) and pass You can also simply download the model folder and pass the path to the local folder to the `StableDiffusionPipeline`.
the path to the local folder to the `StableDiffusionPipeline`.
``` ```
git lfs install git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
``` ```
Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can also run stable diffusion Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can run stable diffusion
without requiring an authentication token: as follows:
```python ```python
pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
...@@ -134,11 +127,7 @@ to using `fp16`. ...@@ -134,11 +127,7 @@ to using `fp16`.
The following snippet should result in less than 4GB VRAM. The following snippet should result in less than 4GB VRAM.
```python ```python
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
"runwayml/stable-diffusion-v1-5",
revision="fp16",
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars" prompt = "a photo of an astronaut riding a horse on mars"
...@@ -164,7 +153,6 @@ If you want to run Stable Diffusion on CPU or you want to have maximum precision ...@@ -164,7 +153,6 @@ If you want to run Stable Diffusion on CPU or you want to have maximum precision
please run the model in the default *full-precision* setting: please run the model in the default *full-precision* setting:
```python ```python
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
...@@ -262,11 +250,8 @@ from diffusers import StableDiffusionImg2ImgPipeline ...@@ -262,11 +250,8 @@ from diffusers import StableDiffusionImg2ImgPipeline
# load the pipeline # load the pipeline
device = "cuda" device = "cuda"
model_id_or_path = "runwayml/stable-diffusion-v1-5" model_id_or_path = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained( pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
model_id_or_path,
revision="fp16",
torch_dtype=torch.float16,
)
# or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 # or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
# and pass `model_id_or_path="./stable-diffusion-v1-5"`. # and pass `model_id_or_path="./stable-diffusion-v1-5"`.
pipe = pipe.to(device) pipe = pipe.to(device)
...@@ -288,10 +273,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research. ...@@ -288,10 +273,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research.
### In-painting using Stable Diffusion ### In-painting using Stable Diffusion
The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and a text prompt. It uses a model optimized for this particular task, whose license you need to accept before use. The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and a text prompt.
Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-inpainting), read the license carefully and tick the checkbox if you agree. Note that this is an additional license, you need to accept it even if you accepted the text-to-image Stable Diffusion license in the past. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
```python ```python
import PIL import PIL
...@@ -311,11 +293,7 @@ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data ...@@ -311,11 +293,7 @@ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data
init_image = download_image(img_url).resize((512, 512)) init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512))
pipe = StableDiffusionInpaintPipeline.from_pretrained( pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16)
"runwayml/stable-diffusion-inpainting",
revision="fp16",
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
prompt = "Face of a yellow cat, high resolution, sitting on a park bench" prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
......
...@@ -26,6 +26,8 @@ ...@@ -26,6 +26,8 @@
title: "Text-Guided Image-to-Image" title: "Text-Guided Image-to-Image"
- local: using-diffusers/inpaint - local: using-diffusers/inpaint
title: "Text-Guided Image-Inpainting" title: "Text-Guided Image-Inpainting"
- local: using-diffusers/depth2img
title: "Text-Guided Depth-to-Image"
- local: using-diffusers/custom_pipeline_examples - local: using-diffusers/custom_pipeline_examples
title: "Community Pipelines" title: "Community Pipelines"
- local: using-diffusers/contribute_pipeline - local: using-diffusers/contribute_pipeline
......
...@@ -18,7 +18,7 @@ specific language governing permissions and limitations under the License. ...@@ -18,7 +18,7 @@ specific language governing permissions and limitations under the License.
# 🧨 Diffusers # 🧨 Diffusers
🤗 Diffusers provides pretrained vision diffusion models, and serves as a modular toolbox for inference and training. 🤗 Diffusers provides pretrained vision and audio diffusion models, and serves as a modular toolbox for inference and training.
More precisely, 🤗 Diffusers offers: More precisely, 🤗 Diffusers offers:
......
...@@ -18,9 +18,12 @@ Whether you're a developer or an everyday user, this quick tour will help you ge ...@@ -18,9 +18,12 @@ Whether you're a developer or an everyday user, this quick tour will help you ge
Before you begin, make sure you have all the necessary libraries installed: Before you begin, make sure you have all the necessary libraries installed:
```bash ```bash
pip install --upgrade diffusers pip install --upgrade diffusers accelerate transformers
``` ```
- [`accelerate`](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training
- [`transformers`](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion)
## DiffusionPipeline ## DiffusionPipeline
The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks across different modalities. Take a look at the table below for some supported tasks: The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks across different modalities. Take a look at the table below for some supported tasks:
...@@ -29,19 +32,26 @@ The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion syst ...@@ -29,19 +32,26 @@ The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion syst
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------| |------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
| Unconditional Image Generation | generate an image from gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation`) | | Unconditional Image Generation | generate an image from gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation`) |
| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) | | Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
| Text-Guided Image-to-Image Translation | generate an image given an original image and a text prompt | [img2img](./using-diffusers/img2img) | | Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
| Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) | | Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2image](./using-diffusers/depth2image) |
For more in-detail information on how diffusion pipelines function for the different tasks, please have a look at the [**Using Diffusers**](./using-diffusers/overview) section. For more in-detail information on how diffusion pipelines function for the different tasks, please have a look at the [**Using Diffusers**](./using-diffusers/overview) section.
As an example, start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download. As an example, start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
You can use the [`DiffusionPipeline`] for any [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads). You can use the [`DiffusionPipeline`] for any [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads).
In this guide though, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256): In this guide though, you'll use [`DiffusionPipeline`] for text-to-image generation with [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion).
For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion), please carefully read its [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) before running the model.
This is due to the improved image generation capabilities of the model and the potentially harmful content that could be produced with it.
Please, head over to your stable diffusion model of choice, *e.g.* [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5), and read the license.
You can load the model as follows:
```python ```python
>>> from diffusers import DiffusionPipeline >>> from diffusers import DiffusionPipeline
>>> pipeline = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256") >>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
``` ```
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components.
...@@ -66,40 +76,14 @@ You can save the image by simply calling: ...@@ -66,40 +76,14 @@ You can save the image by simply calling:
>>> image.save("image_of_squirrel_painting.png") >>> image.save("image_of_squirrel_painting.png")
``` ```
More advanced models, like [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) require you to accept a [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) before running the model. **Note**: You can also use the pipeline locally by downloading the weights via:
This is due to the improved image generation capabilities of the model and the potentially harmful content that could be produced with it.
Please, head over to your stable diffusion model of choice, *e.g.* [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
Having "click-accepted" the license, you can save your token:
```python
AUTH_TOKEN = "<please-fill-with-your-token>"
```
You can then load [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)
just like we did before only that now you need to pass your `AUTH_TOKEN`:
```python
>>> from diffusers import DiffusionPipeline
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token=AUTH_TOKEN)
```
If you do not pass your authentication token you will see that the diffusion system will not be correctly
downloaded. Forcing the user to pass an authentication token ensures that it can be verified that the
user has indeed read and accepted the license, which also means that an internet connection is required.
**Note**: If you do not want to be forced to pass an authentication token, you can also simply download
the weights locally via:
``` ```
git lfs install git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
``` ```
and then load locally saved weights into the pipeline. This way, you do not need to pass an authentication and then loading the saved weights into the pipeline.
token. Assuming that `"./stable-diffusion-v1-5"` is the local path to the cloned stable-diffusion-v1-5 repo,
you can also load the pipeline as follows:
```python ```python
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") >>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
...@@ -121,7 +105,7 @@ you could use it as follows: ...@@ -121,7 +105,7 @@ you could use it as follows:
```python ```python
>>> from diffusers import EulerDiscreteScheduler >>> from diffusers import EulerDiscreteScheduler
>>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token=AUTH_TOKEN) >>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
>>> # change scheduler to Euler >>> # change scheduler to Euler
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) >>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
......
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Text-Guided Image-to-Image Generation
The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images as well as a `depth_map` to preserve the images' structure. If no `depth_map` is provided, the pipeline will automatically predict the depth via an integrated depth-estimation model.
```python
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
).to("cuda")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)
prompt = "two tigers"
n_prompt = "bad, deformed, ugly, bad anatomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment