[Docs] update docs (Stable unCLIP) to reflect the updated ckpts. (#2815)

* update docs to reflect the updated ckpts. * update: point about prompt. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * emove image resizing. * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[Docs] update docs (Stable unCLIP) to reflect the updated ckpts. (#2815)
* update docs to reflect the updated ckpts. * update: point about prompt. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * emove image resizing. * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
5883d8d4 · Sayak Paul · GitHub · dbcb15c2 · 5883d8d4
Unverified Commit 5883d8d4 authored Mar 24, 2023 by Sayak Paul Committed by GitHub Mar 24, 2023
Show whitespace changes
Inline Side-by-side

Showing with 21 additions and 19 deletions

docs/source/en/api/pipelines/stable_unclip.mdx docs/source/en/api/pipelines/stable_unclip.mdx +21 -19

No files found.
--- a/docs/source/en/api/pipelines/stable_unclip.mdx
+++ b/docs/source/en/api/pipelines/stable_unclip.mdx
@@ -16,6 +16,10 @@ Stable unCLIP checkpoints are finetuned from [stable diffusion 2.1](./stable_dif
 Stable unCLIP also still conditions on text embeddings. Given the two separate conditionings, stable unCLIP can be used
 for text guided image variation. When combined with an unCLIP prior, it can also be used for full text to image generation.
+To know more about the unCLIP process, check out the following paper:
+[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen.
 ## Tips
 Stable unCLIP takes a `noise_level` as input during inference. `noise_level` determines how much noise is added 
@@ -24,23 +28,15 @@ we do not add any additional noise to the image embeddings i.e. `noise_level = 0
 ### Available checkpoints:
-TODO
+* Image variation
+	* [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip)
+	* [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small)
+* Text-to-image 
+	* Coming soon!
 ### Text-to-Image Generation
-```python
+Coming soon!
-import torch
-from diffusers import StableUnCLIPPipeline
-pipe = StableUnCLIPPipeline.from_pretrained(
-    "fusing/stable-unclip-2-1-l", torch_dtype=torch.float16
-)  # TODO update model path
-pipe = pipe.to("cuda")
-prompt = "a photo of an astronaut riding a horse on mars"
-images = pipe(prompt).images
-images[0].save("astronaut_horse.png")
-```
 ### Text guided Image-to-Image Variation
@@ -54,19 +50,25 @@ from io import BytesIO
 from diffusers import StableUnCLIPImg2ImgPipeline
 pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
-    "fusing/stable-unclip-2-1-l-img2img", torch_dtype=torch.float16
+    "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
-)  # TODO update model path
+)
 pipe = pipe.to("cuda")
-url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
+url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
 response = requests.get(url)
 init_image = Image.open(BytesIO(response.content)).convert("RGB")
-init_image = init_image.resize((768, 512))
+images = pipe(init_image).images
+images[0].save("fantasy_landscape.png")
+```
+Optionally, you can also pass a prompt to `pipe` such as:
+```python 
 prompt = "A fantasy landscape, trending on artstation"
-images = pipe(prompt, init_image).images
+images = pipe(init_image, prompt=prompt).images
 images[0].save("fantasy_landscape.png")
 ```