Unverified Commit fa736e32 authored by Sayak Paul's avatar Sayak Paul Committed by GitHub
Browse files

[Docs] refactor text-to-video zero (#3049)

* fix: norm group test for UNet3D.

* refactor text-to-video zero docs.
parent a4b233e5
...@@ -61,6 +61,7 @@ Resources: ...@@ -61,6 +61,7 @@ Resources:
To generate a video from prompt, run the following python command To generate a video from prompt, run the following python command
```python ```python
import torch import torch
import imageio
from diffusers import TextToVideoZeroPipeline from diffusers import TextToVideoZeroPipeline
model_id = "runwayml/stable-diffusion-v1-5" model_id = "runwayml/stable-diffusion-v1-5"
...@@ -68,6 +69,7 @@ pipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float ...@@ -68,6 +69,7 @@ pipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float
prompt = "A panda is playing guitar on times square" prompt = "A panda is playing guitar on times square"
result = pipe(prompt=prompt).images result = pipe(prompt=prompt).images
result = [(r * 255).astype("uint8") for r in result]
imageio.mimsave("video.mp4", result, fps=4) imageio.mimsave("video.mp4", result, fps=4)
``` ```
You can change these parameters in the pipeline call: You can change these parameters in the pipeline call:
...@@ -95,6 +97,7 @@ To generate a video from prompt with additional pose control ...@@ -95,6 +97,7 @@ To generate a video from prompt with additional pose control
2. Read video containing extracted pose images 2. Read video containing extracted pose images
```python ```python
from PIL import Image
import imageio import imageio
reader = imageio.get_reader(video_path, "ffmpeg") reader = imageio.get_reader(video_path, "ffmpeg")
...@@ -151,6 +154,7 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/ ...@@ -151,6 +154,7 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/
2. Read video from path 2. Read video from path
```python ```python
from PIL import Image
import imageio import imageio
reader = imageio.get_reader(video_path, "ffmpeg") reader = imageio.get_reader(video_path, "ffmpeg")
...@@ -174,14 +178,14 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/ ...@@ -174,14 +178,14 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/
``` ```
### Dreambooth specialization ### DreamBooth specialization
Methods **Text-To-Video**, **Text-To-Video with Pose Control** and **Text-To-Video with Edge Control** Methods **Text-To-Video**, **Text-To-Video with Pose Control** and **Text-To-Video with Edge Control**
can run with custom [DreamBooth](../training/dreambooth) models, as shown below for can run with custom [DreamBooth](../training/dreambooth) models, as shown below for
[Canny edge ControlNet model](https://huggingface.co/lllyasviel/sd-controlnet-canny) and [Canny edge ControlNet model](https://huggingface.co/lllyasviel/sd-controlnet-canny) and
[Avatar style DreamBooth](https://huggingface.co/PAIR/text2video-zero-controlnet-canny-avatar) model [Avatar style DreamBooth](https://huggingface.co/PAIR/text2video-zero-controlnet-canny-avatar) model
1. Download demo video from huggingface 1. Download a demo video
```python ```python
from huggingface_hub import hf_hub_download from huggingface_hub import hf_hub_download
...@@ -193,6 +197,7 @@ can run with custom [DreamBooth](../training/dreambooth) models, as shown below ...@@ -193,6 +197,7 @@ can run with custom [DreamBooth](../training/dreambooth) models, as shown below
2. Read video from path 2. Read video from path
```python ```python
from PIL import Image
import imageio import imageio
reader = imageio.get_reader(video_path, "ffmpeg") reader = imageio.get_reader(video_path, "ffmpeg")
......
...@@ -374,9 +374,8 @@ class TextToVideoZeroPipeline(StableDiffusionPipeline): ...@@ -374,9 +374,8 @@ class TextToVideoZeroPipeline(StableDiffusionPipeline):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"numpy"`):
The output format of the generate image. Choose between The output format of the generated image. Choose between `"latent"` and `"numpy"`.
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a
plain tuple. plain tuple.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment