Unverified Commit 3028089e authored by M. Tolga Cangöz's avatar M. Tolga Cangöz Committed by GitHub
Browse files

Fix typos (#7411)

* Fix typos

* Fix typo in SVD.md
parent b536f398
...@@ -21,7 +21,7 @@ This guide will show you how to use SVD to generate short videos from images. ...@@ -21,7 +21,7 @@ This guide will show you how to use SVD to generate short videos from images.
Before you begin, make sure you have the following libraries installed: Before you begin, make sure you have the following libraries installed:
```py ```py
!pip install -q -U diffusers transformers accelerate !pip install -q -U diffusers transformers accelerate
``` ```
The are two variants of this model, [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) and [SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt). The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames. The are two variants of this model, [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) and [SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt). The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames.
...@@ -86,7 +86,7 @@ Video generation is very memory intensive because you're essentially generating ...@@ -86,7 +86,7 @@ Video generation is very memory intensive because you're essentially generating
+ frames = pipe(image, decode_chunk_size=2, generator=generator, num_frames=25).frames[0] + frames = pipe(image, decode_chunk_size=2, generator=generator, num_frames=25).frames[0]
``` ```
Using all these tricks togethere should lower the memory requirement to less than 8GB VRAM. Using all these tricks together should lower the memory requirement to less than 8GB VRAM.
## Micro-conditioning ## Micro-conditioning
......
...@@ -48,7 +48,7 @@ class UnCLIPTextInterpolationPipeline(DiffusionPipeline): ...@@ -48,7 +48,7 @@ class UnCLIPTextInterpolationPipeline(DiffusionPipeline):
Tokenizer of class Tokenizer of class
[CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer). [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer).
prior ([`PriorTransformer`]): prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
text_proj ([`UnCLIPTextProjModel`]): text_proj ([`UnCLIPTextProjModel`]):
Utility class to prepare and combine the embeddings before they are passed to the decoder. Utility class to prepare and combine the embeddings before they are passed to the decoder.
decoder ([`UNet2DConditionModel`]): decoder ([`UNet2DConditionModel`]):
......
...@@ -129,7 +129,7 @@ class KandinskyCombinedPipeline(DiffusionPipeline): ...@@ -129,7 +129,7 @@ class KandinskyCombinedPipeline(DiffusionPipeline):
movq ([`VQModel`]): movq ([`VQModel`]):
MoVQ Decoder to generate the image from the latents. MoVQ Decoder to generate the image from the latents.
prior_prior ([`PriorTransformer`]): prior_prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
prior_image_encoder ([`CLIPVisionModelWithProjection`]): prior_image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
prior_text_encoder ([`CLIPTextModelWithProjection`]): prior_text_encoder ([`CLIPTextModelWithProjection`]):
...@@ -346,7 +346,7 @@ class KandinskyImg2ImgCombinedPipeline(DiffusionPipeline): ...@@ -346,7 +346,7 @@ class KandinskyImg2ImgCombinedPipeline(DiffusionPipeline):
movq ([`VQModel`]): movq ([`VQModel`]):
MoVQ Decoder to generate the image from the latents. MoVQ Decoder to generate the image from the latents.
prior_prior ([`PriorTransformer`]): prior_prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
prior_image_encoder ([`CLIPVisionModelWithProjection`]): prior_image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
prior_text_encoder ([`CLIPTextModelWithProjection`]): prior_text_encoder ([`CLIPTextModelWithProjection`]):
...@@ -586,7 +586,7 @@ class KandinskyInpaintCombinedPipeline(DiffusionPipeline): ...@@ -586,7 +586,7 @@ class KandinskyInpaintCombinedPipeline(DiffusionPipeline):
movq ([`VQModel`]): movq ([`VQModel`]):
MoVQ Decoder to generate the image from the latents. MoVQ Decoder to generate the image from the latents.
prior_prior ([`PriorTransformer`]): prior_prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
prior_image_encoder ([`CLIPVisionModelWithProjection`]): prior_image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
prior_text_encoder ([`CLIPTextModelWithProjection`]): prior_text_encoder ([`CLIPTextModelWithProjection`]):
......
...@@ -134,7 +134,7 @@ class KandinskyPriorPipeline(DiffusionPipeline): ...@@ -134,7 +134,7 @@ class KandinskyPriorPipeline(DiffusionPipeline):
Args: Args:
prior ([`PriorTransformer`]): prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
image_encoder ([`CLIPVisionModelWithProjection`]): image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
text_encoder ([`CLIPTextModelWithProjection`]): text_encoder ([`CLIPTextModelWithProjection`]):
......
...@@ -119,7 +119,7 @@ class KandinskyV22CombinedPipeline(DiffusionPipeline): ...@@ -119,7 +119,7 @@ class KandinskyV22CombinedPipeline(DiffusionPipeline):
movq ([`VQModel`]): movq ([`VQModel`]):
MoVQ Decoder to generate the image from the latents. MoVQ Decoder to generate the image from the latents.
prior_prior ([`PriorTransformer`]): prior_prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
prior_image_encoder ([`CLIPVisionModelWithProjection`]): prior_image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
prior_text_encoder ([`CLIPTextModelWithProjection`]): prior_text_encoder ([`CLIPTextModelWithProjection`]):
...@@ -346,7 +346,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline): ...@@ -346,7 +346,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline):
movq ([`VQModel`]): movq ([`VQModel`]):
MoVQ Decoder to generate the image from the latents. MoVQ Decoder to generate the image from the latents.
prior_prior ([`PriorTransformer`]): prior_prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
prior_image_encoder ([`CLIPVisionModelWithProjection`]): prior_image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
prior_text_encoder ([`CLIPTextModelWithProjection`]): prior_text_encoder ([`CLIPTextModelWithProjection`]):
...@@ -594,7 +594,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline): ...@@ -594,7 +594,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline):
movq ([`VQModel`]): movq ([`VQModel`]):
MoVQ Decoder to generate the image from the latents. MoVQ Decoder to generate the image from the latents.
prior_prior ([`PriorTransformer`]): prior_prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
prior_image_encoder ([`CLIPVisionModelWithProjection`]): prior_image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
prior_text_encoder ([`CLIPTextModelWithProjection`]): prior_text_encoder ([`CLIPTextModelWithProjection`]):
......
...@@ -90,7 +90,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline): ...@@ -90,7 +90,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline):
Args: Args:
prior ([`PriorTransformer`]): prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
image_encoder ([`CLIPVisionModelWithProjection`]): image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
text_encoder ([`CLIPTextModelWithProjection`]): text_encoder ([`CLIPTextModelWithProjection`]):
......
...@@ -108,7 +108,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline): ...@@ -108,7 +108,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline):
Args: Args:
prior ([`PriorTransformer`]): prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
image_encoder ([`CLIPVisionModelWithProjection`]): image_encoder ([`CLIPVisionModelWithProjection`]):
Frozen image-encoder. Frozen image-encoder.
text_encoder ([`CLIPTextModelWithProjection`]): text_encoder ([`CLIPTextModelWithProjection`]):
......
...@@ -86,7 +86,7 @@ class ShapEImg2ImgPipeline(DiffusionPipeline): ...@@ -86,7 +86,7 @@ class ShapEImg2ImgPipeline(DiffusionPipeline):
Args: Args:
prior ([`PriorTransformer`]): prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
image_encoder ([`~transformers.CLIPVisionModel`]): image_encoder ([`~transformers.CLIPVisionModel`]):
Frozen image-encoder. Frozen image-encoder.
image_processor ([`~transformers.CLIPImageProcessor`]): image_processor ([`~transformers.CLIPImageProcessor`]):
......
...@@ -700,8 +700,8 @@ class StableDiffusionDepth2ImgPipeline(DiffusionPipeline, TextualInversionLoader ...@@ -700,8 +700,8 @@ class StableDiffusionDepth2ImgPipeline(DiffusionPipeline, TextualInversionLoader
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> init_image = Image.open(requests.get(url, stream=True).raw) >>> init_image = Image.open(requests.get(url, stream=True).raw)
>>> prompt = "two tigers" >>> prompt = "two tigers"
>>> n_propmt = "bad, deformed, ugly, bad anotomy" >>> n_prompt = "bad, deformed, ugly, bad anotomy"
>>> image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0] >>> image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
``` ```
Returns: Returns:
......
...@@ -194,7 +194,7 @@ class StableDiffusionInstructPix2PixPipeline( ...@@ -194,7 +194,7 @@ class StableDiffusionInstructPix2PixPipeline(
A higher guidance scale value encourages the model to generate images closely linked to the text A higher guidance scale value encourages the model to generate images closely linked to the text
`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
image_guidance_scale (`float`, *optional*, defaults to 1.5): image_guidance_scale (`float`, *optional*, defaults to 1.5):
Push the generated image towards the inital `image`. Image guidance scale is enabled by setting Push the generated image towards the initial `image`. Image guidance scale is enabled by setting
`image_guidance_scale > 1`. Higher image guidance scale encourages generated images that are closely `image_guidance_scale > 1`. Higher image guidance scale encourages generated images that are closely
linked to the source `image`, usually at the expense of lower image quality. This pipeline requires a linked to the source `image`, usually at the expense of lower image quality. This pipeline requires a
value of at least `1`. value of at least `1`.
......
...@@ -76,7 +76,7 @@ class StableUnCLIPPipeline(DiffusionPipeline, StableDiffusionMixin, TextualInver ...@@ -76,7 +76,7 @@ class StableUnCLIPPipeline(DiffusionPipeline, StableDiffusionMixin, TextualInver
prior_text_encoder ([`CLIPTextModelWithProjection`]): prior_text_encoder ([`CLIPTextModelWithProjection`]):
Frozen [`CLIPTextModelWithProjection`] text-encoder. Frozen [`CLIPTextModelWithProjection`] text-encoder.
prior ([`PriorTransformer`]): prior ([`PriorTransformer`]):
The canonincal unCLIP prior to approximate the image embedding from the text embedding. The canonical unCLIP prior to approximate the image embedding from the text embedding.
prior_scheduler ([`KarrasDiffusionSchedulers`]): prior_scheduler ([`KarrasDiffusionSchedulers`]):
Scheduler used in the prior denoising process. Scheduler used in the prior denoising process.
image_normalizer ([`StableUnCLIPImageNormalizer`]): image_normalizer ([`StableUnCLIPImageNormalizer`]):
......
...@@ -659,7 +659,7 @@ class StableDiffusionXLInstructPix2PixPipeline( ...@@ -659,7 +659,7 @@ class StableDiffusionXLInstructPix2PixPipeline(
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
usually at the expense of lower image quality. usually at the expense of lower image quality.
image_guidance_scale (`float`, *optional*, defaults to 1.5): image_guidance_scale (`float`, *optional*, defaults to 1.5):
Image guidance scale is to push the generated image towards the inital image `image`. Image guidance Image guidance scale is to push the generated image towards the initial image `image`. Image guidance
scale is enabled by setting `image_guidance_scale > 1`. Higher image guidance scale encourages to scale is enabled by setting `image_guidance_scale > 1`. Higher image guidance scale encourages to
generate images that are closely linked to the source image `image`, usually at the expense of lower generate images that are closely linked to the source image `image`, usually at the expense of lower
image quality. This pipeline requires a value of at least `1`. image quality. This pipeline requires a value of at least `1`.
......
...@@ -438,7 +438,7 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): ...@@ -438,7 +438,7 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
...@@ -775,7 +775,7 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -775,7 +775,7 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
...@@ -1018,7 +1018,7 @@ class DPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -1018,7 +1018,7 @@ class DPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
...@@ -547,7 +547,7 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin): ...@@ -547,7 +547,7 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
...@@ -968,7 +968,7 @@ class DPMSolverSinglestepScheduler(SchedulerMixin, ConfigMixin): ...@@ -968,7 +968,7 @@ class DPMSolverSinglestepScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
...@@ -673,7 +673,7 @@ class EDMDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -673,7 +673,7 @@ class EDMDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
...@@ -371,7 +371,7 @@ class EDMEulerScheduler(SchedulerMixin, ConfigMixin): ...@@ -371,7 +371,7 @@ class EDMEulerScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
...@@ -471,7 +471,7 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -471,7 +471,7 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
# add_noise is called after first denoising step (for inpainting) # add_noise is called after first denoising step (for inpainting)
step_indices = [self.step_index] * timesteps.shape[0] step_indices = [self.step_index] * timesteps.shape[0]
else: else:
# add noise is called bevore first denoising step to create inital latent(img2img) # add noise is called before first denoising step to create initial latent(img2img)
step_indices = [self.begin_index] * timesteps.shape[0] step_indices = [self.begin_index] * timesteps.shape[0]
sigma = sigmas[step_indices].flatten() sigma = sigmas[step_indices].flatten()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment