Unverified Commit c370b90f authored by Sayak Paul's avatar Sayak Paul Committed by GitHub
Browse files

[Flux] minor documentation fixes for flux. (#9048)

* minor documentation fixes for flux.

* clipskip

* add gist
parent ebf3ab14
...@@ -18,7 +18,7 @@ Original model checkpoints for Flux can be found [here](https://huggingface.co/b ...@@ -18,7 +18,7 @@ Original model checkpoints for Flux can be found [here](https://huggingface.co/b
<Tip> <Tip>
Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).
</Tip> </Tip>
......
...@@ -17,12 +17,7 @@ from typing import Any, Callable, Dict, List, Optional, Union ...@@ -17,12 +17,7 @@ from typing import Any, Callable, Dict, List, Optional, Union
import numpy as np import numpy as np
import torch import torch
from transformers import ( from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
CLIPTextModel,
CLIPTokenizer,
T5EncoderModel,
T5TokenizerFast,
)
from ...image_processor import VaeImageProcessor from ...image_processor import VaeImageProcessor
from ...loaders import SD3LoraLoaderMixin from ...loaders import SD3LoraLoaderMixin
...@@ -155,22 +150,18 @@ class FluxPipeline(DiffusionPipeline, SD3LoraLoaderMixin): ...@@ -155,22 +150,18 @@ class FluxPipeline(DiffusionPipeline, SD3LoraLoaderMixin):
A scheduler to be used in combination with `transformer` to denoise the encoded image latents. A scheduler to be used in combination with `transformer` to denoise the encoded image latents.
vae ([`AutoencoderKL`]): vae ([`AutoencoderKL`]):
Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations. Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
text_encoder ([`CLIPTextModelWithProjection`]): text_encoder ([`CLIPTextModel`]):
[CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModelWithProjection), [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel), specifically
specifically the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant, the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant.
with an additional added projection layer that is initialized with a diagonal matrix with the `hidden_size` text_encoder_2 ([`T5EncoderModel`]):
as its dimension. [T5](https://huggingface.co/docs/transformers/en/model_doc/t5#transformers.T5EncoderModel), specifically
text_encoder_2 ([`CLIPTextModelWithProjection`]): the [google/t5-v1_1-xxl](https://huggingface.co/google/t5-v1_1-xxl) variant.
[CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModelWithProjection),
specifically the
[laion/CLIP-ViT-bigG-14-laion2B-39B-b160k](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k)
variant.
tokenizer (`CLIPTokenizer`): tokenizer (`CLIPTokenizer`):
Tokenizer of class Tokenizer of class
[CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer). [CLIPTokenizer](https://huggingface.co/docs/transformers/en/model_doc/clip#transformers.CLIPTokenizer).
tokenizer_2 (`CLIPTokenizer`): tokenizer_2 (`T5TokenizerFast`):
Second Tokenizer of class Second Tokenizer of class
[CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer). [T5TokenizerFast](https://huggingface.co/docs/transformers/en/model_doc/t5#transformers.T5TokenizerFast).
""" """
model_cpu_offload_seq = "text_encoder->text_encoder_2->transformer->vae" model_cpu_offload_seq = "text_encoder->text_encoder_2->transformer->vae"
...@@ -323,9 +314,6 @@ class FluxPipeline(DiffusionPipeline, SD3LoraLoaderMixin): ...@@ -323,9 +314,6 @@ class FluxPipeline(DiffusionPipeline, SD3LoraLoaderMixin):
pooled_prompt_embeds (`torch.FloatTensor`, *optional*): pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
If not provided, pooled text embeddings will be generated from `prompt` input argument. If not provided, pooled text embeddings will be generated from `prompt` input argument.
clip_skip (`int`, *optional*):
Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that
the output of the pre-final layer will be used for computing the prompt embeddings.
lora_scale (`float`, *optional*): lora_scale (`float`, *optional*):
A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded. A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.
""" """
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment