Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
renzhc
diffusers_dcu
Commits
865ba102
Unverified
Commit
865ba102
authored
Aug 27, 2025
by
YiYi Xu
Committed by
GitHub
Aug 27, 2025
Browse files
[Qwen-Image] adding validation for guidance_scale, true_cfg_scale and negative_prompt (#12223)
* up
parent
552c127c
Changes
5
Show whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
180 additions
and
70 deletions
+180
-70
src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py
src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py
+36
-15
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_controlnet.py
...sers/pipelines/qwenimage/pipeline_qwenimage_controlnet.py
+36
-10
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_edit.py
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_edit.py
+36
-15
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_img2img.py
...ffusers/pipelines/qwenimage/pipeline_qwenimage_img2img.py
+36
-15
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_inpaint.py
...ffusers/pipelines/qwenimage/pipeline_qwenimage_inpaint.py
+36
-15
No files found.
src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py
View file @
865ba102
...
@@ -435,7 +435,7 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -435,7 +435,7 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
width
:
Optional
[
int
]
=
None
,
width
:
Optional
[
int
]
=
None
,
num_inference_steps
:
int
=
50
,
num_inference_steps
:
int
=
50
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
guidance_scale
:
float
=
1.0
,
guidance_scale
:
Optional
[
float
]
=
None
,
num_images_per_prompt
:
int
=
1
,
num_images_per_prompt
:
int
=
1
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
...
@@ -462,7 +462,12 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -462,7 +462,12 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `true_cfg_scale` is
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `true_cfg_scale` is
not greater than `1`).
not greater than `1`).
true_cfg_scale (`float`, *optional*, defaults to 1.0):
true_cfg_scale (`float`, *optional*, defaults to 1.0):
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
Guidance scale as defined in [Classifier-Free Diffusion
Guidance](https://huggingface.co/papers/2207.12598). `true_cfg_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Classifier-free guidance is enabled by
setting `true_cfg_scale > 1` and a provided `negative_prompt`. Higher guidance scale encourages to
generate images that are closely linked to the text `prompt`, usually at the expense of lower image
quality.
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The height in pixels of the generated image. This is set to 1024 by default for the best results.
The height in pixels of the generated image. This is set to 1024 by default for the best results.
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
...
@@ -474,17 +479,16 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -474,17 +479,16 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
will be used.
will be used.
guidance_scale (`float`, *optional*, defaults to 3.5):
guidance_scale (`float`, *optional*, defaults to None):
Guidance scale as defined in [Classifier-Free Diffusion
A guidance scale value for guidance distilled models. Unlike the traditional classifier-free guidance
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
where the guidance scale is applied during inference through noise prediction rescaling, guidance
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
distilled models take the guidance scale directly as an input parameter during forward pass. Guidance
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images
the text `prompt`, usually at the expense of lower image quality.
that are closely linked to the text `prompt`, usually at the expense of lower image quality. This
parameter in the pipeline is there to support future guidance-distilled models when they come up. It is
This parameter in the pipeline is there to support future guidance-distilled models when they come up.
ignored when not using guidance distilled models. To enable traditional classifier-free guidance,
Note that passing `guidance_scale` to the pipeline is ineffective. To enable classifier-free guidance,
please pass `true_cfg_scale > 1.0` and `negative_prompt` (even an empty negative prompt like " " should
please pass `true_cfg_scale` and `negative_prompt` (even an empty negative prompt like " ") should
enable classifier-free guidance computations).
enable classifier-free guidance computations.
num_images_per_prompt (`int`, *optional*, defaults to 1):
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
...
@@ -564,6 +568,16 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -564,6 +568,16 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
has_neg_prompt
=
negative_prompt
is
not
None
or
(
has_neg_prompt
=
negative_prompt
is
not
None
or
(
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
)
)
if
true_cfg_scale
>
1
and
not
has_neg_prompt
:
logger
.
warning
(
f
"true_cfg_scale is passed as
{
true_cfg_scale
}
, but classifier-free guidance is not enabled since no negative_prompt is provided."
)
elif
true_cfg_scale
<=
1
and
has_neg_prompt
:
logger
.
warning
(
" negative_prompt is passed but classifier-free guidance is not enabled since true_cfg_scale <= 1"
)
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt
=
prompt
,
prompt
=
prompt
,
...
@@ -618,10 +632,17 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -618,10 +632,17 @@ class QwenImagePipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
self
.
_num_timesteps
=
len
(
timesteps
)
self
.
_num_timesteps
=
len
(
timesteps
)
# handle guidance
# handle guidance
if
self
.
transformer
.
config
.
guidance_embeds
:
if
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
raise
ValueError
(
"guidance_scale is required for guidance-distilled model."
)
elif
self
.
transformer
.
config
.
guidance_embeds
:
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
else
:
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
not
None
:
logger
.
warning
(
f
"guidance_scale is passed as
{
guidance_scale
}
, but ignored since the model is not guidance-distilled."
)
guidance
=
None
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
guidance
=
None
guidance
=
None
if
self
.
attention_kwargs
is
None
:
if
self
.
attention_kwargs
is
None
:
...
...
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_controlnet.py
View file @
865ba102
...
@@ -535,7 +535,7 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -535,7 +535,7 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
width
:
Optional
[
int
]
=
None
,
width
:
Optional
[
int
]
=
None
,
num_inference_steps
:
int
=
50
,
num_inference_steps
:
int
=
50
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
guidance_scale
:
float
=
1.0
,
guidance_scale
:
Optional
[
float
]
=
None
,
control_guidance_start
:
Union
[
float
,
List
[
float
]]
=
0.0
,
control_guidance_start
:
Union
[
float
,
List
[
float
]]
=
0.0
,
control_guidance_end
:
Union
[
float
,
List
[
float
]]
=
1.0
,
control_guidance_end
:
Union
[
float
,
List
[
float
]]
=
1.0
,
control_image
:
PipelineImageInput
=
None
,
control_image
:
PipelineImageInput
=
None
,
...
@@ -566,7 +566,12 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -566,7 +566,12 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `true_cfg_scale` is
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `true_cfg_scale` is
not greater than `1`).
not greater than `1`).
true_cfg_scale (`float`, *optional*, defaults to 1.0):
true_cfg_scale (`float`, *optional*, defaults to 1.0):
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
Guidance scale as defined in [Classifier-Free Diffusion
Guidance](https://huggingface.co/papers/2207.12598). `true_cfg_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Classifier-free guidance is enabled by
setting `true_cfg_scale > 1` and a provided `negative_prompt`. Higher guidance scale encourages to
generate images that are closely linked to the text `prompt`, usually at the expense of lower image
quality.
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The height in pixels of the generated image. This is set to 1024 by default for the best results.
The height in pixels of the generated image. This is set to 1024 by default for the best results.
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
...
@@ -578,12 +583,16 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -578,12 +583,16 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
will be used.
will be used.
guidance_scale (`float`, *optional*, defaults to 3.5):
guidance_scale (`float`, *optional*, defaults to None):
Guidance scale as defined in [Classifier-Free Diffusion
A guidance scale value for guidance distilled models. Unlike the traditional classifier-free guidance
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
where the guidance scale is applied during inference through noise prediction rescaling, guidance
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
distilled models take the guidance scale directly as an input parameter during forward pass. Guidance
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images
the text `prompt`, usually at the expense of lower image quality.
that are closely linked to the text `prompt`, usually at the expense of lower image quality. This
parameter in the pipeline is there to support future guidance-distilled models when they come up. It is
ignored when not using guidance distilled models. To enable traditional classifier-free guidance,
please pass `true_cfg_scale > 1.0` and `negative_prompt` (even an empty negative prompt like " " should
enable classifier-free guidance computations).
num_images_per_prompt (`int`, *optional*, defaults to 1):
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
...
@@ -674,6 +683,16 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -674,6 +683,16 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
has_neg_prompt
=
negative_prompt
is
not
None
or
(
has_neg_prompt
=
negative_prompt
is
not
None
or
(
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
)
)
if
true_cfg_scale
>
1
and
not
has_neg_prompt
:
logger
.
warning
(
f
"true_cfg_scale is passed as
{
true_cfg_scale
}
, but classifier-free guidance is not enabled since no negative_prompt is provided."
)
elif
true_cfg_scale
<=
1
and
has_neg_prompt
:
logger
.
warning
(
" negative_prompt is passed but classifier-free guidance is not enabled since true_cfg_scale <= 1"
)
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt
=
prompt
,
prompt
=
prompt
,
...
@@ -822,10 +841,17 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -822,10 +841,17 @@ class QwenImageControlNetPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
controlnet_keep
.
append
(
keeps
[
0
]
if
isinstance
(
self
.
controlnet
,
QwenImageControlNetModel
)
else
keeps
)
controlnet_keep
.
append
(
keeps
[
0
]
if
isinstance
(
self
.
controlnet
,
QwenImageControlNetModel
)
else
keeps
)
# handle guidance
# handle guidance
if
self
.
transformer
.
config
.
guidance_embeds
:
if
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
raise
ValueError
(
"guidance_scale is required for guidance-distilled model."
)
elif
self
.
transformer
.
config
.
guidance_embeds
:
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
else
:
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
not
None
:
logger
.
warning
(
f
"guidance_scale is passed as
{
guidance_scale
}
, but ignored since the model is not guidance-distilled."
)
guidance
=
None
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
guidance
=
None
guidance
=
None
if
self
.
attention_kwargs
is
None
:
if
self
.
attention_kwargs
is
None
:
...
...
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_edit.py
View file @
865ba102
...
@@ -532,7 +532,7 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -532,7 +532,7 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
width
:
Optional
[
int
]
=
None
,
width
:
Optional
[
int
]
=
None
,
num_inference_steps
:
int
=
50
,
num_inference_steps
:
int
=
50
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
guidance_scale
:
float
=
1.0
,
guidance_scale
:
Optional
[
float
]
=
None
,
num_images_per_prompt
:
int
=
1
,
num_images_per_prompt
:
int
=
1
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
...
@@ -559,7 +559,12 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -559,7 +559,12 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `true_cfg_scale` is
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `true_cfg_scale` is
not greater than `1`).
not greater than `1`).
true_cfg_scale (`float`, *optional*, defaults to 1.0):
true_cfg_scale (`float`, *optional*, defaults to 1.0):
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
true_cfg_scale (`float`, *optional*, defaults to 1.0): Guidance scale as defined in [Classifier-Free
Diffusion Guidance](https://huggingface.co/papers/2207.12598). `true_cfg_scale` is defined as `w` of
equation 2. of [Imagen Paper](https://huggingface.co/papers/2205.11487). Classifier-free guidance is
enabled by setting `true_cfg_scale > 1` and a provided `negative_prompt`. Higher guidance scale
encourages to generate images that are closely linked to the text `prompt`, usually at the expense of
lower image quality.
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The height in pixels of the generated image. This is set to 1024 by default for the best results.
The height in pixels of the generated image. This is set to 1024 by default for the best results.
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
...
@@ -571,17 +576,16 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -571,17 +576,16 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
will be used.
will be used.
guidance_scale (`float`, *optional*, defaults to 3.5):
guidance_scale (`float`, *optional*, defaults to None):
Guidance scale as defined in [Classifier-Free Diffusion
A guidance scale value for guidance distilled models. Unlike the traditional classifier-free guidance
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
where the guidance scale is applied during inference through noise prediction rescaling, guidance
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
distilled models take the guidance scale directly as an input parameter during forward pass. Guidance
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images
the text `prompt`, usually at the expense of lower image quality.
that are closely linked to the text `prompt`, usually at the expense of lower image quality. This
parameter in the pipeline is there to support future guidance-distilled models when they come up. It is
This parameter in the pipeline is there to support future guidance-distilled models when they come up.
ignored when not using guidance distilled models. To enable traditional classifier-free guidance,
Note that passing `guidance_scale` to the pipeline is ineffective. To enable classifier-free guidance,
please pass `true_cfg_scale > 1.0` and `negative_prompt` (even an empty negative prompt like " " should
please pass `true_cfg_scale` and `negative_prompt` (even an empty negative prompt like " ") should
enable classifier-free guidance computations).
enable classifier-free guidance computations.
num_images_per_prompt (`int`, *optional*, defaults to 1):
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
...
@@ -672,6 +676,16 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -672,6 +676,16 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
has_neg_prompt
=
negative_prompt
is
not
None
or
(
has_neg_prompt
=
negative_prompt
is
not
None
or
(
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
)
)
if
true_cfg_scale
>
1
and
not
has_neg_prompt
:
logger
.
warning
(
f
"true_cfg_scale is passed as
{
true_cfg_scale
}
, but classifier-free guidance is not enabled since no negative_prompt is provided."
)
elif
true_cfg_scale
<=
1
and
has_neg_prompt
:
logger
.
warning
(
" negative_prompt is passed but classifier-free guidance is not enabled since true_cfg_scale <= 1"
)
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
image
=
prompt_image
,
image
=
prompt_image
,
...
@@ -734,10 +748,17 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -734,10 +748,17 @@ class QwenImageEditPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
self
.
_num_timesteps
=
len
(
timesteps
)
self
.
_num_timesteps
=
len
(
timesteps
)
# handle guidance
# handle guidance
if
self
.
transformer
.
config
.
guidance_embeds
:
if
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
raise
ValueError
(
"guidance_scale is required for guidance-distilled model."
)
elif
self
.
transformer
.
config
.
guidance_embeds
:
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
else
:
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
not
None
:
logger
.
warning
(
f
"guidance_scale is passed as
{
guidance_scale
}
, but ignored since the model is not guidance-distilled."
)
guidance
=
None
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
guidance
=
None
guidance
=
None
if
self
.
attention_kwargs
is
None
:
if
self
.
attention_kwargs
is
None
:
...
...
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_img2img.py
View file @
865ba102
...
@@ -511,7 +511,7 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -511,7 +511,7 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
strength
:
float
=
0.6
,
strength
:
float
=
0.6
,
num_inference_steps
:
int
=
50
,
num_inference_steps
:
int
=
50
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
guidance_scale
:
float
=
1.0
,
guidance_scale
:
Optional
[
float
]
=
None
,
num_images_per_prompt
:
int
=
1
,
num_images_per_prompt
:
int
=
1
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
...
@@ -544,7 +544,12 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -544,7 +544,12 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
latents as `image`, but if passing latents directly it is not encoded again.
latents as `image`, but if passing latents directly it is not encoded again.
true_cfg_scale (`float`, *optional*, defaults to 1.0):
true_cfg_scale (`float`, *optional*, defaults to 1.0):
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
Guidance scale as defined in [Classifier-Free Diffusion
Guidance](https://huggingface.co/papers/2207.12598). `true_cfg_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Classifier-free guidance is enabled by
setting `true_cfg_scale > 1` and a provided `negative_prompt`. Higher guidance scale encourages to
generate images that are closely linked to the text `prompt`, usually at the expense of lower image
quality.
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The height in pixels of the generated image. This is set to 1024 by default for the best results.
The height in pixels of the generated image. This is set to 1024 by default for the best results.
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
...
@@ -562,17 +567,16 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -562,17 +567,16 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
will be used.
will be used.
guidance_scale (`float`, *optional*, defaults to 3.5):
guidance_scale (`float`, *optional*, defaults to None):
Guidance scale as defined in [Classifier-Free Diffusion
A guidance scale value for guidance distilled models. Unlike the traditional classifier-free guidance
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
where the guidance scale is applied during inference through noise prediction rescaling, guidance
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
distilled models take the guidance scale directly as an input parameter during forward pass. Guidance
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images
the text `prompt`, usually at the expense of lower image quality.
that are closely linked to the text `prompt`, usually at the expense of lower image quality. This
parameter in the pipeline is there to support future guidance-distilled models when they come up. It is
This parameter in the pipeline is there to support future guidance-distilled models when they come up.
ignored when not using guidance distilled models. To enable traditional classifier-free guidance,
Note that passing `guidance_scale` to the pipeline is ineffective. To enable classifier-free guidance,
please pass `true_cfg_scale > 1.0` and `negative_prompt` (even an empty negative prompt like " " should
please pass `true_cfg_scale` and `negative_prompt` (even an empty negative prompt like " ") should
enable classifier-free guidance computations).
enable classifier-free guidance computations.
num_images_per_prompt (`int`, *optional*, defaults to 1):
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
...
@@ -657,6 +661,16 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -657,6 +661,16 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
has_neg_prompt
=
negative_prompt
is
not
None
or
(
has_neg_prompt
=
negative_prompt
is
not
None
or
(
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
)
)
if
true_cfg_scale
>
1
and
not
has_neg_prompt
:
logger
.
warning
(
f
"true_cfg_scale is passed as
{
true_cfg_scale
}
, but classifier-free guidance is not enabled since no negative_prompt is provided."
)
elif
true_cfg_scale
<=
1
and
has_neg_prompt
:
logger
.
warning
(
" negative_prompt is passed but classifier-free guidance is not enabled since true_cfg_scale <= 1"
)
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt
=
prompt
,
prompt
=
prompt
,
...
@@ -721,10 +735,17 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -721,10 +735,17 @@ class QwenImageImg2ImgPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
self
.
_num_timesteps
=
len
(
timesteps
)
self
.
_num_timesteps
=
len
(
timesteps
)
# handle guidance
# handle guidance
if
self
.
transformer
.
config
.
guidance_embeds
:
if
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
raise
ValueError
(
"guidance_scale is required for guidance-distilled model."
)
elif
self
.
transformer
.
config
.
guidance_embeds
:
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
else
:
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
not
None
:
logger
.
warning
(
f
"guidance_scale is passed as
{
guidance_scale
}
, but ignored since the model is not guidance-distilled."
)
guidance
=
None
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
guidance
=
None
guidance
=
None
if
self
.
attention_kwargs
is
None
:
if
self
.
attention_kwargs
is
None
:
...
...
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_inpaint.py
View file @
865ba102
...
@@ -624,7 +624,7 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -624,7 +624,7 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
strength
:
float
=
0.6
,
strength
:
float
=
0.6
,
num_inference_steps
:
int
=
50
,
num_inference_steps
:
int
=
50
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
sigmas
:
Optional
[
List
[
float
]]
=
None
,
guidance_scale
:
float
=
1.0
,
guidance_scale
:
Optional
[
float
]
=
None
,
num_images_per_prompt
:
int
=
1
,
num_images_per_prompt
:
int
=
1
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
generator
:
Optional
[
Union
[
torch
.
Generator
,
List
[
torch
.
Generator
]]]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
latents
:
Optional
[
torch
.
Tensor
]
=
None
,
...
@@ -657,7 +657,12 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -657,7 +657,12 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
latents as `image`, but if passing latents directly it is not encoded again.
latents as `image`, but if passing latents directly it is not encoded again.
true_cfg_scale (`float`, *optional*, defaults to 1.0):
true_cfg_scale (`float`, *optional*, defaults to 1.0):
When > 1.0 and a provided `negative_prompt`, enables true classifier-free guidance.
Guidance scale as defined in [Classifier-Free Diffusion
Guidance](https://huggingface.co/papers/2207.12598). `true_cfg_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Classifier-free guidance is enabled by
setting `true_cfg_scale > 1` and a provided `negative_prompt`. Higher guidance scale encourages to
generate images that are closely linked to the text `prompt`, usually at the expense of lower image
quality.
mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask
`Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask
are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a
are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a
...
@@ -692,17 +697,16 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -692,17 +697,16 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
will be used.
will be used.
guidance_scale (`float`, *optional*, defaults to 3.5):
guidance_scale (`float`, *optional*, defaults to None):
Guidance scale as defined in [Classifier-Free Diffusion
A guidance scale value for guidance distilled models. Unlike the traditional classifier-free guidance
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
where the guidance scale is applied during inference through noise prediction rescaling, guidance
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
distilled models take the guidance scale directly as an input parameter during forward pass. Guidance
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images
the text `prompt`, usually at the expense of lower image quality.
that are closely linked to the text `prompt`, usually at the expense of lower image quality. This
parameter in the pipeline is there to support future guidance-distilled models when they come up. It is
This parameter in the pipeline is there to support future guidance-distilled models when they come up.
ignored when not using guidance distilled models. To enable traditional classifier-free guidance,
Note that passing `guidance_scale` to the pipeline is ineffective. To enable classifier-free guidance,
please pass `true_cfg_scale > 1.0` and `negative_prompt` (even an empty negative prompt like " " should
please pass `true_cfg_scale` and `negative_prompt` (even an empty negative prompt like " ") should
enable classifier-free guidance computations).
enable classifier-free guidance computations.
num_images_per_prompt (`int`, *optional*, defaults to 1):
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
...
@@ -801,6 +805,16 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -801,6 +805,16 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
has_neg_prompt
=
negative_prompt
is
not
None
or
(
has_neg_prompt
=
negative_prompt
is
not
None
or
(
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
negative_prompt_embeds
is
not
None
and
negative_prompt_embeds_mask
is
not
None
)
)
if
true_cfg_scale
>
1
and
not
has_neg_prompt
:
logger
.
warning
(
f
"true_cfg_scale is passed as
{
true_cfg_scale
}
, but classifier-free guidance is not enabled since no negative_prompt is provided."
)
elif
true_cfg_scale
<=
1
and
has_neg_prompt
:
logger
.
warning
(
" negative_prompt is passed but classifier-free guidance is not enabled since true_cfg_scale <= 1"
)
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
do_true_cfg
=
true_cfg_scale
>
1
and
has_neg_prompt
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt_embeds
,
prompt_embeds_mask
=
self
.
encode_prompt
(
prompt
=
prompt
,
prompt
=
prompt
,
...
@@ -890,10 +904,17 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
...
@@ -890,10 +904,17 @@ class QwenImageInpaintPipeline(DiffusionPipeline, QwenImageLoraLoaderMixin):
self
.
_num_timesteps
=
len
(
timesteps
)
self
.
_num_timesteps
=
len
(
timesteps
)
# handle guidance
# handle guidance
if
self
.
transformer
.
config
.
guidance_embeds
:
if
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
raise
ValueError
(
"guidance_scale is required for guidance-distilled model."
)
elif
self
.
transformer
.
config
.
guidance_embeds
:
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
torch
.
full
([
1
],
guidance_scale
,
device
=
device
,
dtype
=
torch
.
float32
)
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
guidance
=
guidance
.
expand
(
latents
.
shape
[
0
])
else
:
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
not
None
:
logger
.
warning
(
f
"guidance_scale is passed as
{
guidance_scale
}
, but ignored since the model is not guidance-distilled."
)
guidance
=
None
elif
not
self
.
transformer
.
config
.
guidance_embeds
and
guidance_scale
is
None
:
guidance
=
None
guidance
=
None
if
self
.
attention_kwargs
is
None
:
if
self
.
attention_kwargs
is
None
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment