hunyuanimage21.md

<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->

# HunyuanImage2.1


HunyuanImage-2.1 is a 17B text-to-image model that is capable of generating 2K (2048 x 2048) resolution images

HunyuanImage-2.1 comes in the following variants:

| model type | model id |
|:----------:|:--------:|
| HunyuanImage-2.1 | [hunyuanvideo-community/HunyuanImage-2.1-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Diffusers) |
| HunyuanImage-2.1-Distilled | [hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers) |
| HunyuanImage-2.1-Refiner | [hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers) |

> [!TIP]
> [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.

## HunyuanImage-2.1

HunyuanImage-2.1 applies [Adaptive Projected Guidance (APG)](https://huggingface.co/papers/2410.02416) combined with Classifier-Free Guidance (CFG) in the denoising loop. `HunyuanImagePipeline` has a `guider` component (read more about [Guider](../modular_diffusers/guiders.md)) and does not take a `guidance_scale` parameter at runtime. To change guider-related parameters, e.g., `guidance_scale`, you can update the `guider` configuration instead.

```python
import torch
from diffusers import HunyuanImagePipeline

pipe = HunyuanImagePipeline.from_pretrained(
    "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", 
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")
``` 

You can inspect the `guider` object:

```py
>>> pipe.guider
AdaptiveProjectedMixGuidance {
  "_class_name": "AdaptiveProjectedMixGuidance",
  "_diffusers_version": "0.36.0.dev0",
  "adaptive_projected_guidance_momentum": -0.5,
  "adaptive_projected_guidance_rescale": 10.0,
  "adaptive_projected_guidance_scale": 10.0,
  "adaptive_projected_guidance_start_step": 5,
  "enabled": true,
  "eta": 0.0,
  "guidance_rescale": 0.0,
  "guidance_scale": 3.5,
  "start": 0.0,
  "stop": 1.0,
  "use_original_formulation": false
}

State:
  step: None
  num_inference_steps: None
  timestep: None
  count_prepared: 0
  enabled: True
  num_conditions: 2
  momentum_buffer: None
  is_apg_enabled: False
  is_cfg_enabled: True
```

To update the guider with a different configuration, use the `new()` method. For example, to generate an image with `guidance_scale=5.0` while keeping all other default guidance parameters:

```py
import torch
from diffusers import HunyuanImagePipeline

pipe = HunyuanImagePipeline.from_pretrained(
    "hunyuanvideo-community/HunyuanImage-2.1-Diffusers", 
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

# Update the guider configuration
pipe.guider = pipe.guider.new(guidance_scale=5.0)

prompt = (
    "A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
    "wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
    "focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
)

image = pipe(
    prompt=prompt, 
    num_inference_steps=50, 
    height=2048, 
    width=2048,
).images[0]
image.save("image.png")
```


## HunyuanImage-2.1-Distilled

use `distilled_guidance_scale` with the guidance-distilled checkpoint, 

```py
import torch
from diffusers import HunyuanImagePipeline
pipe = HunyuanImagePipeline.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

prompt = (
    "A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
    "wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
    "focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
)

out = pipe(
    prompt,
    num_inference_steps=8,
    distilled_guidance_scale=3.25,
    height=2048,
    width=2048,
    generator=generator,
).images[0]

```


## HunyuanImagePipeline

[[autodoc]] HunyuanImagePipeline
  - all
  - __call__

## HunyuanImageRefinerPipeline

[[autodoc]] HunyuanImageRefinerPipeline
  - all
  - __call__


## HunyuanImagePipelineOutput

[[autodoc]] pipelines.hunyuan_image.pipeline_output.HunyuanImagePipelineOutput