HiDream Image (#11231)

* HiDream Image --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aryan <contact.aryanvs@gmail.com> Co-authored-by: Aryan <aryan@huggingface.co>

HiDream Image (#11231)
* HiDream Image --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aryan <contact.aryanvs@gmail.com> Co-authored-by: Aryan <aryan@huggingface.co>
0ef29355 · hlky · GitHub · bc261058 · 0ef29355 · 0ef29355
Unverified Commit 0ef29355 authored Apr 11, 2025 by hlky Committed by GitHub Apr 11, 2025
15 changed files
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -175,7 +175,7 @@
    title: gguf
  - local: quantization/torchao
    title: torchao
-  - local: quantization/quanto 
+  - local: quantization/quanto
    title: quanto
  title: Quantization Methods
 - sections:
@@ -300,6 +300,8 @@
        title: EasyAnimateTransformer3DModel
      - local: api/models/flux_transformer
        title: FluxTransformer2DModel
+      - local: api/models/hidream_image_transformer
+        title: HiDreamImageTransformer2DModel
      - local: api/models/hunyuan_transformer2d
        title: HunyuanDiT2DModel
      - local: api/models/hunyuan_video_transformer_3d
@@ -446,6 +448,8 @@
      title: Flux
    - local: api/pipelines/control_flux_inpaint
      title: FluxControlInpaint
+    - local: api/pipelines/hidream
+      title: HiDream-I1
    - local: api/pipelines/hunyuandit
      title: Hunyuan-DiT
    - local: api/pipelines/hunyuan_video

--- a/docs/source/en/api/models/hidream_image_transformer.md
+++ b/docs/source/en/api/models/hidream_image_transformer.md
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+# HiDreamImageTransformer2DModel
+A Transformer model for image-like data from [HiDream-I1](https://huggingface.co/HiDream-ai).
+The model can be loaded with the following code snippet.
+```python
+from diffusers import HiDreamImageTransformer2DModel
+transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)
+```
+## HiDreamImageTransformer2DModel
+[[autodoc]] HiDreamImageTransformer2DModel
+## Transformer2DModelOutput
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/pipelines/hidream.md
+++ b/docs/source/en/api/pipelines/hidream.md
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License. -->
+# HiDreamImage
+[HiDream-I1](https://huggingface.co/HiDream-ai) by HiDream.ai
+<Tip>
+Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
+</Tip>
+## Available models
+The following models are available for the [`HiDreamImagePipeline`](text-to-image) pipeline:
+| Model name | Description |
+|:---|:---|
+| [`HiDream-ai/HiDream-I1-Full`](https://huggingface.co/HiDream-ai/HiDream-I1-Full) | - |
+| [`HiDream-ai/HiDream-I1-Dev`](https://huggingface.co/HiDream-ai/HiDream-I1-Dev) | - |
+| [`HiDream-ai/HiDream-I1-Fast`](https://huggingface.co/HiDream-ai/HiDream-I1-Fast) | - |
+## HiDreamImagePipeline
+[[autodoc]] HiDreamImagePipeline
+  - all
+  - __call__
+## HiDreamImagePipelineOutput
+[[autodoc]] pipelines.hidream_image.pipeline_output.HiDreamImagePipelineOutput
--- a/src/diffusers/__init__.py
+++ b/src/diffusers/__init__.py
@@ -171,6 +171,7 @@ else:
            "FluxControlNetModel",
            "FluxMultiControlNetModel",
            "FluxTransformer2DModel",
+            "HiDreamImageTransformer2DModel",
            "HunyuanDiT2DControlNetModel",
            "HunyuanDiT2DModel",
            "HunyuanDiT2DMultiControlNetModel",
@@ -368,6 +369,7 @@ else:
            "FluxInpaintPipeline",
            "FluxPipeline",
            "FluxPriorReduxPipeline",
+            "HiDreamImagePipeline",
            "HunyuanDiTControlNetPipeline",
            "HunyuanDiTPAGPipeline",
            "HunyuanDiTPipeline",
@@ -761,6 +763,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
            FluxControlNetModel,
            FluxMultiControlNetModel,
            FluxTransformer2DModel,
+            HiDreamImageTransformer2DModel,
            HunyuanDiT2DControlNetModel,
            HunyuanDiT2DModel,
            HunyuanDiT2DMultiControlNetModel,
@@ -937,6 +940,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
            FluxInpaintPipeline,
            FluxPipeline,
            FluxPriorReduxPipeline,
+            HiDreamImagePipeline,
            HunyuanDiTControlNetPipeline,
            HunyuanDiTPAGPipeline,
            HunyuanDiTPipeline,

--- a/src/diffusers/models/__init__.py
+++ b/src/diffusers/models/__init__.py
@@ -76,6 +76,7 @@ if is_torch_available():
    _import_structure["transformers.transformer_cogview4"] = ["CogView4Transformer2DModel"]
    _import_structure["transformers.transformer_easyanimate"] = ["EasyAnimateTransformer3DModel"]
    _import_structure["transformers.transformer_flux"] = ["FluxTransformer2DModel"]
+    _import_structure["transformers.transformer_hidream_image"] = ["HiDreamImageTransformer2DModel"]
    _import_structure["transformers.transformer_hunyuan_video"] = ["HunyuanVideoTransformer3DModel"]
    _import_structure["transformers.transformer_ltx"] = ["LTXVideoTransformer3DModel"]
    _import_structure["transformers.transformer_lumina2"] = ["Lumina2Transformer2DModel"]
@@ -151,6 +152,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
            DualTransformer2DModel,
            EasyAnimateTransformer3DModel,
            FluxTransformer2DModel,
+            HiDreamImageTransformer2DModel,
            HunyuanDiT2DModel,
            HunyuanVideoTransformer3DModel,
            LatteTransformer3DModel,

--- a/src/diffusers/models/transformers/__init__.py
+++ b/src/diffusers/models/transformers/__init__.py
@@ -21,6 +21,7 @@ if is_torch_available():
    from .transformer_cogview4 import CogView4Transformer2DModel
    from .transformer_easyanimate import EasyAnimateTransformer3DModel
    from .transformer_flux import FluxTransformer2DModel
+    from .transformer_hidream_image import HiDreamImageTransformer2DModel
    from .transformer_hunyuan_video import HunyuanVideoTransformer3DModel
    from .transformer_ltx import LTXVideoTransformer3DModel
    from .transformer_lumina2 import Lumina2Transformer2DModel

--- a/src/diffusers/models/transformers/transformer_hidream_image.py
+++ b/src/diffusers/models/transformers/transformer_hidream_image.py
--- a/src/diffusers/pipelines/__init__.py
+++ b/src/diffusers/pipelines/__init__.py
@@ -221,6 +221,7 @@ else:
        "EasyAnimateInpaintPipeline",
        "EasyAnimateControlPipeline",
    ]
+    _import_structure["hidream_image"] = ["HiDreamImagePipeline"]
    _import_structure["hunyuandit"] = ["HunyuanDiTPipeline"]
    _import_structure["hunyuan_video"] = [
        "HunyuanVideoPipeline",
@@ -585,6 +586,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
            FluxPriorReduxPipeline,
            ReduxImageEncoder,
        )
+        from .hidream_image import HiDreamImagePipeline
        from .hunyuan_video import (
            HunyuanSkyreelsImageToVideoPipeline,
            HunyuanVideoImageToVideoPipeline,

--- a/src/diffusers/pipelines/hidream_image/__init__.py
+++ b/src/diffusers/pipelines/hidream_image/__init__.py
+from typing import TYPE_CHECKING
+from ...utils import (
+    DIFFUSERS_SLOW_IMPORT,
+    OptionalDependencyNotAvailable,
+    _LazyModule,
+    get_objects_from_module,
+    is_torch_available,
+    is_transformers_available,
+)
+_dummy_objects = {}
+_additional_imports = {}
+_import_structure = {"pipeline_output": ["HiDreamImagePipelineOutput"]}
+try:
+    if not (is_transformers_available() and is_torch_available()):
+        raise OptionalDependencyNotAvailable()
+except OptionalDependencyNotAvailable:
+    from ...utils import dummy_torch_and_transformers_objects  # noqa F403
+    _dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
+else:
+    _import_structure["pipeline_hidream_image"] = ["HiDreamImagePipeline"]
+if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
+    try:
+        if not (is_transformers_available() and is_torch_available()):
+            raise OptionalDependencyNotAvailable()
+    except OptionalDependencyNotAvailable:
+        from ...utils.dummy_torch_and_transformers_objects import *  # noqa F403
+    else:
+        from .pipeline_hidream_image import HiDreamImagePipeline
+else:
+    import sys
+    sys.modules[__name__] = _LazyModule(
+        __name__,
+        globals()["__file__"],
+        _import_structure,
+        module_spec=__spec__,
+    )
+    for name, value in _dummy_objects.items():
+        setattr(sys.modules[__name__], name, value)
+    for name, value in _additional_imports.items():
+        setattr(sys.modules[__name__], name, value)
--- a/src/diffusers/pipelines/hidream_image/pipeline_hidream_image.py
+++ b/src/diffusers/pipelines/hidream_image/pipeline_hidream_image.py
--- a/src/diffusers/pipelines/hidream_image/pipeline_output.py
+++ b/src/diffusers/pipelines/hidream_image/pipeline_output.py
+from dataclasses import dataclass
+from typing import List, Union
+import numpy as np
+import PIL.Image
+from ...utils import BaseOutput
+@dataclass
+class HiDreamImagePipelineOutput(BaseOutput):
+    """
+    Output class for HiDreamImage pipelines.
+    Args:
+        images (`List[PIL.Image.Image]` or `np.ndarray`)
+            List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width,
+            num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline.
+    """
+    images: Union[List[PIL.Image.Image], np.ndarray]
--- a/src/diffusers/utils/dummy_pt_objects.py
+++ b/src/diffusers/utils/dummy_pt_objects.py
@@ -505,6 +505,21 @@ class FluxTransformer2DModel(metaclass=DummyObject):
        requires_backends(cls, ["torch"])
+class HiDreamImageTransformer2DModel(metaclass=DummyObject):
+    _backends = ["torch"]
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+    @classmethod
+    def from_config(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
 class HunyuanDiT2DControlNetModel(metaclass=DummyObject):
    _backends = ["torch"]

--- a/src/diffusers/utils/dummy_torch_and_transformers_objects.py
+++ b/src/diffusers/utils/dummy_torch_and_transformers_objects.py
@@ -617,6 +617,21 @@ class FluxPriorReduxPipeline(metaclass=DummyObject):
        requires_backends(cls, ["torch", "transformers"])
+class HiDreamImagePipeline(metaclass=DummyObject):
+    _backends = ["torch", "transformers"]
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch", "transformers"])
+    @classmethod
+    def from_config(cls, *args, **kwargs):
+        requires_backends(cls, ["torch", "transformers"])
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch", "transformers"])
 class HunyuanDiTControlNetPipeline(metaclass=DummyObject):
    _backends = ["torch", "transformers"]

--- a/tests/pipelines/hidream/__init__.py
+++ b/tests/pipelines/hidream/__init__.py
--- a/tests/pipelines/hidream/test_pipeline_hidream.py
+++ b/tests/pipelines/hidream/test_pipeline_hidream.py
+# coding=utf-8
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import unittest
+import numpy as np
+import torch
+from transformers import (
+    AutoTokenizer,
+    CLIPTextConfig,
+    CLIPTextModelWithProjection,
+    CLIPTokenizer,
+    LlamaForCausalLM,
+    T5EncoderModel,
+)
+from diffusers import (
+    AutoencoderKL,
+    FlowMatchEulerDiscreteScheduler,
+    HiDreamImagePipeline,
+    HiDreamImageTransformer2DModel,
+)
+from diffusers.utils.testing_utils import enable_full_determinism
+from ..pipeline_params import TEXT_TO_IMAGE_BATCH_PARAMS, TEXT_TO_IMAGE_IMAGE_PARAMS, TEXT_TO_IMAGE_PARAMS
+from ..test_pipelines_common import PipelineTesterMixin
+enable_full_determinism()
+class HiDreamImagePipelineFastTests(PipelineTesterMixin, unittest.TestCase):
+    pipeline_class = HiDreamImagePipeline
+    params = TEXT_TO_IMAGE_PARAMS - {"cross_attention_kwargs"}
+    batch_params = TEXT_TO_IMAGE_BATCH_PARAMS
+    image_params = TEXT_TO_IMAGE_IMAGE_PARAMS
+    image_latents_params = TEXT_TO_IMAGE_IMAGE_PARAMS
+    required_optional_params = PipelineTesterMixin.required_optional_params
+    test_layerwise_casting = True
+    supports_dduf = False
+    def get_dummy_components(self):
+        torch.manual_seed(0)
+        transformer = HiDreamImageTransformer2DModel(
+            patch_size=2,
+            in_channels=4,
+            out_channels=4,
+            num_layers=1,
+            num_single_layers=1,
+            attention_head_dim=8,
+            num_attention_heads=4,
+            caption_channels=[32, 16],
+            text_emb_dim=64,
+            num_routed_experts=4,
+            num_activated_experts=2,
+            axes_dims_rope=(4, 2, 2),
+            max_resolution=(32, 32),
+            llama_layers=(0, 1),
+        ).eval()
+        torch.manual_seed(0)
+        vae = AutoencoderKL(scaling_factor=0.3611, shift_factor=0.1159)
+        clip_text_encoder_config = CLIPTextConfig(
+            bos_token_id=0,
+            eos_token_id=2,
+            hidden_size=32,
+            intermediate_size=37,
+            layer_norm_eps=1e-05,
+            num_attention_heads=4,
+            num_hidden_layers=5,
+            pad_token_id=1,
+            vocab_size=1000,
+            hidden_act="gelu",
+            projection_dim=32,
+            max_position_embeddings=128,
+        )
+        torch.manual_seed(0)
+        text_encoder = CLIPTextModelWithProjection(clip_text_encoder_config)
+        torch.manual_seed(0)
+        text_encoder_2 = CLIPTextModelWithProjection(clip_text_encoder_config)
+        torch.manual_seed(0)
+        text_encoder_3 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
+        torch.manual_seed(0)
+        text_encoder_4 = LlamaForCausalLM.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM")
+        text_encoder_4.generation_config.pad_token_id = 1
+        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
+        tokenizer_2 = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
+        tokenizer_3 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
+        tokenizer_4 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM")
+        scheduler = FlowMatchEulerDiscreteScheduler()
+        components = {
+            "scheduler": scheduler,
+            "vae": vae,
+            "text_encoder": text_encoder,
+            "tokenizer": tokenizer,
+            "text_encoder_2": text_encoder_2,
+            "tokenizer_2": tokenizer_2,
+            "text_encoder_3": text_encoder_3,
+            "tokenizer_3": tokenizer_3,
+            "text_encoder_4": text_encoder_4,
+            "tokenizer_4": tokenizer_4,
+            "transformer": transformer,
+        }
+        return components
+    def get_dummy_inputs(self, device, seed=0):
+        if str(device).startswith("mps"):
+            generator = torch.manual_seed(seed)
+        else:
+            generator = torch.Generator(device=device).manual_seed(seed)
+        inputs = {
+            "prompt": "A painting of a squirrel eating a burger",
+            "generator": generator,
+            "num_inference_steps": 2,
+            "guidance_scale": 5.0,
+            "output_type": "np",
+        }
+        return inputs
+    def test_inference(self):
+        device = "cpu"
+        components = self.get_dummy_components()
+        pipe = self.pipeline_class(**components)
+        pipe.to(device)
+        pipe.set_progress_bar_config(disable=None)
+        inputs = self.get_dummy_inputs(device)
+        image = pipe(**inputs)[0]
+        generated_image = image[0]
+        self.assertEqual(generated_image.shape, (128, 128, 3))
+        expected_image = torch.randn(128, 128, 3).numpy()
+        max_diff = np.abs(generated_image - expected_image).max()
+        self.assertLessEqual(max_diff, 1e10)
+    def test_inference_batch_single_identical(self):
+        super().test_inference_batch_single_identical(expected_max_diff=3e-4)