Unverified Commit 0ef29355 authored by hlky's avatar hlky Committed by GitHub
Browse files

HiDream Image (#11231)



* HiDream Image


---------
Co-authored-by: default avatargithub-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: default avatarAryan <contact.aryanvs@gmail.com>
Co-authored-by: default avatarAryan <aryan@huggingface.co>
parent bc261058
...@@ -175,7 +175,7 @@ ...@@ -175,7 +175,7 @@
title: gguf title: gguf
- local: quantization/torchao - local: quantization/torchao
title: torchao title: torchao
- local: quantization/quanto - local: quantization/quanto
title: quanto title: quanto
title: Quantization Methods title: Quantization Methods
- sections: - sections:
...@@ -300,6 +300,8 @@ ...@@ -300,6 +300,8 @@
title: EasyAnimateTransformer3DModel title: EasyAnimateTransformer3DModel
- local: api/models/flux_transformer - local: api/models/flux_transformer
title: FluxTransformer2DModel title: FluxTransformer2DModel
- local: api/models/hidream_image_transformer
title: HiDreamImageTransformer2DModel
- local: api/models/hunyuan_transformer2d - local: api/models/hunyuan_transformer2d
title: HunyuanDiT2DModel title: HunyuanDiT2DModel
- local: api/models/hunyuan_video_transformer_3d - local: api/models/hunyuan_video_transformer_3d
...@@ -446,6 +448,8 @@ ...@@ -446,6 +448,8 @@
title: Flux title: Flux
- local: api/pipelines/control_flux_inpaint - local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint title: FluxControlInpaint
- local: api/pipelines/hidream
title: HiDream-I1
- local: api/pipelines/hunyuandit - local: api/pipelines/hunyuandit
title: Hunyuan-DiT title: Hunyuan-DiT
- local: api/pipelines/hunyuan_video - local: api/pipelines/hunyuan_video
......
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# HiDreamImageTransformer2DModel
A Transformer model for image-like data from [HiDream-I1](https://huggingface.co/HiDream-ai).
The model can be loaded with the following code snippet.
```python
from diffusers import HiDreamImageTransformer2DModel
transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## HiDreamImageTransformer2DModel
[[autodoc]] HiDreamImageTransformer2DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
# HiDreamImage
[HiDream-I1](https://huggingface.co/HiDream-ai) by HiDream.ai
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## Available models
The following models are available for the [`HiDreamImagePipeline`](text-to-image) pipeline:
| Model name | Description |
|:---|:---|
| [`HiDream-ai/HiDream-I1-Full`](https://huggingface.co/HiDream-ai/HiDream-I1-Full) | - |
| [`HiDream-ai/HiDream-I1-Dev`](https://huggingface.co/HiDream-ai/HiDream-I1-Dev) | - |
| [`HiDream-ai/HiDream-I1-Fast`](https://huggingface.co/HiDream-ai/HiDream-I1-Fast) | - |
## HiDreamImagePipeline
[[autodoc]] HiDreamImagePipeline
- all
- __call__
## HiDreamImagePipelineOutput
[[autodoc]] pipelines.hidream_image.pipeline_output.HiDreamImagePipelineOutput
...@@ -171,6 +171,7 @@ else: ...@@ -171,6 +171,7 @@ else:
"FluxControlNetModel", "FluxControlNetModel",
"FluxMultiControlNetModel", "FluxMultiControlNetModel",
"FluxTransformer2DModel", "FluxTransformer2DModel",
"HiDreamImageTransformer2DModel",
"HunyuanDiT2DControlNetModel", "HunyuanDiT2DControlNetModel",
"HunyuanDiT2DModel", "HunyuanDiT2DModel",
"HunyuanDiT2DMultiControlNetModel", "HunyuanDiT2DMultiControlNetModel",
...@@ -368,6 +369,7 @@ else: ...@@ -368,6 +369,7 @@ else:
"FluxInpaintPipeline", "FluxInpaintPipeline",
"FluxPipeline", "FluxPipeline",
"FluxPriorReduxPipeline", "FluxPriorReduxPipeline",
"HiDreamImagePipeline",
"HunyuanDiTControlNetPipeline", "HunyuanDiTControlNetPipeline",
"HunyuanDiTPAGPipeline", "HunyuanDiTPAGPipeline",
"HunyuanDiTPipeline", "HunyuanDiTPipeline",
...@@ -761,6 +763,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -761,6 +763,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
FluxControlNetModel, FluxControlNetModel,
FluxMultiControlNetModel, FluxMultiControlNetModel,
FluxTransformer2DModel, FluxTransformer2DModel,
HiDreamImageTransformer2DModel,
HunyuanDiT2DControlNetModel, HunyuanDiT2DControlNetModel,
HunyuanDiT2DModel, HunyuanDiT2DModel,
HunyuanDiT2DMultiControlNetModel, HunyuanDiT2DMultiControlNetModel,
...@@ -937,6 +940,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -937,6 +940,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
FluxInpaintPipeline, FluxInpaintPipeline,
FluxPipeline, FluxPipeline,
FluxPriorReduxPipeline, FluxPriorReduxPipeline,
HiDreamImagePipeline,
HunyuanDiTControlNetPipeline, HunyuanDiTControlNetPipeline,
HunyuanDiTPAGPipeline, HunyuanDiTPAGPipeline,
HunyuanDiTPipeline, HunyuanDiTPipeline,
......
...@@ -76,6 +76,7 @@ if is_torch_available(): ...@@ -76,6 +76,7 @@ if is_torch_available():
_import_structure["transformers.transformer_cogview4"] = ["CogView4Transformer2DModel"] _import_structure["transformers.transformer_cogview4"] = ["CogView4Transformer2DModel"]
_import_structure["transformers.transformer_easyanimate"] = ["EasyAnimateTransformer3DModel"] _import_structure["transformers.transformer_easyanimate"] = ["EasyAnimateTransformer3DModel"]
_import_structure["transformers.transformer_flux"] = ["FluxTransformer2DModel"] _import_structure["transformers.transformer_flux"] = ["FluxTransformer2DModel"]
_import_structure["transformers.transformer_hidream_image"] = ["HiDreamImageTransformer2DModel"]
_import_structure["transformers.transformer_hunyuan_video"] = ["HunyuanVideoTransformer3DModel"] _import_structure["transformers.transformer_hunyuan_video"] = ["HunyuanVideoTransformer3DModel"]
_import_structure["transformers.transformer_ltx"] = ["LTXVideoTransformer3DModel"] _import_structure["transformers.transformer_ltx"] = ["LTXVideoTransformer3DModel"]
_import_structure["transformers.transformer_lumina2"] = ["Lumina2Transformer2DModel"] _import_structure["transformers.transformer_lumina2"] = ["Lumina2Transformer2DModel"]
...@@ -151,6 +152,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -151,6 +152,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
DualTransformer2DModel, DualTransformer2DModel,
EasyAnimateTransformer3DModel, EasyAnimateTransformer3DModel,
FluxTransformer2DModel, FluxTransformer2DModel,
HiDreamImageTransformer2DModel,
HunyuanDiT2DModel, HunyuanDiT2DModel,
HunyuanVideoTransformer3DModel, HunyuanVideoTransformer3DModel,
LatteTransformer3DModel, LatteTransformer3DModel,
......
...@@ -21,6 +21,7 @@ if is_torch_available(): ...@@ -21,6 +21,7 @@ if is_torch_available():
from .transformer_cogview4 import CogView4Transformer2DModel from .transformer_cogview4 import CogView4Transformer2DModel
from .transformer_easyanimate import EasyAnimateTransformer3DModel from .transformer_easyanimate import EasyAnimateTransformer3DModel
from .transformer_flux import FluxTransformer2DModel from .transformer_flux import FluxTransformer2DModel
from .transformer_hidream_image import HiDreamImageTransformer2DModel
from .transformer_hunyuan_video import HunyuanVideoTransformer3DModel from .transformer_hunyuan_video import HunyuanVideoTransformer3DModel
from .transformer_ltx import LTXVideoTransformer3DModel from .transformer_ltx import LTXVideoTransformer3DModel
from .transformer_lumina2 import Lumina2Transformer2DModel from .transformer_lumina2 import Lumina2Transformer2DModel
......
This diff is collapsed.
...@@ -221,6 +221,7 @@ else: ...@@ -221,6 +221,7 @@ else:
"EasyAnimateInpaintPipeline", "EasyAnimateInpaintPipeline",
"EasyAnimateControlPipeline", "EasyAnimateControlPipeline",
] ]
_import_structure["hidream_image"] = ["HiDreamImagePipeline"]
_import_structure["hunyuandit"] = ["HunyuanDiTPipeline"] _import_structure["hunyuandit"] = ["HunyuanDiTPipeline"]
_import_structure["hunyuan_video"] = [ _import_structure["hunyuan_video"] = [
"HunyuanVideoPipeline", "HunyuanVideoPipeline",
...@@ -585,6 +586,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -585,6 +586,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
FluxPriorReduxPipeline, FluxPriorReduxPipeline,
ReduxImageEncoder, ReduxImageEncoder,
) )
from .hidream_image import HiDreamImagePipeline
from .hunyuan_video import ( from .hunyuan_video import (
HunyuanSkyreelsImageToVideoPipeline, HunyuanSkyreelsImageToVideoPipeline,
HunyuanVideoImageToVideoPipeline, HunyuanVideoImageToVideoPipeline,
......
from typing import TYPE_CHECKING
from ...utils import (
DIFFUSERS_SLOW_IMPORT,
OptionalDependencyNotAvailable,
_LazyModule,
get_objects_from_module,
is_torch_available,
is_transformers_available,
)
_dummy_objects = {}
_additional_imports = {}
_import_structure = {"pipeline_output": ["HiDreamImagePipelineOutput"]}
try:
if not (is_transformers_available() and is_torch_available()):
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
from ...utils import dummy_torch_and_transformers_objects # noqa F403
_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
else:
_import_structure["pipeline_hidream_image"] = ["HiDreamImagePipeline"]
if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
try:
if not (is_transformers_available() and is_torch_available()):
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
from ...utils.dummy_torch_and_transformers_objects import * # noqa F403
else:
from .pipeline_hidream_image import HiDreamImagePipeline
else:
import sys
sys.modules[__name__] = _LazyModule(
__name__,
globals()["__file__"],
_import_structure,
module_spec=__spec__,
)
for name, value in _dummy_objects.items():
setattr(sys.modules[__name__], name, value)
for name, value in _additional_imports.items():
setattr(sys.modules[__name__], name, value)
from dataclasses import dataclass
from typing import List, Union
import numpy as np
import PIL.Image
from ...utils import BaseOutput
@dataclass
class HiDreamImagePipelineOutput(BaseOutput):
"""
Output class for HiDreamImage pipelines.
Args:
images (`List[PIL.Image.Image]` or `np.ndarray`)
List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width,
num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline.
"""
images: Union[List[PIL.Image.Image], np.ndarray]
...@@ -505,6 +505,21 @@ class FluxTransformer2DModel(metaclass=DummyObject): ...@@ -505,6 +505,21 @@ class FluxTransformer2DModel(metaclass=DummyObject):
requires_backends(cls, ["torch"]) requires_backends(cls, ["torch"])
class HiDreamImageTransformer2DModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch"])
class HunyuanDiT2DControlNetModel(metaclass=DummyObject): class HunyuanDiT2DControlNetModel(metaclass=DummyObject):
_backends = ["torch"] _backends = ["torch"]
......
...@@ -617,6 +617,21 @@ class FluxPriorReduxPipeline(metaclass=DummyObject): ...@@ -617,6 +617,21 @@ class FluxPriorReduxPipeline(metaclass=DummyObject):
requires_backends(cls, ["torch", "transformers"]) requires_backends(cls, ["torch", "transformers"])
class HiDreamImagePipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch", "transformers"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
class HunyuanDiTControlNetPipeline(metaclass=DummyObject): class HunyuanDiTControlNetPipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"] _backends = ["torch", "transformers"]
......
# coding=utf-8
# Copyright 2024 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import numpy as np
import torch
from transformers import (
AutoTokenizer,
CLIPTextConfig,
CLIPTextModelWithProjection,
CLIPTokenizer,
LlamaForCausalLM,
T5EncoderModel,
)
from diffusers import (
AutoencoderKL,
FlowMatchEulerDiscreteScheduler,
HiDreamImagePipeline,
HiDreamImageTransformer2DModel,
)
from diffusers.utils.testing_utils import enable_full_determinism
from ..pipeline_params import TEXT_TO_IMAGE_BATCH_PARAMS, TEXT_TO_IMAGE_IMAGE_PARAMS, TEXT_TO_IMAGE_PARAMS
from ..test_pipelines_common import PipelineTesterMixin
enable_full_determinism()
class HiDreamImagePipelineFastTests(PipelineTesterMixin, unittest.TestCase):
pipeline_class = HiDreamImagePipeline
params = TEXT_TO_IMAGE_PARAMS - {"cross_attention_kwargs"}
batch_params = TEXT_TO_IMAGE_BATCH_PARAMS
image_params = TEXT_TO_IMAGE_IMAGE_PARAMS
image_latents_params = TEXT_TO_IMAGE_IMAGE_PARAMS
required_optional_params = PipelineTesterMixin.required_optional_params
test_layerwise_casting = True
supports_dduf = False
def get_dummy_components(self):
torch.manual_seed(0)
transformer = HiDreamImageTransformer2DModel(
patch_size=2,
in_channels=4,
out_channels=4,
num_layers=1,
num_single_layers=1,
attention_head_dim=8,
num_attention_heads=4,
caption_channels=[32, 16],
text_emb_dim=64,
num_routed_experts=4,
num_activated_experts=2,
axes_dims_rope=(4, 2, 2),
max_resolution=(32, 32),
llama_layers=(0, 1),
).eval()
torch.manual_seed(0)
vae = AutoencoderKL(scaling_factor=0.3611, shift_factor=0.1159)
clip_text_encoder_config = CLIPTextConfig(
bos_token_id=0,
eos_token_id=2,
hidden_size=32,
intermediate_size=37,
layer_norm_eps=1e-05,
num_attention_heads=4,
num_hidden_layers=5,
pad_token_id=1,
vocab_size=1000,
hidden_act="gelu",
projection_dim=32,
max_position_embeddings=128,
)
torch.manual_seed(0)
text_encoder = CLIPTextModelWithProjection(clip_text_encoder_config)
torch.manual_seed(0)
text_encoder_2 = CLIPTextModelWithProjection(clip_text_encoder_config)
torch.manual_seed(0)
text_encoder_3 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
torch.manual_seed(0)
text_encoder_4 = LlamaForCausalLM.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM")
text_encoder_4.generation_config.pad_token_id = 1
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
tokenizer_2 = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
tokenizer_3 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
tokenizer_4 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM")
scheduler = FlowMatchEulerDiscreteScheduler()
components = {
"scheduler": scheduler,
"vae": vae,
"text_encoder": text_encoder,
"tokenizer": tokenizer,
"text_encoder_2": text_encoder_2,
"tokenizer_2": tokenizer_2,
"text_encoder_3": text_encoder_3,
"tokenizer_3": tokenizer_3,
"text_encoder_4": text_encoder_4,
"tokenizer_4": tokenizer_4,
"transformer": transformer,
}
return components
def get_dummy_inputs(self, device, seed=0):
if str(device).startswith("mps"):
generator = torch.manual_seed(seed)
else:
generator = torch.Generator(device=device).manual_seed(seed)
inputs = {
"prompt": "A painting of a squirrel eating a burger",
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 5.0,
"output_type": "np",
}
return inputs
def test_inference(self):
device = "cpu"
components = self.get_dummy_components()
pipe = self.pipeline_class(**components)
pipe.to(device)
pipe.set_progress_bar_config(disable=None)
inputs = self.get_dummy_inputs(device)
image = pipe(**inputs)[0]
generated_image = image[0]
self.assertEqual(generated_image.shape, (128, 128, 3))
expected_image = torch.randn(128, 128, 3).numpy()
max_diff = np.abs(generated_image - expected_image).max()
self.assertLessEqual(max_diff, 1e10)
def test_inference_batch_single_identical(self):
super().test_inference_batch_single_identical(expected_max_diff=3e-4)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment