Unverified Commit 6156cf8f authored by YiYi Xu's avatar YiYi Xu Committed by GitHub
Browse files
parent 152f7ca3
...@@ -359,6 +359,8 @@ ...@@ -359,6 +359,8 @@
title: HunyuanDiT2DModel title: HunyuanDiT2DModel
- local: api/models/hunyuanimage_transformer_2d - local: api/models/hunyuanimage_transformer_2d
title: HunyuanImageTransformer2DModel title: HunyuanImageTransformer2DModel
- local: api/models/hunyuan_video15_transformer_3d
title: HunyuanVideo15Transformer3DModel
- local: api/models/hunyuan_video_transformer_3d - local: api/models/hunyuan_video_transformer_3d
title: HunyuanVideoTransformer3DModel title: HunyuanVideoTransformer3DModel
- local: api/models/latte_transformer3d - local: api/models/latte_transformer3d
...@@ -433,6 +435,8 @@ ...@@ -433,6 +435,8 @@
title: AutoencoderKLHunyuanImageRefiner title: AutoencoderKLHunyuanImageRefiner
- local: api/models/autoencoder_kl_hunyuan_video - local: api/models/autoencoder_kl_hunyuan_video
title: AutoencoderKLHunyuanVideo title: AutoencoderKLHunyuanVideo
- local: api/models/autoencoder_kl_hunyuan_video15
title: AutoencoderKLHunyuanVideo15
- local: api/models/autoencoderkl_ltx_video - local: api/models/autoencoderkl_ltx_video
title: AutoencoderKLLTXVideo title: AutoencoderKLLTXVideo
- local: api/models/autoencoderkl_magvit - local: api/models/autoencoderkl_magvit
...@@ -652,6 +656,8 @@ ...@@ -652,6 +656,8 @@
title: Framepack title: Framepack
- local: api/pipelines/hunyuan_video - local: api/pipelines/hunyuan_video
title: HunyuanVideo title: HunyuanVideo
- local: api/pipelines/hunyuan_video15
title: HunyuanVideo1.5
- local: api/pipelines/i2vgenxl - local: api/pipelines/i2vgenxl
title: I2VGen-XL title: I2VGen-XL
- local: api/pipelines/kandinsky5_video - local: api/pipelines/kandinsky5_video
......
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLHunyuanVideo15
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5) by Tencent.
The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLHunyuanVideo15
vae = AutoencoderKLHunyuanVideo15.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v", subfolder="vae", torch_dtype=torch.float32)
# make sure to enable tiling to avoid OOM
vae.enable_tiling()
```
## AutoencoderKLHunyuanVideo15
[[autodoc]] AutoencoderKLHunyuanVideo15
- decode
- encode
- all
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# HunyuanVideo15Transformer3DModel
A Diffusion Transformer model for 3D video-like data used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5).
The model can be loaded with the following code snippet.
```python
from diffusers import HunyuanVideo15Transformer3DModel
transformer = HunyuanVideo15Transformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v" subfolder="transformer", torch_dtype=torch.bfloat16)
```
## HunyuanVideo15Transformer3DModel
[[autodoc]] HunyuanVideo15Transformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
# HunyuanVideo-1.5
HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models.
You can find all the original HunyuanVideo checkpoints under the [Tencent](https://huggingface.co/tencent) organization.
> [!TIP]
> Click on the HunyuanVideo models in the right sidebar for more examples of video generation tasks.
>
> The examples below use a checkpoint from [hunyuanvideo-community](https://huggingface.co/hunyuanvideo-community) because the weights are stored in a layout compatible with Diffusers.
The example below demonstrates how to generate a video optimized for memory or inference speed.
<hfoptions id="usage">
<hfoption id="memory">
Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques.
```py
import torch
from diffusers import AutoModel, HunyuanVideo15Pipeline
from diffusers.utils import export_to_video
pipeline = HunyuanVideo15Pipeline.from_pretrained(
"HunyuanVideo-1.5-Diffusers-480p_t2v",
torch_dtype=torch.bfloat16,
)
# model-offloading and tiling
pipeline.enable_model_cpu_offload()
pipeline.vae.enable_tiling()
prompt = "A fluffy teddy bear sits on a bed of soft pillows surrounded by children's toys."
video = pipeline(prompt=prompt, num_frames=61, num_inference_steps=30).frames[0]
export_to_video(video, "output.mp4", fps=15)
```
## Notes
- HunyuanVideo1.5 use attention masks with variable-length sequences. For best performance, we recommend using an attention backend that handles padding efficiently.
- **H100/H800:** `_flash_3_hub` or `_flash_varlen_3`
- **A100/A800/RTX 4090:** `flash_hub` or `flash_varlen`
- **Other GPUs:** `sage_hub`
Refer to the [Attention backends](../../optimization/attention_backends) guide for more details about using a different backend.
```py
pipe.transformer.set_attention_backend("flash_hub") # or your preferred backend
```
- [`HunyuanVideo15Pipeline`] use guider and does not take `guidance_scale` parameter at runtime.
You can check the default guider configuration using `pipe.guider`:
```py
>>> pipe.guider
ClassifierFreeGuidance {
"_class_name": "ClassifierFreeGuidance",
"_diffusers_version": "0.36.0.dev0",
"enabled": true,
"guidance_rescale": 0.0,
"guidance_scale": 6.0,
"start": 0.0,
"stop": 1.0,
"use_original_formulation": false
}
State:
step: None
num_inference_steps: None
timestep: None
count_prepared: 0
enabled: True
num_conditions: 2
```
To update guider configuration, you can run `pipe.guider = pipe.guider.new(...)`
```py
pipe.guider = pipe.guider.new(guidance_scale=5.0)
```
Read more on Guider [here](../../modular_diffusers/guiders).
## HunyuanVideo15Pipeline
[[autodoc]] HunyuanVideo15Pipeline
- all
- __call__
## HunyuanVideo15ImageToVideoPipeline
[[autodoc]] HunyuanVideo15ImageToVideoPipeline
- all
- __call__
## HunyuanVideo15PipelineOutput
[[autodoc]] pipelines.hunyuan_video1_5.pipeline_output.HunyuanVideo15PipelineOutput
This diff is collapsed.
...@@ -190,6 +190,7 @@ else: ...@@ -190,6 +190,7 @@ else:
"AutoencoderKLHunyuanImage", "AutoencoderKLHunyuanImage",
"AutoencoderKLHunyuanImageRefiner", "AutoencoderKLHunyuanImageRefiner",
"AutoencoderKLHunyuanVideo", "AutoencoderKLHunyuanVideo",
"AutoencoderKLHunyuanVideo15",
"AutoencoderKLLTXVideo", "AutoencoderKLLTXVideo",
"AutoencoderKLMagvit", "AutoencoderKLMagvit",
"AutoencoderKLMochi", "AutoencoderKLMochi",
...@@ -225,6 +226,7 @@ else: ...@@ -225,6 +226,7 @@ else:
"HunyuanDiT2DModel", "HunyuanDiT2DModel",
"HunyuanDiT2DMultiControlNetModel", "HunyuanDiT2DMultiControlNetModel",
"HunyuanImageTransformer2DModel", "HunyuanImageTransformer2DModel",
"HunyuanVideo15Transformer3DModel",
"HunyuanVideoFramepackTransformer3DModel", "HunyuanVideoFramepackTransformer3DModel",
"HunyuanVideoTransformer3DModel", "HunyuanVideoTransformer3DModel",
"I2VGenXLUNet", "I2VGenXLUNet",
...@@ -481,6 +483,8 @@ else: ...@@ -481,6 +483,8 @@ else:
"HunyuanImagePipeline", "HunyuanImagePipeline",
"HunyuanImageRefinerPipeline", "HunyuanImageRefinerPipeline",
"HunyuanSkyreelsImageToVideoPipeline", "HunyuanSkyreelsImageToVideoPipeline",
"HunyuanVideo15ImageToVideoPipeline",
"HunyuanVideo15Pipeline",
"HunyuanVideoFramepackPipeline", "HunyuanVideoFramepackPipeline",
"HunyuanVideoImageToVideoPipeline", "HunyuanVideoImageToVideoPipeline",
"HunyuanVideoPipeline", "HunyuanVideoPipeline",
...@@ -909,6 +913,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -909,6 +913,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
AutoencoderKLHunyuanImage, AutoencoderKLHunyuanImage,
AutoencoderKLHunyuanImageRefiner, AutoencoderKLHunyuanImageRefiner,
AutoencoderKLHunyuanVideo, AutoencoderKLHunyuanVideo,
AutoencoderKLHunyuanVideo15,
AutoencoderKLLTXVideo, AutoencoderKLLTXVideo,
AutoencoderKLMagvit, AutoencoderKLMagvit,
AutoencoderKLMochi, AutoencoderKLMochi,
...@@ -944,6 +949,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -944,6 +949,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
HunyuanDiT2DModel, HunyuanDiT2DModel,
HunyuanDiT2DMultiControlNetModel, HunyuanDiT2DMultiControlNetModel,
HunyuanImageTransformer2DModel, HunyuanImageTransformer2DModel,
HunyuanVideo15Transformer3DModel,
HunyuanVideoFramepackTransformer3DModel, HunyuanVideoFramepackTransformer3DModel,
HunyuanVideoTransformer3DModel, HunyuanVideoTransformer3DModel,
I2VGenXLUNet, I2VGenXLUNet,
...@@ -1170,6 +1176,8 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -1170,6 +1176,8 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
HunyuanImagePipeline, HunyuanImagePipeline,
HunyuanImageRefinerPipeline, HunyuanImageRefinerPipeline,
HunyuanSkyreelsImageToVideoPipeline, HunyuanSkyreelsImageToVideoPipeline,
HunyuanVideo15ImageToVideoPipeline,
HunyuanVideo15Pipeline,
HunyuanVideoFramepackPipeline, HunyuanVideoFramepackPipeline,
HunyuanVideoImageToVideoPipeline, HunyuanVideoImageToVideoPipeline,
HunyuanVideoPipeline, HunyuanVideoPipeline,
......
...@@ -39,6 +39,7 @@ if is_torch_available(): ...@@ -39,6 +39,7 @@ if is_torch_available():
_import_structure["autoencoders.autoencoder_kl_hunyuan_video"] = ["AutoencoderKLHunyuanVideo"] _import_structure["autoencoders.autoencoder_kl_hunyuan_video"] = ["AutoencoderKLHunyuanVideo"]
_import_structure["autoencoders.autoencoder_kl_hunyuanimage"] = ["AutoencoderKLHunyuanImage"] _import_structure["autoencoders.autoencoder_kl_hunyuanimage"] = ["AutoencoderKLHunyuanImage"]
_import_structure["autoencoders.autoencoder_kl_hunyuanimage_refiner"] = ["AutoencoderKLHunyuanImageRefiner"] _import_structure["autoencoders.autoencoder_kl_hunyuanimage_refiner"] = ["AutoencoderKLHunyuanImageRefiner"]
_import_structure["autoencoders.autoencoder_kl_hunyuanvideo15"] = ["AutoencoderKLHunyuanVideo15"]
_import_structure["autoencoders.autoencoder_kl_ltx"] = ["AutoencoderKLLTXVideo"] _import_structure["autoencoders.autoencoder_kl_ltx"] = ["AutoencoderKLLTXVideo"]
_import_structure["autoencoders.autoencoder_kl_magvit"] = ["AutoencoderKLMagvit"] _import_structure["autoencoders.autoencoder_kl_magvit"] = ["AutoencoderKLMagvit"]
_import_structure["autoencoders.autoencoder_kl_mochi"] = ["AutoencoderKLMochi"] _import_structure["autoencoders.autoencoder_kl_mochi"] = ["AutoencoderKLMochi"]
...@@ -96,6 +97,7 @@ if is_torch_available(): ...@@ -96,6 +97,7 @@ if is_torch_available():
_import_structure["transformers.transformer_flux2"] = ["Flux2Transformer2DModel"] _import_structure["transformers.transformer_flux2"] = ["Flux2Transformer2DModel"]
_import_structure["transformers.transformer_hidream_image"] = ["HiDreamImageTransformer2DModel"] _import_structure["transformers.transformer_hidream_image"] = ["HiDreamImageTransformer2DModel"]
_import_structure["transformers.transformer_hunyuan_video"] = ["HunyuanVideoTransformer3DModel"] _import_structure["transformers.transformer_hunyuan_video"] = ["HunyuanVideoTransformer3DModel"]
_import_structure["transformers.transformer_hunyuan_video15"] = ["HunyuanVideo15Transformer3DModel"]
_import_structure["transformers.transformer_hunyuan_video_framepack"] = ["HunyuanVideoFramepackTransformer3DModel"] _import_structure["transformers.transformer_hunyuan_video_framepack"] = ["HunyuanVideoFramepackTransformer3DModel"]
_import_structure["transformers.transformer_hunyuanimage"] = ["HunyuanImageTransformer2DModel"] _import_structure["transformers.transformer_hunyuanimage"] = ["HunyuanImageTransformer2DModel"]
_import_structure["transformers.transformer_kandinsky"] = ["Kandinsky5Transformer3DModel"] _import_structure["transformers.transformer_kandinsky"] = ["Kandinsky5Transformer3DModel"]
...@@ -147,6 +149,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -147,6 +149,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
AutoencoderKLHunyuanImage, AutoencoderKLHunyuanImage,
AutoencoderKLHunyuanImageRefiner, AutoencoderKLHunyuanImageRefiner,
AutoencoderKLHunyuanVideo, AutoencoderKLHunyuanVideo,
AutoencoderKLHunyuanVideo15,
AutoencoderKLLTXVideo, AutoencoderKLLTXVideo,
AutoencoderKLMagvit, AutoencoderKLMagvit,
AutoencoderKLMochi, AutoencoderKLMochi,
...@@ -199,6 +202,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -199,6 +202,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
HiDreamImageTransformer2DModel, HiDreamImageTransformer2DModel,
HunyuanDiT2DModel, HunyuanDiT2DModel,
HunyuanImageTransformer2DModel, HunyuanImageTransformer2DModel,
HunyuanVideo15Transformer3DModel,
HunyuanVideoFramepackTransformer3DModel, HunyuanVideoFramepackTransformer3DModel,
HunyuanVideoTransformer3DModel, HunyuanVideoTransformer3DModel,
Kandinsky5Transformer3DModel, Kandinsky5Transformer3DModel,
......
...@@ -282,6 +282,7 @@ def attention_backend(backend: Union[str, AttentionBackendName] = AttentionBacke ...@@ -282,6 +282,7 @@ def attention_backend(backend: Union[str, AttentionBackendName] = AttentionBacke
backend = AttentionBackendName(backend) backend = AttentionBackendName(backend)
_check_attention_backend_requirements(backend) _check_attention_backend_requirements(backend)
_maybe_download_kernel_for_backend(backend)
old_backend = _AttentionBackendRegistry._active_backend old_backend = _AttentionBackendRegistry._active_backend
_AttentionBackendRegistry._active_backend = backend _AttentionBackendRegistry._active_backend = backend
......
...@@ -8,6 +8,7 @@ from .autoencoder_kl_flux2 import AutoencoderKLFlux2 ...@@ -8,6 +8,7 @@ from .autoencoder_kl_flux2 import AutoencoderKLFlux2
from .autoencoder_kl_hunyuan_video import AutoencoderKLHunyuanVideo from .autoencoder_kl_hunyuan_video import AutoencoderKLHunyuanVideo
from .autoencoder_kl_hunyuanimage import AutoencoderKLHunyuanImage from .autoencoder_kl_hunyuanimage import AutoencoderKLHunyuanImage
from .autoencoder_kl_hunyuanimage_refiner import AutoencoderKLHunyuanImageRefiner from .autoencoder_kl_hunyuanimage_refiner import AutoencoderKLHunyuanImageRefiner
from .autoencoder_kl_hunyuanvideo15 import AutoencoderKLHunyuanVideo15
from .autoencoder_kl_ltx import AutoencoderKLLTXVideo from .autoencoder_kl_ltx import AutoencoderKLLTXVideo
from .autoencoder_kl_magvit import AutoencoderKLMagvit from .autoencoder_kl_magvit import AutoencoderKLMagvit
from .autoencoder_kl_mochi import AutoencoderKLMochi from .autoencoder_kl_mochi import AutoencoderKLMochi
......
...@@ -29,6 +29,7 @@ if is_torch_available(): ...@@ -29,6 +29,7 @@ if is_torch_available():
from .transformer_flux2 import Flux2Transformer2DModel from .transformer_flux2 import Flux2Transformer2DModel
from .transformer_hidream_image import HiDreamImageTransformer2DModel from .transformer_hidream_image import HiDreamImageTransformer2DModel
from .transformer_hunyuan_video import HunyuanVideoTransformer3DModel from .transformer_hunyuan_video import HunyuanVideoTransformer3DModel
from .transformer_hunyuan_video15 import HunyuanVideo15Transformer3DModel
from .transformer_hunyuan_video_framepack import HunyuanVideoFramepackTransformer3DModel from .transformer_hunyuan_video_framepack import HunyuanVideoFramepackTransformer3DModel
from .transformer_hunyuanimage import HunyuanImageTransformer2DModel from .transformer_hunyuanimage import HunyuanImageTransformer2DModel
from .transformer_kandinsky import Kandinsky5Transformer3DModel from .transformer_kandinsky import Kandinsky5Transformer3DModel
......
...@@ -243,6 +243,7 @@ else: ...@@ -243,6 +243,7 @@ else:
"HunyuanVideoImageToVideoPipeline", "HunyuanVideoImageToVideoPipeline",
"HunyuanVideoFramepackPipeline", "HunyuanVideoFramepackPipeline",
] ]
_import_structure["hunyuan_video1_5"] = ["HunyuanVideo15Pipeline", "HunyuanVideo15ImageToVideoPipeline"]
_import_structure["hunyuan_image"] = ["HunyuanImagePipeline", "HunyuanImageRefinerPipeline"] _import_structure["hunyuan_image"] = ["HunyuanImagePipeline", "HunyuanImageRefinerPipeline"]
_import_structure["kandinsky"] = [ _import_structure["kandinsky"] = [
"KandinskyCombinedPipeline", "KandinskyCombinedPipeline",
...@@ -665,6 +666,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT: ...@@ -665,6 +666,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
HunyuanVideoImageToVideoPipeline, HunyuanVideoImageToVideoPipeline,
HunyuanVideoPipeline, HunyuanVideoPipeline,
) )
from .hunyuan_video1_5 import HunyuanVideo15ImageToVideoPipeline, HunyuanVideo15Pipeline
from .hunyuandit import HunyuanDiTPipeline from .hunyuandit import HunyuanDiTPipeline
from .i2vgen_xl import I2VGenXLPipeline from .i2vgen_xl import I2VGenXLPipeline
from .kandinsky import ( from .kandinsky import (
......
from typing import TYPE_CHECKING
from ...utils import (
DIFFUSERS_SLOW_IMPORT,
OptionalDependencyNotAvailable,
_LazyModule,
get_objects_from_module,
is_torch_available,
is_transformers_available,
)
_dummy_objects = {}
_import_structure = {}
try:
if not (is_transformers_available() and is_torch_available()):
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
from ...utils import dummy_torch_and_transformers_objects # noqa F403
_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
else:
_import_structure["pipeline_hunyuan_video1_5"] = ["HunyuanVideo15Pipeline"]
_import_structure["pipeline_hunyuan_video1_5_image2video"] = ["HunyuanVideo15ImageToVideoPipeline"]
if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
try:
if not (is_transformers_available() and is_torch_available()):
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
from ...utils.dummy_torch_and_transformers_objects import *
else:
from .pipeline_hunyuan_video1_5 import HunyuanVideo15Pipeline
from .pipeline_hunyuan_video1_5_image2video import HunyuanVideo15ImageToVideoPipeline
else:
import sys
sys.modules[__name__] = _LazyModule(
__name__,
globals()["__file__"],
_import_structure,
module_spec=__spec__,
)
for name, value in _dummy_objects.items():
setattr(sys.modules[__name__], name, value)
# Copyright 2025 The HunyuanVideo Team and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
from ...configuration_utils import register_to_config
from ...video_processor import VideoProcessor
# copied from https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/hyvideo/utils/data_utils.py#L20
def generate_crop_size_list(base_size=256, patch_size=16, max_ratio=4.0):
num_patches = round((base_size / patch_size) ** 2)
assert max_ratio >= 1.0
crop_size_list = []
wp, hp = num_patches, 1
while wp > 0:
if max(wp, hp) / min(wp, hp) <= max_ratio:
crop_size_list.append((wp * patch_size, hp * patch_size))
if (hp + 1) * wp <= num_patches:
hp += 1
else:
wp -= 1
return crop_size_list
# copied from https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/hyvideo/utils/data_utils.py#L38
def get_closest_ratio(height: float, width: float, ratios: list, buckets: list):
"""
Get the closest ratio in the buckets.
Args:
height (float): video height
width (float): video width
ratios (list): video aspect ratio
buckets (list): buckets generated by `generate_crop_size_list`
Returns:
the closest size in the buckets and the corresponding ratio
"""
aspect_ratio = float(height) / float(width)
diff_ratios = ratios - aspect_ratio
if aspect_ratio >= 1:
indices = [(index, x) for index, x in enumerate(diff_ratios) if x <= 0]
else:
indices = [(index, x) for index, x in enumerate(diff_ratios) if x >= 0]
closest_ratio_id = min(indices, key=lambda pair: abs(pair[1]))[0]
closest_size = buckets[closest_ratio_id]
closest_ratio = ratios[closest_ratio_id]
return closest_size, closest_ratio
class HunyuanVideo15ImageProcessor(VideoProcessor):
r"""
Image/video processor to preproces/postprocess the reference image/generatedvideo for the HunyuanVideo1.5 model.
Args:
do_resize (`bool`, *optional*, defaults to `True`):
Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept
`height` and `width` arguments from [`image_processor.VaeImageProcessor.preprocess`] method.
vae_scale_factor (`int`, *optional*, defaults to `16`):
VAE (spatial) scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of
this factor.
vae_latent_channels (`int`, *optional*, defaults to `32`):
VAE latent channels.
do_convert_rgb (`bool`, *optional*, defaults to `True`):
Whether to convert the image to RGB.
"""
@register_to_config
def __init__(
self,
do_resize: bool = True,
vae_scale_factor: int = 16,
vae_latent_channels: int = 32,
do_convert_rgb: bool = True,
):
super().__init__(
do_resize=do_resize,
vae_scale_factor=vae_scale_factor,
vae_latent_channels=vae_latent_channels,
do_convert_rgb=do_convert_rgb,
)
def calculate_default_height_width(self, height: int, width: int, target_size: int):
crop_size_list = generate_crop_size_list(base_size=target_size, patch_size=self.config.vae_scale_factor)
aspect_ratios = np.array([round(float(h) / float(w), 5) for h, w in crop_size_list])
height, width = get_closest_ratio(height, width, aspect_ratios, crop_size_list)[0]
return height, width
from dataclasses import dataclass
import torch
from diffusers.utils import BaseOutput
@dataclass
class HunyuanVideo15PipelineOutput(BaseOutput):
r"""
Output class for HunyuanVideo1.5 pipelines.
Args:
frames (`torch.Tensor`, `np.ndarray`, or List[List[PIL.Image.Image]]):
List of video outputs - It can be a nested list of length `batch_size,` with each sub-list containing
denoised PIL image sequences of length `num_frames.` It can also be a NumPy array or Torch tensor of shape
`(batch_size, num_frames, channels, height, width)`.
"""
frames: torch.Tensor
...@@ -468,6 +468,21 @@ class AutoencoderKLHunyuanVideo(metaclass=DummyObject): ...@@ -468,6 +468,21 @@ class AutoencoderKLHunyuanVideo(metaclass=DummyObject):
requires_backends(cls, ["torch"]) requires_backends(cls, ["torch"])
class AutoencoderKLHunyuanVideo15(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch"])
class AutoencoderKLLTXVideo(metaclass=DummyObject): class AutoencoderKLLTXVideo(metaclass=DummyObject):
_backends = ["torch"] _backends = ["torch"]
...@@ -993,6 +1008,21 @@ class HunyuanImageTransformer2DModel(metaclass=DummyObject): ...@@ -993,6 +1008,21 @@ class HunyuanImageTransformer2DModel(metaclass=DummyObject):
requires_backends(cls, ["torch"]) requires_backends(cls, ["torch"])
class HunyuanVideo15Transformer3DModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch"])
class HunyuanVideoFramepackTransformer3DModel(metaclass=DummyObject): class HunyuanVideoFramepackTransformer3DModel(metaclass=DummyObject):
_backends = ["torch"] _backends = ["torch"]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment