Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set (#10918)

* Bug fix in ltx * Assume packed latents. --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set (#10918)
* Bug fix in ltx * Assume packed latents. --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>
e8fc8b1f · kakukakujirori · GitHub · d6f4774c · e8fc8b1f
Unverified Commit e8fc8b1f authored Apr 01, 2025 by kakukakujirori Committed by GitHub Mar 31, 2025
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 5 deletions

src/diffusers/pipelines/ltx/pipeline_ltx_image2video.py src/diffusers/pipelines/ltx/pipeline_ltx_image2video.py +7 -5

No files found.
--- a/src/diffusers/pipelines/ltx/pipeline_ltx_image2video.py
+++ b/src/diffusers/pipelines/ltx/pipeline_ltx_image2video.py
@@ -487,18 +487,20 @@ class LTXImageToVideoPipeline(DiffusionPipeline, FromSingleFileMixin, LTXVideoLo
    ) -> torch.Tensor:
        height = height // self.vae_spatial_compression_ratio
        width = width // self.vae_spatial_compression_ratio
-        num_frames = (
+        num_frames = (num_frames - 1) // self.vae_temporal_compression_ratio + 1
-            (num_frames - 1) // self.vae_temporal_compression_ratio + 1 if latents is None else latents.size(2)
-        )
        shape = (batch_size, num_channels_latents, num_frames, height, width)
        mask_shape = (batch_size, 1, num_frames, height, width)
        if latents is not None:
-            conditioning_mask = latents.new_zeros(shape)
+            conditioning_mask = latents.new_zeros(mask_shape)
            conditioning_mask[:, :, 0] = 1.0
            conditioning_mask = self._pack_latents(
                conditioning_mask, self.transformer_spatial_patch_size, self.transformer_temporal_patch_size
+            ).squeeze(-1)
+            if latents.ndim != 3 or latents.shape[:2] != conditioning_mask.shape:
+                raise ValueError(
+                    f"Provided `latents` tensor has shape {latents.shape}, but the expected shape is {conditioning_mask.shape + (num_channels_latents,)}."
                )
            return latents.to(device=device, dtype=dtype), conditioning_mask