Unverified Commit 4e898560 authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

revert automatic chunking (#3934)

* revert automatic chunking

* Apply suggestions from code review

* revert automatic chunking
parent 332d2bbe
...@@ -138,6 +138,7 @@ pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dt ...@@ -138,6 +138,7 @@ pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dt
pipe.enable_model_cpu_offload() pipe.enable_model_cpu_offload()
# memory optimization # memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing() pipe.enable_vae_slicing()
prompt = "Darth Vader surfing a wave" prompt = "Darth Vader surfing a wave"
...@@ -150,10 +151,13 @@ Now the video can be upscaled: ...@@ -150,10 +151,13 @@ Now the video can be upscaled:
```py ```py
pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_XL", torch_dtype=torch.float16) pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_XL", torch_dtype=torch.float16)
pipe.vae.enable_slicing()
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload() pipe.enable_model_cpu_offload()
# memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()
video = [Image.fromarray(frame).resize((1024, 576)) for frame in video_frames] video = [Image.fromarray(frame).resize((1024, 576)) for frame in video_frames]
video_frames = pipe(prompt, video=video, strength=0.6).frames video_frames = pipe(prompt, video=video, strength=0.6).frames
...@@ -175,6 +179,28 @@ Here are some sample outputs: ...@@ -175,6 +179,28 @@ Here are some sample outputs:
</tr> </tr>
</table> </table>
### Memory optimizations
Text-guided video generation with [`~TextToVideoSDPipeline`] and [`~VideoToVideoSDPipeline`] is very memory intensive both
when denoising with [`~UNet3DConditionModel`] and when decoding with [`~AutoencoderKL`]. It is possible though to reduce
memory usage at the cost of increased runtime to achieve the exact same result. To do so, it is recommended to enable
**forward chunking** and **vae slicing**:
Forward chunking via [`~UNet3DConditionModel.enable_forward_chunking`]is explained in [this blog post](https://huggingface.co/blog/reformer#2-chunked-feed-forward-layers) and
allows to significantly reduce the required memory for the unet. You can chunk the feed forward layer over the `num_frames`
dimension by doing:
```py
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
```
Vae slicing via [`~TextToVideoSDPipeline.enable_vae_slicing`] and [`~VideoToVideoSDPipeline.enable_vae_slicing`] also
gives significant memory savings since the two pipelines decode all image frames at once.
```py
pipe.enable_vae_slicing()
```
## Available checkpoints ## Available checkpoints
* [damo-vilab/text-to-video-ms-1.7b](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/) * [damo-vilab/text-to-video-ms-1.7b](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/)
......
...@@ -634,9 +634,6 @@ class TextToVideoSDPipeline(DiffusionPipeline, TextualInversionLoaderMixin, Lora ...@@ -634,9 +634,6 @@ class TextToVideoSDPipeline(DiffusionPipeline, TextualInversionLoaderMixin, Lora
# 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline # 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta) extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
# 6.1 Chunk feed-forward computation to save memory
self.unet.enable_forward_chunking(chunk_size=1, dim=1)
# 7. Denoising loop # 7. Denoising loop
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
with self.progress_bar(total=num_inference_steps) as progress_bar: with self.progress_bar(total=num_inference_steps) as progress_bar:
......
...@@ -709,9 +709,6 @@ class VideoToVideoSDPipeline(DiffusionPipeline, TextualInversionLoaderMixin, Lor ...@@ -709,9 +709,6 @@ class VideoToVideoSDPipeline(DiffusionPipeline, TextualInversionLoaderMixin, Lor
# 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline # 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta) extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
# 6.1 Chunk feed-forward computation to save memory
self.unet.enable_forward_chunking(chunk_size=1, dim=1)
# 7. Denoising loop # 7. Denoising loop
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
with self.progress_bar(total=num_inference_steps) as progress_bar: with self.progress_bar(total=num_inference_steps) as progress_bar:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment