Unverified Commit f8ba5cd7 authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] Cache link (#12105)

cache
parent c9c82173
...@@ -25,6 +25,8 @@ Original model checkpoints for Flux can be found [here](https://huggingface.co/b ...@@ -25,6 +25,8 @@ Original model checkpoints for Flux can be found [here](https://huggingface.co/b
Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c). Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).
[Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
</Tip> </Tip>
Flux comes in the following variants: Flux comes in the following variants:
......
...@@ -18,7 +18,7 @@ ...@@ -18,7 +18,7 @@
<Tip> <Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines. [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
</Tip> </Tip>
......
...@@ -88,7 +88,7 @@ export_to_video(video, "output.mp4", fps=24) ...@@ -88,7 +88,7 @@ export_to_video(video, "output.mp4", fps=24)
</hfoption> </hfoption>
<hfoption id="inference speed"> <hfoption id="inference speed">
[Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster. [Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster. [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
```py ```py
import torch import torch
......
...@@ -20,7 +20,7 @@ Check out the model card [here](https://huggingface.co/Qwen/Qwen-Image) to learn ...@@ -20,7 +20,7 @@ Check out the model card [here](https://huggingface.co/Qwen/Qwen-Image) to learn
<Tip> <Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines. [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
</Tip> </Tip>
......
...@@ -119,7 +119,7 @@ export_to_video(output, "output.mp4", fps=16) ...@@ -119,7 +119,7 @@ export_to_video(output, "output.mp4", fps=16)
</hfoption> </hfoption>
<hfoption id="T2V inference speed"> <hfoption id="T2V inference speed">
[Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster. [Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster. [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
```py ```py
# pip install ftfy # pip install ftfy
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment