update torch compile doc (#4874)

aa08aeac · Ke Bao · GitHub · d8a136a1 · aa08aeac
Unverified Commit aa08aeac authored Mar 29, 2025 by Ke Bao Committed by GitHub Mar 28, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

docs/backend/hyperparameter_tuning.md docs/backend/hyperparameter_tuning.md +1 -1

No files found.
--- a/docs/backend/hyperparameter_tuning.md
+++ b/docs/backend/hyperparameter_tuning.md
@@ -40,7 +40,7 @@ If you see out of memory (OOM) errors, you can try to tune the following paramet

 ## Enabling cache for `torch.compile`

-To enable `torch.compile` acceleration, add `--enable-torch-compile`. It accelerates small models on small batch sizes. This does not work for FP8 currently. By default, `torch.compile` will automatically cache the FX graph and Triton in `/tmp/torchinductor_root`, which might be cleared according to the [system policy](https://serverfault.com/questions/377348/when-does-tmp-get-cleared). You can export the environment variable `TORCHINDUCTOR_CACHE_DIR` to save compilation cache in your desired directory to avoid unwanted deletion. You can also share the cache with other machines to reduce the compilation time.
+To enable `torch.compile` acceleration, add `--enable-torch-compile`. It accelerates small models on small batch sizes. By default, `torch.compile` will automatically cache the FX graph and Triton in `/tmp/torchinductor_root`, which might be cleared according to the [system policy](https://serverfault.com/questions/377348/when-does-tmp-get-cleared). You can export the environment variable `TORCHINDUCTOR_CACHE_DIR` to save compilation cache in your desired directory to avoid unwanted deletion. You can also share the cache with other machines to reduce the compilation time.

 SGLang uses `max-autotune-no-cudagraphs` mode of `torch.compile`. The auto-tuning can be slow.
 If you want to deploy a model on many different machines, you can ship the `torch.compile` cache to these machines and skip the compilation steps. This is based on [PyTorch official documentation](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html).