[Doc] Remove unnecessary V1 flag (#16924)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Doc] Remove unnecessary V1 flag (#16924)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
29f395c9 · Cyrus Leung · GitHub · fa3bba2a · 29f395c9
Unverified Commit 29f395c9 authored Apr 22, 2025 by Cyrus Leung Committed by GitHub Apr 21, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

docs/source/design/v1/torch_compile.md docs/source/design/v1/torch_compile.md +2 -2

No files found.
--- a/docs/source/design/v1/torch_compile.md
+++ b/docs/source/design/v1/torch_compile.md
@@ -99,7 +99,7 @@ This time, Inductor compilation is completely bypassed, and we will load from di

 The above example just uses Inductor to compile for a general shape (i.e. symbolic shape). We can also use Inductor to compile for some of the specific shapes, for example:

-`VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.2-1B --compilation_config "{'compile_sizes': [1, 2, 4, 8]}"`
+`vllm serve meta-llama/Llama-3.2-1B --compilation_config "{'compile_sizes': [1, 2, 4, 8]}"`

 Then it will also compile a specific kernel just for batch size `1, 2, 4, 8`. At this time, all of the shapes in the computation graph are static and known, and we will turn on auto-tuning to tune for max performance. This can be slow when you run it for the first time, but the next time you run it, we can directly bypass the tuning and run the tuned kernel.

@@ -134,6 +134,6 @@ The cudagraphs are captured and managed by the compiler backend, and replayed wh

 By default, vLLM will try to determine a set of sizes to capture cudagraph. You can also override it using the config `cudagraph_capture_sizes`:

-`VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.2-1B --compilation-config "{'cudagraph_capture_sizes': [1, 2, 4, 8]}"`
+`vllm serve meta-llama/Llama-3.2-1B --compilation-config "{'cudagraph_capture_sizes': [1, 2, 4, 8]}"`

 Then it will only capture cudagraph for the specified sizes. It can be useful to have fine-grained control over the cudagraph capture.