Unverified Commit 34a0e96d authored by Avshalom Manevich's avatar Avshalom Manevich Committed by GitHub
Browse files

[Kernel] changing fused moe kernel chunk size default to 32k (#7995)

parent 80c7b089
...@@ -352,7 +352,7 @@ environment_variables: Dict[str, Callable[[], Any]] = { ...@@ -352,7 +352,7 @@ environment_variables: Dict[str, Callable[[], Any]] = {
os.path.join(get_default_cache_root(), "vllm", "xla_cache"), os.path.join(get_default_cache_root(), "vllm", "xla_cache"),
)), )),
"VLLM_FUSED_MOE_CHUNK_SIZE": "VLLM_FUSED_MOE_CHUNK_SIZE":
lambda: int(os.getenv("VLLM_FUSED_MOE_CHUNK_SIZE", "65536")), lambda: int(os.getenv("VLLM_FUSED_MOE_CHUNK_SIZE", "32768")),
# If set, vllm will skip the deprecation warnings. # If set, vllm will skip the deprecation warnings.
"VLLM_NO_DEPRECATION_WARNING": "VLLM_NO_DEPRECATION_WARNING":
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment