Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
9af6d22e
Unverified
Commit
9af6d22e
authored
Jun 09, 2025
by
XiongfeiWei
Committed by
GitHub
Jun 10, 2025
Browse files
Use xla flag to improve the quantized model performance (#19303)
Signed-off-by:
Xiongfei Wei
<
isaacwxf23@gmail.com
>
parent
4589b940
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
1 deletion
+4
-1
vllm/v1/worker/tpu_worker.py
vllm/v1/worker/tpu_worker.py
+4
-1
No files found.
vllm/v1/worker/tpu_worker.py
View file @
9af6d22e
...
...
@@ -101,7 +101,10 @@ class TPUWorker:
# fix this. It will be removed after the bug in XLA compiler is fixed.
os
.
environ
[
"LIBTPU_INIT_ARGS"
]
=
(
os
.
environ
.
get
(
"LIBTPU_INIT_ARGS"
,
""
)
+
" --xla_tpu_force_1d_allreduce_at_chunk_count=1"
)
" --xla_tpu_force_1d_allreduce_at_chunk_count=1"
" --xla_jf_conv_input_fusion=False"
)
# --xla_jf_conv_input_fusion=False is used to improve the perf of
# quantized matmul.
torch
.
set_grad_enabled
(
False
)
torch
.
set_default_dtype
(
self
.
model_config
.
dtype
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment