Unverified Commit c32a18cb authored by Eldar Kurtić's avatar Eldar Kurtić Committed by GitHub
Browse files

Attempt to fix GPU OOM in a spec-decoding test (#29419)


Signed-off-by: default avatarEldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
parent b07555d2
...@@ -133,7 +133,7 @@ def main(args): ...@@ -133,7 +133,7 @@ def main(args):
tensor_parallel_size=args.tp, tensor_parallel_size=args.tp,
enable_chunked_prefill=args.enable_chunked_prefill, enable_chunked_prefill=args.enable_chunked_prefill,
enforce_eager=args.enforce_eager, enforce_eager=args.enforce_eager,
gpu_memory_utilization=0.8, gpu_memory_utilization=0.9,
speculative_config=speculative_config, speculative_config=speculative_config,
disable_log_stats=False, disable_log_stats=False,
max_model_len=args.max_model_len, max_model_len=args.max_model_len,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment