Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
jerrrrry
vllm_test_tools
Commits
e3b8cb11
You need to sign in or sign up before continuing.
Commit
e3b8cb11
authored
May 15, 2025
by
jerrrrry
Browse files
Upload New File
parent
c4a52ad6
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
20 additions
and
0 deletions
+20
-0
084-servering/server.sh
084-servering/server.sh
+20
-0
No files found.
084-servering/server.sh
0 → 100644
View file @
e3b8cb11
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
export
HSA_FORCE_FINE_GRAIN_PCIE
=
1
export
NCCL_MAX_NCHANNELS
=
16
export
NCCL_MIN_NCHANNELS
=
16
export
NCCL_P2P_LEVEL
=
SYS
export
NCCL_LAUNCH_MODE
=
GROUP
export
ROCBLAS_COMPUTETYPE_FP16R
=
0
export
LD_LIBRARY_PATH
=
/usr/local/lib/python3.10/site-packages/torch/lib/:
$LD_LIBRARY_PATH
export
VLLM_NUMA_BIND
=
1
export
VLLM_RANK0_NUMA
=
3
export
VLLM_RANK1_NUMA
=
1
export
VLLM_RANK2_NUMA
=
1
export
VLLM_RANK3_NUMA
=
0
export
VLLM_RANK4_NUMA
=
7
export
VLLM_RANK5_NUMA
=
5
export
VLLM_RANK6_NUMA
=
5
export
VLLM_RANK7_NUMA
=
4
vllm serve /workspace/llms/bf16_model/
--trust-remote-code
--distributed-executor-backend
ray
--dtype
bfloat16
--max-model-len
24000
--max-seq-len-to-capture
24000
-tp
32
--gpu-memory-utilization
0.9
--max-num-seqs
128
--speculative_config
'{"num_speculative_tokens": 1}'
--block-size
64
--disable-log-requests
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment