两节点共8张k100 AI卡推理Qwen3-VL-235B-A22B-Instruct报错

镜像：harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.11.0-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226

export ALLREDUCE_STREAM_WITH_COMPUTE=1 export VLLM_HOST_IP=$(ifconfig | grep '11.' | awk '{print $2}') # 对应计算节点的IP，选择IB口SOCKET_IFNAME对应IP地址 export NCCL_SOCKET_IFNAME=ib0 export GLOO_SOCKET_IFNAME=ib0 export NCCL_IB_HCA=mlx5_0:1 # 环境中的IB网卡名字 unset NCCL_ALGO export NCCL_MIN_NCHANNELS=16 export NCCL_MAX_NCHANNELS=16 export NCCL_NET_GDR_READ=1 export HIP_VISIBLE_DEVICES=0,1,2,3 export VLLM_SPEC_DECODE_EAGER=1 export VLLM_MLA_DISABLE=0 export VLLM_USE_FLASH_MLA=1

K100_AI集群建议额外设置的环境变量：

export VLLM_ENFORCE_EAGER_BS_THRESHOLD=44 export VLLM_RPC_TIMEOUT=1800000

海光CPU绑定核

export VLLM_NUMA_BIND=1 export VLLM_RANK0_NUMA=0 export VLLM_RANK1_NUMA=1 export VLLM_RANK2_NUMA=2 export VLLM_RANK3_NUMA=3 export VLLM_RANK4_NUMA=4 export VLLM_RANK5_NUMA=5

# head节点执行
ray start --head --node-ip-address=x.x.x.x --port=6379 --num-gpus=8 --num-cpus=32
# worker节点执行
ray start --address='x.x.x.x:6379' --num-gpus=8 --num-cpus=32

nohup vllm serve /data/model/Qwen3-VL-235B-A22B-Instruct
--distributed-executor-backend ray
--tensor-parallel-size 8
--trust-remote-code
--dtype bfloat16
--max-model-len 32768
--max-num-seqs 128
--block-size 64
--gpu-memory-utilization 0.90
--allowed-local-media-path /
--served-model-name qwen-vl
--override-generation-config '{"temperature": 0.7, "top_p":0.8, "top_k":20, "repetition_penalty": 1.05}' \

runlog.$(hostname) 2>&1 &

报错： (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1768) I0416 15:01:58.591132 1768 ProcessGroupNCCL.cpp:2291] [PG ID 2 PG GUID 3 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 423.481 ms (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4440, ip=11.5.2.71) I0416 15:01:59.425798 4440 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 4] ProcessGroupNCCL created ncclComm_ 0x555ca8a86a80 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4440, ip=11.5.2.71) I0416 15:01:59.425839 4440 ProcessGroupNCCL.cpp:2335] [PG ID 2 PG GUID 3 Rank 4] NCCL_DEBUG: N/A (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4441, ip=11.5.2.71) I0416 15:01:59.425758 4441 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 5] ProcessGroupNCCL created ncclComm_ 0x5600cfe3a580 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4442, ip=11.5.2.71) I0416 15:01:59.425832 4442 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 6] ProcessGroupNCCL created ncclComm_ 0x55b7b9a62430 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) I0416 15:01:59.425824 4443 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 7] ProcessGroupNCCL created ncclComm_ 0x556601d00990 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4441, ip=11.5.2.71) INFO 04-16 14:59:20 [worker_base.py:287] ########## 4441 process(rank1) is running on CPU(s): {8, 9, 10, 11, 12, 13, 14, 15} (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4441, ip=11.5.2.71) INFO 04-16 14:59:20 [worker_base.py:288] ########## 4441 process(rank1) is running on memnode(s): {1} (EngineCore_DP0 pid=1554) (pid=4443, ip=11.5.2.71) INFO 04-16 14:59:13 [init.py:217] Automatically detected platform rocm. [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.) (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 14:59:49 [init.py:1398] Found nccl from library librccl.so.1 (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 14:59:49 [pynccl.py:103] vLLM is using nccl==2.22.3 (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 14:59:21 [worker_base.py:287] ########## 4443 process(rank3) is running on CPU(s): {24, 25, 26, 27, 28, 29, 30, 31} [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 14:59:21 [worker_base.py:288] ########## 4443 process(rank3) is running on memnode(s): {3} [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) WARNING 04-16 14:59:50 [custom_all_reduce.py:87] Custom allreduce is disabled because this process group spans across nodes. (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1766) INFO 04-16 14:59:50 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_354a766f'), local_subscribe_addr='ipc:///tmp/a11cc771-ef5e-43c6-b23f-10d9525b9cbe', remote_subscribe_addr='tcp://11.5.2.61:32802', remote_addr_ipv6=False) (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 14:59:51 [parallel_state.py:1212] rank 3 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3 (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 14:59:56 [gpu_model_runner.py:2608] Starting to load model /data/model/Qwen3-VL-235B-A22B-Instruct... (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 14:59:50 [init.py:1398] Found nccl from library librccl.so.1 [repeated 15x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 14:59:50 [pynccl.py:103] vLLM is using nccl==2.22.3 [repeated 15x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) WARNING 04-16 14:59:50 [custom_all_reduce.py:87] Custom allreduce is disabled because this process group spans across nodes. [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 14:59:51 [parallel_state.py:1212] rank 7 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 7, EP rank 7 [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 14:59:57 [gpu_model_runner.py:2640] Loading model from scratch... (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 14:59:57 [rocm.py:262] Using Flash Attention backend on V1 engine. (only supports block size 64) (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 15:01:37 [default_loader.py:267] Loading weights took 99.58 seconds (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 14:59:56 [gpu_model_runner.py:2608] Starting to load model /data/model/Qwen3-VL-235B-A22B-Instruct... [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1766) INFO 04-16 14:59:57 [gpu_model_runner.py:2640] Loading model from scratch... [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 14:59:57 [rocm.py:262] Using Flash Attention backend on V1 engine. (only supports block size 64) [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 15:01:37 [gpu_model_runner.py:2659] Model loading took 55.5396 GiB and 100.378088 seconds (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4441, ip=11.5.2.71) INFO 04-16 15:01:42 [default_loader.py:267] Loading weights took 104.28 seconds [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4442, ip=11.5.2.71) INFO 04-16 15:01:42 [gpu_model_runner.py:3367] Encoder cache will be initialized with a budget of 153600 tokens, and profiled with 1 video items of the maximum feature size. (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4441, ip=11.5.2.71) INFO 04-16 15:01:42 [gpu_model_runner.py:2659] Model loading took 55.5396 GiB and 105.002879 seconds [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4442, ip=11.5.2.71) INFO 04-16 15:02:42 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/6b1e590b12/rank_6_0/backbone for vLLM's torch.compile (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4442, ip=11.5.2.71) INFO 04-16 15:02:42 [backends.py:559] Dynamo bytecode transform time: 34.70 s (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 15:01:43 [gpu_model_runner.py:3367] Encoder cache will be initialized with a budget of 153600 tokens, and profiled with 1 video items of the maximum feature size. [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4442, ip=11.5.2.71) INFO 04-16 15:03:04 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 19.768 s (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1768) INFO 04-16 15:02:43 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/6b1e590b12/rank_2_0/backbone for vLLM's torch.compile [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1768) INFO 04-16 15:02:43 [backends.py:559] Dynamo bytecode transform time: 36.08 s [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 15:03:14 [fused_moe.py:1188] Using configuration from /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=gfx928_120cu_nn.json for MoE layer. (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1768) INFO 04-16 15:03:06 [backends.py:164] Directly load the compiled graph(s) for dynamic shape from the cache, took 20.249 s [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) INFO 04-16 15:03:14 [monitor.py:34] torch.compile takes 34.90 s in total (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4442, ip=11.5.2.71) INFO 04-16 15:03:27 [gpu_worker.py:298] Available KV cache memory: -4.71 GiB (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 15:03:14 [fused_moe.py:1188] Using configuration from /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=gfx928_120cu_nn.json for MoE layer. [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 15:03:14 [monitor.py:34] torch.compile takes 35.19 s in total [repeated 7x across cluster] (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] EngineCore failed to start. (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] Traceback (most recent call last): (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 498, in init (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 93, in init (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] self._initialize_kv_caches(vllm_config) (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 200, in _initialize_kv_caches (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] kv_cache_configs = get_kv_cache_configs(vllm_config, kv_cache_specs, (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/core/kv_cache_utils.py", line 1243, in get_kv_cache_configs (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] check_enough_kv_cache_memory(vllm_config, kv_cache_spec_one_worker, (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/core/kv_cache_utils.py", line 699, in check_enough_kv_cache_memory (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] raise ValueError("No available memory for the cache blocks. " (EngineCore_DP0 pid=1554) ERROR 04-16 15:03:28 [core.py:708] ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine. (EngineCore_DP0 pid=1554) Process EngineCore_DP0: (EngineCore_DP0 pid=1554) Traceback (most recent call last): (EngineCore_DP0 pid=1554) File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=1554) self.run() (EngineCore_DP0 pid=1554) File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=1554) self._target(*self.args, **self.kwargs) (EngineCore_DP0 pid=1554) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core (EngineCore_DP0 pid=1554) raise e (EngineCore_DP0 pid=1554) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=1554) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=1554) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 498, in init (EngineCore_DP0 pid=1554) super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=1554) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 93, in init (EngineCore_DP0 pid=1554) self.initialize_kv_caches(vllm_config) (EngineCore_DP0 pid=1554) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 200, in initialize_kv_caches (EngineCore_DP0 pid=1554) kv_cache_configs = get_kv_cache_configs(vllm_config, kv_cache_specs, (EngineCore_DP0 pid=1554) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/core/kv_cache_utils.py", line 1243, in get_kv_cache_configs (EngineCore_DP0 pid=1554) check_enough_kv_cache_memory(vllm_config, kv_cache_spec_one_worker, (EngineCore_DP0 pid=1554) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/core/kv_cache_utils.py", line 699, in check_enough_kv_cache_memory (EngineCore_DP0 pid=1554) raise ValueError("No available memory for the cache blocks. " (EngineCore_DP0 pid=1554) ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine. (EngineCore_DP0 pid=1554) INFO 04-16 15:03:28 [ray_distributed_executor.py:122] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray. (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1769) I0416 15:01:58.785089 1769 ProcessGroupNCCL.cpp:2291] [PG ID 2 PG GUID 3 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.168538 ms [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1766) I0416 15:01:59.426687 1766 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL created ncclComm 0x561dc35f2ec0 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1769) I0416 15:01:59.426139 1769 ProcessGroupNCCL.cpp:2335] [PG ID 2 PG GUID 3 Rank 1] NCCL_DEBUG: N/A [repeated 7x across cluster] (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1769) I0416 15:01:59.426092 1769 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 1] ProcessGroupNCCL created ncclComm 0x562f9be3d8b0 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1768) I0416 15:01:59.426720 1768 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 2] ProcessGroupNCCL created ncclComm 0x55a35e91e520 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=1763) I0416 15:01:59.426153 1763 ProcessGroupNCCL.cpp:2330] [PG ID 2 PG GUID 3 Rank 3] ProcessGroupNCCL created ncclComm 0x5580b7787310 on CUDA device: (EngineCore_DP0 pid=1554) (RayWorkerWrapper pid=4443, ip=11.5.2.71) INFO 04-16 15:03:28 [gpu_worker.py:298] Available KV cache memory: -4.61 GiB [repeated 7x across cluster] (APIServer pid=1409) Traceback (most recent call last): (APIServer pid=1409) File "/usr/local/bin/vllm", line 7, in (APIServer pid=1409) sys.exit(main()) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main (APIServer pid=1409) args.dispatch_function(args) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd (APIServer pid=1409) uvloop.run(run_server(args)) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 69, in run (APIServer pid=1409) return loop.run_until_complete(wrapper()) (APIServer pid=1409) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1409) return await main (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server (APIServer pid=1409) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker (APIServer pid=1409) async with build_async_engine_client( (APIServer pid=1409) File "/usr/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=1409) return await anext(self.gen) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client (APIServer pid=1409) async with build_async_engine_client_from_engine_args( (APIServer pid=1409) File "/usr/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=1409) return await anext(self.gen) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args (APIServer pid=1409) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/utils/init.py", line 1586, in inner (APIServer pid=1409) return fn(*args, **kwargs) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config (APIServer pid=1409) return cls( (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 134, in init (APIServer pid=1409) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client (APIServer pid=1409) return AsyncMPClient(*client_args) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 769, in init (APIServer pid=1409) super().init( (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 448, in init (APIServer pid=1409) with launch_core_engines(vllm_config, executor_class, (APIServer pid=1409) File "/usr/lib/python3.10/contextlib.py", line 142, in exit (APIServer pid=1409) next(self.gen) (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines (APIServer pid=1409) wait_for_engine_startup( (APIServer pid=1409) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup (APIServer pid=1409) raise RuntimeError("Engine core initialization failed. " (APIServer pid=1409) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}