K100_AI * 8 ,容器化部署MiniMAX-M2 遇到问题，求助 (#1) · Issues · ModelZoo / MiniMax-M2_vllm

K100_AI * 8 ,容器化部署MiniMAX-M2 遇到问题，求助

[root@lenovo MiniMax-M2]$ docker run -it --shm-size 60g --network=host --name minimax_m2 --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro -v /MODELS:/data/MODELS image.sourcefind.cn:5000/dcu/admin/base/custom:vllm-ubuntu22.04-dtk25.04.2-py3.10-minimax-m2 bash

| | / / / / / / |/ / | | / / / / / / /|/ / | |/ / // // / / /
|//__// //

DTK version 25.04.2 vllm version 0.9.2 8 DCU were detected in the container. NOTE: Make sure the user is in the 39 group. For more information, please go to https://developer.hpccube.com/ root@lenovo:/workspace# export ALLREDUCE_STREAM_WITH_COMPUTE=1 export VLLM_MLA_DISABLE=0 export VLLM_USE_FLASH_MLA=1

vllm serve /data/MODELS/MiniMax/MiniMax-M2-bf16/
--host 0.0.0.0
--port 20001
--trust-remote-code
--max-model-len 32768
--served-model-name minimax
--dtype bfloat16
--tensor-parallel-size 8 INFO 12-14 20:53:35 [init.py:217] Automatically detected platform rocm. (APIServer pid=37) INFO 12-14 20:53:40 [api_server.py:1839] vLLM API server version 0.11.0 (APIServer pid=37) INFO 12-14 20:53:40 [utils.py:233] non-default args: {'model_tag': '/data/MODELS/MiniMax/MiniMax-M2-bf16/', 'host': '0.0.0.0', 'port': 20001, 'model': '/data/MODELS/MiniMax/MiniMax-M2-bf16/', 'trust_remote_code': True, 'dtype': 'bfloat16', 'max_model_len': 32768, 'served_model_name': ['minimax'], 'tensor_parallel_size': 8} (APIServer pid=37) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=37) The module name (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues. (APIServer pid=37) The module name (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues. (APIServer pid=37) INFO 12-14 20:53:41 [model.py:555] Resolved architecture: MiniMaxM2ForCausalLM (APIServer pid=37) torch_dtype is deprecated! Use dtype instead! (APIServer pid=37) ERROR 12-14 20:53:41 [config.py:278] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/data/MODELS/MiniMax/MiniMax-M2-bf16/'. Use repo_type argument if needed., retrying 1 of 2 (APIServer pid=37) ERROR 12-14 20:53:43 [config.py:276] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/data/MODELS/MiniMax/MiniMax-M2-bf16/'. Use repo_type argument if needed. (APIServer pid=37) INFO 12-14 20:53:43 [model.py:1739] Downcasting torch.float32 to torch.bfloat16. (APIServer pid=37) INFO 12-14 20:53:43 [model.py:1519] Using max model len 32768 (APIServer pid=37) INFO 12-14 20:53:43 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=2048. (EngineCore_DP0 pid=334) INFO 12-14 20:53:52 [core.py:644] Waiting for init message from front-end. (EngineCore_DP0 pid=334) INFO 12-14 20:53:52 [core.py:78] Initializing a V1 LLM engine (v0.11.0) with config: model='/data/MODELS/MiniMax/MiniMax-M2-bf16/', speculative_config=None, tokenizer='/data/MODELS/MiniMax/MiniMax-M2-bf16/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=minimax, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} (EngineCore_DP0 pid=334) WARNING 12-14 20:53:52 [multiproc_executor.py:720] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore_DP0 pid=334) INFO 12-14 20:53:52 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3, 4, 5, 6, 7], buffer_handle=(8, 16777216, 10, 'psm_d127af06'), local_subscribe_addr='ipc:///tmp/38aaa436-df50-463c-89e2-534c384b2d28', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. INFO 12-14 20:53:56 [init.py:217] Automatically detected platform rocm. WARNING 12-14 20:54:02 [worker_base.py:40] VLLM_RANK5_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK5_NUMA = -1 INFO 12-14 20:54:02 [worker_base.py:287] ########## 563 process(rank5) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:02 [worker_base.py:288] ########## 563 process(rank5) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} WARNING 12-14 20:54:02 [worker_base.py:40] VLLM_RANK1_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK1_NUMA = -1 INFO 12-14 20:54:02 [worker_base.py:287] ########## 559 process(rank1) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:02 [worker_base.py:288] ########## 559 process(rank1) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} INFO 12-14 20:54:02 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_67caf140'), local_subscribe_addr='ipc:///tmp/70a2f498-b21f-4667-9774-4506f0d9033a', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-14 20:54:02 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_682e4e7e'), local_subscribe_addr='ipc:///tmp/847f3e56-ea46-4db4-a96e-c2ba98d477da', remote_subscribe_addr=None, remote_addr_ipv6=False) WARNING 12-14 20:54:02 [worker_base.py:40] VLLM_RANK0_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK0_NUMA = -1 INFO 12-14 20:54:02 [worker_base.py:287] ########## 558 process(rank0) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:02 [worker_base.py:288] ########## 558 process(rank0) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} INFO 12-14 20:54:02 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_aa332a9c'), local_subscribe_addr='ipc:///tmp/12a45e50-48e4-4025-8bf6-2679cb342ecc', remote_subscribe_addr=None, remote_addr_ipv6=False) WARNING 12-14 20:54:02 [worker_base.py:40] VLLM_RANK3_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK3_NUMA = -1 INFO 12-14 20:54:02 [worker_base.py:287] ########## 561 process(rank3) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:02 [worker_base.py:288] ########## 561 process(rank3) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} INFO 12-14 20:54:02 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_d39f383f'), local_subscribe_addr='ipc:///tmp/250c3736-4d9a-47aa-b69c-b97290760d50', remote_subscribe_addr=None, remote_addr_ipv6=False) WARNING 12-14 20:54:02 [worker_base.py:40] VLLM_RANK4_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK4_NUMA = -1 INFO 12-14 20:54:02 [worker_base.py:287] ########## 562 process(rank4) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:02 [worker_base.py:288] ########## 562 process(rank4) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} INFO 12-14 20:54:02 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_32514742'), local_subscribe_addr='ipc:///tmp/023a6c49-967e-442d-93ed-a6554c334656', remote_subscribe_addr=None, remote_addr_ipv6=False) WARNING 12-14 20:54:02 [worker_base.py:40] VLLM_RANK6_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK6_NUMA = -1 INFO 12-14 20:54:02 [worker_base.py:287] ########## 564 process(rank6) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:02 [worker_base.py:288] ########## 564 process(rank6) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} INFO 12-14 20:54:02 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c9fe4f5c'), local_subscribe_addr='ipc:///tmp/48d287a3-7985-4fbb-ab7e-fa15039c9b01', remote_subscribe_addr=None, remote_addr_ipv6=False) WARNING 12-14 20:54:03 [worker_base.py:40] VLLM_RANK2_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK2_NUMA = -1 INFO 12-14 20:54:03 [worker_base.py:287] ########## 560 process(rank2) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:03 [worker_base.py:288] ########## 560 process(rank2) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} INFO 12-14 20:54:03 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_2ccce1cb'), local_subscribe_addr='ipc:///tmp/94f74a5e-abca-47df-9cbc-fdc7adaf202c', remote_subscribe_addr=None, remote_addr_ipv6=False) WARNING 12-14 20:54:03 [worker_base.py:40] VLLM_RANK7_NUMA is unset or set incorrectly, vllm will not bind to numa! VLLM_RANK7_NUMA = -1 INFO 12-14 20:54:03 [worker_base.py:287] ########## 565 process(rank7) is running on CPU(s): {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191} INFO 12-14 20:54:03 [worker_base.py:288] ########## 565 process(rank7) is running on memnode(s): {0, 1, 2, 3, 4, 5, 6, 7} INFO 12-14 20:54:03 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_4d4bbfb5'), local_subscribe_addr='ipc:///tmp/efdce4f5-40a0-4449-aaba-776cdf7fa310', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. WARNING 12-14 20:54:03 [custom_all_reduce.py:153] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. INFO 12-14 20:54:03 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_e059d2de'), local_subscribe_addr='ipc:///tmp/47108111-4633-4d65-b93a-537844862202', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 3 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 2 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 1 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 7 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 7, EP rank 7 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [init.py:1398] Found nccl from library librccl.so.1 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 6 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 6, EP rank 6 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 4 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 4, EP rank 4 INFO 12-14 20:54:03 [parallel_state.py:1209] rank 5 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 5, EP rank 5 (Worker_TP5 pid=563) INFO 12-14 20:54:03 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP1 pid=559) INFO 12-14 20:54:04 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP2 pid=560) INFO 12-14 20:54:04 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP6 pid=564) INFO 12-14 20:54:04 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP0 pid=558) INFO 12-14 20:54:04 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP4 pid=562) INFO 12-14 20:54:04 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP3 pid=561) INFO 12-14 20:54:04 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP7 pid=565) INFO 12-14 20:54:04 [gpu_model_runner.py:2608] Starting to load model /data/MODELS/MiniMax/MiniMax-M2-bf16/... (Worker_TP5 pid=563) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP5 pid=563) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP5 pid=563) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP5 pid=563) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP0 pid=558) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP1 pid=559) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP2 pid=560) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP6 pid=564) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP4 pid=562) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP0 pid=558) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP0 pid=558) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP0 pid=558) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP1 pid=559) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP1 pid=559) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP1 pid=559) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP2 pid=560) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP2 pid=560) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP2 pid=560) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP6 pid=564) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP6 pid=564) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP6 pid=564) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP3 pid=561) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP4 pid=562) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP4 pid=562) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP4 pid=562) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP3 pid=561) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP3 pid=561) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP3 pid=561) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP7 pid=565) INFO 12-14 20:54:04 [gpu_model_runner.py:2640] Loading model from scratch... (Worker_TP7 pid=565) INFO 12-14 20:54:04 [rocm.py:266] Using Flash Attention backend on V1 engine. (only supports block size 64) (Worker_TP7 pid=565) WARNING 12-14 20:54:04 [fp8.py:457] Failed to import DeepGemm kernels. (Worker_TP7 pid=565) WARNING 12-14 20:54:04 [fp8.py:480] CutlassBlockScaledGroupedGemm not supported on the current platform. (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] WorkerProc failed to start. (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] Traceback (most recent call last): (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs) (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 437, in init (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.worker.load_model() (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.model_runner.load_model(eep_scale_up=eep_scale_up) (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2641, in load_model (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.model = model_loader.load_model( (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] model = initialize_model(vllm_config=vllm_config, (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/utils.py", line 67, in initialize_model (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] return model_class(vllm_config=vllm_config, prefix=prefix) (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/minimax_m2.py", line 499, in init (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.model = MiniMaxM2Model( (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/minimax_m2.py", line 343, in init (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.start_layer, self.end_layer, self.layers = make_layers( (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 629, in make_layers (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] [PPMissingLayer() for _ in range(start_layer)] + [ (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 630, in (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/minimax_m2.py", line 345, in (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] lambda prefix: MiniMaxM2DecoderLayer( (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/minimax_m2.py", line 283, in init (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.block_sparse_moe = MiniMaxM2MoE( (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/minimax_m2.py", line 99, in init (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.experts = FusedMoE( (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1201, in init (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] self.quant_method.create_weights(layer=self, **moe_quant_params) (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 509, in create_weights (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] raise ValueError( (Worker_TP5 pid=563) ERROR 12-14 20:54:04 [multiproc_executor.py:597] ValueError: The output_size of gate's and up's weight = 192 is not divisible by weight quantization block_n = 128. (Worker_TP5 pid=563) INFO 12-14 20:54:04 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP0 pid=558) INFO 12-14 20:54:04 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP4 pid=562) INFO 12-14 20:54:04 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP2 pid=560) INFO 12-14 20:54:05 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP3 pid=561) INFO 12-14 20:54:05 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP6 pid=564) INFO 12-14 20:54:05 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP1 pid=559) INFO 12-14 20:54:05 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP7 pid=565) INFO 12-14 20:54:05 [multiproc_executor.py:558] Parent process exited, terminating worker WARNING: Logging before InitGoogleLogging() is written to STDERR W1214 20:54:06.103955 558 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] EngineCore failed to start. (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] Traceback (most recent call last): (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 498, in init (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 84, in init (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 54, in init (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] self._init_executor() (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] raise e from None (EngineCore_DP0 pid=334) ERROR 12-14 20:54:08 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore_DP0 pid=334) Process EngineCore_DP0: (EngineCore_DP0 pid=334) Traceback (most recent call last): (EngineCore_DP0 pid=334) File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=334) self.run() (EngineCore_DP0 pid=334) File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=334) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=334) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core (EngineCore_DP0 pid=334) raise e (EngineCore_DP0 pid=334) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=334) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=334) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 498, in init (EngineCore_DP0 pid=334) super().init(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=334) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 84, in init (EngineCore_DP0 pid=334) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=334) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 54, in init (EngineCore_DP0 pid=334) self._init_executor() (EngineCore_DP0 pid=334) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor (EngineCore_DP0 pid=334) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=334) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready (EngineCore_DP0 pid=334) raise e from None (EngineCore_DP0 pid=334) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=37) Traceback (most recent call last): (APIServer pid=37) File "/usr/local/bin/vllm", line 7, in (APIServer pid=37) sys.exit(main()) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main (APIServer pid=37) args.dispatch_function(args) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd (APIServer pid=37) uvloop.run(run_server(args)) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 69, in run (APIServer pid=37) return loop.run_until_complete(wrapper()) (APIServer pid=37) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=37) return await main (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server (APIServer pid=37) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker (APIServer pid=37) async with build_async_engine_client( (APIServer pid=37) File "/usr/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=37) return await anext(self.gen) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client (APIServer pid=37) async with build_async_engine_client_from_engine_args( (APIServer pid=37) File "/usr/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=37) return await anext(self.gen) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args (APIServer pid=37) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/utils/init.py", line 1586, in inner (APIServer pid=37) return fn(*args, **kwargs) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config (APIServer pid=37) return cls( (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 134, in init (APIServer pid=37) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client (APIServer pid=37) return AsyncMPClient(*client_args) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 769, in init (APIServer pid=37) super().init( (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 448, in init (APIServer pid=37) with launch_core_engines(vllm_config, executor_class, (APIServer pid=37) File "/usr/lib/python3.10/contextlib.py", line 142, in exit (APIServer pid=37) next(self.gen) (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines (APIServer pid=37) wait_for_engine_startup( (APIServer pid=37) File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup (APIServer pid=37) raise RuntimeError("Engine core initialization failed. " (APIServer pid=37) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} root@lenovo:/workspace# /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '