Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • L LLaMA_vllm
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ModelZoo
  • LLaMA_vllm
  • Issues
  • #6

Closed
Open
Created Sep 11, 2024 by binbin2024@binbin2024

vllm多卡推理的时候报错

python benchmarks/benchmark_throughput.py --num-prompts 1 --input-ce-eager --dtype float16 Namespace(backend='vllm', dataset=None, input_len=32, output_len=128, model='./Llama-2-7b-chat-hf', tokeniearch=False, num_prompts=1, seed=0, hf_max_batch_size=None, trust_remote_code=True, max_model_len=None, dt INFO 09-11 16:08:13 config.py:420] Disabled the custom all-reduce kernel because it is not supported on AM 2024-09-11 16:08:16,051 INFO worker.py:1724 -- Started a local Ray instance. INFO 09-11 16:08:18 llm_engine.py:87] Initializing an LLM engine with config: model='./Llama-2-7b-chat-hf'sion=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=ae_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0) WARNING: Logging before InitGoogleLogging() is written to STDERR I0911 16:08:26.039074 655 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCWAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, (RayWorkerVllm pid=14356) WARNING: Logging before InitGoogleLogging() is written to STDERR (RayWorkerVllm pid=14356) I0911 16:08:26.036923 14356 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL _TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DE I0911 16:08:27.040035 655 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A I0911 16:08:27.041360 655 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCWAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, I0911 16:08:27.041788 655 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCWAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, (RayWorkerVllm pid=14356) I0911 16:08:27.041778 14356 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL _TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DE (RayWorkerVllm pid=14356) I0911 16:08:27.042833 14356 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL _TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DE Traceback (most recent call last): File "/llama/llama_vllm/vllm/benchmarks/benchmark_throughput.py", line 343, in main(args) File "/llama/llama_vllm/vllm/benchmarks/benchmark_throughput.py", line 209, in main elapsed_time = run_vllm(requests, args.model, args.tokenizer, File "/llama/llama_vllm/vllm/benchmarks/benchmark_throughput.py", line 78, in run_vllm llm = LLM( File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 109, in init self.llm_engine = LLMEngine.from_engine_args(engine_args) File "/usr/local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args engine = cls(*engine_configs, File "/usr/local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 126, in init self._init_workers_ray(placement_group) File "/usr/local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 307, in _init_workers_ray self._run_workers( File "/usr/local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1057, in _run_workers ray_worker_outputs = ray.get(ray_worker_outputs) File "/usr/local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wra return fn(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ModuleNotFoundError): ray::RayWorkerVllm.execute_method() (pid=14356, ip=10.1.rVllm object at 0x7ef816ad3520>) File "/llama/llama_vllm/vllm/vllm/engine/ray_utils.py", line 37, in execute_method return executor(*args, **kwargs) File "/llama/llama_vllm/vllm/vllm/worker/worker.py", line 100, in load_model self.model_runner.load_model() File "/llama/llama_vllm/vllm/vllm/worker/model_runner.py", line 88, in load_model self.model = get_model(self.model_config, File "/llama/llama_vllm/vllm/vllm/model_executor/utils.py", line 49, in get_model imported_model_loader = importlib.import_module( File "/usr/local/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/llama/llama_vllm/vllm/vllm/model_executor/model_loader.py", line 10, in from vllm.model_executor.weight_utils import (get_quant_config, File "/llama/llama_vllm/vllm/vllm/model_executor/weight_utils.py", line 18, in from vllm.model_executor.layers.quantization import (get_quantization_config, File "/llama/llama_vllm/vllm/vllm/model_executor/layers/quantization/init.py", line 4, in from vllm.model_executor.layers.quantization.awq import AWQConfig File "/llama/llama_vllm/vllm/vllm/model_executor/layers/quantization/awq.py", line 6, in from vllm._C import ops ModuleNotFoundError: No module named 'vllm._C'

Assignee
Assign to
Time tracking