Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • L LLaMA_vllm
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ModelZoo
  • LLaMA_vllm
  • Issues
  • #1

Closed
Open
Created Jun 21, 2024 by ncic_liuyao@ncic_liuyao

vllm能否做个webserver的推理例子

目前给的例子只有api发布,没有交互式网页webserver发布。 我试了下llama_vllm/vllm/examples/gradio_openai_chatbot_webserver.py例子,按照vllm文档去写的脚本,缺gradio等,完整装好后,发布的0.0.0.0:8000的网址,能打开网页,但是一输入问题就error,能否测一下? 因为Lmdeploy和vllm是2个主流的推理框架,我看Lmdeploy里有gradio的测试例子,能运行能访问

测试语句 python vllm/examples/gradio_openai_chatbot_webserver.py
-m $MODEL_NAME
--host 0.0.0.0
--port 8811
--model-url http://localhost:8081/v1
--stop-token-ids 128009,128001

Assignee
Assign to
Time tracking