Commits · 78107fa0911567f131cbad810872ae25594a4506 · OpenDAS / vllm_cscc

05 Apr, 2024 1 commit

[Doc]Add asynchronous engine arguments to documentation. (#3810) · 78107fa0

Sean Gallen authored Apr 04, 2024


Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

78107fa0

03 Apr, 2024 1 commit

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290) · 2ff767b5

Adrian Abeyta authored Apr 03, 2024


Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

2ff767b5

02 Apr, 2024 2 commits
- [Doc] Fix vLLMEngine Doc Page (#3791) · 3bec41f4
  Roger Wang authored Apr 02, 2024
  
  3bec41f4
- [Hardware][Intel] Add CPU inference backend (#3634) · 0e3f06fe
  bigPYJ1151 authored Apr 02, 2024
```
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
```
  0e3f06fe
30 Mar, 2024 1 commit

[Doc] Update installation doc (#3746) · 9c82a1be

youkaichao authored Mar 30, 2024



[Doc] Update installation doc for build from source and explain the dependency on torch/cuda version (#3746)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

9c82a1be

29 Mar, 2024 1 commit
- Usage Stats Collection (#2852) · d8658c8c
  yhu422 authored Mar 28, 2024
  
  d8658c8c
28 Mar, 2024 1 commit
- [Model] Add support for Qwen2MoeModel (#3346) · d6ea427f
  wenyujin333 authored Mar 28, 2024
  
  d6ea427f
27 Mar, 2024 4 commits
- [Docs] Add Command-R to supported models (#3669) · 6d9aa00f
  Woosuk Kwon authored Mar 27, 2024
  
  6d9aa00f
- [Model] Add support for DBRX (#3660) · e24336b5
  Megha Agarwal authored Mar 27, 2024
  
  e24336b5
- [Misc] Minor fix in KVCache type (#3652) · e66b629c
  Woosuk Kwon authored Mar 26, 2024
  
  e66b629c
- [Doc]add lora support (#3649) · 76879342
  Jee Li authored Mar 27, 2024
  
  76879342
25 Mar, 2024 2 commits
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
- [CI/Build] respect the common environment variable MAX_JOBS (#3600) · 42bc3861
  youkaichao authored Mar 24, 2024
  
  42bc3861
21 Mar, 2024 1 commit
- [🚀 Ready to be merged] Added support for Jais models (#3183) · 4c07dd28
  Lalit Pradhan authored Mar 21, 2024
  
  4c07dd28
19 Mar, 2024 3 commits
- [Doc] minor fix of spelling in amd-installation.rst (#3506) · 63e8b28a
  Jim Burtoft authored Mar 19, 2024
  
  63e8b28a
- [Doc] minor fix to neuron-installation.rst (#3505) · 2a60c9bd
  Jim Burtoft authored Mar 19, 2024
  
  2a60c9bd
- [Doc] Add docs about OpenAI compatible server (#3288) · ef65dcfa
  Simon Mo authored Mar 18, 2024
  
  ef65dcfa
15 Mar, 2024 1 commit
- fix document error for value and v_vec illustration (#3421) · 8fa7357f
  laneeee authored Mar 16, 2024
  
  8fa7357f
12 Mar, 2024 1 commit
- docs: Add BentoML deployment doc (#3336) · b0925b38
  Sherlock Xu authored Mar 13, 2024
```
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
```
  b0925b38
11 Mar, 2024 2 commits
- Add distributed model executor abstraction (#3191) · 4c922709
  Zhuohan Li authored Mar 11, 2024
  
  4c922709
- [docs] Add LoRA support information for models (#3299) · 657061fd
  Philipp Moritz authored Mar 11, 2024
  
  657061fd
08 Mar, 2024 1 commit
- [Docs] Fix Unmocked Imports (#3275) · 99c3cfb8
  Roger Wang authored Mar 08, 2024
  
  99c3cfb8
04 Mar, 2024 2 commits
- Add document for vllm paged attention kernel. (#2978) · 27a7b070
  Jialun Lyu authored Mar 04, 2024
  
  27a7b070
- [DOC] add setup document to support neuron backend (#2777) · d0fae881
  Liangfu Chen authored Mar 03, 2024
  
  d0fae881
02 Mar, 2024 1 commit

Add Automatic Prefix Caching (#2762) · ce4f5a29

Sage Moore authored Mar 02, 2024


Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

ce4f5a29

01 Mar, 2024 1 commit
- docs: Add tutorial on deploying vLLM model with KServe (#2586) · 49d849b3
  Yuan Tang authored Mar 01, 2024
```
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
```
  49d849b3
28 Feb, 2024 1 commit
- multi-lora documentation fix (#3064) · a8683102
  Ganesh Jagadeesan authored Feb 28, 2024
  
  a8683102
27 Feb, 2024 2 commits
- [Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046) · 8b430d7d
  Woosuk Kwon authored Feb 26, 2024
  
  8b430d7d
- Support Orion model (#2539) · 48a8f4a7
  张大成 authored Feb 27, 2024
```
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  48a8f4a7
25 Feb, 2024 1 commit
- Port metrics from `aioprometheus` to `prometheus_client` (#2730) · ef978fe4
  Harry Mellor authored Feb 25, 2024
  
  ef978fe4
21 Feb, 2024 1 commit
- [FIX] Add Gemma model to the doc (#2966) · a9c82128
  Zhuohan Li authored Feb 21, 2024
  
  a9c82128
19 Feb, 2024 1 commit
- Support OLMo models. (#2832) · ab3a5a82
  Isotr0py authored Feb 19, 2024
  
  ab3a5a82
17 Feb, 2024 1 commit

multi-LoRA as extra models in OpenAI server (#2775) · 8f36444c

jvmncs authored Feb 17, 2024

how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models

8f36444c

13 Feb, 2024 2 commits
- Remove Yi model definition, please use `LlamaForCausalLM` instead (#2854) · 317b29de
  Philipp Moritz authored Feb 13, 2024
```
Co-authored-by: Roy <jasonailu87@gmail.com>
```
  317b29de
- [CI] Ensure documentation build is checked in CI (#2842) · f9644932
  Simon Mo authored Feb 12, 2024
  
  f9644932
12 Feb, 2024 1 commit
- Add documentation section about LoRA (#2834) · 4ca2c358
  Philipp Moritz authored Feb 12, 2024
  
  4ca2c358
11 Feb, 2024 1 commit
- [ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768) · 0580aab0
  Hongxia Yang authored Feb 11, 2024
  
  0580aab0
07 Feb, 2024 1 commit
- Add documentation on how to do incremental builds (#2796) · 931746bc
  Philipp Moritz authored Feb 07, 2024
  
  931746bc
04 Feb, 2024 1 commit
- docs: fix langchain (#2736) · 5ed704ec
  Massimiliano Pronesti authored Feb 04, 2024
  
  5ed704ec
01 Feb, 2024 1 commit
- Add Internlm2 (#2666) · cd9e60c7
  Fengzhe Zhou authored Feb 02, 2024
  
  cd9e60c7