Commits · 7a64d24aad69e4d2548aa0bf528d9fe63428ab01 · OpenDAS / vllm_cscc

03 Jun, 2024 1 commit
- [Core] Support image processor (#4197) · 7a64d24a
  Cyrus Leung authored Jun 03, 2024
  
  7a64d24a
25 May, 2024 1 commit

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) · 8e192ff9

Eric Xihui Lin authored May 25, 2024


Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

8e192ff9

22 May, 2024 1 commit
- [Frontend] Dynamic RoPE scaling (#4638) · 9b9a10d6
  sasha0552 authored May 22, 2024
  
  9b9a10d6
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

09 May, 2024 1 commit

[Model] Snowflake arctic model implementation (#4652) · ebce310b

Hao Zhang authored May 09, 2024


Co-authored-by: Dash Desai <1723932+iamontheinet@users.noreply.github.com>
Co-authored-by: Aurick Qiao <qiao@aurick.net>
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
Co-authored-by: Aurick Qiao <aurickq@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

ebce310b

02 May, 2024 1 commit
- [Misc] centralize all usage of environment variables (#4548) · 5b8a7c1c
  youkaichao authored May 02, 2024
  
  5b8a7c1c
30 Apr, 2024 1 commit
- fix_tokenizer_snapshot_download_bug (#4493) · 6ad58f42
  fuchen.ljl authored May 01, 2024
  
  6ad58f42
27 Apr, 2024 1 commit

[Core] Support offline use of local cache for models (#4374) · d6e520e1

Prashant Gupta authored Apr 27, 2024


Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Travis Johnson <tjohnson31415@gmail.com>

d6e520e1

26 Apr, 2024 2 commits
- [CI] Disable non-lazy string operation on logging (#4326) · a88081bf
  SangBin Cho authored Apr 26, 2024
```
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
```
  a88081bf
- [Bugfix] Fix parameter name in `get_tokenizer` (#4107) · a74dee9b
  Cyrus Leung authored Apr 26, 2024
  
  a74dee9b
25 Apr, 2024 1 commit
- [Core] Move ray_utils.py from `engine` to `executor` package (#4347) · 479d69fa
  Nick Hill authored Apr 24, 2024
  
  479d69fa
23 Apr, 2024 1 commit
- [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) · 0ae11f78
  SangBin Cho authored Apr 23, 2024
  
  0ae11f78
16 Apr, 2024 1 commit
- [Core] Refactor model loading code (#4097) · 69e1d2fb
  Antoni Baum authored Apr 16, 2024
  
  69e1d2fb
12 Apr, 2024 2 commits
- [mypy] Add mypy type annotation part 1 (#4006) · 09473ee4
  SangBin Cho authored Apr 13, 2024
  
  09473ee4
- [Doc] Add typing hints / mypy types cleanup (#3816) · c2b4a1bc
  Michael Feil authored Apr 11, 2024
```
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
```
  c2b4a1bc
11 Apr, 2024 1 commit
- [BugFix] Fix handling of stop strings and stop token ids (#3672) · e46a60aa
  Nick Hill authored Apr 11, 2024
  
  e46a60aa
04 Apr, 2024 1 commit
- [BugFix] Pass tokenizer_config to local_tokenizer_group (#3754) · 294f8f66
  Tao He authored Apr 04, 2024
```
Signed-off-by: Tao He <sighingnow@gmail.com>
```
  294f8f66
01 Apr, 2024 1 commit

[Misc] Some minor simplifications to detokenization logic (#3670) · 49782fcb

Nick Hill authored Apr 01, 2024

Some simplifications made for clarity.

Also moves detokenization-related functions from tokenizer.py to detokenizer.py.

49782fcb

30 Mar, 2024 1 commit
- [Core][Bugfix] cache len of tokenizer (#3741) · 203d4f82
  youkaichao authored Mar 29, 2024
  
  203d4f82
29 Mar, 2024 1 commit
- [BugFix] Fix tokenizer out of vocab size (#3685) · 6110c39d
  Roy authored Mar 29, 2024
  
  6110c39d
27 Mar, 2024 1 commit
- [Model] Add support for DBRX (#3660) · e24336b5
  Megha Agarwal authored Mar 27, 2024
  
  e24336b5
25 Mar, 2024 2 commits
- [Feature] Add vision language model support. (#3042) · 64172a97
  xwjiang2010 authored Mar 25, 2024
  
  64172a97
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
22 Mar, 2024 1 commit
- [Core] Improve detokenization performance for prefill (#3469) · bfdb1ba5
  Antoni Baum authored Mar 22, 2024
```
Co-authored-by: MeloYang <meloyang05@gmail.com>
```
  bfdb1ba5
21 Mar, 2024 2 commits
- [Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551) · c188ecb0
  Woosuk Kwon authored Mar 21, 2024
```
Co-authored-by: Roy <jasonailu87@gmail.com>
Co-authored-by: Roger Meier <r.meier@siemens.com>
```
  c188ecb0
- [🚀 Ready to be merged] Added support for Jais models (#3183) · 4c07dd28
  Lalit Pradhan authored Mar 21, 2024
  
  4c07dd28
20 Mar, 2024 1 commit
- [Core] Add generic typing to `LRUCache` (#3511) · 4ad521d8
  Nick Hill authored Mar 20, 2024
  
  4ad521d8
15 Mar, 2024 1 commit
- Asynchronous tokenization (#2879) · fb96c1e9
  Antoni Baum authored Mar 15, 2024
  
  fb96c1e9
11 Mar, 2024 2 commits
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
- [BugFix] Fix get tokenizer when using ray (#3301) · 9e8744a5
  Roy authored Mar 11, 2024
  
  9e8744a5
29 Feb, 2024 1 commit
- Support starcoder2 architecture (#3089) · bfdcfa6a
  Seonghyeon authored Feb 29, 2024
  
  bfdcfa6a
27 Feb, 2024 1 commit
- [Minor] Remove unused config files (#3039) · d9f726c4
  Roy authored Feb 27, 2024
  
  d9f726c4
19 Feb, 2024 1 commit
- Support OLMo models. (#2832) · ab3a5a82
  Isotr0py authored Feb 19, 2024
  
  ab3a5a82
18 Feb, 2024 1 commit
- Add code-revision config argument for Hugging Face Hub (#2892) · 786b7f18
  Mark Mozolewski authored Feb 17, 2024
  
  786b7f18
14 Feb, 2024 1 commit
- Migrate AquilaForCausalLM to LlamaForCausalLM (#2867) · 4efbac6d
  Roy authored Feb 15, 2024
  
  4efbac6d
13 Feb, 2024 3 commits
- Remove Yi model definition, please use `LlamaForCausalLM` instead (#2854) · 317b29de
  Philipp Moritz authored Feb 13, 2024
```
Co-authored-by: Roy <jasonailu87@gmail.com>
```
  317b29de
- Revert "Refactor llama family models (#2637)" (#2851) · ea356004
  Philipp Moritz authored Feb 13, 2024
```
This reverts commit 5c976a7e.
```
  ea356004
- Refactor llama family models (#2637) · 5c976a7e
  Roy authored Feb 13, 2024
  
  5c976a7e
23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa

17 Dec, 2023 1 commit
- [Minor] Delete Llama tokenizer warnings (#2146) · 3d1cfbfc
  Woosuk Kwon authored Dec 16, 2023
  
  3d1cfbfc