Commits · c0c17d489628591363ef486fe840d9308ff13dc9 · OpenDAS / vllm_cscc · GitLab

18 Mar, 2024 5 commits
- [Misc] Fix PR Template (#3478) · c0c17d48
  Zhuohan Li authored Mar 18, 2024
  
  c0c17d48
- [CI/Build] Fix Bad Import In Test (#3473) · 097aa0ea
  Robert Shaw authored Mar 18, 2024
  
  097aa0ea
- [Testing] Add test_config.py to CI (#3437) · 482b0adf
  Cade Daniel authored Mar 18, 2024
  
  482b0adf
- CI: Add ROCm Docker Build (#2886) · 8c654c04
  Simon Mo authored Mar 18, 2024
  
  8c654c04
- [Bugfix] Make moe_align_block_size AMD-compatible (#3470) · 9101d832
  Woosuk Kwon authored Mar 18, 2024
  
  9101d832
17 Mar, 2024 2 commits
- [CI] Shard tests for LoRA and Kernels to speed up (#3445) · 93348d94
  Simon Mo authored Mar 17, 2024
  
  93348d94
- [Misc] Use dataclass for InputMetadata (#3452) · abfc4f33
  Woosuk Kwon authored Mar 17, 2024
```
Co-authored-by: youkaichao <youkaichao@126.com>
```
  abfc4f33
16 Mar, 2024 9 commits
- Fix setup.py neuron-ls issue (#2671) · 6b78837b
  Simon Mo authored Mar 16, 2024
  
  6b78837b
- Support arbitrary json_object in OpenAI and Context Free Grammar (#3211) · 120157fd
  Simon Mo authored Mar 16, 2024
  
  120157fd
- [Misc] fix line length for entire codebase (#3444) · 8e67598a
  Simon Mo authored Mar 16, 2024
  
  8e67598a
- fix lint · ad50bf4b
  simon-mo authored Mar 15, 2024
  
  ad50bf4b
- Fix Baichuan chat template (#3340) · cf6ff182
  Dinghow Yang authored Mar 16, 2024
  
  cf6ff182
- Replace `lstrip()` with `removeprefix()` to fix Ruff linter warning (#2958) · 14e3f9a1
  Ronen Schaffer authored Mar 16, 2024
  
  14e3f9a1
- Fixes the incorrect argument in the prefix-prefill test cases (#3246) · 3123f151
  Tao He authored Mar 16, 2024
  
  3123f151
- [Misc] PR templates (#3413) · 413366e9
  youkaichao authored Mar 15, 2024
```
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
```
  413366e9
- Removed Extraneous Print Message From OAI Server (#3440) · 10585e03
  Robert Shaw authored Mar 15, 2024
  
  10585e03
15 Mar, 2024 12 commits
- Asynchronous tokenization (#2879) · fb96c1e9
  Antoni Baum authored Mar 15, 2024
  
  fb96c1e9
- fix document error for value and v_vec illustration (#3421) · 8fa7357f
  laneeee authored Mar 16, 2024
  
  8fa7357f
- Fix issue templates (#3436) · a7af4538
  Harry Mellor authored Mar 15, 2024
  
  a7af4538
- [Misc] add error message in non linux platform (#3438) · 604f2359
  youkaichao authored Mar 15, 2024
  
  604f2359
- Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220) · 14b8ae02
  Tao He authored Mar 16, 2024
```
Signed-off-by: Tao He <sighingnow@gmail.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
```
  14b8ae02
- [Fix] Add args for mTLS support (#3430) · 03d37f24
  Dan Clark authored Mar 15, 2024
```
Co-authored-by: declark1 <daniel.clark@ibm.com>
```
  03d37f24
- Fix tie_word_embeddings for Qwen2. (#3344) · a7c87168
  Yang Fan authored Mar 16, 2024
  
  a7c87168
- Fix `dist.broadcast` stall without group argument (#3408) · 429284dc
  Junda Chen authored Mar 14, 2024
  
  429284dc
- Add chat templates for ChatGLM (#3418) · 253a9807
  Dinghow Yang authored Mar 15, 2024
  
  253a9807
- Add chat templates for Falcon (#3420) · 21539e68
  Dinghow Yang authored Mar 15, 2024
  
  21539e68
- [Misc] add HOST_IP env var (#3419) · b522c447
  youkaichao authored Mar 14, 2024
```
Co-authored-by: Simon Mo <simon.mo@hey.com>
```
  b522c447
- Dynamically configure shared memory size for moe_align_block_size_kernel (#3376) · 78b6c484
  akhoroshev authored Mar 15, 2024
  
  78b6c484
14 Mar, 2024 8 commits
- fix marlin config repr (#3414) · b983ba35
  Enrique Shockwave authored Mar 14, 2024
  
  b983ba35
- Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373) · 54be8a0b
  陈序 authored Mar 15, 2024
```
Co-authored-by: Cade Daniel <edacih@gmail.com>
```
  54be8a0b
- [issue templates] add some issue templates (#3412) · dfc77408
  youkaichao authored Mar 14, 2024
  
  dfc77408
- Add args for mTLS support (#3410) · c17ca8ef
  Dan Clark authored Mar 14, 2024
```
Co-authored-by: Daniel Clark <daniel.clark@ibm.com>
```
  c17ca8ef
- Install `flash_attn` in Docker image (#3396) · 06ec4867
  Thomas Parnell authored Mar 14, 2024
  
  06ec4867
- [Kernel] change benchmark script so that result can be directly used; tune moe... · 8fe83865
  youkaichao authored Mar 14, 2024
```
[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389)
```
  8fe83865
- allow user to chose which vllm's merics to display in grafana (#3393) · a37415c3
  Allen.Dou authored Mar 14, 2024
  
  a37415c3
- [Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion (#3383) · 81653d96
  Simon Mo authored Mar 13, 2024
  
  81653d96
13 Mar, 2024 4 commits
- [FIX] Simpler fix for async engine running on ray (#3371) · eeab52a4
  Zhuohan Li authored Mar 13, 2024
  
  eeab52a4
- Fix lint (#3388) · c33afd89
  Antoni Baum authored Mar 13, 2024
  
  c33afd89
- Add batched RoPE kernel (#3095) · 7e9bd08f
  Terry authored Mar 13, 2024
  
  7e9bd08f
- Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when... · ae0ccb40
  Or Sharir authored Mar 13, 2024
```
Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. (#3350)
```
  ae0ccb40