Commits · f31c1f90e381967d25591a8928782d8a6a13693e · OpenDAS / vllm_cscc · GitLab

16 Jun, 2024 1 commit
- Add basic correctness 2 GPU tests to 4 GPU pipeline (#5518) · f31c1f90
  Antoni Baum authored Jun 16, 2024
  
  f31c1f90
15 Jun, 2024 8 commits
- [Fix] Correct OpenAI batch response format (#5554) · 3ce2c050
  zifeitong authored Jun 15, 2024
  
  3ce2c050
- [BugFix] Don't start a Ray cluster when not using Ray (#5570) · 1c0afa13
  Nick Hill authored Jun 15, 2024
  
  1c0afa13
- add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088 (#5145) · d919ecc7
  Alexander Matveev authored Jun 15, 2024
  
  d919ecc7
- [misc] Do not allow to use lora with chunked prefill. (#5538) · e691918e
  SangBin Cho authored Jun 15, 2024
```
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
```
  e691918e
- [CI/Build] Test both text and token IDs in batched OpenAI Completions API (#5568) · 81fbb365
  Cyrus Leung authored Jun 15, 2024
  
  81fbb365
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
- [Core][Bugfix]: fix prefix caching for blockv2 (#5364) · 1b8a0d71
  leiwen83 authored Jun 15, 2024
```
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
```
  1b8a0d71
- Add ccache to amd (#5555) · bd7efe95
  Simon Mo authored Jun 14, 2024
  
  bd7efe95
14 Jun, 2024 16 commits
- [Core][Distributed] improve p2p cache generation (#5528) · f5bb85b4
  youkaichao authored Jun 14, 2024
  
  f5bb85b4
- [Bugfix] Fix typo in Pallas backend (#5558) · 28c145eb
  Woosuk Kwon authored Jun 14, 2024
  
  28c145eb
- [Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models (#5460) · e2afb03c
  Thomas Parnell authored Jun 14, 2024
```
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
```
  e2afb03c
- [Doc] Update documentation on Tensorizer (#5471) · 6e2527a7
  Sanger Steel authored Jun 14, 2024
  
  6e2527a7
- [Docs] Add ZhenFund as a Sponsor (#5548) · cdab68dc
  Simon Mo authored Jun 14, 2024
  
  cdab68dc
- [misc][distributed] fix benign error in `is_in_the_same_node` (#5512) · d1c3d7d1
  youkaichao authored Jun 14, 2024
  
  d1c3d7d1
- [Core] Remove duplicate processing in async engine (#5525) · 77490c6f
  Cyrus Leung authored Jun 15, 2024
  
  77490c6f
- [mis] fix flaky test of test_cuda_device_count_stateless (#5546) · 48f589e1
  youkaichao authored Jun 14, 2024
  
  48f589e1
- [Kernel] Suppress mma.sp warning on CUDA 12.5 and later (#5401) · 348616ac
  Tyler Michael Smith authored Jun 14, 2024
  
  348616ac
- [ Misc ] Rs/compressed tensors cleanup (#5432) · 15985680
  Robert Shaw authored Jun 14, 2024
```
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
```
  15985680
- [Misc] Fix arg names (#5524) · d74674bb
  Allen.Dou authored Jun 15, 2024
  
  d74674bb
- [Kernel] Fix CUTLASS 3.x custom broadcast load epilogue (#5516) · 703475f6
  Tyler Michael Smith authored Jun 14, 2024
  
  703475f6
- [CI/Build] Disable LLaVA-NeXT CPU test (#5529) · d47af2bc
  Cyrus Leung authored Jun 15, 2024
  
  d47af2bc
- [CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with... · 319ad7f1
  Kuntai Du authored Jun 13, 2024
```
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label (#5073)
Co-authored-by: simon-mo <simon.mo@hey.com>
```
  319ad7f1
- bump version to v0.5.0.post1 (#5522) · 0f0d8bc0
  Simon Mo authored Jun 13, 2024
  
  0f0d8bc0
- [Misc] Fix arg names in quantizer script (#5507) · 55d6361b
  Allen.Dou authored Jun 14, 2024
  
  55d6361b
13 Jun, 2024 15 commits
- [Hardware][Intel] Support CPU inference with AVX2 ISA (#5452) · cd9c0d65
  Jie Fu (傅杰) authored Jun 14, 2024
  
  cd9c0d65
- Add `cuda_device_count_stateless` (#5473) · 50eed24d
  Antoni Baum authored Jun 13, 2024
  
  50eed24d
- [Kernel] Disable CUTLASS kernels for fp8 (#5505) · e38042d4
  Tyler Michael Smith authored Jun 13, 2024
  
  e38042d4
- [CI/Build] Disable test_fp8.py (#5508) · 33e3b372
  Tyler Michael Smith authored Jun 13, 2024
  
  33e3b372
- [misc] fix format.sh (#5511) · 1696efe6
  youkaichao authored Jun 13, 2024
  
  1696efe6
- Revert "[Core] Remove unnecessary copies in flash attn backend" (#5478) · 6b0511a5
  Antoni Baum authored Jun 13, 2024
  
  6b0511a5
- Seperate dev requirements into lint and test (#5474) · a8fda4f6
  Antoni Baum authored Jun 13, 2024
  
  a8fda4f6
- [MISC] Remove FP8 warning (#5472) · 30299a41
  Cody Yu authored Jun 13, 2024
```
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
```
  30299a41
- [Kernel] Factor out epilogues from cutlass kernels (#5391) · 85657b56
  Tyler Michael Smith authored Jun 13, 2024
```
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: zifeitong <zifei.tong@parasail.io>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
```
  85657b56
- [Doc] Update LLaVA docs (#5437) · 0ce7b952
  Cyrus Leung authored Jun 14, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  0ce7b952
- [CI/Build] Simplify OpenAI server setup in tests (#5100) · 39873476
  Cyrus Leung authored Jun 14, 2024
  
  39873476
- [Misc] Add vLLM version getter to utils (#5098) · 03dccc88
  Cyrus Leung authored Jun 14, 2024
  
  03dccc88
- [Docs] Add 4th meetup slides (#5509) · a65634d3
  Woosuk Kwon authored Jun 13, 2024
  
  a65634d3
- [Hardware][Intel] Optimize CPU backend and add more performance tips (#4971) · 80aa7e91
  Li, Jiang authored Jun 14, 2024
```
Co-authored-by: Jianan Gu <jianan.gu@intel.com>
```
  80aa7e91
- [Kernel] Tune Qwen2MoE kernel configurations with tp2,4 (#5497) · bd439735
  wenyujin333 authored Jun 14, 2024
```
Tune Qwen2-57B-A14B configs based on #4921

Throughput Performance
command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2

A100 GPU

benchmark	no config	w/ PR
tp=2	10.53 requests/s, 11058.17 tokens/s	12.47 requests/s, 13088.57 tokens/s
tp=4	17.77 requests/s, 18662.95 tokens/s	20.20 requests/s, 21212.32 tokens/s
```
  bd439735