Commits · b1366a953498fde9c5e7ab91915367ebc69008b2 · OpenDAS / vllm_cscc · GitLab

27 Jul, 2024 22 commits
- Add Nemotron to PP_SUPPORTED_MODELS (#6863) · b1366a95
  Michael Goin authored Jul 27, 2024
  
  b1366a95
- [Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795) · 75acdaa4
  Alexander Matveev authored Jul 27, 2024
  
  75acdaa4
- [TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) · fad5576c
  Woosuk Kwon authored Jul 27, 2024
  
  fad5576c
- [Docs] Add RunLLM chat widget (#6857) · f954d071
  Chenggang Wu authored Jul 27, 2024
  
  f954d071
- [Model] Initial support for BLIP-2 (#5920) · 1ad86acf
  Cyrus Leung authored Jul 27, 2024
```
Co-authored-by: ywang96 <ywang@roblox.com>
```
  1ad86acf
- [CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) · ecb33a28
  Roger Wang authored Jul 27, 2024
  
  ecb33a28
- [bugfix] make args.stream work (#6831) · a57d7582
  Wang Ran (汪然) authored Jul 27, 2024
  
  a57d7582
- [Bugfix] Fix VLM example typo (#6859) · 925de97e
  Roger Wang authored Jul 26, 2024
  
  925de97e
- [Misc][VLM][Doc] Consolidate offline examples for vision language models (#6858) · aa46953a
  Roger Wang authored Jul 26, 2024
```
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
```
  aa46953a
- [Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802) · 593e79e7
  Travis Johnson authored Jul 26, 2024
```
[Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
```
  593e79e7
- [Doc] Add missing mock import to docs `conf.py` (#6834) · c53041ae
  Harry Mellor authored Jul 27, 2024
  
  c53041ae
- [Hardware][TPU] Implement tensor parallelism with Ray (#5871) · 52f07e3d
  Woosuk Kwon authored Jul 26, 2024
  
  52f07e3d
- [Model] H2O Danube3-4b (#6451) · 14dbd5a7
  Joe authored Jul 26, 2024
  
  14dbd5a7
- [Bugfix][Model] Jamba assertions and no chunked prefill by default for Jamba (#6784) · ed94e4f4
  tomeras91 authored Jul 27, 2024
  
  ed94e4f4
- [Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844) · 3c301239
  omrishiv authored Jul 26, 2024
```
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
```
  3c301239
- [ROCm] Upgrade PyTorch nightly version (#6845) · ced36cd8
  Woosuk Kwon authored Jul 26, 2024
  
  ced36cd8
- [Bugfix]: Fix Tensorizer test failures (#6835) · 969d0322
  Sanger Steel authored Jul 26, 2024
  
  969d0322
- [Bug Fix] Illegal memory access, FP8 Llama 3.1 405b (#6852) · 55712941
  Lucas Wilkinson authored Jul 26, 2024
  
  55712941
- [Frontend] Factor out code for running uvicorn (#6828) · 981b0d56
  Cyrus Leung authored Jul 27, 2024
  
  981b0d56
- [TPU] Support collective communications in XLA devices (#6813) · d09b94ca
  Woosuk Kwon authored Jul 26, 2024
  
  d09b94ca
- enforce eager mode with bnb quantization temporarily (#6846) · bb549467
  chenqianfzh authored Jul 26, 2024
  
  bb549467
- Update README.md (#6847) · b5f49ee5
  Gurpreet Singh Dhami authored Jul 26, 2024
  
  b5f49ee5
26 Jul, 2024 13 commits
- [Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283) · 150a1ffb
  Zhanghao Wu authored Jul 26, 2024
  
  150a1ffb
- [Doc] Add Nemotron to supported model docs (#6843) · 281977bd
  Michael Goin authored Jul 26, 2024
  
  281977bd
- [Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend... · 3bbb4936
  Li, Jiang authored Jul 27, 2024
```
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation  (#6125)
```
  3bbb4936
- [Misc][TPU] Support TPU in initialize_ray_cluster (#6812) · aa486779
  Woosuk Kwon authored Jul 26, 2024
  
  aa486779
- [Build/CI][ROCm] Minor simplification to Dockerfile.rocm (#6811) · 71734f1b
  Woosuk Kwon authored Jul 26, 2024
  
  71734f1b
- [Bugfix][Kernel] Promote another index to int64_t (#6838) · 50704f52
  Tyler Michael Smith authored Jul 26, 2024
  
  50704f52
- [Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611) · 07278c37
  Michael Goin authored Jul 26, 2024
  
  07278c37
- [doc][debugging] add known issues for hangs (#6816) · 85ad7e2d
  youkaichao authored Jul 25, 2024
  
  85ad7e2d
- [Core] Use array to speedup padding (#6779) · 89a84b0b
  Peng Guanwen authored Jul 26, 2024
  
  89a84b0b
- [Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770) · 084a01fd
  Anthony Platanios authored Jul 26, 2024
  
  084a01fd
- Fix ReplicatedLinear weight loading (#6793) · 062a1d0f
  QQSong authored Jul 25, 2024
  
  062a1d0f
- [ci] Mark tensorizer as soft fail and separate from grouped test (#6810) · 2eb9f4ff
  Kevin H. Luu authored Jul 25, 2024
```
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810)
Signed-off-by: kevin <kevin@anyscale.com>
```
  2eb9f4ff
- [ci][distributed] fix flaky tests (#6806) · 443c7cf4
  youkaichao authored Jul 25, 2024
  
  443c7cf4
25 Jul, 2024 5 commits
- [Core] Fix ray forward_dag error mssg (#6792) · 1adddb14
  SangBin Cho authored Jul 25, 2024
  
  1adddb14
- [Docs] Publish 5th meetup slides (#6799) · b7215de2
  Woosuk Kwon authored Jul 25, 2024
  
  b7215de2
- [doc][distributed] improve multinode serving doc (#6804) · f3ff63c3
  youkaichao authored Jul 25, 2024
  
  f3ff63c3
- [Bugfix] Fix empty (nullptr) channelwise scales when loading wNa16 using... · cd7edc4e
  Lucas Wilkinson authored Jul 25, 2024
```
[Bugfix] Fix empty (nullptr) channelwise  scales when loading wNa16 using compressed tensors (#6798)
```
  cd7edc4e
- [Doc] Add documentations for nightly benchmarks (#6412) · 6a1e25b1
  Kuntai Du authored Jul 25, 2024
  
  6a1e25b1