Commits · 95e7d4a97cd64f8c6dc226ec0bbceebef6458701 · OpenDAS / vllm_cscc

10 Apr, 2024 2 commits
- [Core][Refactor] move parallel_utils into vllm/distributed (#3950) · 63e7176f
  youkaichao authored Apr 10, 2024
```
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
```
  63e7176f
- [Misc] Avoid loading incorrect LoRA config (#3777) · 11dd6ebb
  Jee Li authored Apr 10, 2024
  
  11dd6ebb
09 Apr, 2024 1 commit
- [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable... · e7c7067b
  Cade Daniel authored Apr 09, 2024
```
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
```
  e7c7067b
27 Mar, 2024 1 commit
- [Kernel] support non-zero cuda devices in punica kernels (#3636) · 566b57c5
  Jee Li authored Mar 27, 2024
  
  566b57c5
26 Mar, 2024 1 commit
- Enable more models to inference based on LoRA (#3382) · 8af890a8
  Jee Li authored Mar 26, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  8af890a8
25 Mar, 2024 1 commit
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
22 Mar, 2024 1 commit
- [Hardware][Neuron] Refactor neuron support (#3471) · e90fc21f
  Zhuohan Li authored Mar 21, 2024
  
  e90fc21f
20 Mar, 2024 2 commits
- Migrate `logits` computation and gather to `model_runner` (#3233) · f1c0fc39
  Roy authored Mar 21, 2024
  
  f1c0fc39
- [1/n][Chunked Prefill] Refactor input query shapes (#3236) · 6e435de7
  SangBin Cho authored Mar 21, 2024
  
  6e435de7
15 Mar, 2024 1 commit
- Asynchronous tokenization (#2879) · fb96c1e9
  Antoni Baum authored Mar 15, 2024
  
  fb96c1e9
13 Mar, 2024 1 commit
- Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when... · ae0ccb40
  Or Sharir authored Mar 13, 2024
```
Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. (#3350)
```
  ae0ccb40
11 Mar, 2024 2 commits
- Add distributed model executor abstraction (#3191) · 4c922709
  Zhuohan Li authored Mar 11, 2024
  
  4c922709
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
10 Mar, 2024 1 commit
- Enhance lora tests with more layer and rank variations (#3243) · 0bba88df
  Terry authored Mar 09, 2024
  
  0bba88df
28 Feb, 2024 2 commits
- Add LoRA support for Gemma (#3050) · 929b4f29
  Woosuk Kwon authored Feb 28, 2024
  
  929b4f29
- [Neuron] Support inference with transformers-neuronx (#2569) · 3b7178cf
  Liangfu Chen authored Feb 28, 2024
  
  3b7178cf
22 Feb, 2024 1 commit
- chore(vllm): codespell for spell checking (#2820) · 93dc5a28
  Massimiliano Pronesti authored Feb 22, 2024
  
  93dc5a28
13 Feb, 2024 1 commit

Add LoRA support for Mixtral (#2831) · 2a543d6e

Terry authored Feb 13, 2024

* add mixtral lora support

* formatting

* fix incorrectly ported logic

* polish tests

* minor fixes and refactoring

* minor fixes

* formatting

* rename and remove redundant logic

* refactoring

* refactoring

* minor fix

* minor refactoring

* fix code smell

2a543d6e

01 Feb, 2024 1 commit
- Remove hardcoded `device="cuda" ` to support more devices (#2503) · 96b6f475
  Kunshang Ji authored Feb 02, 2024
```
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
```
  96b6f475
23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa