Commits · 0fca3cdcf265cd375bca684d951702b6b7adf65a · OpenDAS / vllm_cscc

"vllm/model_executor/models/opt.py" did not exist on "de0fabbc5c84e6771d70b92014ae06fe82654ff0"

13 May, 2024 1 commit
- [Misc] Enhance attention selector (#4751) · 0fca3cdc
  Woosuk Kwon authored May 13, 2024
  
  0fca3cdc
26 Apr, 2024 1 commit
- [Misc][Refactor] Generalize linear_method to be quant_method (#4373) · a62aaf1d
  Cody Yu authored Apr 26, 2024
  
  a62aaf1d
16 Apr, 2024 1 commit
- [Core] Refactor model loading code (#4097) · 69e1d2fb
  Antoni Baum authored Apr 16, 2024
  
  69e1d2fb
10 Apr, 2024 1 commit

[Core][Refactor] move parallel_utils into vllm/distributed (#3950) · 63e7176f

youkaichao authored Apr 10, 2024

[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)

63e7176f

28 Mar, 2024 1 commit

[Model] Add support for xverse (#3610) · 098e1776

hxer7963 authored Mar 28, 2024


Co-authored-by: willhe <hexin@xverse.cn>
Co-authored-by: root <root@localhost.localdomain>

098e1776

25 Mar, 2024 3 commits
- [Feature] Add vision language model support. (#3042) · 64172a97
  xwjiang2010 authored Mar 25, 2024
  
  64172a97
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
- [Core] Refactor Attention Take 2 (#3462) · 925f3332
  Woosuk Kwon authored Mar 24, 2024
  
  925f3332
20 Mar, 2024 1 commit
- Migrate `logits` computation and gather to `model_runner` (#3233) · f1c0fc39
  Roy authored Mar 21, 2024
  
  f1c0fc39
07 Mar, 2024 1 commit
- Separate attention backends (#3005) · 2daf23ab
  Woosuk Kwon authored Mar 07, 2024
  
  2daf23ab
28 Feb, 2024 1 commit
- Add LoRA support for Gemma (#3050) · 929b4f29
  Woosuk Kwon authored Feb 28, 2024
  
  929b4f29
22 Feb, 2024 1 commit
- Migrate MistralForCausalLM to LlamaForCausalLM (#2868) · 344020c9
  Roy authored Feb 22, 2024
  
  344020c9
14 Feb, 2024 2 commits
- Fix internlm after https://github.com/vllm-project/vllm/pull/2860 (#2861) · 0c48b37c
  Philipp Moritz authored Feb 13, 2024
  
  0c48b37c
- Migrate InternLMForCausalLM to LlamaForCausalLM (#2860) · 7eacffd9
  Philipp Moritz authored Feb 13, 2024
```
Co-authored-by: Roy <jasonailu87@gmail.com>
```
  7eacffd9
13 Feb, 2024 3 commits

Add LoRA support for Mixtral (#2831) · 2a543d6e

Terry authored Feb 13, 2024

* add mixtral lora support

* formatting

* fix incorrectly ported logic

* polish tests

* minor fixes and refactoring

* minor fixes

* formatting

* rename and remove redundant logic

* refactoring

* refactoring

* minor fix

* minor refactoring

* fix code smell

2a543d6e

Revert "Refactor llama family models (#2637)" (#2851) · ea356004
Philipp Moritz authored Feb 13, 2024
```
This reverts commit 5c976a7e.
```
ea356004
Refactor llama family models (#2637) · 5c976a7e
Roy authored Feb 13, 2024

5c976a7e

23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa

03 Jan, 2024 1 commit
- Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) · fd4ea8ef
  Zhuohan Li authored Jan 04, 2024
  
  fd4ea8ef
17 Dec, 2023 1 commit

Optimize model execution with CUDA graph (#1926) · 37ca5581

Woosuk Kwon authored Dec 16, 2023


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

37ca5581

15 Dec, 2023 1 commit
- Add GPTQ support (#916) · 0fbfc4b8
  CHU Tianxiang authored Dec 15, 2023
  
  0fbfc4b8
10 Dec, 2023 1 commit
- [Minor] Add comment on skipping rope caches (#2004) · 24cde76a
  Woosuk Kwon authored Dec 10, 2023
  
  24cde76a
09 Dec, 2023 1 commit
- Fix for KeyError on Loading LLaMA (#1978) · 3a8c2381
  Jun Gao authored Dec 10, 2023
  
  3a8c2381
30 Nov, 2023 1 commit
- Refactor Worker & InputMetadata (#1843) · 27feead2
  Woosuk Kwon authored Nov 29, 2023
  
  27feead2
29 Nov, 2023 1 commit
- Refactor Attention (#1840) · a9e45742
  Woosuk Kwon authored Nov 29, 2023
  
  a9e45742
24 Nov, 2023 1 commit
- Fix model docstrings (#1764) · 7c600440
  Woosuk Kwon authored Nov 23, 2023
  
  7c600440
20 Nov, 2023 1 commit
- Migrate linter from `pylint` to `ruff` (#1665) · 5ffc0d13
  Simon Mo authored Nov 20, 2023
  
  5ffc0d13
19 Nov, 2023 1 commit
- [Optimization] Implement fused add rmsnorm (#1667) · e1054247
  ljss authored Nov 19, 2023
  
  e1054247
16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

22 Oct, 2023 1 commit

Support SqueezeLLM (#1326) · 1f24755b

chooper1 authored Oct 22, 2023


Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

1f24755b

02 Oct, 2023 2 commits
- TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) · ba0bfd40
  Zhuohan Li authored Oct 02, 2023
  
  ba0bfd40
- support sharding llama2-70b on more than 8 GPUs (#1209) · a60b3530
  Zhuohan Li authored Oct 02, 2023
```
Co-authored-by: JiCheng <247153481@qq.com>
```
  a60b3530
27 Sep, 2023 1 commit

Support Longchat and RoPE scaling (#555) · 21877b0d

Lily Liu authored Sep 27, 2023


Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

21877b0d

20 Sep, 2023 1 commit
- rope_theta and max_position_embeddings from config (#1096) · 3302f0ae
  Antoni Baum authored Sep 20, 2023
```
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
```
  3302f0ae
18 Sep, 2023 1 commit
- Convert before transpose (#1073) · cc796b13
  Woosuk Kwon authored Sep 18, 2023
  
  cc796b13
16 Sep, 2023 1 commit

Implement AWQ quantization support for LLaMA (#1032) · e3e79e9e

Woosuk Kwon authored Sep 16, 2023


Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>

e3e79e9e

13 Sep, 2023 1 commit

Add Model Revision Support (#1014) · ab019eea

Jasmond L authored Sep 14, 2023


Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

ab019eea

07 Sep, 2023 1 commit
- Enable safetensors loading for all models (#974) · c957c741
  Zhuohan Li authored Sep 07, 2023
  
  c957c741
05 Sep, 2023 1 commit
- Align vLLM's beam search implementation with HF generate (#857) · 002800f0
  Zhuohan Li authored Sep 04, 2023
  
  002800f0
30 Aug, 2023 1 commit
- Accelerate LLaMA model loading (#234) · 0d93f156
  JFDuan authored Aug 30, 2023
  
  0d93f156