Commits · 711a000255eac3e034f0b73aa5cc62b45201a571 · OpenDAS / vllm_cscc

14 Apr, 2024 1 commit
- [Frontend] [Core] feat: Add model loading using `tensorizer` (#3476) · 711a0002
  Sanger Steel authored Apr 13, 2024
  
  711a0002
04 Apr, 2024 1 commit
- [Core] Enable hf_transfer by default if available (#3817) · 537ee25f
  Michael Feil authored Apr 03, 2024
  
  537ee25f
03 Apr, 2024 1 commit

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290) · 2ff767b5

Adrian Abeyta authored Apr 03, 2024


Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

2ff767b5

25 Mar, 2024 2 commits
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
- [Bugfix] store lock file in tmp directory (#3578)" (#3599) · 56a8652f
  Woosuk Kwon authored Mar 24, 2024
```
Co-authored-by: youkaichao <youkaichao@126.com>
```
  56a8652f
23 Mar, 2024 1 commit
- [Bugfix] use SoftLockFile instead of LockFile (#3578) · 743a0b74
  kota-iizuka authored Mar 24, 2024
  
  743a0b74
08 Mar, 2024 1 commit
- Move model filelocks from `/tmp/` to `~/.cache/vllm/locks/` dir (#3241) · c2c5e090
  Michael Goin authored Mar 08, 2024
  
  c2c5e090
01 Feb, 2024 1 commit
- Use revision when downloading the quantization config file (#2697) · c410f5d0
  Pernekhan Utemuratov authored Feb 01, 2024
```
Co-authored-by: Pernekhan Utemuratov <pernekhan@deepinfra.com>
```
  c410f5d0
20 Jan, 2024 1 commit
- [Bugfix] fix load local safetensors model (#2512) · 91a61da9
  Roy authored Jan 20, 2024
  
  91a61da9
19 Jan, 2024 1 commit
- refactor complemention api for readability (#2499) · dd7e8f5f
  Simon Mo authored Jan 18, 2024
  
  dd7e8f5f
18 Jan, 2024 1 commit
- Don't download both safetensor and bin files. (#2480) · 7e108113
  Nikola Borisov authored Jan 18, 2024
  
  7e108113
17 Dec, 2023 1 commit
- [Minor] Fix a typo in .pt weight support (#2160) · 2c9b6380
  Woosuk Kwon authored Dec 17, 2023
  
  2c9b6380
16 Dec, 2023 1 commit
- Simplify weight loading logic (#2133) · eed74a55
  Roy authored Dec 17, 2023
  
  eed74a55
15 Dec, 2023 1 commit
- Add GPTQ support (#916) · 0fbfc4b8
  CHU Tianxiang authored Dec 15, 2023
  
  0fbfc4b8
20 Nov, 2023 2 commits
- [BugFix] Fix a bug in loading safetensors (#1732) · f5a37c6c
  Woosuk Kwon authored Nov 20, 2023
  
  f5a37c6c
- Migrate linter from `pylint` to `ruff` (#1665) · 5ffc0d13
  Simon Mo authored Nov 20, 2023
  
  5ffc0d13
19 Nov, 2023 1 commit
- use get_tensor in safe_open (#1696) · e946260c
  twaka authored Nov 19, 2023
  
  e946260c
18 Nov, 2023 1 commit
- Use `quantization_config` in hf config (#1695) · bb00f66e
  Woosuk Kwon authored Nov 17, 2023
  
  bb00f66e
16 Nov, 2023 2 commits

Fix loading error when safetensors contains empty tensor (#1687) · 2a2c135b
twaka authored Nov 17, 2023

2a2c135b

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

12 Oct, 2023 1 commit
- Add blacklist in model checkpoint (#1325) · 875afe38
  Woosuk Kwon authored Oct 12, 2023
  
  875afe38
16 Sep, 2023 1 commit

Implement AWQ quantization support for LLaMA (#1032) · e3e79e9e

Woosuk Kwon authored Sep 16, 2023


Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>

e3e79e9e

13 Sep, 2023 1 commit

Add Model Revision Support (#1014) · ab019eea

Jasmond L authored Sep 14, 2023


Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

ab019eea

07 Sep, 2023 1 commit
- Enable safetensors loading for all models (#974) · c957c741
  Zhuohan Li authored Sep 07, 2023
  
  c957c741
30 Aug, 2023 1 commit
- Accelerate LLaMA model loading (#234) · 0d93f156
  JFDuan authored Aug 30, 2023
  
  0d93f156
17 Aug, 2023 1 commit
- explicitly del state (#784) · 73b3de79
  Xinyu Yang authored Aug 18, 2023
  
  73b3de79
08 Jul, 2023 1 commit
- Don't try to load training_args.bin (#373) · 75beba29
  Fazlul Shahriar authored Jul 08, 2023
  
  75beba29
03 Jul, 2023 1 commit
- [Quality] Add code formatter and linter (#326) · d6fa1be3
  Zhuohan Li authored Jul 03, 2023
  
  d6fa1be3
30 Jun, 2023 1 commit
- [Fix] Weight loading for GPTBigCode (#313) · 598dc4b7
  Zhuohan Li authored Jun 29, 2023
  
  598dc4b7
17 Jun, 2023 1 commit
- Change the name to vLLM (#150) · 0b98ba15
  Woosuk Kwon authored Jun 17, 2023
  
  0b98ba15
15 May, 2023 1 commit
- Add docstrings to some modules and classes (#100) · b322fd16
  Woosuk Kwon authored May 14, 2023
  
  b322fd16
09 May, 2023 1 commit
- Refactor system architecture (#82) · 7c041ab5
  Woosuk Kwon authored May 09, 2023
  
  7c041ab5
03 May, 2023 2 commits
- Support bfloat16 data type (#54) · e070829a
  Woosuk Kwon authored May 03, 2023
  
  e070829a
- New weight loader without np copy (#52) · 27f1410d
  Zhuohan Li authored May 03, 2023
  
  27f1410d
29 Mar, 2023 1 commit
- FastAPI-based working frontend (#10) · 721fa3df
  Zhuohan Li authored Mar 29, 2023
  
  721fa3df
21 Mar, 2023 1 commit
- Support tensor parallel (#2) · 2f49f155
  Zhuohan Li authored Mar 22, 2023
  
  2f49f155
12 Mar, 2023 1 commit
- Add memory analyzer & utomatically configure KV cache size (#6) · e9d3f2ff
  Woosuk Kwon authored Mar 11, 2023
  
  e9d3f2ff