Commits · 7076fa1c9f5769469bc2671afaca5af604a9bed3 · kecinstone / 2024pra-vllm

16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

09 Nov, 2023 1 commit
- Add Yi model to quantization support (#1600) · ab9e8488
  forpanyang authored Nov 10, 2023
  
  ab9e8488
07 Nov, 2023 1 commit
- ChatGLM Support (#1261) · 1a2bbc93
  GoHomeToMacDonal authored Nov 07, 2023
  
  1a2bbc93
06 Nov, 2023 1 commit
- Support Yi model (#1567) · e7f579eb
  Roy authored Nov 07, 2023
  
  e7f579eb
01 Nov, 2023 1 commit
- Remove `MPTConfig` (#1529) · 1fe09900
  Woosuk Kwon authored Nov 01, 2023
  
  1fe09900
31 Oct, 2023 1 commit
- Add `MptForCausalLM` key in model_loader (#1526) · cf8849f2
  Wenfei Yan authored Oct 31, 2023
  
  cf8849f2
13 Oct, 2023 1 commit
- Fix the issue for AquilaChat2-* models (#1339) · de894728
  Lu Wang authored Oct 13, 2023
  
  de894728
11 Oct, 2023 1 commit
- Add Mistral to quantization model list (#1278) · ee8217e5
  amaleshvemula authored Oct 11, 2023
  
  ee8217e5
28 Sep, 2023 1 commit
- [Mistral] Mistral-7B-v0.1 support (#1196) · bb1ba58f
  Chris Bamford authored Sep 28, 2023
```
Co-authored-by: timlacroix <t@mistral.ai>
```
  bb1ba58f
18 Sep, 2023 1 commit
- Add minimum capability requirement for AWQ (#1064) · 2b1c116b
  Woosuk Kwon authored Sep 18, 2023
  
  2b1c116b
16 Sep, 2023 1 commit

Implement AWQ quantization support for LLaMA (#1032) · e3e79e9e

Woosuk Kwon authored Sep 16, 2023


Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>

e3e79e9e

13 Sep, 2023 1 commit

Add Model Revision Support (#1014) · ab019eea

Jasmond L authored Sep 14, 2023


Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

ab019eea

07 Sep, 2023 2 commits
- Enable safetensors loading for all models (#974) · c957c741
  Zhuohan Li authored Sep 07, 2023
  
  c957c741
- Set torch default dtype in a context manager (#971) · 005ba458
  Antoni Baum authored Sep 06, 2023
```
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
```
  005ba458
22 Aug, 2023 1 commit

Add support for aquila (#663) · ad5f2fe3

shunxing1234 authored Aug 22, 2023



* add aquila
Signed-off-by: ftgreat <ftgreat@163.com>

* fix some bug
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* delete pdb
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* fix bugs
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* fix bugs
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* delete whitespace
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* format

* fix order

---------
Signed-off-by: ftgreat <ftgreat@163.com>
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
Co-authored-by: ftgreat <ftgreat@163.com>

ad5f2fe3

08 Aug, 2023 2 commits
- add internlm model (#528) · 735ecfff
  Jia Guoqing authored Aug 09, 2023
  
  735ecfff
- add QWen-7b (#685) · a57d13cc
  Qing authored Aug 09, 2023
```
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
```
  a57d13cc
02 Aug, 2023 2 commits
- Add Falcon support (new) (#592) · 1b0bd0fe
  Zhuohan Li authored Aug 02, 2023
  
  1b0bd0fe
- fix baichuan for different position embedding for 7b and 13b models (#643) · 64f23c29
  Song authored Aug 02, 2023
  
  64f23c29
17 Jul, 2023 1 commit
- Add support for baichuan (#365) · 20b0d88d
  codethazine authored Jul 17, 2023
  
  20b0d88d
09 Jul, 2023 1 commit
- [Model] Add support for GPT-J (#226) · c8948361
  Andre Slavescu authored Jul 08, 2023
```
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
```
  c8948361
03 Jul, 2023 3 commits
- [Model] Add support for MPT (#334) · 404422f4
  Woosuk Kwon authored Jul 03, 2023
  
  404422f4
- Add support for BLOOM (#331) · e41f0670
  Woosuk Kwon authored Jul 03, 2023
  
  e41f0670
- [Quality] Add code formatter and linter (#326) · d6fa1be3
  Zhuohan Li authored Jul 03, 2023
  
  d6fa1be3
26 Jun, 2023 1 commit
- Compatible with Decapoda Research llama hf version (#251) · 471a7a45
  BasicCoder authored Jun 27, 2023
  
  471a7a45
22 Jun, 2023 1 commit
- GPTBigCode (StarCoder, SantaCoder Support) (#209) · 298695b7
  Michael Feil authored Jun 22, 2023
  
  298695b7
17 Jun, 2023 1 commit
- Change the name to vLLM (#150) · 0b98ba15
  Woosuk Kwon authored Jun 17, 2023
  
  0b98ba15
24 May, 2023 1 commit
- Add contributing guideline and mypy config (#122) · a283ec2e
  Woosuk Kwon authored May 23, 2023
  
  a283ec2e
20 May, 2023 1 commit
- Refactor system architecture (#109) · c3442c1f
  Woosuk Kwon authored May 20, 2023
  
  c3442c1f
19 May, 2023 1 commit
- Use runtime profiling to replace manual memory analyzers (#81) · f756799b
  Zhuohan Li authored May 19, 2023
  
  f756799b
15 May, 2023 1 commit
- Add docstrings to some modules and classes (#100) · b322fd16
  Woosuk Kwon authored May 14, 2023
  
  b322fd16
09 May, 2023 2 commits
- Enhance model loader (#83) · add055e1
  Woosuk Kwon authored May 09, 2023
  
  add055e1
- Refactor system architecture (#82) · 7c041ab5
  Woosuk Kwon authored May 09, 2023
  
  7c041ab5
06 May, 2023 1 commit
- [Minor] Fix a dtype bug (#79) · c84e9242
  Woosuk Kwon authored May 06, 2023
  
  c84e9242
04 May, 2023 2 commits
- Use dtype from model config & Add Dolly V2 (#63) · 189ae231
  Woosuk Kwon authored May 04, 2023
  
  189ae231
- Add support for GPT-2 (#60) · e548c148
  Woosuk Kwon authored May 04, 2023
  
  e548c148
03 May, 2023 1 commit
- New weight loader without np copy (#52) · 27f1410d
  Zhuohan Li authored May 03, 2023
  
  27f1410d
28 Apr, 2023 1 commit
- Add support for GPT-NeoX (Pythia) (#50) · a96d63c2
  Woosuk Kwon authored Apr 28, 2023
  
  a96d63c2
09 Apr, 2023 1 commit
- Add an option to use dummy model weights (#33) · ee88a7e5
  Woosuk Kwon authored Apr 08, 2023
  
  ee88a7e5
30 Mar, 2023 1 commit
- Implement LLaMA (#9) · 80a2f812
  Woosuk Kwon authored Mar 29, 2023
```
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
```
  80a2f812