Commits · 7076fa1c9f5769469bc2671afaca5af604a9bed3 · kecinstone / 2024pra-vllm

16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

12 Nov, 2023 1 commit
- Fix #1474 - AssertionError:assert param_slice.shape == loaded_weight.shape (#1631) · eb825c1e
  lirui authored Nov 13, 2023
  
  eb825c1e
07 Nov, 2023 1 commit
- ChatGLM Support (#1261) · 1a2bbc93
  GoHomeToMacDonal authored Nov 07, 2023
  
  1a2bbc93
06 Nov, 2023 1 commit
- Support Yi model (#1567) · e7f579eb
  Roy authored Nov 07, 2023
  
  e7f579eb
01 Nov, 2023 1 commit
- Remove `MPTConfig` (#1529) · 1fe09900
  Woosuk Kwon authored Nov 01, 2023
  
  1fe09900
29 Oct, 2023 2 commits
- Fix bias in InternLM (#1501) · aa9af07c
  Woosuk Kwon authored Oct 30, 2023
  
  aa9af07c
- Add rope_scaling to Aquila model (#1457) · 28b47d1e
  Qing authored Oct 29, 2023
  
  28b47d1e
22 Oct, 2023 1 commit

Support SqueezeLLM (#1326) · 1f24755b

chooper1 authored Oct 22, 2023


Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

1f24755b

13 Oct, 2023 2 commits
- Fix the issue for AquilaChat2-* models (#1339) · de894728
  Lu Wang authored Oct 13, 2023
  
  de894728
- Bump up transformers version & Remove MistralConfig (#1254) · e7c8555d
  Woosuk Kwon authored Oct 13, 2023
  
  e7c8555d
10 Oct, 2023 1 commit
- [Minor] Fix comment in mistral.py (#1303) · b95ee898
  Zhuohan Li authored Oct 09, 2023
  
  b95ee898
02 Oct, 2023 2 commits
- TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) · ba0bfd40
  Zhuohan Li authored Oct 02, 2023
  
  ba0bfd40
- support sharding llama2-70b on more than 8 GPUs (#1209) · a60b3530
  Zhuohan Li authored Oct 02, 2023
```
Co-authored-by: JiCheng <247153481@qq.com>
```
  a60b3530
28 Sep, 2023 3 commits
- Fix Mistral model (#1220) · a8e98aee
  Woosuk Kwon authored Sep 28, 2023
  
  a8e98aee
- [Mistral] Mistral-7B-v0.1 support (#1196) · bb1ba58f
  Chris Bamford authored Sep 28, 2023
```
Co-authored-by: timlacroix <t@mistral.ai>
```
  bb1ba58f
- Add rope_scaling to Qwen (#1210) · 7bedab57
  Qing authored Sep 28, 2023
  
  7bedab57
27 Sep, 2023 2 commits
- fix qwen-14b model (#1173) · 28e616c4
  Qing authored Sep 28, 2023
  
  28e616c4
- Support Longchat and RoPE scaling (#555) · 21877b0d
  Lily Liu authored Sep 27, 2023
```
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  21877b0d
20 Sep, 2023 1 commit
- rope_theta and max_position_embeddings from config (#1096) · 3302f0ae
  Antoni Baum authored Sep 20, 2023
```
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
```
  3302f0ae
18 Sep, 2023 1 commit
- Convert before transpose (#1073) · cc796b13
  Woosuk Kwon authored Sep 18, 2023
  
  cc796b13
16 Sep, 2023 1 commit

Implement AWQ quantization support for LLaMA (#1032) · e3e79e9e

Woosuk Kwon authored Sep 16, 2023


Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>

e3e79e9e

13 Sep, 2023 1 commit

Add Model Revision Support (#1014) · ab019eea

Jasmond L authored Sep 14, 2023


Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

ab019eea

07 Sep, 2023 1 commit
- Enable safetensors loading for all models (#974) · c957c741
  Zhuohan Li authored Sep 07, 2023
  
  c957c741
06 Sep, 2023 1 commit
- [BugFix] Implement RoPE for GPT-J (#941) · 320a622e
  Woosuk Kwon authored Sep 06, 2023
  
  320a622e
05 Sep, 2023 1 commit
- Align vLLM's beam search implementation with HF generate (#857) · 002800f0
  Zhuohan Li authored Sep 04, 2023
  
  002800f0
30 Aug, 2023 1 commit
- Accelerate LLaMA model loading (#234) · 0d93f156
  JFDuan authored Aug 30, 2023
  
  0d93f156
25 Aug, 2023 1 commit
- Add support for CodeLlama (#854) · 4b6f069b
  Antoni Baum authored Aug 25, 2023
  
  4b6f069b
22 Aug, 2023 3 commits

fix: revert code to avoid no attribute problem (#827) · eedac9db
Wen Sun authored Aug 23, 2023

eedac9db

Add support for aquila (#663) · ad5f2fe3

shunxing1234 authored Aug 22, 2023



* add aquila
Signed-off-by: ftgreat <ftgreat@163.com>

* fix some bug
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* delete pdb
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* fix bugs
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* fix bugs
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* delete whitespace
Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* format

* fix order

---------
Signed-off-by: ftgreat <ftgreat@163.com>
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
Co-authored-by: ftgreat <ftgreat@163.com>

ad5f2fe3

Fix mqa is false case in gpt_bigcode (#806) · 4f858475
zhaoyang-star authored Aug 22, 2023

4f858475

11 Aug, 2023 1 commit
- [Fix] unwantted bias in InternLM Model (#740) · 462ae522
  WRH authored Aug 12, 2023
  
  462ae522
08 Aug, 2023 2 commits
- add internlm model (#528) · 735ecfff
  Jia Guoqing authored Aug 09, 2023
  
  735ecfff
- add QWen-7b (#685) · a57d13cc
  Qing authored Aug 09, 2023
```
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
```
  a57d13cc
02 Aug, 2023 3 commits
- [Doc] Add Baichuan 13B to supported models (#656) · f7389f47
  Zhuohan Li authored Aug 02, 2023
  
  f7389f47
- Add Falcon support (new) (#592) · 1b0bd0fe
  Zhuohan Li authored Aug 02, 2023
  
  1b0bd0fe
- fix baichuan for different position embedding for 7b and 13b models (#643) · 64f23c29
  Song authored Aug 02, 2023
  
  64f23c29
01 Aug, 2023 1 commit
- fix biachuan-7b tp (#598) · d4c7755c
  Qing authored Aug 02, 2023
```
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
```
  d4c7755c
25 Jul, 2023 1 commit
- [Fix] Fix GPTBigcoder for distributed execution (#503) · 7d5a155e
  Zhuohan Li authored Jul 24, 2023
  
  7d5a155e
24 Jul, 2023 1 commit
- GPTJConfig has no attribute rotary. (#532) · 1dde34e0
  leegohi04517 authored Jul 25, 2023
  
  1dde34e0
20 Jul, 2023 1 commit
- Add support for LLaMA-2 (#505) · 6fc2a38b
  Zhuohan Li authored Jul 20, 2023
  
  6fc2a38b