Commits · c06170cc8e324f4fe6a0c26b57d09e8c958e11bc · kecinstone / 2024pra-vllm

20 Nov, 2023 2 commits
- [BugFix] Fix a bug in loading safetensors (#1732) · f5a37c6c
  Woosuk Kwon authored Nov 20, 2023
  
  f5a37c6c
- Migrate linter from `pylint` to `ruff` (#1665) · 5ffc0d13
  Simon Mo authored Nov 20, 2023
  
  5ffc0d13
19 Nov, 2023 1 commit
- use get_tensor in safe_open (#1696) · e946260c
  twaka authored Nov 19, 2023
  
  e946260c
18 Nov, 2023 1 commit
- Use `quantization_config` in hf config (#1695) · bb00f66e
  Woosuk Kwon authored Nov 17, 2023
  
  bb00f66e
16 Nov, 2023 2 commits

Fix loading error when safetensors contains empty tensor (#1687) · 2a2c135b
twaka authored Nov 17, 2023

2a2c135b

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

12 Oct, 2023 1 commit
- Add blacklist in model checkpoint (#1325) · 875afe38
  Woosuk Kwon authored Oct 12, 2023
  
  875afe38
16 Sep, 2023 1 commit

Implement AWQ quantization support for LLaMA (#1032) · e3e79e9e

Woosuk Kwon authored Sep 16, 2023


Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>

e3e79e9e

13 Sep, 2023 1 commit

Add Model Revision Support (#1014) · ab019eea

Jasmond L authored Sep 14, 2023


Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

ab019eea

07 Sep, 2023 1 commit
- Enable safetensors loading for all models (#974) · c957c741
  Zhuohan Li authored Sep 07, 2023
  
  c957c741
30 Aug, 2023 1 commit
- Accelerate LLaMA model loading (#234) · 0d93f156
  JFDuan authored Aug 30, 2023
  
  0d93f156
17 Aug, 2023 1 commit
- explicitly del state (#784) · 73b3de79
  Xinyu Yang authored Aug 18, 2023
  
  73b3de79
08 Jul, 2023 1 commit
- Don't try to load training_args.bin (#373) · 75beba29
  Fazlul Shahriar authored Jul 08, 2023
  
  75beba29
03 Jul, 2023 1 commit
- [Quality] Add code formatter and linter (#326) · d6fa1be3
  Zhuohan Li authored Jul 03, 2023
  
  d6fa1be3
30 Jun, 2023 1 commit
- [Fix] Weight loading for GPTBigCode (#313) · 598dc4b7
  Zhuohan Li authored Jun 29, 2023
  
  598dc4b7
17 Jun, 2023 1 commit
- Change the name to vLLM (#150) · 0b98ba15
  Woosuk Kwon authored Jun 17, 2023
  
  0b98ba15
15 May, 2023 1 commit
- Add docstrings to some modules and classes (#100) · b322fd16
  Woosuk Kwon authored May 14, 2023
  
  b322fd16
09 May, 2023 1 commit
- Refactor system architecture (#82) · 7c041ab5
  Woosuk Kwon authored May 09, 2023
  
  7c041ab5
03 May, 2023 2 commits
- Support bfloat16 data type (#54) · e070829a
  Woosuk Kwon authored May 03, 2023
  
  e070829a
- New weight loader without np copy (#52) · 27f1410d
  Zhuohan Li authored May 03, 2023
  
  27f1410d
29 Mar, 2023 1 commit
- FastAPI-based working frontend (#10) · 721fa3df
  Zhuohan Li authored Mar 29, 2023
  
  721fa3df
21 Mar, 2023 1 commit
- Support tensor parallel (#2) · 2f49f155
  Zhuohan Li authored Mar 22, 2023
  
  2f49f155
12 Mar, 2023 1 commit
- Add memory analyzer & utomatically configure KV cache size (#6) · e9d3f2ff
  Woosuk Kwon authored Mar 11, 2023
  
  e9d3f2ff