Commits · 7076fa1c9f5769469bc2671afaca5af604a9bed3 · kecinstone / 2024pra-vllm

16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

16 Oct, 2023 1 commit
- Implement prompt logprobs & Batched topk for computing logprobs (#1328) · 9d9072a0
  Zhuohan Li authored Oct 16, 2023
```
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
```
  9d9072a0
02 Oct, 2023 1 commit
- TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) · ba0bfd40
  Zhuohan Li authored Oct 02, 2023
  
  ba0bfd40
18 Sep, 2023 1 commit
- [FIX] Don't initialize parameter by default (#1067) · 90979c38
  Zhuohan Li authored Sep 17, 2023
  
  90979c38
16 Sep, 2023 1 commit

Implement AWQ quantization support for LLaMA (#1032) · e3e79e9e

Woosuk Kwon authored Sep 16, 2023


Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>

e3e79e9e

02 Aug, 2023 1 commit
- Add Falcon support (new) (#592) · 1b0bd0fe
  Zhuohan Li authored Aug 02, 2023
  
  1b0bd0fe
25 Jul, 2023 1 commit
- fixed tensor parallel is not defined (#564) · 2d867b55
  MoeedDar authored Jul 25, 2023
  
  2d867b55
17 Jun, 2023 1 commit
- Change the name to vLLM (#150) · 0b98ba15
  Woosuk Kwon authored Jun 17, 2023
  
  0b98ba15