Commits · a130cf331ef8b91197150e5a47a09e2b9487e61b · kecinstone / 2024pra-vllm

28 Feb, 2024 1 commit
- Add LoRA support for Gemma (#3050) · 929b4f29
  Woosuk Kwon authored Feb 28, 2024
  
  929b4f29
22 Feb, 2024 2 commits
- Optimize GeGLU layer in Gemma (#2975) · fd5dcc5c
  Woosuk Kwon authored Feb 21, 2024
  
  fd5dcc5c
- Use Llama RMSNorm custom op for Gemma (#2974) · 95529e32
  Woosuk Kwon authored Feb 21, 2024
  
  95529e32
21 Feb, 2024 1 commit
- Add Gemma model (#2964) · 5253edaa
  Xiang Xu authored Feb 21, 2024
  
  5253edaa
25 Jan, 2024 1 commit
- fix names and license for Qwen2 (#2589) · 2832e7b9
  Junyang Lin authored Jan 25, 2024
  
  2832e7b9
22 Jan, 2024 1 commit
- Add qwen2 (#2495) · 94b5edeb
  Junyang Lin authored Jan 23, 2024
  
  94b5edeb
03 Jan, 2024 1 commit
- Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) · fd4ea8ef
  Zhuohan Li authored Jan 04, 2024
  
  fd4ea8ef
17 Dec, 2023 1 commit

Optimize model execution with CUDA graph (#1926) · 37ca5581

Woosuk Kwon authored Dec 16, 2023


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

37ca5581

15 Dec, 2023 1 commit
- Add GPTQ support (#916) · 0fbfc4b8
  CHU Tianxiang authored Dec 15, 2023
  
  0fbfc4b8
30 Nov, 2023 1 commit
- Refactor Worker & InputMetadata (#1843) · 27feead2
  Woosuk Kwon authored Nov 29, 2023
  
  27feead2
29 Nov, 2023 1 commit
- Refactor Attention (#1840) · a9e45742
  Woosuk Kwon authored Nov 29, 2023
  
  a9e45742
24 Nov, 2023 1 commit
- Fix model docstrings (#1764) · 7c600440
  Woosuk Kwon authored Nov 23, 2023
  
  7c600440
20 Nov, 2023 1 commit
- Migrate linter from `pylint` to `ruff` (#1665) · 5ffc0d13
  Simon Mo authored Nov 20, 2023
  
  5ffc0d13
19 Nov, 2023 1 commit
- [Optimization] Implement fused add rmsnorm (#1667) · e1054247
  ljss authored Nov 19, 2023
  
  e1054247
16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

22 Oct, 2023 1 commit

Support SqueezeLLM (#1326) · 1f24755b

chooper1 authored Oct 22, 2023


Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

1f24755b

13 Oct, 2023 1 commit
- Bump up transformers version & Remove MistralConfig (#1254) · e7c8555d
  Woosuk Kwon authored Oct 13, 2023
  
  e7c8555d
10 Oct, 2023 1 commit
- [Minor] Fix comment in mistral.py (#1303) · b95ee898
  Zhuohan Li authored Oct 09, 2023
  
  b95ee898
02 Oct, 2023 1 commit
- TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) · ba0bfd40
  Zhuohan Li authored Oct 02, 2023
  
  ba0bfd40
28 Sep, 2023 2 commits
- Fix Mistral model (#1220) · a8e98aee
  Woosuk Kwon authored Sep 28, 2023
  
  a8e98aee
- [Mistral] Mistral-7B-v0.1 support (#1196) · bb1ba58f
  Chris Bamford authored Sep 28, 2023
```
Co-authored-by: timlacroix <t@mistral.ai>
```
  bb1ba58f
27 Sep, 2023 1 commit

Support Longchat and RoPE scaling (#555) · 21877b0d

Lily Liu authored Sep 27, 2023


Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

21877b0d

20 Sep, 2023 1 commit
- rope_theta and max_position_embeddings from config (#1096) · 3302f0ae
  Antoni Baum authored Sep 20, 2023
```
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
```
  3302f0ae
18 Sep, 2023 1 commit
- Convert before transpose (#1073) · cc796b13
  Woosuk Kwon authored Sep 18, 2023
  
  cc796b13
16 Sep, 2023 1 commit

Implement AWQ quantization support for LLaMA (#1032) · e3e79e9e

Woosuk Kwon authored Sep 16, 2023


Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>

e3e79e9e

13 Sep, 2023 1 commit

Add Model Revision Support (#1014) · ab019eea

Jasmond L authored Sep 14, 2023


Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

ab019eea

07 Sep, 2023 1 commit
- Enable safetensors loading for all models (#974) · c957c741
  Zhuohan Li authored Sep 07, 2023
  
  c957c741
05 Sep, 2023 1 commit
- Align vLLM's beam search implementation with HF generate (#857) · 002800f0
  Zhuohan Li authored Sep 04, 2023
  
  002800f0
30 Aug, 2023 1 commit
- Accelerate LLaMA model loading (#234) · 0d93f156
  JFDuan authored Aug 30, 2023
  
  0d93f156
25 Aug, 2023 1 commit
- Add support for CodeLlama (#854) · 4b6f069b
  Antoni Baum authored Aug 25, 2023
  
  4b6f069b
20 Jul, 2023 1 commit
- Add support for LLaMA-2 (#505) · 6fc2a38b
  Zhuohan Li authored Jul 20, 2023
  
  6fc2a38b
14 Jul, 2023 1 commit
- add vocab padding for LLama(Support WizardLM) (#411) · 7b6ae940
  panda authored Jul 14, 2023
  
  7b6ae940
03 Jul, 2023 1 commit
- [Quality] Add code formatter and linter (#326) · d6fa1be3
  Zhuohan Li authored Jul 03, 2023
  
  d6fa1be3
17 Jun, 2023 1 commit
- Change the name to vLLM (#150) · 0b98ba15
  Woosuk Kwon authored Jun 17, 2023
  
  0b98ba15
24 May, 2023 1 commit
- Add contributing guideline and mypy config (#122) · a283ec2e
  Woosuk Kwon authored May 23, 2023
  
  a283ec2e
19 May, 2023 1 commit
- Use runtime profiling to replace manual memory analyzers (#81) · f756799b
  Zhuohan Li authored May 19, 2023
  
  f756799b
15 May, 2023 3 commits
- Add docstrings to some modules and classes (#100) · b322fd16
  Woosuk Kwon authored May 14, 2023
  
  b322fd16
- Add copyright headers to source files adapted from FT (#104) · 667ba399
  Woosuk Kwon authored May 14, 2023
  
  667ba399
- Add copyright headers for HF models (#103) · 707ec647
  Woosuk Kwon authored May 14, 2023
  
  707ec647
09 May, 2023 1 commit
- Refactor system architecture (#82) · 7c041ab5
  Woosuk Kwon authored May 09, 2023
  
  7c041ab5