Commits · e0c6f556e85053059c74ab6b5cee396baf3b4316 · norm / vllm

24 Nov, 2023 1 commit
- [Build] Avoid building too many extensions (#1624) · e0c6f556
  Yanming W authored Nov 23, 2023
  
  e0c6f556
22 Nov, 2023 4 commits
- Fix repetition penalty aligned with huggingface (#1577) · de23687d
  ljss authored Nov 23, 2023
  
  de23687d
- Set top_p=0 and top_k=-1 in greedy sampling (#1748) · 4cea74c7
  ljss authored Nov 23, 2023
  
  4cea74c7
- [DOCS] Add engine args documentation (#1741) · a921d8be
  Casper authored Nov 22, 2023
  
  a921d8be
- Add stop_token_ids in SamplingParams.__repr__ (#1745) · 094f716b
  陈序 authored Nov 22, 2023
  
  094f716b
21 Nov, 2023 6 commits
- [FIX] Fix the case when `input_is_parallel=False` for `ScaledActivation` (#1737) · 7d761fe3
  Zhuohan Li authored Nov 20, 2023
  
  7d761fe3
- [BugFix] Fix TP support for AWQ (#1731) · cf35d8f3
  Woosuk Kwon authored Nov 20, 2023
  
  cf35d8f3
- fix RAM OOM when load large models in tensor parallel mode. (#1395) · 4bb6b671
  boydfd authored Nov 21, 2023
```
Co-authored-by: ran_lin <rlin@thoughtworks.com>
```
  4bb6b671
- Rewrite torch.repeat_interleave to remove cpu synchronization (#1599) · 819b18e7
  ljss authored Nov 21, 2023
  
  819b18e7
- [Fix] Fix bugs in scheduler (#1727) · 19849db5
  Zhuofan authored Nov 21, 2023
  
  19849db5
- Fix hanging in the scheduler caused by long prompts (#1534) · 3d4ceb29
  陈序 authored Nov 21, 2023
  
  3d4ceb29
20 Nov, 2023 5 commits
- [BugFix] Fix a bug in loading safetensors (#1732) · f5a37c6c
  Woosuk Kwon authored Nov 20, 2023
  
  f5a37c6c
- [FIX] Update the doc link in README.md (#1730) · 32c927b5
  Zhuohan Li authored Nov 20, 2023
  
  32c927b5
- Migrate linter from `pylint` to `ruff` (#1665) · 5ffc0d13
  Simon Mo authored Nov 20, 2023
  
  5ffc0d13
- [Docs] Fix the code block's format in deploying_with_docker page (#1722) · 112627e8
  Wen Sun authored Nov 20, 2023
  
  112627e8
- Documentation about official docker image (#1709) · 37c1e3c2
  Simon Mo authored Nov 19, 2023
  
  37c1e3c2
19 Nov, 2023 6 commits
- Add instructions to install vLLM+cu118 (#1717) · 06e9ebeb
  Woosuk Kwon authored Nov 18, 2023
  
  06e9ebeb
- Bump up to v0.2.2 (#1689) · c5f7740d
  Woosuk Kwon authored Nov 18, 2023
  
  c5f7740d
- Fix warning msg on quantization (#1715) · be66d9b1
  Woosuk Kwon authored Nov 18, 2023
  
  be66d9b1
- [Optimization] Implement fused add rmsnorm (#1667) · e1054247
  ljss authored Nov 19, 2023
  
  e1054247
- Add AWQ support for all models (#1714) · 8d17774f
  Woosuk Kwon authored Nov 18, 2023
  
  8d17774f
- use get_tensor in safe_open (#1696) · e946260c
  twaka authored Nov 19, 2023
  
  e946260c
18 Nov, 2023 3 commits
- Support download models from www.modelscope.cn (#1588) · edb30558
  liuyhwangyh authored Nov 18, 2023
  
  edb30558
- Use `quantization_config` in hf config (#1695) · bb00f66e
  Woosuk Kwon authored Nov 17, 2023
  
  bb00f66e
- Support Min P Sampler (#1642) · e87557b0
  Roy authored Nov 18, 2023
  
  e87557b0
17 Nov, 2023 3 commits
- [Minor] Fix comment (#1704) · dcc543a2
  Zhuofan authored Nov 18, 2023
  
  dcc543a2
- Update the adding-model doc according to the new refactor (#1692) · 0fc280b0
  Zhuohan Li authored Nov 16, 2023
  
  0fc280b0
- [Fix] Fix comm test (#1691) · 20d0699d
  Zhuohan Li authored Nov 16, 2023
  
  20d0699d
16 Nov, 2023 8 commits
- Return usage for openai streaming requests (#1663) · 686f5e32
  Iskren Ivov Chernev authored Nov 17, 2023
  
  686f5e32
- [Fix] Update Supported Models List (#1690) · 415d1095
  Zhuohan Li authored Nov 16, 2023
  
  415d1095
- Support Microsoft Phi 1.5 (#1664) · 521b35f7
  maximzubkov authored Nov 16, 2023
  
  521b35f7
- [Minor] Fix duplication of ignored seq group in engine step (#1666) · cb08cd0d
  Simon Mo authored Nov 16, 2023
  
  cb08cd0d
- Fix loading error when safetensors contains empty tensor (#1687) · 2a2c135b
  twaka authored Nov 17, 2023
  
  2a2c135b
- feat(config): support parsing torch.dtype (#1641) · 65ea2ddf
  Aaron Pham authored Nov 16, 2023
```
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
```
  65ea2ddf
- Revert `MptConfig` to `MPTConfig` (#1668) · b514d3c4
  Megha Agarwal authored Nov 16, 2023
  
  b514d3c4
- TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c
  Zhuohan Li authored Nov 15, 2023
```
TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.
```
  7076fa1c
14 Nov, 2023 1 commit
- Add DeepSpeed MII backend to benchmark script (#1649) · 660a7fcf
  Woosuk Kwon authored Nov 14, 2023
  
  660a7fcf
13 Nov, 2023 1 commit
- [Minor] Move RoPE selection logic to `get_rope` (#1633) · 054072be
  Woosuk Kwon authored Nov 12, 2023
  
  054072be
12 Nov, 2023 1 commit
- Fix #1474 - AssertionError:assert param_slice.shape == loaded_weight.shape (#1631) · eb825c1e
  lirui authored Nov 13, 2023
  
  eb825c1e
11 Nov, 2023 1 commit
- Run default _AsyncLLMEngine._run_workers_async in threadpool (#1628) · 1b290ace
  Dominik Schwabe authored Nov 11, 2023
  
  1b290ace