Commits · 500b93c8dc84182776f17c3c31053aaba9865e8b · OpenDAS / vllm_cscc

20 Jul, 2024 1 commit
- [ Misc ] `fbgemm` checkpoints (#6559) · 683e3cb9
  Robert Shaw authored Jul 20, 2024
  
  683e3cb9
30 Jun, 2024 1 commit
- [Misc] Fix `get_min_capability` (#5971) · 7836fdcc
  Dipika Sikka authored Jun 30, 2024
  
  7836fdcc
29 Apr, 2024 1 commit
- [mypy][5/N] Support all typing on model executor (#4427) · df29793d
  SangBin Cho authored Apr 29, 2024
  
  df29793d
26 Apr, 2024 1 commit
- [Misc][Refactor] Generalize linear_method to be quant_method (#4373) · a62aaf1d
  Cody Yu authored Apr 26, 2024
  
  a62aaf1d
23 Apr, 2024 1 commit
- AQLM CUDA support (#3287) · 2b7949c1
  James Fleming authored Apr 23, 2024
```
Co-authored-by: mgoin <michael@neuralmagic.com>
```
  2b7949c1
11 Apr, 2024 2 commits
- [Core] Set `linear_weights` directly on the layer (#3977) · a10d3056
  Antoni Baum authored Apr 11, 2024
  
  a10d3056
- [Misc] Add indirection layer for custom ops (#3913) · e9da5a40
  Kunshang Ji authored Apr 11, 2024
  
  e9da5a40
11 Mar, 2024 1 commit
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
01 Feb, 2024 1 commit
- Remove hardcoded `device="cuda" ` to support more devices (#2503) · 96b6f475
  Kunshang Ji authored Feb 02, 2024
```
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
```
  96b6f475
15 Dec, 2023 1 commit
- Add GPTQ support (#916) · 0fbfc4b8
  CHU Tianxiang authored Dec 15, 2023
  
  0fbfc4b8
08 Dec, 2023 1 commit

Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836) · 6ccc0bff

TJian authored Dec 08, 2023


Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>

6ccc0bff

24 Nov, 2023 1 commit
- [Build] Avoid building too many extensions (#1624) · e0c6f556
  Yanming W authored Nov 23, 2023
  
  e0c6f556
19 Nov, 2023 1 commit
- Add AWQ support for all models (#1714) · 8d17774f
  Woosuk Kwon authored Nov 18, 2023
  
  8d17774f
16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c