Commits · 2f4117c38e101ee63b65521c93b22efe3526f77e · OpenDAS / vllm_cscc

09 Oct, 2024 1 commit
- support bitsandbytes quantization with more models (#9148) · 2f4117c3
  chenqianfzh authored Oct 08, 2024
  
  2f4117c3
07 Oct, 2024 1 commit
- [Core] Refactor GGUF parameters packing and forwarding (#8859) · f19da648
  Isotr0py authored Oct 07, 2024
  
  f19da648
17 Sep, 2024 1 commit
- [Feature][kernel] tensor parallelism with bitsandbytes quantization (#8434) · 9855b995
  chenqianfzh authored Sep 17, 2024
  
  9855b995
11 Sep, 2024 1 commit
- [Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112) · efcf946a
  Pavani Majety authored Sep 10, 2024
  
  efcf946a
04 Sep, 2024 1 commit
- [Misc] Update fbgemmfp8 to use `vLLMParameters` (#7972) · e16fa99a
  Dipika Sikka authored Sep 03, 2024
```
Co-authored-by: Michael Goin <michael@neuralmagic.com>
```
  e16fa99a
03 Sep, 2024 1 commit
- [Misc] Update `GPTQ` to use `vLLMParameters` (#7976) · 2188a60c
  Dipika Sikka authored Sep 03, 2024
  
  2188a60c
29 Aug, 2024 2 commits
- support bitsandbytes 8-bit and FP4 quantized models (#7445) · 4664ceaa
  chenqianfzh authored Aug 29, 2024
  
  4664ceaa
- [misc] update tpu int8 to use new vLLM Parameters (#7973) · 86a677de
  Dipika Sikka authored Aug 29, 2024
  
  86a677de
27 Aug, 2024 1 commit
- [Misc] Update compressed tensors lifecycle to remove `prefix` from `create_weights` (#7825) · 015e6cc2
  Dipika Sikka authored Aug 26, 2024
  
  015e6cc2
26 Aug, 2024 2 commits
- [Misc] Update `gptq_marlin_24` to use vLLMParameters (#7762) · dd9857f5
  Dipika Sikka authored Aug 26, 2024
```
Co-authored-by: Michael Goin <michael@neuralmagic.com>
```
  dd9857f5
- [Misc] Update `qqq` to use vLLMParameters (#7805) · 66530409
  Dipika Sikka authored Aug 26, 2024
  
  66530409
23 Aug, 2024 1 commit
- [Misc] Update `marlin` to use vLLMParameters (#7803) · f1df5dbf
  Dipika Sikka authored Aug 23, 2024
  
  f1df5dbf
22 Aug, 2024 1 commit
- [Misc] update fp8 to use `vLLMParameter` (#7437) · 955b5191
  Dipika Sikka authored Aug 22, 2024
  
  955b5191
21 Aug, 2024 1 commit
- [Model] Add AWQ quantization support for InternVL2 model (#7187) · 12e1c65b
  Isotr0py authored Aug 21, 2024
  
  12e1c65b
19 Aug, 2024 1 commit
- [Core] Support tensor parallelism for GGUF quantization (#7520) · 7601cb04
  Isotr0py authored Aug 20, 2024
  
  7601cb04
13 Aug, 2024 2 commits
- [Misc] Update `awq` and `awq_marlin` to use `vLLMParameters` (#7422) · b1e5afc3
  Dipika Sikka authored Aug 13, 2024
  
  b1e5afc3
- [Misc] Update `gptq_marlin` to use new vLLMParameters (#7281) · fb377d7e
  Dipika Sikka authored Aug 13, 2024
  
  fb377d7e
09 Aug, 2024 1 commit
- [Bugfix] Fix `PerTensorScaleParameter` weight loading for fused models (#7376) · 5c6c54d6
  Dipika Sikka authored Aug 09, 2024
  
  5c6c54d6
07 Aug, 2024 1 commit
- [Misc] Refactor linear layer weight loading; introduce `BasevLLMParameter` and... · 0f7052bc
  Dipika Sikka authored Aug 07, 2024
```
[Misc] Refactor linear layer weight loading; introduce `BasevLLMParameter` and `weight_loader_v2` (#5874)
```
  0f7052bc
05 Aug, 2024 1 commit
- [Core] Support loading GGUF model (#5191) · 360bd67c
  Isotr0py authored Aug 06, 2024
```
Co-authored-by: Michael Goin <michael@neuralmagic.com>
```
  360bd67c
26 Jul, 2024 1 commit
- Fix ReplicatedLinear weight loading (#6793) · 062a1d0f
  QQSong authored Jul 25, 2024
  
  062a1d0f
20 Jul, 2024 1 commit
- [ Misc ] `fbgemm` checkpoints (#6559) · 683e3cb9
  Robert Shaw authored Jul 20, 2024
  
  683e3cb9
19 Jul, 2024 2 commits
- [Model] RowParallelLinear: pass bias to quant_method.apply (#6327) · a5314e86
  Thomas Parnell authored Jul 19, 2024
```
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
```
  a5314e86
- [ Misc ] non-uniform quantization via `compressed-tensors` for `Llama` (#6515) · dbe55885
  Robert Shaw authored Jul 18, 2024
  
  dbe55885
16 Jul, 2024 1 commit
- [Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and `v_scale` (#6081) · 978aed53
  Michael Goin authored Jul 16, 2024
  
  978aed53
12 Jul, 2024 1 commit
- [ Misc ] Remove separate bias add (#6353) · 6047187c
  Robert Shaw authored Jul 12, 2024
  
  6047187c
11 Jul, 2024 1 commit
- [Doc] Remove comments incorrectly copied from another project (#6286) · 99ded1e1
  daquexian authored Jul 11, 2024
  
  99ded1e1
09 Jul, 2024 1 commit
- [Bugfix]fix and needs_scalar_to_array logic check (#6238) · d3a24513
  Baoyuan Qi authored Jul 10, 2024
```
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
```
  d3a24513
30 Jun, 2024 1 commit
- [ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify Weight Loading) (#5940) · af9ad46f
  Robert Shaw authored Jun 30, 2024
```
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
```
  af9ad46f
28 Jun, 2024 2 commits
- [ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921) · 2cd402e1
  Robert Shaw authored Jun 28, 2024
```
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
```
  2cd402e1
- [ Misc ] Remove `fp8_shard_indexer` from Col/Row Parallel Linear (Simplify Weight Loading) (#5928) · b1852307
  Robert Shaw authored Jun 28, 2024
```
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
```
  b1852307
18 Jun, 2024 1 commit
- [Misc] Add channel-wise quantization support for w8a8 dynamic per token... · 95db455e
  Dipika Sikka authored Jun 18, 2024
```
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization (#5542)
```
  95db455e
15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
01 Jun, 2024 1 commit
- [Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776) · b9c0605a
  chenqianfzh authored Jun 01, 2024
  
  b9c0605a
23 May, 2024 1 commit

[Kernel] Initial Activation Quantization Support (#4525) · a1242324

Dipika Sikka authored May 23, 2024


Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

a1242324

01 May, 2024 1 commit
- [Misc]Add customized information for models (#4132) · d6f4bd7c
  Jee Li authored May 01, 2024
  
  d6f4bd7c
30 Apr, 2024 1 commit

[Kernel] Support Fp8 Checkpoints (Dynamic + Static) (#4332) · 111815d4

Robert Shaw authored Apr 30, 2024


Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

111815d4

29 Apr, 2024 1 commit
- [mypy][5/N] Support all typing on model executor (#4427) · df29793d
  SangBin Cho authored Apr 29, 2024
  
  df29793d
26 Apr, 2024 1 commit
- [Misc][Refactor] Generalize linear_method to be quant_method (#4373) · a62aaf1d
  Cody Yu authored Apr 26, 2024
  
  a62aaf1d
24 Apr, 2024 1 commit
- [BUG] fixed fp8 conflict with aqlm (#4307) · 79a268c4
  Robert Shaw authored Apr 23, 2024
```
Fixes fp8 iterface which broke in AQLM merge.
```
  79a268c4