Commits · 235366fe2eb3144321978e181af94487f0215595 · OpenDAS / vllm_cscc

05 Nov, 2024 1 commit
- [CI] Prune back the number of tests in tests/kernels/* (#9932) · 235366fe
  Michael Goin authored Nov 05, 2024
```
Signed-off-by: mgoin <michael@neuralmagic.com>
```
  235366fe
29 Oct, 2024 1 commit
- [Hardware] using current_platform.seed_everything (#9785) · 622b7ab9
  wangshuai09 authored Oct 29, 2024
```
Signed-off-by: wangshuai09 <391746016@qq.com>
```
  622b7ab9
28 Oct, 2024 2 commits
- [torch.compile] support moe models (#9632) · 32176fee
  youkaichao authored Oct 27, 2024
```
Signed-off-by: youkaichao <youkaichao@gmail.com>
```
  32176fee
- [Hardware][ROCM] using current_platform.is_rocm (#9642) · 4e2d95e3
  wangshuai09 authored Oct 28, 2024
```
Signed-off-by: wangshuai09 <391746016@qq.com>
```
  4e2d95e3
24 Oct, 2024 1 commit
- [Performance][Kernel] Fused_moe Performance Improvement (#9384) · 59449095
  Charlie Fu authored Oct 24, 2024
```
Signed-off-by: charlifu <charlifu@amd.com>
```
  59449095
17 Oct, 2024 1 commit
- [Bugfix] Fix support for dimension like integers and ScalarType (#9299) · eca2c5f7
  bnellnm authored Oct 17, 2024
  
  eca2c5f7
04 Oct, 2024 1 commit
- [Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973) · 05d68643
  ElizaWszola authored Oct 04, 2024
```
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
```
  05d68643
29 Sep, 2024 1 commit
- [Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741) · d081da00
  ElizaWszola authored Sep 29, 2024
```
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
```
  d081da00
25 Sep, 2024 1 commit
- [Kernel] Fullgraph and opcheck tests (#8479) · 300da091
  bnellnm authored Sep 25, 2024
  
  300da091
18 Sep, 2024 1 commit
- [CI/Build] Avoid CUDA initialization (#8534) · 6ffa3f31
  Cyrus Leung authored Sep 18, 2024
  
  6ffa3f31
16 Sep, 2024 1 commit
- [Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032) · a091e2da
  ElizaWszola authored Sep 16, 2024
```
Co-authored-by: Dipika <dipikasikka1@gmail.com>
```
  a091e2da
10 Sep, 2024 1 commit
- [Misc] Fused MoE Marlin support for GPTQ (#8217) · 6cd5e5b0
  Dipika Sikka authored Sep 09, 2024
  
  6cd5e5b0
16 Aug, 2024 1 commit
- [Misc/Testing] Use `torch.testing.assert_close` (#7324) · 50b8d08d
  jon-chuang authored Aug 15, 2024
  
  50b8d08d
02 Jul, 2024 1 commit

[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970) · 7c008c51

Robert Shaw authored Jul 02, 2024


Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

7c008c51

01 Jul, 2024 1 commit
- [Bugfix] adding chunking mechanism to fused_moe to handle large inputs (#6029) · 12a59959
  Avshalom Manevich authored Jul 02, 2024
  
  12a59959
04 May, 2024 1 commit

[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with... · 2a052011

Michael Goin authored May 04, 2024

[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (#4527)

Follow on to #4332 to enable FP8 checkpoint loading for Mixtral and supersedes #4436.

This PR enables the following checkpoint loading features for Mixtral:

Supports loading fp8 checkpoints for Mixtral, such as this "nm-testing/Mixtral-8x7B-Instruct-v0.1-FP8" test model
Supports static or dynamic activation quantization with static weight quantization (all per tensor)
Supports different scales for each expert weight
Supports Fp8 in QKV layer
Notes:

The Expert Gate/Router always runs at half / full precision for now.
If there are different weight scales between QKV layer (for separate QKV weights), they are re-quantized using layer.weight_scale.max() so we can have a single gemm for performance.

2a052011

11 Apr, 2024 1 commit
- [Core] Set `linear_weights` directly on the layer (#3977) · a10d3056
  Antoni Baum authored Apr 11, 2024
  
  a10d3056
25 Mar, 2024 1 commit
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
24 Mar, 2024 1 commit
- [BugFix] 1D query fix for MoE models (#3597) · 41deac4a
  Nick Hill authored Mar 24, 2024
  
  41deac4a
11 Mar, 2024 1 commit
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
06 Feb, 2024 1 commit
- Add fused top-K softmax kernel for MoE (#2769) · f0d4e145
  Woosuk Kwon authored Feb 05, 2024
  
  f0d4e145
31 Jan, 2024 1 commit
- Add unit test for Mixtral MoE layer (#2677) · d0d93b92
  Philipp Moritz authored Jan 31, 2024
  
  d0d93b92