Commits · 9705b90bcf66ba6316e70bef442074df7ee6cebf · OpenDAS / vllm_cscc

21 Jan, 2025 1 commit

[Kernel] optimize moe_align_block_size for cuda graph and large num_experts... · 750f4cab

Jinzhen Lin authored Jan 21, 2025


[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

750f4cab

27 Dec, 2024 1 commit

Deepseek v3 (#11502) · f49777ba

Simon Mo authored Dec 26, 2024


Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>

f49777ba

24 Oct, 2024 1 commit
- [Performance][Kernel] Fused_moe Performance Improvement (#9384) · 59449095
  Charlie Fu authored Oct 24, 2024
```
Signed-off-by: charlifu <charlifu@amd.com>
```
  59449095
09 Jun, 2024 1 commit
- [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) · 5467ac31
  bnellnm authored Jun 09, 2024
  
  5467ac31
22 May, 2024 1 commit
- [CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722) · 5f6d10c1
  Michael Goin authored May 22, 2024
  
  5f6d10c1
18 Mar, 2024 1 commit
- [Bugfix] Make moe_align_block_size AMD-compatible (#3470) · 9101d832
  Woosuk Kwon authored Mar 18, 2024
  
  9101d832
15 Mar, 2024 1 commit
- Dynamically configure shared memory size for moe_align_block_size_kernel (#3376) · 78b6c484
  akhoroshev authored Mar 15, 2024
  
  78b6c484
30 Jan, 2024 2 commits
- Fused MOE for Mixtral (#2542) · ab406446
  Philipp Moritz authored Jan 29, 2024
```
Co-authored-by: chen shen <scv119@gmail.com>
```
  ab406446
- DeepseekMoE support with Fused MoE kernel (#2453) · 5d60def0
  wangding zeng authored Jan 30, 2024
```
Co-authored-by: roy <jasonailu87@gmail.com>
```
  5d60def0