Commits · e645d847941bea224b5ec8f521ed5df2284ebbfd · ox696c / ktransformers

27 Feb, 2025 1 commit
- use generation config from json file in official repo · e645d847
  Atream authored Feb 27, 2025
  
  e645d847
26 Feb, 2025 1 commit
- ⚡ fix experts torch · ffb86c66
  liam authored Feb 26, 2025
  
  ffb86c66
25 Feb, 2025 2 commits
- fix-update-flashinfer_wrapper_local_chat · 477ac28a
  Atream authored Feb 25, 2025
  
  477ac28a
- support absorb for prefill long context · f4c198bd
  Atream authored Feb 25, 2025
  
  f4c198bd
24 Feb, 2025 1 commit
- Add data loader to read special weights for fp8; Add special weight process script · 581a524f
  Azure authored Feb 24, 2025
  
  581a524f
23 Feb, 2025 3 commits
- support Moonlight · e8e02e5c
  Atream authored Feb 23, 2025
  
  e8e02e5c
- tmp · 95d937c5
  DDong Jianwei authored Feb 23, 2025
  
  95d937c5
- remove causal mask · 006e8c6a
  Atream authored Feb 23, 2025
  
  006e8c6a
22 Feb, 2025 2 commits

Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎... · 7b7c6a65
Azure authored Feb 22, 2025
```
Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225'
```
7b7c6a65

optimize gguf dequant, save mem, support Q2_K · 5ec33d04

Atream authored Feb 22, 2025

use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM

5ec33d04

19 Feb, 2025 1 commit
- clean PR code and disable flashinfer · a5295183
  Atream authored Feb 19, 2025
  
  a5295183
18 Feb, 2025 1 commit
- fix: adapt prefix cache in `forward_linux_flashinfer` · 2ffb43f9
  ceerrep authored Feb 18, 2025
  
  2ffb43f9
17 Feb, 2025 1 commit
- fix precision bug imported by position_ids in 0.2.0 · 038bc308
  Atream authored Feb 17, 2025
  
  038bc308
16 Feb, 2025 2 commits
- fix: use flash_attn for faster prefill · 5ac26608
  ceerrep authored Feb 17, 2025
  
  5ac26608
- Mock triton mla due to precision issue · ff6b265e
  Azure authored Feb 16, 2025
  
  ff6b265e
15 Feb, 2025 3 commits
- toy support for experts on GPU, no CUDA Graph · c189d55b
  Atream authored Feb 15, 2025
  
  c189d55b
- Update attention.py · 92399283
  Atream authored Feb 15, 2025
  
  92399283
- Update triton_attention.py · d90749d3
  Atream authored Feb 15, 2025
  
  d90749d3
14 Feb, 2025 1 commit
- linux support triton MLA kernel · 1084d4e4
  Atream authored Feb 14, 2025
  
  1084d4e4
13 Feb, 2025 3 commits
- init support for MLA using Attention kernel · bb35dc5b
  Atream authored Feb 13, 2025
  
  bb35dc5b
- 📝 ⚡ fix some debug output and update doc · 8d5ebe49
  liam authored Feb 13, 2025
  
  8d5ebe49
- 📝 add doc support and fix bug in qwen2 · c74453d8
  liam authored Feb 13, 2025
  
  c74453d8
10 Feb, 2025 1 commit
- ⚡ ready to publish · 83401dbb
  liam authored Feb 10, 2025
  
  83401dbb
07 Feb, 2025 1 commit
- support KExpertsMarlin backend · c4d9bc66
  Azure authored Feb 07, 2025
  
  c4d9bc66
06 Feb, 2025 1 commit
- modify moeinfer param · 027b1126
  Azure authored Feb 06, 2025
  
  027b1126
04 Feb, 2025 1 commit
- done support deepseekv3 · 907251c7
  Azure authored Feb 04, 2025
  
  907251c7
01 Feb, 2025 2 commits
- fix rope; update moegate · f748cd29
  Azure authored Feb 01, 2025
  
  f748cd29
- update rope calculation; update modeling.py; update gate for moe · f873558a
  Azure authored Feb 01, 2025
  
  f873558a
31 Jan, 2025 1 commit
- support deepseekv3; runable but have precition problem · 476b1d8d
  Azure authored Jan 31, 2025
  
  476b1d8d
12 Sep, 2024 1 commit
- typo fix: KMisrtal -> KMistral · 234faf79
  xhedit authored Sep 12, 2024
  
  234faf79
02 Sep, 2024 1 commit
- fix qlen > 1000 mask is none error · c55de02f
  Azure authored Sep 02, 2024
  
  c55de02f
29 Aug, 2024 1 commit
- Fix cannot offload whole layer in cpu · 6735beb5
  TangJingqi authored Aug 29, 2024
  
  6735beb5
28 Aug, 2024 1 commit
- [feature] release 0.1.3 · 4d1d561d
  chenxl authored Aug 28, 2024
  
  4d1d561d
15 Aug, 2024 1 commit
- [fix] format classes and files name · 67043b4b
  TangJingqi authored Aug 15, 2024
  
  67043b4b
14 Aug, 2024 1 commit

[feature] experts can be injected using CPUInfer · 412055d4

Atream authored Aug 14, 2024

[fix] fix ktransformers interface when use new CUDAGraphRunner
[fix] fix YAML and optimize logic, the top rule has the highest priority

412055d4

12 Aug, 2024 1 commit
- [ADD] support multi-gpu qlen>1 q5_k · f5f79f5c
  chenxl authored Aug 12, 2024
  
  f5f79f5c
08 Aug, 2024 3 commits
- 1) Linear and MLP operators support qlen>1; 2) All operators now share a... · c1cc7d2c
  chenht2022 authored Aug 08, 2024
```
1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic.
```
  c1cc7d2c
- [fix] linux and windows can all find CPUInfer in current Directory · 1f92f7cc
  Atream authored Aug 08, 2024
  
  1f92f7cc
- support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from... · 0a2fd52c
  Atream authored Aug 07, 2024
```
support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from pygguf(It was added before, but disappear after merge). Add some TODO in the code.
```
  0a2fd52c
27 Jul, 2024 1 commit
- Initial commit · 18c42e67
  chenxl authored Jul 27, 2024
  
  18c42e67