Commits · ca7366d2dbb81751e65f1fa8b7b5fba9165f9f42 · wqshmzh / ktransformers

24 Feb, 2025 1 commit
- Add data loader to read special weights for fp8; Add special weight process script · 581a524f
  Azure authored Feb 24, 2025
  
  581a524f
22 Feb, 2025 3 commits
- Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎... · 7b7c6a65
  Azure authored Feb 22, 2025
```
Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225'
```
  7b7c6a65
- fix merge bug, this branch also padding Marlin · f7f10598
  Atream authored Feb 22, 2025
  
  f7f10598
- optimize gguf dequant, save mem, support Q2_K · 5ec33d04
  Atream authored Feb 22, 2025
```
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
```
  5ec33d04
21 Feb, 2025 1 commit
- optimize GPU · 7e1fe256
  Atream authored Feb 21, 2025
  
  7e1fe256
19 Feb, 2025 1 commit
- feat: Support Moore Threads GPU · 2207f6cd
  Xiaodong Ye authored Feb 19, 2025
```
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
```
  2207f6cd
15 Feb, 2025 1 commit
- toy support for experts on GPU, no CUDA Graph · c189d55b
  Atream authored Feb 15, 2025
  
  c189d55b
09 Feb, 2025 1 commit
- ⚡ v0.2 ongoing · 098602b0
  liam authored Feb 09, 2025
  
  098602b0
06 Feb, 2025 1 commit
- ⚡ fix moe.cpp int overflow problem · 3dca28d2
  liam authored Feb 06, 2025
  
  3dca28d2
09 Oct, 2024 1 commit
- Adapt Windows · 14869b55
  chenht2022 authored Oct 09, 2024
  
  14869b55
13 Sep, 2024 1 commit
- fix some dequant function dosen't support multi gpu bug · 3758afb5
  Azure authored Sep 13, 2024
  
  3758afb5
11 Sep, 2024 1 commit
- Use cond var to avoid busy loop · 6666d622
  Yap Sok Ann authored Sep 10, 2024
  
  6666d622
02 Sep, 2024 1 commit
- Support IQ4_XS dequantize · be356c1b
  Yap Sok Ann authored Sep 02, 2024
  
  be356c1b
28 Aug, 2024 1 commit
- [feature] release 0.1.3 · 4d1d561d
  chenxl authored Aug 28, 2024
  
  4d1d561d
12 Aug, 2024 3 commits
- [feature] support q2_k & q3_k dequantize on gpu · 7c4cb520
  BITcyman authored Aug 12, 2024
  
  7c4cb520
- Update task_queue.h · 3c675af6
  Atream authored Aug 12, 2024
  
  3c675af6
- [ADD] support multi-gpu qlen>1 q5_k · f5f79f5c
  chenxl authored Aug 12, 2024
  
  f5f79f5c
09 Aug, 2024 1 commit
- [feature] add bat for windows, update readme · 782a17e4
  chenxl authored Aug 09, 2024
  
  782a17e4
08 Aug, 2024 3 commits
- 1) Linear and MLP operators support qlen>1; 2) All operators now share a... · c1cc7d2c
  chenht2022 authored Aug 08, 2024
```
1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic.
```
  c1cc7d2c
- fix some bug in compile in linux · 1d9d3975
  chenxl authored Aug 08, 2024
  
  1d9d3975
- support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from... · 0a2fd52c
  Atream authored Aug 07, 2024
```
support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from pygguf(It was added before, but disappear after merge). Add some TODO in the code.
```
  0a2fd52c
31 Jul, 2024 1 commit
- [feature] support python 310 and multi instruction · 112cb3c9
  chenxl authored Jul 31, 2024
  
  112cb3c9
27 Jul, 2024 1 commit
- Initial commit · 18c42e67
  chenxl authored Jul 27, 2024
  
  18c42e67