Commits · ca1dc1e7d16958893aa4ef3e005ad419e55a4b71 · ox696c / ktransformers

01 Mar, 2025 2 commits
- Update local_chat.py · 71286ec1
  宁鹏涛 authored Mar 01, 2025
```
修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真
```
  71286ec1
- support chunk prefill, support 139K context for 24G VRAM · f35e8d41
  Atream authored Mar 01, 2025
  
  f35e8d41
28 Feb, 2025 1 commit
- ⚡ fix server cache lens · 8ddc9906
  liam authored Mar 01, 2025
  
  8ddc9906
27 Feb, 2025 4 commits
- Delete unused code · a34a25d5
  Shuaiyi authored Feb 27, 2025
  
  a34a25d5
- fix temperature · 22df52e9
  qiyuxinlin authored Feb 27, 2025
  
  22df52e9
- use generation config from json file in official repo · e645d847
  Atream authored Feb 27, 2025
  
  e645d847
- Fix according to upstream changes · b121ca4d
  lazymio authored Feb 27, 2025
  
  b121ca4d
26 Feb, 2025 5 commits
- Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml · 90eb87b3
  Atream authored Feb 26, 2025
  
  90eb87b3
- modify · ec7e912f
  swu-hyk authored Feb 26, 2025
  
  ec7e912f
- implementation of chat routing for Ollama · 68e7df3a
  swu-hyk authored Feb 26, 2025
  
  68e7df3a
- ⚡ fix experts torch · ffb86c66
  liam authored Feb 26, 2025
  
  ffb86c66
- fix numa cpu distribution · b2bff177
  wkgcass authored Feb 26, 2025
```
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
```
  b2bff177
25 Feb, 2025 9 commits
- Fix RuntimeError on Windows caused by integer overflow in np.prod · 8817777e
  akemimadoka authored Feb 26, 2025
  
  8817777e
- ⚡ release v0.2.2rc1 · ddf33393
  liam authored Feb 25, 2025
  
  ddf33393
- add fp8 multi gpu yaml example · 2c0cce90
  Azure authored Feb 25, 2025
  
  2c0cce90
- fix-update-flashinfer_wrapper_local_chat · 477ac28a
  Atream authored Feb 25, 2025
  
  477ac28a
- fix fp8 multi gpu; update FQA · 7e5962af
  Azure authored Feb 25, 2025
  
  7e5962af
- ⚡ update git ignore add docker dev container · 0ca0b99f
  liam authored Feb 25, 2025
  
  0ca0b99f
- support absorb for prefill long context · f4c198bd
  Atream authored Feb 25, 2025
  
  f4c198bd
- Update doc · 36fbeee3
  Azure authored Feb 25, 2025
  
  36fbeee3
- feat: basic api key support · f639fbc1
  ceerrep authored Feb 25, 2025
  
  f639fbc1
24 Feb, 2025 11 commits
- update fp8 kernel tutorial · 4dc5518e
  Azure authored Feb 24, 2025
  
  4dc5518e
- Ensure backward compatibility with Torch 2.2 · f88c05a6
  Xiaodong Ye authored Feb 24, 2025
```
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
```
  f88c05a6
- Left out · 07eb712a
  lazymio authored Feb 24, 2025
  
  07eb712a
- Default values · 91062a83
  lazymio authored Feb 24, 2025
  
  91062a83
- Revert repetition_penalty as it is not in API spec · 76487c4d
  lazymio authored Feb 24, 2025
  
  76487c4d
- Also /chat/completions · 05ad2884
  lazymio authored Feb 24, 2025
  
  05ad2884
- Also allow repetition_penalty · bf36547f
  lazymio authored Feb 24, 2025
  
  bf36547f
- Allow temperature and top_p from requests · 8704c091
  lazymio authored Feb 24, 2025
  
  8704c091
- Add data loader to read special weights for fp8; Add special weight process script · 581a524f
  Azure authored Feb 24, 2025
  
  581a524f
- fix KExpertsMarlin on GPU with out CUDA Graph · f3276950
  Atream authored Feb 24, 2025
  
  f3276950
- Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM · cea07d19
  Yuhao Tsui authored Feb 24, 2025
```
This change explicitly clears CUDA cache during weight loading to mitigate memory fragmentation issues, particularly beneficial for low-VRAM GPUs.
```
  cea07d19
23 Feb, 2025 7 commits
- Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC · 706e69f4
  akemimadoka authored Feb 24, 2025
  
  706e69f4
- update yaml · f5f6c6b9
  Atream authored Feb 23, 2025
  
  f5f6c6b9
- support Moonlight · e8e02e5c
  Atream authored Feb 23, 2025
  
  e8e02e5c
- tmp · 95d937c5
  DDong Jianwei authored Feb 23, 2025
  
  95d937c5
- remove causal mask · 006e8c6a
  Atream authored Feb 23, 2025
  
  006e8c6a
- fix bf16 load, TODO: refactor cpu dequant · 036ae25a
  Atream authored Feb 23, 2025
  
  036ae25a
- musa: support bf16 · 18b1d183
  Xiaodong Ye authored Feb 23, 2025
```
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
```
  18b1d183
22 Feb, 2025 1 commit
- Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎... · 7b7c6a65
  Azure authored Feb 22, 2025
```
Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225'
```
  7b7c6a65