- 28 Feb, 2025 6 commits
- 27 Feb, 2025 16 commits
-
-
wang jiahao authored
fix temperature
-
qiyuxinlin authored
-
Atream authored
use generation config from json file in official repo
-
Atream authored
-
wang jiahao authored
Allow temperature and top_p from /v1/chat/completions
-
lazymio authored
-
wang jiahao authored
-
Azure authored
Update issue templates
-
Azure authored
-
Atream authored
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
-
Atream authored
Fix RuntimeError on Windows caused by integer overflow in np.prod
-
Atream authored
fix: fix SSE formatting
-
Atream authored
feat: basic api key support
-
Atream authored
Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM
-
Atream authored
-
wang jiahao authored
feat:implementation of chat routing for Ollama
-
- 26 Feb, 2025 13 commits
-
-
Azure authored
[UPDATE] Update documents.
-
Azure authored
-
Atream authored
Update DeepseekR1_V3_tutorial.md
-
Atream authored
-
Atream authored
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
-
Atream authored
-
swu-hyk authored
-
swu-hyk authored
-
Chen Hongtao authored
fix numa cpu distribution
-
ZiWei Yuan authored
fix dockerfile in devcontainer and fix expert torch
-
liam authored
-
liam authored
-
wkgcass authored
The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.
-
- 25 Feb, 2025 5 commits
-
-
akemimadoka authored
-
Azure authored
📝 update benchmark.md -
liam authored
-
Azure authored
[update] Update doc.
-
liam authored
-