- 27 Feb, 2025 6 commits
-
-
Atream authored
Fix RuntimeError on Windows caused by integer overflow in np.prod
-
Atream authored
fix: fix SSE formatting
-
Atream authored
feat: basic api key support
-
Atream authored
Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM
-
Atream authored
-
wang jiahao authored
feat:implementation of chat routing for Ollama
-
- 26 Feb, 2025 13 commits
-
-
Azure authored
[UPDATE] Update documents.
-
Azure authored
-
Atream authored
Update DeepseekR1_V3_tutorial.md
-
Atream authored
-
Atream authored
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
-
Atream authored
-
swu-hyk authored
-
swu-hyk authored
-
Chen Hongtao authored
fix numa cpu distribution
-
ZiWei Yuan authored
fix dockerfile in devcontainer and fix expert torch
-
liam authored
-
liam authored
-
wkgcass authored
The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.
-
- 25 Feb, 2025 21 commits
-
-
akemimadoka authored
-
Azure authored
📝 update benchmark.md -
liam authored
-
Azure authored
[update] Update doc.
-
liam authored
-
Azure authored
-
ZiWei Yuan authored
⚡ release v0.2.2rc1 -
liam authored
-
Azure authored
[release] Release 0.2.2rc.
-
Azure authored
[update] Update readme.
-
Azure authored
Update README.md
-
Atream authored
-
Atream authored
-
Azure authored
-
Atream authored
-
Atream authored
-
Azure authored
-
ZiWei Yuan authored
📝 add benchmark.md -
liam authored
-
ZiWei Yuan authored
⚡ update git ignore add docker dev container -
liam authored
-