Unverified Commit 3c6035aa authored by ZiWei Yuan's avatar ZiWei Yuan Committed by GitHub
Browse files

Merge pull request #329 from kvcache-ai/fix_doc

📝 fix typo
parents 65d73ea3 69b00753
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
- [Why KTransformers So Fast](en/deepseek-v2-injection.md) - [Why KTransformers So Fast](en/deepseek-v2-injection.md)
- [Injection Tutorial](en/injection_tutorial.md) - [Injection Tutorial](en/injection_tutorial.md)
- [Multi-GPU Tutorial](en/multi-gpu-tutorial.md) - [Multi-GPU Tutorial](en/multi-gpu-tutorial.md)
# Server(Temperary Deprected) # Server (Temporary Deprecated)
- [Server](en/api/server/server.md) - [Server](en/api/server/server.md)
- [Website](en/api/server/website.md) - [Website](en/api/server/website.md)
- [Tabby](en/api/server/tabby.md) - [Tabby](en/api/server/tabby.md)
......
...@@ -83,7 +83,7 @@ Memory: standard DDR5-4800 server DRAM (1 TB), each socket with 8×DDR5-4800 ...@@ -83,7 +83,7 @@ Memory: standard DDR5-4800 server DRAM (1 TB), each socket with 8×DDR5-4800
#### Change Log #### Change Log
- Longer Context (from 4K to 8K for 24GB VRAM) and Slightly Faster Speed (+15%):<br> - Longer Context (from 4K to 8K for 24GB VRAM) and Slightly Faster Speed (+15%):<br>
Integrated the highly efficient Triton MLA Kernel from the fantastic sglang project, enable much longer context length and slightly faster prefill/decode speed Integrated the highly efficient Triton MLA Kernel from the fantastic sglang project, enable much longer context length and slightly faster prefill/decode speed
- We suspect the impressive improvement comes from the change of hardwre platform (4090D->4090) - We suspect that some of the improvements come from the change of hardwre platform (4090D->4090)
#### Benchmark Results #### Benchmark Results
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment