Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ox696c
ktransformers
Commits
d3ebdafd
Commit
d3ebdafd
authored
Apr 28, 2025
by
qiyuxinlin
Browse files
update readme
parent
59b0631e
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
3 deletions
+9
-3
README.md
README.md
+5
-2
doc/en/AMX.md
doc/en/AMX.md
+4
-1
No files found.
README.md
View file @
d3ebdafd
...
@@ -23,8 +23,11 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin
...
@@ -23,8 +23,11 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin
<h2 id="Updates">
🔥 Updates
</h2>
<h2 id="Updates">
🔥 Updates
</h2>
*
**Apr 29, 2025**
: Support AMX-Int8 and AMX-BF16(
[
Tutorial
](
./doc/en/AMX.md
)
). Support Qwen3MoE
*
**Apr 29, 2025**
: Support AMX-Int8 and AMX-BF16(
[
Tutorial
](
./doc/en/AMX.md
)
). Support Qwen3MoE
https://github.com/user-attachments/assets/fafe8aec-4e22-49a8-8553-59fb5c6b00a2
<p align="center">
📹
<a href="[202504290023-4.mov](https://github.com/user-attachments/assets/fafe8aec-4e22-49a8-8553-59fb5c6b00a2)">
Qwen3MoE+AMX
</a>
</p>
*
**Apr 9, 2025**
: Experimental support for LLaMA 4 models (
[
Tutorial
](
./doc/en/llama4.md
)
).
*
**Apr 9, 2025**
: Experimental support for LLaMA 4 models (
[
Tutorial
](
./doc/en/llama4.md
)
).
*
**Apr 2, 2025**
: Support Multi-concurrency. (
[
Tutorial
](
./doc/en/balance-serve.md
)
).
*
**Apr 2, 2025**
: Support Multi-concurrency. (
[
Tutorial
](
./doc/en/balance-serve.md
)
).
...
...
doc/en/AMX.md
View file @
d3ebdafd
...
@@ -9,7 +9,10 @@ Consumer-grade CPU (Core i9-14900KF + dual-channel DDR4-4000 MT/s) + RTX 4090
...
@@ -9,7 +9,10 @@ Consumer-grade CPU (Core i9-14900KF + dual-channel DDR4-4000 MT/s) + RTX 4090
The results are as follows:
The results are as follows:
https://github.com/user-attachments/assets/fafe8aec-4e22-49a8-8553-59fb5c6b00a2
<p
align=
"center"
>
📹
<a
href=
"[202504290023-4.mov](https://github.com/user-attachments/assets/fafe8aec-4e22-49a8-8553-59fb5c6b00a2)"
>
Qwen3MoE+AMX
</a>
</p>
| Machine | Model | GPU Memory | RAM Usage | Prefill (tokens/s) | Decode (tokens/s) |
| Machine | Model | GPU Memory | RAM Usage | Prefill (tokens/s) | Decode (tokens/s) |
| Workstation (Xeon 4 + RTX 4090) | Qwen3-30B-A3B (8-bit) | 8.6 GB | 44 GB | 313 | 33 (single) → 50 (4-way) |
| Workstation (Xeon 4 + RTX 4090) | Qwen3-30B-A3B (8-bit) | 8.6 GB | 44 GB | 313 | 33 (single) → 50 (4-way) |
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment