Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ktransformers
Commits
e70db18b
Commit
e70db18b
authored
Apr 28, 2025
by
qiyuxinlin
Browse files
update AMX readme
parent
2e905c8b
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
6 deletions
+8
-6
doc/en/AMX.md
doc/en/AMX.md
+8
-6
No files found.
doc/en/AMX.md
View file @
e70db18b
...
@@ -19,8 +19,10 @@ You can see that, thanks to the AMX instruction optimizations, we achieve up to
...
@@ -19,8 +19,10 @@ You can see that, thanks to the AMX instruction optimizations, we achieve up to
Here is the Qwen3MoE startup command:
Here is the Qwen3MoE startup command:
```
python
```
python
python
ktransformers
/
server
/
main
.
py
--
architectures
Qwen3MoeForCausalLM
--
model_path
<
model_dir
>
--
gguf_path
<
gguf_dir
>
--
optimize_config_path
ktransformers
/
optimize
/
optimize_rules
/
Qwen3Moe
-
serve
.
yaml
# llamafile backend
# llamafile backend
python
ktransformers
/
server
/
main
.
py
--
architectures
Qwen3MoeForCausalLM
--
model_path
<
model_dir
>
--
gguf_path
<
gguf_dir
>
--
optimize_config_path
ktransformers
/
optimize
/
optimize_rules
/
Qwen3Moe
-
serve
-
amx
.
yaml
# AMX backend
python
ktransformers
/
server
/
main
.
py
--
architectures
Qwen3MoeForCausalLM
--
model_path
<
model_dir
>
--
gguf_path
<
gguf_dir
>
--
optimize_config_path
ktransformers
/
optimize
/
optimize_rules
/
Qwen3Moe
-
serve
.
yaml
# AMX backend
python
ktransformers
/
server
/
main
.
py
--
architectures
Qwen3MoeForCausalLM
--
model_path
<
model_dir
>
--
gguf_path
<
gguf_dir
>
--
optimize_config_path
ktransformers
/
optimize
/
optimize_rules
/
Qwen3Moe
-
serve
-
amx
.
yaml
```
```
**Note: At present, Qwen3MoE running with AMX can only read BF16 GGUF; support for loading from safetensor will be added later.**
**Note: At present, Qwen3MoE running with AMX can only read BF16 GGUF; support for loading from safetensor will be added later.**
...
@@ -62,7 +64,7 @@ Taking INT8 as an example, AMX can perform the multiplication of two 16×64 sub-
...
@@ -62,7 +64,7 @@ Taking INT8 as an example, AMX can perform the multiplication of two 16×64 sub-
<p
align=
"center"
>
<p
align=
"center"
>
<picture>
<picture>
<img
alt=
"amx_intro"
src=
"
https://github.com/kvcache-ai/ktransformers/tree/main/doc
/assets/amx_intro.png"
width=
60%
>
<img
alt=
"amx_intro"
src=
"
..
/assets/amx_intro.png"
width=
60%
>
</picture>
</picture>
</p>
</p>
...
@@ -87,7 +89,7 @@ During inference, we designed around the CPU’s multi-level cache hierarchy to
...
@@ -87,7 +89,7 @@ During inference, we designed around the CPU’s multi-level cache hierarchy to
<p
align=
"center"
>
<p
align=
"center"
>
<picture>
<picture>
<img
alt=
"amx"
src=
"
https://github.com/kvcache-ai/ktransformers/tree/main/doc
/assets/amx.png"
width=
60%
>
<img
alt=
"amx"
src=
"
..
/assets/amx.png"
width=
60%
>
</picture>
</picture>
</p>
</p>
...
@@ -104,7 +106,7 @@ Although AMX is highly efficient for large-scale matrix multiplication, it perfo
...
@@ -104,7 +106,7 @@ Although AMX is highly efficient for large-scale matrix multiplication, it perfo
<p
align=
"center"
>
<p
align=
"center"
>
<picture>
<picture>
<img
alt=
"amx_avx"
src=
"
https://github.com/kvcache-ai/ktransformers/tree/main/doc
/assets/amx_avx.png"
width=
60%
>
<img
alt=
"amx_avx"
src=
"
..
/assets/amx_avx.png"
width=
60%
>
</picture>
</picture>
</p>
</p>
...
@@ -124,7 +126,7 @@ Thanks to these optimizations, our kernel can achieve 21 TFLOPS of BF16 throughp
...
@@ -124,7 +126,7 @@ Thanks to these optimizations, our kernel can achieve 21 TFLOPS of BF16 throughp
<p
align=
"center"
>
<p
align=
"center"
>
<picture>
<picture>
<img
alt=
"onednn_1"
src=
"
https://github.com/kvcache-ai/ktransformers/tree/main/doc
/assets/onednn_1.png"
width=
60%
>
<img
alt=
"onednn_1"
src=
"
..
/assets/onednn_1.png"
width=
60%
>
</picture>
</picture>
</p>
</p>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment