Unverified Commit 073ce601 authored by Atream's avatar Atream Committed by GitHub
Browse files

Update AMX.md

parent 2bcdf10f
......@@ -5,7 +5,7 @@ What excites me most about Qwen3MoE is that, unlike the 671 B “giant” model,
Server CPU (Xeon 4) + RTX 4090
Consumer-grade CPU (Core i9-14900KF + dual-channel DDR4-4000 MT/s) + RTX 4090
Consumer-grade CPU (Core i9-14900KF + dual-channel DDR5-4000 MT/s) + RTX 4090
The results are as follows:
......@@ -170,4 +170,4 @@ KTransformers allows users to easily switch between different backends through s
**Note:** Currently, using AMXInt8 requires reading weights from a BF16 GGUF file and performing online quantization during model loading. This may cause slightly slower load times. Future versions will provide pre-quantized weights to eliminate this overhead.
![Image](https://github.com/user-attachments/assets/7c33c410-3af9-456f-aa67-5b24e19ba680)
\ No newline at end of file
![Image](https://github.com/user-attachments/assets/7c33c410-3af9-456f-aa67-5b24e19ba680)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment