**The prefill of KTrans V0.3 is up to <u>x3.45</u> times faster than KTrans V0.2. The decoding speed is the same as KTrans V0.2 (6 experts version) so it is omitted.**
**The prefill of KTrans V0.3 is up to <u>x3.45</u> times faster than KTrans V0.2, and is up to <u>x63.53</u> times faster than Llama.**
**The decoding speed is the same as KTrans V0.2 (6 experts version) so it is omitted.**
The main acceleration comes from
The main acceleration comes from
- Intel AMX instruction set and our specially designed cache friendly memory layout
- Intel AMX instruction set and our specially designed cache friendly memory layout