**The prefill of KTrans V0.3 is up to <u>x3.45</u> times faster than KTrans V0.2. The decoding speed is the same as KTrans V0.2 (6 experts version) so it is omitted.**
The main acceleration comes from
- Intel AMX instruction set and our specially designed cache friendly memory layout
- Expert selection strategy that selects fewer experts based on offline profile results of out of domain data