@@ -154,6 +154,18 @@ the output quality doesn't change. But the speed of decoding and prefill
...
@@ -154,6 +154,18 @@ the output quality doesn't change. But the speed of decoding and prefill
is speed up which is inspiring. So our showcase makes use of this finding*
is speed up which is inspiring. So our showcase makes use of this finding*
## How to Run
## How to Run
### V0.2.2 longer context
If you want to use long context(longer than 20K) for prefill, enable the matrix absorption MLA during the prefill phase, which will significantly reduce the size of the kv cache. Modify yaml file like this: