"vscode:/vscode.git/clone" did not exist on "ca012d39c6ba265091d9373c8ca00157b933d3e9"
  • Daniël de Kok's avatar
    Add basic FP8 KV cache support (#2603) · 2358c2bb
    Daniël de Kok authored
    * Add basic FP8 KV cache support
    
    This change adds rudimentary FP8 KV cache support. The support is
    enabled by passing `--kv-cache-dtype fp8_e5m2` to the launcher. Doing so
    uses this type for the KV cache. However support is still limited:
    
    * Only the `fp8_e5m2` type is supported.
    * The KV cache layout is the same as `float16`/`bfloat16` (HND).
    * The FP8 KV cache is only supported for FlashInfer.
    * Loading of scales is not yet supported.
    
    * Fix Cargo.toml
    2358c2bb
launcher.md 18 KB