Unverified Commit 6df5fe2a authored by carlushuang's avatar carlushuang Committed by GitHub
Browse files

[CK_TILE]naive attn support FP8 KVCache quant (#1747)



* quant

* fix bug

* simple smoothquant after softmax

* update kv-quant

* update stride

* fix fp8-pertoken-kvcache

* update int8/fp8 quant support

---------

Co-authored-by: so <a.com>
Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
parent 4f62f6e9
...@@ -1131,15 +1131,16 @@ bool run(const ck_tile::ArgParser& arg_parser) ...@@ -1131,15 +1131,16 @@ bool run(const ck_tile::ArgParser& arg_parser)
{ {
// NOTE: use gpu to do validation // NOTE: use gpu to do validation
ck_tile::naive_attention_fwd_traits naive_t; ck_tile::naive_attention_fwd_traits naive_t;
naive_t.q_type = data_type; naive_t.q_type = data_type;
naive_t.k_type = data_type; naive_t.k_type = data_type;
naive_t.v_type = data_type; naive_t.v_type = data_type;
naive_t.o_type = data_type; naive_t.o_type = data_type;
naive_t.q_layout = i_perm == 1 ? "bhsd" : "bshd"; naive_t.q_layout = i_perm == 1 ? "bhsd" : "bshd";
naive_t.k_layout = i_perm == 1 ? "bhsd" : "bshd"; naive_t.k_layout = i_perm == 1 ? "bhsd" : "bshd";
naive_t.v_layout = i_perm == 1 ? "bhsd" : "bshd"; naive_t.v_layout = i_perm == 1 ? "bhsd" : "bshd";
naive_t.o_layout = o_perm == 1 ? "bhsd" : "bshd"; naive_t.o_layout = o_perm == 1 ? "bhsd" : "bshd";
naive_t.variation = 0; // TODO? naive_t.variation = 0; // TODO?
naive_t.quant_algo = 0;
ck_tile::DeviceMem o_naive_buf(o_host.get_element_space_size_in_bytes()); ck_tile::DeviceMem o_naive_buf(o_host.get_element_space_size_in_bytes());
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment