"git@developer.sourcefind.cn:hehl2/torchaudio.git" did not exist on "5ec6ada6383b098d8c9363306c215787c67f37e9"
Unverified Commit 6df5fe2a authored by carlushuang's avatar carlushuang Committed by GitHub
Browse files

[CK_TILE]naive attn support FP8 KVCache quant (#1747)



* quant

* fix bug

* simple smoothquant after softmax

* update kv-quant

* update stride

* fix fp8-pertoken-kvcache

* update int8/fp8 quant support

---------

Co-authored-by: so <a.com>
Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
parent 4f62f6e9
...@@ -1140,6 +1140,7 @@ bool run(const ck_tile::ArgParser& arg_parser) ...@@ -1140,6 +1140,7 @@ bool run(const ck_tile::ArgParser& arg_parser)
naive_t.v_layout = i_perm == 1 ? "bhsd" : "bshd"; naive_t.v_layout = i_perm == 1 ? "bhsd" : "bshd";
naive_t.o_layout = o_perm == 1 ? "bhsd" : "bshd"; naive_t.o_layout = o_perm == 1 ? "bhsd" : "bshd";
naive_t.variation = 0; // TODO? naive_t.variation = 0; // TODO?
naive_t.quant_algo = 0;
ck_tile::DeviceMem o_naive_buf(o_host.get_element_space_size_in_bytes()); ck_tile::DeviceMem o_naive_buf(o_host.get_element_space_size_in_bytes());
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment