[Workaround] Use bf16 lds to save fp32 input
quantize_transpose_vector_blockwise function use lds exceeding 64kb when
input type is fp32. But max size of lds in dcu is 64kb, thus we use lds
as bfp16 for workaround.
Signed-off-by:
wenjh <wenjh@sugon.com>
Showing
Please register or sign in to comment