[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)
Signed-off-by:mgoin <michael@neuralmagic.com> Co-authored-by:
Michael Goin <mgoin@redhat.com> Co-authored-by:
mgoin <michael@neuralmagic.com>
Showing
Please register or sign in to comment