K

kvpress-FINCH-Qwen3-8B_pytorch

FINCH将context分成多块，然后依次与prompt拼接输入模型并级联起来推理，从而把完整context切小来减小显存占用。

README
Apache License 2.0
Auto DevOps enabled