"docs/vscode:/vscode.git/clone" did not exist on "8ba482a6e7064320857d1fb7c332bb08c22c9d31"
  • Daniël de Kok's avatar
    Add support for exl2 quantization · 36dd1601
    Daniël de Kok authored
    Mostly straightforward, changes to existing code:
    
    * Wrap quantizer parameters in a small wrapper to avoid passing
      around untyped tuples and needing to repack them as a dict.
    * Move scratch space computation to warmup, because we need the
      maximum input sequence length to avoid allocating huge
      scratch buffers that OOM.
    36dd1601
tensor_parallel.py 8.55 KB