raiseValueError("Can not use Marlin int4*fp16 kernel with AMD ROCm version of PyTorch as the kernel is not compatible. Please do not use `use_marlin=True` when using ROCm devices.")
ifnottorch.cuda.get_device_capability()[0]>=8:
raiseValueError(f'Can not use Marlin int4*fp16 kernel with a device of compute capability {torch.cuda.get_device_capability()}, the minimum compute capability is 8.0 for Marlin kernel. Please do not use `use_marlin=True`, or please upgrade your GPU ("The more you buy, the more you save." - Taiwanese proverb).')
ifinfeatures%128!=0oroutfeatures%256!=0:
raiseValueError("`infeatures` must be divisible by 128 and `outfeatures` by 256.")
ifbitsnotin[4]:
raiseNotImplementedError("Only 4 bits are supported.")
The codes in this directory are mainly referenced from @qwopqwop200 's [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda), which itself is based on [gptq](https://github.com/IST-DASLab/gptq)