"vscode:/vscode.git/clone" did not exist on "a964e5e6c35e8f22bd7663dcf93d1c801421a029"
-
Brian Dellabetta authored
Based on a request by @mgoin , with @kylesayrs we have added an example doc for int4 w4a16 quantization, following the pre-existing int8 w8a8 quantization example and the example available in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16/llama3_example.py ) FIX #n/a (no issue created) @kylesayrs and I have discussed a couple additional improvements for the quantization docs. We will revisit at a later date, possibly including: - A section for "choosing the correct quantization scheme/ compression technique" - Additional vision or audio calibration datasets --------- Signed-off-by:
Brian Dellabetta <bdellabe@redhat.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
44bbca78