Blame · docs/source/features/quantization/index.md · 313301013d01110924b011c120ec2f8ac957f666 · OpenDAS / vllm_cscc · GitLab

Find file Normal view History Permalink

index.md 314 Bytes

Newer Older

[Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung committed Jan 06, 2025	1 2 3 4 5 6	`(quantization-index)= # Quantization Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.`
[Doc] Convert docs to use colon fences (#12471) Harry Mellor committed Jan 29, 2025	7	`:::{toctree}`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung committed Jan 06, 2025	8 9 10 11 12 13 14	`:caption: Contents :maxdepth: 1 supported_hardware auto_awq bnb gguf`
[Docs] Add GPTQModel (#14056) Qubitium-ModelCloud committed Mar 03, 2025	15	`gptqmodel`
[Doc] int4 w4a16 example (#12585) Brian Dellabetta committed Jan 31, 2025	16	`int4`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung committed Jan 06, 2025	17 18	`int8 fp8`
[Doc] Quark quantization documentation (#15861) chaow-amd committed Apr 01, 2025	19	`quark`
[Docs] Update FP8 KV Cache documentation (#12238) Michael Goin committed Jan 23, 2025	20	`quantized_kvcache`
Torchao (#14231) Driss Guessous committed Apr 07, 2025	21	`torchao`
[Doc] Convert docs to use colon fences (#12471) Harry Mellor committed Jan 29, 2025	22	`:::`