Blame · docs/source/features/quantization/index.md · dcb5624aa2460d62ca09304a053a98ec633b0f79 · OpenDAS / vllm_cscc · GitLab

Find file Normal view History Permalink

index.md 322 Bytes

Newer Older

[Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung committed Jan 06, 2025	1 2 3 4 5 6	`(quantization-index)= # Quantization Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.`
[Doc] Convert docs to use colon fences (#12471) Harry Mellor committed Jan 29, 2025	7	`:::{toctree}`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung committed Jan 06, 2025	8 9 10 11 12 13	`:caption: Contents :maxdepth: 1 supported_hardware auto_awq bnb`
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036) Lei Wang committed Apr 22, 2025	14	`bitblas`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung committed Jan 06, 2025	15	`gguf`
[Docs] Add GPTQModel (#14056) Qubitium-ModelCloud committed Mar 03, 2025	16	`gptqmodel`
[Doc] int4 w4a16 example (#12585) Brian Dellabetta committed Jan 31, 2025	17	`int4`
[Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung committed Jan 06, 2025	18 19	`int8 fp8`
[Doc] Quark quantization documentation (#15861) chaow-amd committed Apr 01, 2025	20	`quark`
[Docs] Update FP8 KV Cache documentation (#12238) Michael Goin committed Jan 23, 2025	21	`quantized_kvcache`
Torchao (#14231) Driss Guessous committed Apr 07, 2025	22	`torchao`
[Doc] Convert docs to use colon fences (#12471) Harry Mellor committed Jan 29, 2025	23	`:::`