Unverified Commit df5c8795 authored by Reid's avatar Reid Committed by GitHub
Browse files

[doc] update wrong hf model links (#17184)


Signed-off-by: default avatarreidliu41 <reid201711@gmail.com>
Co-authored-by: default avatarreidliu41 <reid201711@gmail.com>
parent 423e9f1c
......@@ -6,7 +6,7 @@ To create a new 4-bit quantized model, you can leverage [AutoAWQ](https://github
Quantization reduces the model's precision from BF16/FP16 to INT4 which effectively reduces the total model memory footprint.
The main benefits are lower latency and memory usage.
You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?sort=trending&search=awq).
You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?search=awq).
```console
pip install autoawq
......
......@@ -20,8 +20,8 @@ vLLM reads the model's config file and supports pre-quantized checkpoints.
You can find pre-quantized models on:
- [Hugging Face (BitBLAS)](https://huggingface.co/models?other=bitblas)
- [Hugging Face (GPTQ)](https://huggingface.co/models?other=gptq)
- [Hugging Face (BitBLAS)](https://huggingface.co/models?search=bitblas)
- [Hugging Face (GPTQ)](https://huggingface.co/models?search=gptq)
Usually, these repositories have a `quantize_config.json` file that includes a `quantization_config` section.
......
......@@ -14,7 +14,7 @@ pip install bitsandbytes>=0.45.3
vLLM reads the model's config file and supports both in-flight quantization and pre-quantized checkpoint.
You can find bitsandbytes quantized models on <https://huggingface.co/models?other=bitsandbytes>.
You can find bitsandbytes quantized models on <https://huggingface.co/models?search=bitsandbytes>.
And usually, these repositories have a config.json file that includes a quantization_config section.
## Read quantized checkpoint
......
......@@ -18,7 +18,7 @@ for more details on this and other advanced features.
## Installation
You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?sort=trending&search=gptq).
You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?search=gptq).
```console
pip install -U gptqmodel --no-build-isolation -v
......
......@@ -30,5 +30,4 @@ tokenizer.push_to_hub(hub_repo)
quantized_model.push_to_hub(hub_repo, safe_serialization=False)
```
Alternatively, you can use the TorchAO Quantization space for quantizing models with a simple UI.
See: https://huggingface.co/spaces/medmekk/TorchAO_Quantization
Alternatively, you can use the [TorchAO Quantization space](https://huggingface.co/spaces/medmekk/TorchAO_Quantization) for quantizing models with a simple UI.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment