"src/git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "81fa2d688d6a04e834ecf9b12606f95d4308c3ea"
Unverified Commit b389f339 authored by Dhruv Nair's avatar Dhruv Nair Committed by GitHub
Browse files

Fix Doc links in GGUF and Quantization overview docs (#10279)



* update

* Update docs/source/en/quantization/gguf.md
Co-authored-by: default avatarAryan <aryan@huggingface.co>

---------
Co-authored-by: default avatarAryan <aryan@huggingface.co>
parent e222246b
...@@ -25,9 +25,9 @@ pip install -U gguf ...@@ -25,9 +25,9 @@ pip install -U gguf
Since GGUF is a single file format, use [`~FromSingleFileMixin.from_single_file`] to load the model and pass in the [`GGUFQuantizationConfig`]. Since GGUF is a single file format, use [`~FromSingleFileMixin.from_single_file`] to load the model and pass in the [`GGUFQuantizationConfig`].
When using GGUF checkpoints, the quantized weights remain in a low memory `dtype`(typically `torch.unint8`) and are dynamically dequantized and cast to the configured `compute_dtype` during each module's forward pass through the model. The `GGUFQuantizationConfig` allows you to set the `compute_dtype`. When using GGUF checkpoints, the quantized weights remain in a low memory `dtype`(typically `torch.uint8`) and are dynamically dequantized and cast to the configured `compute_dtype` during each module's forward pass through the model. The `GGUFQuantizationConfig` allows you to set the `compute_dtype`.
The functions used for dynamic dequantizatation are based on the great work done by [city96](https://github.com/city96/ComfyUI-GGUF), who created the Pytorch ports of the original (`numpy`)[https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py] implementation by [compilade](https://github.com/compilade). The functions used for dynamic dequantizatation are based on the great work done by [city96](https://github.com/city96/ComfyUI-GGUF), who created the Pytorch ports of the original [`numpy`](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py) implementation by [compilade](https://github.com/compilade).
```python ```python
import torch import torch
......
...@@ -33,8 +33,8 @@ If you are new to the quantization field, we recommend you to check out these be ...@@ -33,8 +33,8 @@ If you are new to the quantization field, we recommend you to check out these be
## When to use what? ## When to use what?
Diffusers currently supports the following quantization methods. Diffusers currently supports the following quantization methods.
- [BitsandBytes]() - [BitsandBytes](./bitsandbytes.md)
- [TorchAO]() - [TorchAO](./torchao.md)
- [GGUF]() - [GGUF](./gguf.md)
[This resource](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what) provides a good overview of the pros and cons of different quantization techniques. [This resource](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what) provides a good overview of the pros and cons of different quantization techniques.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment