"docs/source/vscode:/vscode.git/clone" did not exist on "7b95825d7dddb5896397c806119d1819325c25ff"
Unverified Commit ca974aff authored by Younes Belkada's avatar Younes Belkada Committed by GitHub
Browse files

[`Docs`] Clarify 4bit docs (#24878)



* clarify 4bit docs

* Apply suggestions from code review
Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>

---------
Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>
parent 2ab75add
......@@ -38,11 +38,21 @@ Make sure that you have installed the requirements below before running any of t
- Latest `bitsandbytes` library
`pip install bitsandbytes>=0.39.0`
- Install latest `accelerate` from source
`pip install git+https://github.com/huggingface/accelerate.git`
- Install latest `accelerate`
`pip install --upgrade accelerate`
- Install latest `transformers` from source
`pip install git+https://github.com/huggingface/transformers.git`
`pip install --upgrade transformers`
#### Tips and best practices
- **Advanced usage:** Refer to [this Google Colab notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf) for advanced usage of 4-bit quantization with all the possible options.
- **Faster inference with `batch_size=1` :** Since the `0.40.0` release of bitsandbytes, for `batch_size=1` you can benefit from fast inference. Check out [these release notes](https://github.com/TimDettmers/bitsandbytes/releases/tag/0.40.0) and make sure to have a version that is greater than `0.40.0` to benefit from this feature out of the box.
- **Training:** According to [QLoRA paper](https://arxiv.org/abs/2305.14314), for training 4-bit base models (e.g. using LoRA adapters) one should use `bnb_4bit_quant_type='nf4'`.
- **Inference:** For inference, `bnb_4bit_quant_type` does not have a huge impact on the performance. However for consistency with the model's weights, make sure you use the same `bnb_4bit_compute_dtype` and `torch_dtype` arguments.
#### Load a large model in 4bit
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment