Unverified Commit 3a8769f6 authored by Younes Belkada's avatar Younes Belkada Committed by GitHub
Browse files

[`Docs`] Add 4-bit serialization docs (#28182)

* add 4-bit serialization docs

* up

* up
parent 3657748b
...@@ -345,7 +345,7 @@ model_4bit = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", load_in_4 ...@@ -345,7 +345,7 @@ model_4bit = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", load_in_4
model_4bit.model.decoder.layers[-1].final_layer_norm.weight.dtype model_4bit.model.decoder.layers[-1].final_layer_norm.weight.dtype
``` ```
Once a model is quantized to 4-bit, you can't push the quantized weights to the Hub. If you have `bitsandbytes>=0.41.3`, you can serialize 4-bit models and push them on Hugging Face Hub. Simply call `model.push_to_hub()` after loading it in 4-bit precision. You can also save the serialized 4-bit models locally with `model.save_pretrained()` command.
</hfoption> </hfoption>
</hfoptions> </hfoptions>
...@@ -468,6 +468,7 @@ Try 4-bit quantization in this [notebook](https://colab.research.google.com/driv ...@@ -468,6 +468,7 @@ Try 4-bit quantization in this [notebook](https://colab.research.google.com/driv
This section explores some of the specific features of 4-bit models, such as changing the compute data type, using the Normal Float 4 (NF4) data type, and using nested quantization. This section explores some of the specific features of 4-bit models, such as changing the compute data type, using the Normal Float 4 (NF4) data type, and using nested quantization.
#### Compute data type #### Compute data type
To speedup computation, you can change the data type from float32 (the default value) to bf16 using the `bnb_4bit_compute_dtype` parameter in [`BitsAndBytesConfig`]: To speedup computation, you can change the data type from float32 (the default value) to bf16 using the `bnb_4bit_compute_dtype` parameter in [`BitsAndBytesConfig`]:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment