Unverified Commit 4f1b31c2 authored by Joao Gante's avatar Joao Gante Committed by GitHub
Browse files

Docs: 4 bit doc corrections (#24572)

4 bit doc corrections
parent 1fd52e6e
...@@ -67,23 +67,23 @@ You can quickly run a FP4 model on a single GPU by running the following code: ...@@ -67,23 +67,23 @@ You can quickly run a FP4 model on a single GPU by running the following code:
from transformers import AutoModelForCausalLM from transformers import AutoModelForCausalLM
model_name = "bigscience/bloom-2b5" model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True) model_4bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True)
``` ```
Note that `device_map` is optional but setting `device_map = 'auto'` is prefered for inference as it will dispatch efficiently the model on the available ressources. Note that `device_map` is optional but setting `device_map = 'auto'` is prefered for inference as it will dispatch efficiently the model on the available ressources.
### Running FP4 models - multi GPU setup ### Running FP4 models - multi GPU setup
The way to load your mixed 8-bit model in multiple GPUs is as follows (same command as single GPU setup): The way to load your mixed 4-bit model in multiple GPUs is as follows (same command as single GPU setup):
```py ```py
model_name = "bigscience/bloom-2b5" model_name = "bigscience/bloom-2b5"
model_8bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True) model_4bit = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True)
``` ```
But you can control the GPU RAM you want to allocate on each GPU using `accelerate`. Use the `max_memory` argument as follows: But you can control the GPU RAM you want to allocate on each GPU using `accelerate`. Use the `max_memory` argument as follows:
```py ```py
max_memory_mapping = {0: "600MB", 1: "1GB"} max_memory_mapping = {0: "600MB", 1: "1GB"}
model_name = "bigscience/bloom-3b" model_name = "bigscience/bloom-3b"
model_8bit = AutoModelForCausalLM.from_pretrained( model_4bit = AutoModelForCausalLM.from_pretrained(
model_name, device_map="auto", load_in_4bit=True, max_memory=max_memory_mapping model_name, device_map="auto", load_in_4bit=True, max_memory=max_memory_mapping
) )
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment