Unverified Commit 94c7f2c5 authored by Miles Cranmer's avatar Miles Cranmer Committed by GitHub
Browse files

Fix `max_memory` example on README (#944)

* Fix `max_memory` example on README

- The new `max_memory` syntax expects a dictionary
- This change also accounts for multiple devices

* Fix model name in `from_pretrained` on README
parent f1c75741
...@@ -41,7 +41,11 @@ model = AutoModelForCausalLM.from_pretrained( ...@@ -41,7 +41,11 @@ model = AutoModelForCausalLM.from_pretrained(
'decapoda-research/llama-7b-hf', 'decapoda-research/llama-7b-hf',
device_map='auto', device_map='auto',
load_in_8bit=True, load_in_8bit=True,
max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB') max_memory={
i: f'{int(torch.cuda.mem_get_info(i)[0]/1024**3)-2}GB'
for i in range(torch.cuda.device_count())
}
)
``` ```
A more detailed example, can be found in [examples/int8_inference_huggingface.py](examples/int8_inference_huggingface.py). A more detailed example, can be found in [examples/int8_inference_huggingface.py](examples/int8_inference_huggingface.py).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment