Fix `max_memory` example on README (#944)

* Fix `max_memory` example on README - The new `max_memory` syntax expects a dictionary - This change also accounts for multiple devices * Fix model name in `from_pretrained` on README

Fix `max_memory` example on README (#944)
* Fix `max_memory` example on README - The new `max_memory` syntax expects a dictionary - This change also accounts for multiple devices * Fix model name in `from_pretrained` on README
94c7f2c5 · Miles Cranmer · GitHub · f1c75741 · 94c7f2c5
Unverified Commit 94c7f2c5 authored Jan 26, 2024 by Miles Cranmer Committed by GitHub Jan 25, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

README.md README.md +5 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -41,7 +41,11 @@ model = AutoModelForCausalLM.from_pretrained(
  'decapoda-research/llama-7b-hf',
  device_map='auto',
  load_in_8bit=True,
-  max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB')
+  max_memory={
+    i: f'{int(torch.cuda.mem_get_info(i)[0]/1024**3)-2}GB'
+    for i in range(torch.cuda.device_count())
+  }
+)
 ```

 A more detailed example, can be found in [examples/int8_inference_huggingface.py](examples/int8_inference_huggingface.py).