Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
bitsandbytes
Commits
32f8c892
Commit
32f8c892
authored
Apr 12, 2023
by
Tim Dettmers
Browse files
Added missing example folder.
parent
7c651012
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
27 additions
and
0 deletions
+27
-0
examples/int8_inference_huggingface.py
examples/int8_inference_huggingface.py
+27
-0
No files found.
examples/int8_inference_huggingface.py
0 → 100644
View file @
32f8c892
import
torch
from
transformers
import
AutoModelForCausalLM
,
AutoTokenizer
MAX_NEW_TOKENS
=
128
model_name
=
'decapoda-research/llama-7b-hf'
text
=
'Hamburg is in which country?
\n
'
tokenizer
=
AutoTokenizer
.
from_pretrained
(
model_name
)
input_ids
=
tokenizer
(
text
,
return_tensors
=
"pt"
).
input_ids
free_in_GB
=
int
(
torch
.
cuda
.
mem_get_info
()[
0
]
/
1024
**
3
)
max_memory
=
f
'
{
int
(
torch
.
cuda
.
mem_get_info
()[
0
]
/
1024
**
3
)
-
2
}
GB'
n_gpus
=
torch
.
cuda
.
device_count
()
max_memory
=
{
i
:
max_memory
for
i
in
range
(
n_gpus
)}
model
=
AutoModelForCausalLM
.
from_pretrained
(
model_name
,
device_map
=
'auto'
,
load_in_8bit
=
True
,
max_memory
=
max_memory
)
generated_ids
=
model
.
generate
(
input_ids
,
max_length
=
MAX_NEW_TOKENS
)
print
(
tokenizer
.
decode
(
generated_ids
[
0
],
skip_special_tokens
=
True
))
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment