Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
ComfyUI
Commits
c837a173
Commit
c837a173
authored
Oct 30, 2023
by
comfyanonymous
Browse files
Fix some memory issues in sub quad attention.
parent
125b03ee
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
24 deletions
+11
-24
comfy/ldm/modules/attention.py
comfy/ldm/modules/attention.py
+11
-24
No files found.
comfy/ldm/modules/attention.py
View file @
c837a173
...
@@ -160,32 +160,19 @@ def attention_sub_quad(query, key, value, heads, mask=None):
...
@@ -160,32 +160,19 @@ def attention_sub_quad(query, key, value, heads, mask=None):
mem_free_total
,
mem_free_torch
=
model_management
.
get_free_memory
(
query
.
device
,
True
)
mem_free_total
,
mem_free_torch
=
model_management
.
get_free_memory
(
query
.
device
,
True
)
chunk_threshold_bytes
=
mem_free_torch
*
0.5
#Using only this seems to work better on AMD
kv_chunk_size_min
=
None
kv_chunk_size_min
=
None
kv_chunk_size
=
None
query_chunk_size
=
None
for
x
in
[
4096
,
2048
,
1024
,
512
,
256
]:
count
=
mem_free_total
/
(
batch_x_heads
*
bytes_per_token
*
x
*
4.0
)
if
count
>=
k_tokens
:
kv_chunk_size
=
k_tokens
query_chunk_size
=
x
break
#not sure at all about the math here
if
query_chunk_size
is
None
:
#TODO: tweak this
query_chunk_size
=
512
if
mem_free_total
>
8192
*
1024
*
1024
*
1.3
:
query_chunk_size_x
=
1024
*
4
elif
mem_free_total
>
4096
*
1024
*
1024
*
1.3
:
query_chunk_size_x
=
1024
*
2
else
:
query_chunk_size_x
=
1024
kv_chunk_size_min_x
=
None
kv_chunk_size_x
=
(
int
((
chunk_threshold_bytes
//
(
batch_x_heads
*
bytes_per_token
*
query_chunk_size_x
))
*
2.0
)
//
1024
)
*
1024
if
kv_chunk_size_x
<
1024
:
kv_chunk_size_x
=
None
if
chunk_threshold_bytes
is
not
None
and
qk_matmul_size_bytes
<=
chunk_threshold_bytes
:
# the big matmul fits into our memory limit; do everything in 1 chunk,
# i.e. send it down the unchunked fast-path
query_chunk_size
=
q_tokens
kv_chunk_size
=
k_tokens
else
:
query_chunk_size
=
query_chunk_size_x
kv_chunk_size
=
kv_chunk_size_x
kv_chunk_size_min
=
kv_chunk_size_min_x
hidden_states
=
efficient_dot_product_attention
(
hidden_states
=
efficient_dot_product_attention
(
query
,
query
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment