Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
3d28ad34
Unverified
Commit
3d28ad34
authored
May 23, 2025
by
Harry Mellor
Committed by
GitHub
May 23, 2025
Browse files
Fix figures in design doc (#18612)
Signed-off-by:
Harry Mellor
<
19981378+hmellor@users.noreply.github.com
>
parent
6a7988c5
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
26 deletions
+12
-26
docs/design/kernel/paged_attention.md
docs/design/kernel/paged_attention.md
+12
-26
No files found.
docs/design/kernel/paged_attention.md
View file @
3d28ad34
...
@@ -140,22 +140,18 @@ title: vLLM Paged Attention
...
@@ -140,22 +140,18 @@ title: vLLM Paged Attention
const
scalar_t
*
q_ptr
=
q
+
seq_idx
*
q_stride
+
head_idx
*
HEAD_SIZE
;
const
scalar_t
*
q_ptr
=
q
+
seq_idx
*
q_stride
+
head_idx
*
HEAD_SIZE
;
```
```
<figure
markdown=
"span"
>
<figure
markdown=
"span"
>
!
[](
../../assets/kernel/query.png
)
{ align="center" alt="query" width="70%" }
!
[](
../../assets/kernel/query.png
)
{ align="center" alt="query" width="70%" }
<figcaption>
</figure>
</figcaption>
</figure>
-
Each thread defines its own
`q_ptr`
which points to the assigned
-
Each thread defines its own
`q_ptr`
which points to the assigned
query token data on global memory. For example, if
`VEC_SIZE`
is 4
query token data on global memory. For example, if
`VEC_SIZE`
is 4
and
`HEAD_SIZE`
is 128, the
`q_ptr`
points to data that contains
and
`HEAD_SIZE`
is 128, the
`q_ptr`
points to data that contains
total of 128 elements divided into 128 / 4 = 32 vecs.
total of 128 elements divided into 128 / 4 = 32 vecs.
<figure
markdown=
"span"
>
<figure
markdown=
"span"
>
!
[](
../../assets/kernel/q_vecs.png
)
{ align="center" alt="q_vecs" width="70%" }
!
[](
../../assets/kernel/q_vecs.png
)
{ align="center" alt="q_vecs" width="70%" }
<figcaption>
</figure>
</figcaption>
</figure>
```
cpp
```
cpp
__shared__
Q_vec
q_vecs
[
THREAD_GROUP_SIZE
][
NUM_VECS_PER_THREAD
];
__shared__
Q_vec
q_vecs
[
THREAD_GROUP_SIZE
][
NUM_VECS_PER_THREAD
];
...
@@ -192,11 +188,9 @@ title: vLLM Paged Attention
...
@@ -192,11 +188,9 @@ title: vLLM Paged Attention
points to key token data based on
`k_cache`
at assigned block,
points to key token data based on
`k_cache`
at assigned block,
assigned head and assigned token.
assigned head and assigned token.
<figure
markdown=
"span"
>
<figure
markdown=
"span"
>
!
[](
../../assets/kernel/key.png
)
{ align="center" alt="key" width="70%" }
!
[](
../../assets/kernel/key.png
)
{ align="center" alt="key" width="70%" }
<figcaption>
</figure>
</figcaption>
</figure>
-
The diagram above illustrates the memory layout for key data. It
-
The diagram above illustrates the memory layout for key data. It
assumes that the
`BLOCK_SIZE`
is 16,
`HEAD_SIZE`
is 128,
`x`
is
assumes that the
`BLOCK_SIZE`
is 16,
`HEAD_SIZE`
is 128,
`x`
is
...
@@ -209,11 +203,9 @@ title: vLLM Paged Attention
...
@@ -209,11 +203,9 @@ title: vLLM Paged Attention
elements for one token) that will be processed by 2 threads (one
elements for one token) that will be processed by 2 threads (one
thread group) separately.
thread group) separately.
<figure
markdown=
"span"
>
<figure
markdown=
"span"
>
!
[](
../../assets/kernel/k_vecs.png
)
{ align="center" alt="k_vecs" width="70%" }
!
[](
../../assets/kernel/k_vecs.png
)
{ align="center" alt="k_vecs" width="70%" }
<figcaption>
</figure>
</figcaption>
</figure>
```
cpp
```
cpp
K_vec
k_vecs
[
NUM_VECS_PER_THREAD
]
K_vec
k_vecs
[
NUM_VECS_PER_THREAD
]
...
@@ -372,20 +364,14 @@ title: vLLM Paged Attention
...
@@ -372,20 +364,14 @@ title: vLLM Paged Attention
<figure
markdown=
"span"
>
<figure
markdown=
"span"
>
!
[](
../../assets/kernel/value.png
)
{ align="center" alt="value" width="70%" }
!
[](
../../assets/kernel/value.png
)
{ align="center" alt="value" width="70%" }
<figcaption>
</figcaption>
</figure>
</figure>
<figure
markdown=
"span"
>
<figure
markdown=
"span"
>
!
[](
../../assets/kernel/logits_vec.png
)
{ align="center" alt="logits_vec" width="50%" }
!
[](
../../assets/kernel/logits_vec.png
)
{ align="center" alt="logits_vec" width="50%" }
<figcaption>
</figcaption>
</figure>
</figure>
<figure
markdown=
"span"
>
<figure
markdown=
"span"
>
!
[](
../../assets/kernel/v_vec.png
)
{ align="center" alt="v_vec" width="70%" }
!
[](
../../assets/kernel/v_vec.png
)
{ align="center" alt="v_vec" width="70%" }
<figcaption>
</figcaption>
</figure>
</figure>
-
Now we need to retrieve the value data and perform dot multiplication
-
Now we need to retrieve the value data and perform dot multiplication
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment