Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
899541bd
Unverified
Commit
899541bd
authored
Jan 12, 2026
by
XlKsyt
Committed by
GitHub
Jan 12, 2026
Browse files
[doc] fix broken links (#32158)
Signed-off-by:
minimAluminiumalism
<
caixuesen@outlook.com
>
parent
d7b2e570
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
21 deletions
+7
-21
docs/design/paged_attention.md
docs/design/paged_attention.md
+7
-21
No files found.
docs/design/paged_attention.md
View file @
899541bd
...
...
@@ -139,18 +139,14 @@ token data.
const
scalar_t
*
q_ptr
=
q
+
seq_idx
*
q_stride
+
head_idx
*
HEAD_SIZE
;
```
<p
align=
"center"
>
<img
src=
"../assets/design/paged_attention/query.png"
alt=
"query"
width=
"70%"
/>
</p>

Each thread defines its own
`q_ptr`
which points to the assigned
query token data on global memory. For example, if
`VEC_SIZE`
is 4
and
`HEAD_SIZE`
is 128, the
`q_ptr`
points to data that contains
total of 128 elements divided into 128 / 4 = 32 vecs.
<p
align=
"center"
>
<img
src=
"../assets/design/paged_attention/q_vecs.png"
alt=
"q_vecs"
width=
"70%"
/>
</p>

```
cpp
__shared__
Q_vec
q_vecs
[
THREAD_GROUP_SIZE
][
NUM_VECS_PER_THREAD
];
...
...
@@ -187,9 +183,7 @@ key token at different iterations. As shown above, that `k_ptr`
points to key token data based on
`k_cache`
at assigned block,
assigned head and assigned token.
<p
align=
"center"
>
<img
src=
"../assets/design/paged_attention/key.png"
alt=
"key"
width=
"70%"
/>
</p>

The diagram above illustrates the memory layout for key data. It
assumes that the
`BLOCK_SIZE`
is 16,
`HEAD_SIZE`
is 128,
`x`
is
...
...
@@ -202,9 +196,7 @@ iterations. Inside each rectangle, there are a total 32 vecs (128
elements for one token) that will be processed by 2 threads (one
thread group) separately.
<p
align=
"center"
>
<img
src=
"../assets/design/paged_attention/k_vecs.png"
alt=
"k_vecs"
width=
"70%"
/>
</p>

```
cpp
K_vec
k_vecs
[
NUM_VECS_PER_THREAD
]
...
...
@@ -361,17 +353,11 @@ later steps. Now, it should store the normalized softmax result of
## Value
<p
align=
"center"
>
<img
src=
"../assets/design/paged_attention/value.png"
alt=
"value"
width=
"70%"
/>
</p>

<p
align=
"center"
>
<img
src=
"../assets/design/paged_attention/logits_vec.png"
alt=
"logits_vec"
width=
"50%"
/>
</p>

<p
align=
"center"
>
<img
src=
"../assets/design/paged_attention/v_vec.png"
alt=
"v_vec"
width=
"70%"
/>
</p>

Now we need to retrieve the value data and perform dot multiplication
with
`logits`
. Unlike query and key, there is no thread group
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment