Unverified Commit d1695758 authored by Shanshan Shen's avatar Shanshan Shen Committed by GitHub
Browse files

[Doc][V1] Fix V1 APC doc (#14920)

parent 53a0cf8b
...@@ -191,7 +191,7 @@ When the head block (least recently used block) of the free queue is cached, we ...@@ -191,7 +191,7 @@ When the head block (least recently used block) of the free queue is cached, we
In this example, we assume the block size is 4 (each block can cache 4 tokens), and we have 10 blocks in the KV-cache manager in total. In this example, we assume the block size is 4 (each block can cache 4 tokens), and we have 10 blocks in the KV-cache manager in total.
**Time 1: The cache is empty and a new request comes in.** We allocate 4 blocks. 3 of them are already full and cached. The fourth block is partially full with 2 of 4 tokens. **Time 1: The cache is empty and a new request comes in.** We allocate 4 blocks. 3 of them are already full and cached. The fourth block is partially full with 3 of 4 tokens.
:::{image} /assets/design/v1/prefix_caching/example-time-1.png :::{image} /assets/design/v1/prefix_caching/example-time-1.png
:alt: Example Time 1 :alt: Example Time 1
...@@ -203,7 +203,7 @@ In this example, we assume the block size is 4 (each block can cache 4 tokens), ...@@ -203,7 +203,7 @@ In this example, we assume the block size is 4 (each block can cache 4 tokens),
:alt: Example Time 3 :alt: Example Time 3
::: :::
**Time 4: Request 1 comes in with the 14 prompt tokens, where the first 11 tokens are the same as request 0.** We can see that only 2 blocks (11 tokens) hit the cache, because the 3rd block only matches 3 of 4 tokens. **Time 4: Request 1 comes in with the 14 prompt tokens, where the first 10 tokens are the same as request 0.** We can see that only the first 2 blocks (8 tokens) hit the cache, because the 3rd block only matches 2 of 4 tokens.
:::{image} /assets/design/v1/prefix_caching/example-time-4.png :::{image} /assets/design/v1/prefix_caching/example-time-4.png
:alt: Example Time 4 :alt: Example Time 4
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment