Unverified Commit bc7c4d20 authored by Aleksandr Malyshev's avatar Aleksandr Malyshev Committed by GitHub
Browse files

[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305)


Signed-off-by: default avatarSage Moore <sage@neuralmagic.com>
Signed-off-by: default avatarroot <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: default avatarAleksandr Malyshev <maleksan@amd.com>
Signed-off-by: default avatarroot <root@banff-cyxtera-s65-4.amd.com>
Signed-off-by: default avatarmaleksan85 <maleksan@amd.com>
Signed-off-by: <>
Co-authored-by: default avatarSage Moore <sage@neuralmagic.com>
Co-authored-by: default avatarroot <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: default avatarAleksandr Malyshev <maleksan@amd.com>
Co-authored-by: default avatarqli88 <qiang.li2@amd.com>
Co-authored-by: default avatarroot <root@banff-cyxtera-s65-4.amd.com>
parent f67e9e9f
......@@ -195,15 +195,15 @@ def test_lookahead_greedy_equality_with_preemption(baseline_llm_generator,
])
@pytest.mark.parametrize("per_test_common_llm_kwargs",
[{
"block_size": 8,
"block_size": 16,
"max_num_batched_tokens": 2,
"max_num_seqs": 2,
}, {
"block_size": 8,
"block_size": 16,
"max_num_batched_tokens": 3,
"max_num_seqs": 2,
}, {
"block_size": 8,
"block_size": 16,
"max_num_batched_tokens": 256,
"max_num_seqs": 10,
}])
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment