Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
d1b4f10b
Commit
d1b4f10b
authored
Mar 27, 2026
by
Michael Goin
Committed by
khluu
Mar 27, 2026
Browse files
cherry-pick [CI Bugfix] Pre-download missing FlashInfer headers in Docker build
Signed-off-by:
khluu
<
khluu000@gmail.com
>
#38391
parent
9fdc0f3a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
0 deletions
+19
-0
docker/Dockerfile
docker/Dockerfile
+19
-0
No files found.
docker/Dockerfile
View file @
d1b4f10b
...
@@ -593,6 +593,25 @@ RUN --mount=type=cache,target=/root/.cache/uv \
...
@@ -593,6 +593,25 @@ RUN --mount=type=cache,target=/root/.cache/uv \
--extra-index-url
https://flashinfer.ai/whl/cu
$(
echo
$CUDA_VERSION
|
cut
-d
.
-f1
,2 |
tr
-d
'.'
)
\
--extra-index-url
https://flashinfer.ai/whl/cu
$(
echo
$CUDA_VERSION
|
cut
-d
.
-f1
,2 |
tr
-d
'.'
)
\
&&
flashinfer show-config
&&
flashinfer show-config
# Pre-download FlashInfer TRTLLM BMM headers for air-gapped environments.
# At runtime, MoE JIT compilation downloads these from edge.urm.nvidia.com
# which fails without internet. This step caches them at build time.
RUN
python3
<<
'
PYEOF
'
from
flashinfer.jit import env as jit_env
from
flashinfer.jit.cubin_loader import download_trtllm_headers, get_cubin
from
flashinfer.artifacts import ArtifactPath, CheckSumHash
download_trtllm_headers(
'bmm',
jit_env.FLASHINFER_CUBIN_DIR / 'flashinfer' / 'trtllm' / 'batched_gemm' / 'trtllmGen_bmm_export',
f'{ArtifactPath.TRTLLM_GEN_BMM}/include/trtllmGen_bmm_export',
ArtifactPath.TRTLLM_GEN_BMM,
get_cubin(f'{ArtifactPath.TRTLLM_GEN_BMM}/checksums.txt', CheckSumHash.TRTLLM_GEN_BMM),
)
print('FlashInfer TRTLLM BMM headers downloaded successfully')
PYEOF
# ============================================================
# ============================================================
# OPENAI API SERVER DEPENDENCIES
# OPENAI API SERVER DEPENDENCIES
# Pre-install these to avoid reinstalling on every vLLM wheel rebuild
# Pre-install these to avoid reinstalling on every vLLM wheel rebuild
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment