cherry-pick [CI Bugfix] Pre-download missing FlashInfer headers in Docker build

Signed-off-by: khluu <khluu000@gmail.com> #38391

cherry-pick [CI Bugfix] Pre-download missing FlashInfer headers in Docker build
Signed-off-by: khluu <khluu000@gmail.com> #38391
d1b4f10b · Michael Goin · khluu · 9fdc0f3a · d1b4f10b
Commit d1b4f10b authored Mar 27, 2026 by Michael Goin Committed by khluu Mar 27, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 19 additions and 0 deletions

docker/Dockerfile docker/Dockerfile +19 -0

No files found.
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -593,6 +593,25 @@ RUN --mount=type=cache,target=/root/.cache/uv \
        --extra-index-url https://flashinfer.ai/whl/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') \
    && flashinfer show-config
+# Pre-download FlashInfer TRTLLM BMM headers for air-gapped environments.
+# At runtime, MoE JIT compilation downloads these from edge.urm.nvidia.com
+# which fails without internet. This step caches them at build time.
+RUN python3 <<'PYEOF'
+from flashinfer.jit import env as jit_env
+from flashinfer.jit.cubin_loader import download_trtllm_headers, get_cubin
+from flashinfer.artifacts import ArtifactPath, CheckSumHash
+download_trtllm_headers(
+    'bmm',
+    jit_env.FLASHINFER_CUBIN_DIR / 'flashinfer' / 'trtllm' / 'batched_gemm' / 'trtllmGen_bmm_export',
+    f'{ArtifactPath.TRTLLM_GEN_BMM}/include/trtllmGen_bmm_export',
+    ArtifactPath.TRTLLM_GEN_BMM,
+    get_cubin(f'{ArtifactPath.TRTLLM_GEN_BMM}/checksums.txt', CheckSumHash.TRTLLM_GEN_BMM),
+)
+print('FlashInfer TRTLLM BMM headers downloaded successfully')
+PYEOF
 # ============================================================
 # OPENAI API SERVER DEPENDENCIES
 # Pre-install these to avoid reinstalling on every vLLM wheel rebuild