Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
0f0e0389
Unverified
Commit
0f0e0389
authored
Mar 24, 2026
by
Michael Goin
Committed by
GitHub
Mar 24, 2026
Browse files
[UX] Add flashinfer-cubin as CUDA default dep (#37233)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
4b53740d
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
4 deletions
+3
-4
docker/Dockerfile
docker/Dockerfile
+2
-4
requirements/cuda.txt
requirements/cuda.txt
+1
-0
No files found.
docker/Dockerfile
View file @
0f0e0389
...
...
@@ -587,14 +587,12 @@ RUN --mount=type=cache,target=/root/.cache/uv \
--extra-index-url
${
PYTORCH_CUDA_INDEX_BASE_URL
}
/cu
$(
echo
$CUDA_VERSION
|
cut
-d
.
-f1
,2 |
tr
-d
'.'
)
&&
\
rm
/tmp/requirements-cuda.txt /tmp/common.txt
# Install FlashInfer pre-compiled kernel cache and binaries
# This is ~1.1GB and only changes when FlashInfer version bumps
# Install FlashInfer JIT cache (requires CUDA-version-specific index URL)
# https://docs.flashinfer.ai/installation.html
# From versions.json: .flashinfer.version
ARG
FLASHINFER_VERSION=0.6.6
RUN
--mount
=
type
=
cache,target
=
/root/.cache/uv
\
uv pip
install
--system
flashinfer-cubin
==
${
FLASHINFER_VERSION
}
\
&&
uv pip
install
--system
flashinfer-jit-cache
==
${
FLASHINFER_VERSION
}
\
uv pip
install
--system
flashinfer-jit-cache
==
${
FLASHINFER_VERSION
}
\
--extra-index-url
https://flashinfer.ai/whl/cu
$(
echo
$CUDA_VERSION
|
cut
-d
.
-f1
,2 |
tr
-d
'.'
)
\
&&
flashinfer show-config
...
...
requirements/cuda.txt
View file @
0f0e0389
...
...
@@ -10,6 +10,7 @@ torchaudio==2.10.0
torchvision==0.25.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version
# FlashInfer should be updated together with the Dockerfile
flashinfer-python==0.6.6
flashinfer-cubin==0.6.6
# Cap nvidia-cudnn-frontend (transitive dep of flashinfer) due to
# breaking changes in 1.19.0
nvidia-cudnn-frontend>=1.13.0,<1.19.0
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment