Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
50df09fe
Unverified
Commit
50df09fe
authored
Aug 20, 2025
by
Michael Goin
Committed by
GitHub
Aug 20, 2025
Browse files
Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
68fcd3fa
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
35 additions
and
21 deletions
+35
-21
.buildkite/release-pipeline.yaml
.buildkite/release-pipeline.yaml
+1
-1
docker/Dockerfile
docker/Dockerfile
+33
-19
setup.py
setup.py
+1
-1
No files found.
.buildkite/release-pipeline.yaml
View file @
50df09fe
...
@@ -68,7 +68,7 @@ steps:
...
@@ -68,7 +68,7 @@ steps:
queue
:
cpu_queue_postmerge
queue
:
cpu_queue_postmerge
commands
:
commands
:
-
"
aws
ecr-public
get-login-password
--region
us-east-1
|
docker
login
--username
AWS
--password-stdin
public.ecr.aws/q9t5s3a7"
-
"
aws
ecr-public
get-login-password
--region
us-east-1
|
docker
login
--username
AWS
--password-stdin
public.ecr.aws/q9t5s3a7"
-
"
DOCKER_BUILDKIT=1
docker
build
--build-arg
max_jobs=16
--build-arg
USE_SCCACHE=1
--build-arg
GIT_REPO_CHECK=1
--build-arg
CUDA_VERSION=12.8.1
--build-arg
INSTALL_KV_CONNECTORS=true
--tag
public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT
--target
vllm-openai
--progress
plain
-f
docker/Dockerfile
."
-
"
DOCKER_BUILDKIT=1
docker
build
--build-arg
max_jobs=16
--build-arg
USE_SCCACHE=1
--build-arg
GIT_REPO_CHECK=1
--build-arg
CUDA_VERSION=12.8.1
--build-arg
FLASHINFER_AOT_COMPILE=true
--build-arg
INSTALL_KV_CONNECTORS=true
--tag
public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT
--target
vllm-openai
--progress
plain
-f
docker/Dockerfile
."
-
"
docker
push
public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT"
-
"
docker
push
public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT"
-
label
:
"
Annotate
release
workflow"
-
label
:
"
Annotate
release
workflow"
...
...
docker/Dockerfile
View file @
50df09fe
...
@@ -372,31 +372,45 @@ RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist
...
@@ -372,31 +372,45 @@ RUN --mount=type=bind,from=build,src=/workspace/dist,target=/vllm-workspace/dist
# Install FlashInfer from source
# Install FlashInfer from source
ARG
FLASHINFER_GIT_REPO="https://github.com/flashinfer-ai/flashinfer.git"
ARG
FLASHINFER_GIT_REPO="https://github.com/flashinfer-ai/flashinfer.git"
# Keep this in sync with https://github.com/vllm-project/vllm/blob/main/requirements/cuda.txt
# Keep this in sync with "flashinfer" extra in setup.py
# We use `--force-reinstall --no-deps` to avoid issues with the existing FlashInfer wheel.
ARG
FLASHINFER_GIT_REF="v0.2.12"
ARG
FLASHINFER_GIT_REF="v0.2.11"
# Flag to control whether to compile FlashInfer AOT kernels
# Set to "true" to enable AOT compilation:
# docker build --build-arg FLASHINFER_AOT_COMPILE=true ...
ARG
FLASHINFER_AOT_COMPILE=false
RUN
--mount
=
type
=
cache,target
=
/root/.cache/uv bash -
<<
'
BASH
'
RUN
--mount
=
type
=
cache,target
=
/root/.cache/uv bash -
<<
'
BASH
'
. /etc/environment
. /etc/environment
git clone --depth 1 --recursive --shallow-submodules \
git clone --depth 1 --recursive --shallow-submodules \
--branch ${FLASHINFER_GIT_REF} \
--branch ${FLASHINFER_GIT_REF} \
${FLASHINFER_GIT_REPO} flashinfer
${FLASHINFER_GIT_REPO} flashinfer
# Exclude CUDA arches for older versions (11.x and 12.0-12.7)
# TODO: Update this to allow setting TORCH_CUDA_ARCH_LIST as a build arg.
if [[ "${CUDA_VERSION}" == 11.* ]]; then
FI_TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9"
elif [[ "${CUDA_VERSION}" == 12.[0-7]* ]]; then
FI_TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a"
else
# CUDA 12.8+ supports 10.0a and 12.0
FI_TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a 10.0a 12.0"
fi
echo "🏗️ Building FlashInfer for arches: ${FI_TORCH_CUDA_ARCH_LIST}"
# Needed to build AOT kernels
pushd flashinfer
pushd flashinfer
TORCH_CUDA_ARCH_LIST="${FI_TORCH_CUDA_ARCH_LIST}" \
if [ "${FLASHINFER_AOT_COMPILE}" = "true" ]; then
python3 -m flashinfer.aot
# Exclude CUDA arches for older versions (11.x and 12.0-12.7)
TORCH_CUDA_ARCH_LIST="${FI_TORCH_CUDA_ARCH_LIST}" \
# TODO: Update this to allow setting TORCH_CUDA_ARCH_LIST as a build arg.
uv pip install --system --no-build-isolation --force-reinstall --no-deps .
if [[ "${CUDA_VERSION}" == 11.* ]]; then
FI_TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9"
elif [[ "${CUDA_VERSION}" == 12.[0-7]* ]]; then
FI_TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a"
else
# CUDA 12.8+ supports 10.0a and 12.0
FI_TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a 10.0a 12.0"
fi
echo "🏗️ Installing FlashInfer with AOT compilation for arches: ${FI_TORCH_CUDA_ARCH_LIST}"
# Build AOT kernels
TORCH_CUDA_ARCH_LIST="${FI_TORCH_CUDA_ARCH_LIST}" \
python3 -m flashinfer.aot
# Install with no-build-isolation since we already built AOT kernels
TORCH_CUDA_ARCH_LIST="${FI_TORCH_CUDA_ARCH_LIST}" \
uv pip install --system --no-build-isolation . \
--extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.')
# Download pre-compiled cubins
TORCH_CUDA_ARCH_LIST="${FI_TORCH_CUDA_ARCH_LIST}" \
python3 -m flashinfer --download-cubin || echo "WARNING: Failed to download flashinfer cubins."
else
echo "🏗️ Installing FlashInfer without AOT compilation in JIT mode"
uv pip install --system . \
--extra-index-url ${PYTORCH_CUDA_INDEX_BASE_URL}/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.')
fi
popd
popd
rm -rf flashinfer
rm -rf flashinfer
BASH
BASH
...
...
setup.py
View file @
50df09fe
...
@@ -685,7 +685,7 @@ setup(
...
@@ -685,7 +685,7 @@ setup(
"mistral_common[audio]"
],
# Required for audio processing
"mistral_common[audio]"
],
# Required for audio processing
"video"
:
[],
# Kept for backwards compatibility
"video"
:
[],
# Kept for backwards compatibility
# FlashInfer should be updated together with the Dockerfile
# FlashInfer should be updated together with the Dockerfile
"flashinfer"
:
[
"flashinfer-python==0.2.1
1
"
],
"flashinfer"
:
[
"flashinfer-python==0.2.1
2
"
],
},
},
cmdclass
=
cmdclass
,
cmdclass
=
cmdclass
,
package_data
=
package_data
,
package_data
=
package_data
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment