Unverified Commit f7b3ccfc authored by Adit Ranadive's avatar Adit Ranadive Committed by GitHub
Browse files

feat: Update container with better EFA/RDMA support (#1333)



Need to reinstall the rdma-core and libibverbs to use RDMA devices.
Also, docker container can be built using a recent version of UCX for
EFA support.
Signed-off-by: default avatarAdit Ranadive <aranadive@nvidia.com>
parent 1e628d5a
......@@ -71,6 +71,9 @@ Notes about builds for specific frameworks:
- For specific details on the `--framework vllm` build, see [here](examples/llm/README.md).
- For specific details on the `--framework tensorrtllm` build, see [here](examples/tensorrt_llm/README.md).
Note about AWS environments:
- If deploying Dynamo in AWS, make sure to build the container with EFA support using the `--make-efa` flag.
After building, you can use this image by setting the `DYNAMO_IMAGE` environment variable to point to your built image:
```bash
export DYNAMO_IMAGE=<your-registry>/dynamo-base:latest-vllm
......
......@@ -72,15 +72,25 @@ RUN apt-get update -y && \
autoconf \
libtool
# These headers are missing with the hpcx installer, required
# by UCX to find RDMA devices
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
--reinstall libibverbs-dev rdma-core ibverbs-utils libibumad-dev \
libnuma-dev librdmacm-dev ibverbs-providers
ARG NIXL_UCX_REF=v1.19.x
WORKDIR /workspace
### UCX EFA Setup ###
RUN rm -rf /opt/hpcx/ucx
RUN rm -rf /usr/local/ucx
RUN echo "Building UCX with reference $NIXL_UCX_REF"
RUN cd /usr/local/src && \
git clone https://github.com/openucx/ucx.git && \
cd ucx && \
git checkout v1.19.x && \
git checkout $NIXL_UCX_REF && \
./autogen.sh && ./configure \
--prefix=/usr/local/ucx \
--enable-shared \
......@@ -114,14 +124,14 @@ COPY --from=nixl_base /opt/nixl/commit.txt /opt/nixl/commit.txt
RUN if [ "$ARCH" = "arm64" ]; then \
cd /opt/nixl && \
mkdir build && \
meson setup build/ --prefix=/usr/local/nixl -Dgds_path=/usr/local/cuda/targets/sbsa-linux && \
meson setup build/ --buildtype=release --prefix=/usr/local/nixl -Dgds_path=/usr/local/cuda/targets/sbsa-linux && \
cd build/ && \
ninja && \
ninja install; \
else \
cd /opt/nixl && \
mkdir build && \
meson setup build/ --prefix=/usr/local/nixl && \
meson setup build/ --buildtype=release --prefix=/usr/local/nixl && \
cd build/ && \
ninja && \
ninja install; \
......
......@@ -112,6 +112,8 @@ SGLANG_BASE_IMAGE_TAG="25.01-cuda12.8-devel-ubuntu24.04"
NIXL_COMMIT=f531404be4866d85ed618b3baf4008c636798d63
NIXL_REPO=ai-dynamo/nixl.git
NIXL_UCX_EFA_REF=7ec95b95e524a87e81cac92f5ca8523e3966b16b
NO_CACHE=""
get_options() {
......@@ -247,6 +249,9 @@ get_options() {
--release-build)
RELEASE_BUILD=true
;;
--make-efa)
NIXL_UCX_REF=$NIXL_UCX_EFA_REF
;;
--)
shift
break
......@@ -345,6 +350,8 @@ show_help() {
echo " [--no-cache disable docker build cache]"
echo " [--dry-run print docker commands without running]"
echo " [--build-context name=path to add build context]"
echo " [--release-build perform a release build]"
echo " [--make-efa Enables EFA support for NIXL]"
exit 0
}
......@@ -500,6 +507,10 @@ if [ ! -z ${RELEASE_BUILD} ]; then
BUILD_ARGS+=" --build-arg RELEASE_BUILD=${RELEASE_BUILD} "
fi
if [ -n "${NIXL_UCX_REF}" ]; then
BUILD_ARGS+=" --build-arg NIXL_UCX_REF=${NIXL_UCX_REF} "
fi
LATEST_TAG="--tag dynamo:latest-${FRAMEWORK,,}"
if [ -n "${TARGET}" ]; then
LATEST_TAG="${LATEST_TAG}-${TARGET}"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment