Commit 0f9cb486 authored by chenpangpang's avatar chenpangpang
Browse files

feat: merge dev

parents 66d1a57a 235feb09
......@@ -5,7 +5,7 @@
1. 准备一台裸机器,安装[nvidia-docker2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)、git
2. 下载镜像验证中需要的代码和模型(或从陈宜航处拷贝),放在项目根目录下
1. 下载代码:`git clone http://developer.hpccube.com/codes/chenpangpang/gpu-base-image-test.git`
2. 下载模型: `cd gpu-base-image-test && python hf_down.py`
2. 下载模型(pytorch): `cd gpu-base-image-test/pytorch && python hf_down.py`
3. 确认要构建的镜像
- 镜像制作进度:https://bvjoh3z2qoz.feishu.cn/base/BKl6birVbarmzJsnznkcEDFTnV9?table=tbl3bCdS7qfjPn6j&view=vewww0URg8
## 镜像构建
......@@ -20,23 +20,39 @@
- 参数2: 输出镜像名
- 参数3: 基础镜像
- 基于[nvidia官方镜像](https://hub.docker.com/r/nvidia/cuda)构建镜像
```bash
cd build_space && \
./build_ubuntu.sh jupyterlab \
juypterlab-pytorch:2.3.1-py3.8-cuda12.1-ubuntu22.04-devel \
nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04 \
TORCH_VERSION="2.3.1" \
TORCHVISION_VERSION="0.18.1" \
TORCHAUDIO_VERSION="2.3.1" \
CONDA_URL="https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py38_22.11.1-1-Linux-x86_64.sh"
```
- 参数1: ide,不需要改动
- 参数2: 输出镜像名
- 参数3: 基础镜像
- TORCH_VERSION:torch版本
- TORCHVISION_VERSION:torchvision版本
- TORCHAUDIO_VERSION:torchaudio版本
- CONDA_URL:安装conda的url
- pytorch
```bash
cd build_space && \
./build_ubuntu.sh jupyterlab \
juypterlab-pytorch:2.3.1-py3.8-cuda12.1-ubuntu22.04-devel \
nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04 \
TORCH_VERSION="2.3.1" \
TORCHVISION_VERSION="0.18.1" \
TORCHAUDIO_VERSION="2.3.1" \
CONDA_URL="https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py38_22.11.1-1-Linux-x86_64.sh"
```
- 参数1: ide,不需要改动
- 参数2: 输出镜像名
- 参数3: 基础镜像
- TORCH_VERSION:torch版本
- TORCHVISION_VERSION:torchvision版本
- TORCHAUDIO_VERSION:torchaudio版本
- CONDA_URL:安装conda的url
- tensorflow
```bash
cd build_space && \
./build_ubuntu.sh jupyterlab \
jupyterlab-tensorflow:2.17.0-py3.11-cuda12.3-ubuntu22.04-devel \
nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04 \
TENSORFLOW_VERSION="2.17.0" \
CONDA_URL="https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py311_24.7.1-0-Linux-x86_64.sh"
```
- 参数1: ide,不需要改动
- 参数2: 输出镜像名
- 参数3: 基础镜像
- TENSORFLOW_VERSION:tensorflow版本
- CONDA_URL:安装conda的url
### 相关链接
- pytorch镜像(**选择devel镜像**):https://hub.docker.com/r/pytorch/pytorch/tags
......@@ -73,7 +89,7 @@ torchvision version: 0.18.1
torchaudio version: 2.3.1
```
确认`输出的版本信息`和`镜像名称`是否匹配,确认`torch cuda`是否可用。<br>
2. 文本生成验证:运行:`sh script/2_text_generate_test.sh $IMAGE_NAME`,输出:
2. 文本生成验证:运行:`sh script/2_text_test.sh $IMAGE_NAME`,输出:
```
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Hello, I'm a language model, to be honest." (Hooker)
......@@ -81,7 +97,7 @@ torchaudio version: 2.3.1
"Let's start an internal test now, and then
```
确认`输出信息`是否符合预期。<br>
3. 图像生成验证:运行`sh script/3_image_generate_test.sh $IMAGE_NAME`,输出:
3. 图像生成验证:运行`sh script/3_image_test.sh $IMAGE_NAME`,输出:
```
==========
......
......@@ -4,11 +4,17 @@ FROM $BASE_IMAGE
ARG BASE_IMAGE
ARG DEBIAN_FRONTEND=noninteractive
LABEL module="jupyter"
# ----- torch args -----
# 是否基于torch镜像构建
ARG BASE_IMAGE_IS_TORCH=0
ARG TORCH_VERSION="2.0.1"
ARG TORCHVISION_VERSION="0.15.2"
ARG TORCHAUDIO_VERSION="2.0.2"
ARG TORCH_VERSION
ARG TORCHVISION_VERSION
ARG TORCHAUDIO_VERSION
# ----- tensorflow args -----
ARG TENSORFLOW_VERSION
ARG CONDA_URL="https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py310_24.7.1-0-Linux-x86_64.sh"
ARG SOURCES="-i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn"
ENV TZ=Asia/Shanghai
......@@ -54,15 +60,22 @@ RUN pip3 install --upgrade pip ${SOURCES} || pip install --upgrade pip ${SOURCES
&& mv /etc/apt/sources.list.bak /etc/apt/sources.list \
&& mv /etc/apt/sources.list.d.bak /etc/apt/sources.list.d
# 安装pytorch 需要设置代理
#ENV http_proxy=http://ac19pn3az3:M36tPjtQ@10.21.131.1:3128/
#ENV https_proxy=http://ac19pn3az3:M36tPjtQ@10.21.131.1:3128/
RUN if [ $BASE_IMAGE_IS_TORCH -eq 0 ];then \
RUN if [ $BASE_IMAGE_IS_TORCH -eq 0 && -n "$TORCH_VERSION" ];then \
pip3 install torch==$TORCH_VERSION torchvision==$TORCHVISION_VERSION torchaudio==$TORCHAUDIO_VERSION \
--index-url https://download.pytorch.org/whl/cu$(echo "$BASE_IMAGE" | awk -F'[:-]' '{n=split($2,a,"."); print a[1] a[2]}') \
&& rm -r /root/.cache/pip; fi
RUN if [ -n "$TORCH_VERSION" ];then \
pip install --no-cache-dir transformers accelerate diffusers; fi
RUN if [ -n "$TENSORFLOW_VERSION" ]; then \
tf_version_minor=$(echo $TENSORFLOW_VERSION | cut -d'.' -f1-2 ) && \
pip install --no-cache-dir tensorflow[and-cuda]==$TENSORFLOW_VERSION \
tensorflow-text==$tf_version_minor.* tf-models-official==$tf_version_minor.* && \
apt-get update -y && \
apt-get install --no-install-recommends -y libnvinfer8 libnvjitlink-12-3 libnvjpeg-12-3 libnvinfer-plugin8; fi
COPY ./python-requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/python-requirements.txt
......
......@@ -2,9 +2,4 @@ setuptools
ipywidgets
wheel
matplotlib
transformers
git-lfs
accelerate
diffusers
datasets
hf_transfer
git-lfs
\ No newline at end of file
#!/bin/bash
docker run --rm --platform=linux/amd64 --gpus all $1 python -c \
# 检查是否提供了输入参数
if [ -z "$1" ]; then
echo "please set input image"
exit 1
fi
# 检查第一个输入参数中是否包含"pytorch"字符串
if [[ "$1" == *"pytorch"* ]]; then
docker run --rm --platform=linux/amd64 --gpus all $1 python -c \
"import os; \
os.system(\"cat /etc/issue\"); \
import sys; \
......@@ -14,4 +22,23 @@ docker run --rm --platform=linux/amd64 --gpus all $1 python -c \
print(\"torchvision version: \", torchvision.__version__); \
import torchaudio; \
print(\"torchaudio version: \", torchaudio.__version__);
"
\ No newline at end of file
"
elif [[ "$1" == *"tensorflow"* ]]; then
docker run --rm --platform=linux/amd64 --gpus all $1 python -c \
"import os; \
os.system(\"cat /etc/issue\"); \
import sys; \
print(\"python version: \", sys.version); \
import tensorflow as tf; \
print(\"tensorflow version: \", tf.__version__); \
print(\"tensorflow cuda available: \", tf.test.is_gpu_available()); \
os.system('nvcc -V | tail -n 2')
"
else
echo "ERROR: no supported test shell"
exit 1
fi
#!/bin/bash
TARGET_DIR=gpu-base-image-test
docker run --rm --platform=linux/amd64 --gpus all -v ./$TARGET_DIR:/workspace --workdir /workspace/gpt2 $1 python infer.py
\ No newline at end of file
#!/bin/bash
TARGET_DIR=gpu-base-image-test
# 检查是否提供了输入参数
if [ -z "$1" ]; then
echo "please set input image"
exit 1
fi
if [[ "$1" == *"pytorch"* ]]; then \
docker run --rm --platform=linux/amd64 --gpus all -v ./$TARGET_DIR:/workspace --workdir /workspace/pytorch/gpt2 $1 python infer.py; fi
if [[ "$1" == *"tensorflow"* ]]; then \
docker run --rm --platform=linux/amd64 --gpus all -v ./$TARGET_DIR:/workspace --workdir /workspace/tensorflow/bert $1 python infer.py; fi
\ No newline at end of file
#!/bin/bash
TARGET_DIR=gpu-base-image-test
docker run --rm --platform=linux/amd64 --gpus all -v ./$TARGET_DIR:/workspace --workdir /workspace/stable-diffusion-v1-4 $1 python infer.py
\ No newline at end of file
#!/bin/bash
TARGET_DIR=gpu-base-image-test
# 检查是否提供了输入参数
if [ -z "$1" ]; then
echo "please set input image"
exit 1
fi
if [[ "$1" == *"pytorch"* ]]; then \
docker run --rm --platform=linux/amd64 --gpus all -v ./$TARGET_DIR:/workspace --workdir /workspace/pytorch/stable-diffusion-v1-4 $1 python infer.py; fi
if [[ "$1" == *"tensorflow"* ]]; then \
docker run --rm --platform=linux/amd64 --gpus all -v ./$TARGET_DIR:/workspace --workdir /workspace/tensorflow/mnist $1 python train.py; fi
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment