Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
gpu-base-image-build
Commits
0f9cb486
Commit
0f9cb486
authored
Oct 12, 2024
by
chenpangpang
Browse files
feat: merge dev
parents
66d1a57a
235feb09
Changes
8
Hide whitespace changes
Inline
Side-by-side
Showing
8 changed files
with
112 additions
and
41 deletions
+112
-41
README.md
README.md
+36
-20
build_space/Dockerfile.jupyterlab_ubuntu
build_space/Dockerfile.jupyterlab_ubuntu
+20
-7
build_space/python-requirements.txt
build_space/python-requirements.txt
+1
-6
script/1_base_test.sh
script/1_base_test.sh
+29
-2
script/2_text_generate_test.sh
script/2_text_generate_test.sh
+0
-3
script/2_text_test.sh
script/2_text_test.sh
+13
-0
script/3_image_generate_test.sh
script/3_image_generate_test.sh
+0
-3
script/3_image_test.sh
script/3_image_test.sh
+13
-0
No files found.
README.md
View file @
0f9cb486
...
...
@@ -5,7 +5,7 @@
1.
准备一台裸机器,安装
[
nvidia-docker2
](
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
)
、git
2.
下载镜像验证中需要的代码和模型(或从陈宜航处拷贝),放在项目根目录下
1.
下载代码:
`git clone http://developer.hpccube.com/codes/chenpangpang/gpu-base-image-test.git`
2.
下载模型:
`cd gpu-base-image-test && python hf_down.py`
2.
下载模型
(pytorch)
:
`cd gpu-base-image-test
/pytorch
&& python hf_down.py`
3.
确认要构建的镜像
-
镜像制作进度:https://bvjoh3z2qoz.feishu.cn/base/BKl6birVbarmzJsnznkcEDFTnV9?table=tbl3bCdS7qfjPn6j&view=vewww0URg8
## 镜像构建
...
...
@@ -20,23 +20,39 @@
-
参数2: 输出镜像名
-
参数3: 基础镜像
-
基于
[
nvidia官方镜像
](
https://hub.docker.com/r/nvidia/cuda
)
构建镜像
```
bash
cd
build_space
&&
\
./build_ubuntu.sh jupyterlab
\
juypterlab-pytorch:2.3.1-py3.8-cuda12.1-ubuntu22.04-devel
\
nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
\
TORCH_VERSION
=
"2.3.1"
\
TORCHVISION_VERSION
=
"0.18.1"
\
TORCHAUDIO_VERSION
=
"2.3.1"
\
CONDA_URL
=
"https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py38_22.11.1-1-Linux-x86_64.sh"
```
-
参数1: ide,不需要改动
-
参数2: 输出镜像名
-
参数3: 基础镜像
-
TORCH_VERSION:torch版本
-
TORCHVISION_VERSION:torchvision版本
-
TORCHAUDIO_VERSION:torchaudio版本
-
CONDA_URL:安装conda的url
-
pytorch
```
bash
cd
build_space
&&
\
./build_ubuntu.sh jupyterlab
\
juypterlab-pytorch:2.3.1-py3.8-cuda12.1-ubuntu22.04-devel
\
nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
\
TORCH_VERSION
=
"2.3.1"
\
TORCHVISION_VERSION
=
"0.18.1"
\
TORCHAUDIO_VERSION
=
"2.3.1"
\
CONDA_URL
=
"https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py38_22.11.1-1-Linux-x86_64.sh"
```
- 参数1: ide,不需要改动
- 参数2: 输出镜像名
- 参数3: 基础镜像
- TORCH_VERSION:torch版本
- TORCHVISION_VERSION:torchvision版本
- TORCHAUDIO_VERSION:torchaudio版本
- CONDA_URL:安装conda的url
- tensorflow
```
bash
cd build_space &&
\
./build_ubuntu.sh jupyterlab
\
jupyterlab-tensorflow:2.17.0-py3.11-cuda12.3-ubuntu22.04-devel
\
nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04
\
TENSORFLOW_VERSION="2.17.0"
\
CONDA_URL="https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py311_24.7.1-0-Linux-x86_64.sh"
```
- 参数1: ide,不需要改动
- 参数2: 输出镜像名
- 参数3: 基础镜像
- TENSORFLOW_VERSION:tensorflow版本
- CONDA_URL:安装conda的url
### 相关链接
- pytorch镜像(**选择devel镜像**):https://hub.docker.com/r/pytorch/pytorch/tags
...
...
@@ -73,7 +89,7 @@ torchvision version: 0.18.1
torchaudio version: 2.3.1
```
确认`输出的版本信息`和`镜像名称`是否匹配,确认`torch cuda`是否可用。<br>
2.
文本生成验证:运行:
`sh script/2_text_
generate_
test.sh $IMAGE_NAME`
,输出:
2. 文本生成验证:运行:`sh script/2_text_test.sh $IMAGE_NAME`,输出:
```
Setting
`pad_token_id`
to
`eos_token_id`
:50256 for open-end generation.
Hello, I'm a language model, to be honest." (Hooker)
...
...
@@ -81,7 +97,7 @@ torchaudio version: 2.3.1
"Let's start an internal test now, and then
```
确认`输出信息`是否符合预期。<br>
3.
图像生成验证:运行
`sh script/3_image_
generate_
test.sh $IMAGE_NAME`
,输出:
3. 图像生成验证:运行`sh script/3_image_test.sh $IMAGE_NAME`,输出:
```
==========
...
...
build_space/Dockerfile.jupyterlab_ubuntu
View file @
0f9cb486
...
...
@@ -4,11 +4,17 @@ FROM $BASE_IMAGE
ARG BASE_IMAGE
ARG DEBIAN_FRONTEND=noninteractive
LABEL module="jupyter"
# ----- torch args -----
# 是否基于torch镜像构建
ARG BASE_IMAGE_IS_TORCH=0
ARG TORCH_VERSION="2.0.1"
ARG TORCHVISION_VERSION="0.15.2"
ARG TORCHAUDIO_VERSION="2.0.2"
ARG TORCH_VERSION
ARG TORCHVISION_VERSION
ARG TORCHAUDIO_VERSION
# ----- tensorflow args -----
ARG TENSORFLOW_VERSION
ARG CONDA_URL="https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py310_24.7.1-0-Linux-x86_64.sh"
ARG SOURCES="-i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn"
ENV TZ=Asia/Shanghai
...
...
@@ -54,15 +60,22 @@ RUN pip3 install --upgrade pip ${SOURCES} || pip install --upgrade pip ${SOURCES
&& mv /etc/apt/sources.list.bak /etc/apt/sources.list \
&& mv /etc/apt/sources.list.d.bak /etc/apt/sources.list.d
# 安装pytorch 需要设置代理
#ENV http_proxy=http://ac19pn3az3:M36tPjtQ@10.21.131.1:3128/
#ENV https_proxy=http://ac19pn3az3:M36tPjtQ@10.21.131.1:3128/
RUN if [ $BASE_IMAGE_IS_TORCH -eq 0 ];then \
RUN if [ $BASE_IMAGE_IS_TORCH -eq 0
&& -n "$TORCH_VERSION"
];then \
pip3 install torch==$TORCH_VERSION torchvision==$TORCHVISION_VERSION torchaudio==$TORCHAUDIO_VERSION \
--index-url https://download.pytorch.org/whl/cu$(echo "$BASE_IMAGE" | awk -F'[:-]' '{n=split($2,a,"."); print a[1] a[2]}') \
&& rm -r /root/.cache/pip; fi
RUN if [ -n "$TORCH_VERSION" ];then \
pip install --no-cache-dir transformers accelerate diffusers; fi
RUN if [ -n "$TENSORFLOW_VERSION" ]; then \
tf_version_minor=$(echo $TENSORFLOW_VERSION | cut -d'.' -f1-2 ) && \
pip install --no-cache-dir tensorflow[and-cuda]==$TENSORFLOW_VERSION \
tensorflow-text==$tf_version_minor.* tf-models-official==$tf_version_minor.* && \
apt-get update -y && \
apt-get install --no-install-recommends -y libnvinfer8 libnvjitlink-12-3 libnvjpeg-12-3 libnvinfer-plugin8; fi
COPY ./python-requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/python-requirements.txt
...
...
build_space/python-requirements.txt
View file @
0f9cb486
...
...
@@ -2,9 +2,4 @@ setuptools
ipywidgets
wheel
matplotlib
transformers
git-lfs
accelerate
diffusers
datasets
hf_transfer
git-lfs
\ No newline at end of file
script/1_base_test.sh
View file @
0f9cb486
#!/bin/bash
docker run
--rm
--platform
=
linux/amd64
--gpus
all
$1
python
-c
\
# 检查是否提供了输入参数
if
[
-z
"
$1
"
]
;
then
echo
"please set input image"
exit
1
fi
# 检查第一个输入参数中是否包含"pytorch"字符串
if
[[
"
$1
"
==
*
"pytorch"
*
]]
;
then
docker run
--rm
--platform
=
linux/amd64
--gpus
all
$1
python
-c
\
"import os;
\
os.system(
\"
cat /etc/issue
\"
);
\
import sys;
\
...
...
@@ -14,4 +22,23 @@ docker run --rm --platform=linux/amd64 --gpus all $1 python -c \
print(
\"
torchvision version:
\"
, torchvision.__version__);
\
import torchaudio;
\
print(
\"
torchaudio version:
\"
, torchaudio.__version__);
"
\ No newline at end of file
"
elif
[[
"
$1
"
==
*
"tensorflow"
*
]]
;
then
docker run
--rm
--platform
=
linux/amd64
--gpus
all
$1
python
-c
\
"import os;
\
os.system(
\"
cat /etc/issue
\"
);
\
import sys;
\
print(
\"
python version:
\"
, sys.version);
\
import tensorflow as tf;
\
print(
\"
tensorflow version:
\"
, tf.__version__);
\
print(
\"
tensorflow cuda available:
\"
, tf.test.is_gpu_available());
\
os.system('nvcc -V | tail -n 2')
"
else
echo
"ERROR: no supported test shell"
exit
1
fi
script/2_text_generate_test.sh
deleted
100644 → 0
View file @
66d1a57a
#!/bin/bash
TARGET_DIR
=
gpu-base-image-test
docker run
--rm
--platform
=
linux/amd64
--gpus
all
-v
./
$TARGET_DIR
:/workspace
--workdir
/workspace/gpt2
$1
python infer.py
\ No newline at end of file
script/2_text_test.sh
0 → 100644
View file @
0f9cb486
#!/bin/bash
TARGET_DIR
=
gpu-base-image-test
# 检查是否提供了输入参数
if
[
-z
"
$1
"
]
;
then
echo
"please set input image"
exit
1
fi
if
[[
"
$1
"
==
*
"pytorch"
*
]]
;
then
\
docker run
--rm
--platform
=
linux/amd64
--gpus
all
-v
./
$TARGET_DIR
:/workspace
--workdir
/workspace/pytorch/gpt2
$1
python infer.py
;
fi
if
[[
"
$1
"
==
*
"tensorflow"
*
]]
;
then
\
docker run
--rm
--platform
=
linux/amd64
--gpus
all
-v
./
$TARGET_DIR
:/workspace
--workdir
/workspace/tensorflow/bert
$1
python infer.py
;
fi
\ No newline at end of file
script/3_image_generate_test.sh
deleted
100644 → 0
View file @
66d1a57a
#!/bin/bash
TARGET_DIR
=
gpu-base-image-test
docker run
--rm
--platform
=
linux/amd64
--gpus
all
-v
./
$TARGET_DIR
:/workspace
--workdir
/workspace/stable-diffusion-v1-4
$1
python infer.py
\ No newline at end of file
script/3_image_test.sh
0 → 100644
View file @
0f9cb486
#!/bin/bash
TARGET_DIR
=
gpu-base-image-test
# 检查是否提供了输入参数
if
[
-z
"
$1
"
]
;
then
echo
"please set input image"
exit
1
fi
if
[[
"
$1
"
==
*
"pytorch"
*
]]
;
then
\
docker run
--rm
--platform
=
linux/amd64
--gpus
all
-v
./
$TARGET_DIR
:/workspace
--workdir
/workspace/pytorch/stable-diffusion-v1-4
$1
python infer.py
;
fi
if
[[
"
$1
"
==
*
"tensorflow"
*
]]
;
then
\
docker run
--rm
--platform
=
linux/amd64
--gpus
all
-v
./
$TARGET_DIR
:/workspace
--workdir
/workspace/tensorflow/mnist
$1
python train.py
;
fi
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment