Commit cf66c525 authored by qianyj's avatar qianyj
Browse files

update some TF file

parent 6b6f8b0c
TenorFlow 框架 训练 图像分类相关网络的代码,tensorflow 官方基准测试程序,使用的数据集是 imagenet。
# 测试运行
- 测试代码分为两部分,基础性能测试和大规模性能测试。
## 基础 benchmark
- 创建 TensorFlow 运行时环境后,以 resnet50 网络为例,计算其 batch_size=32 num_gpu=1 条件下不同精度下的性能。
### fp32 train
python3 ./benchmarks-master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --save_model_steps=10020 --optimizer=momentum --variable_update=parameter_server --print_training_accuracy=true --eval_during_training_every_n_epochs=1 --nodistortions --num_gpus=1 --num_epochs=90 --weight_decay=1e-4 --data_dir=$data_dir_path --use_fp16=False --data_name=imagenet --train_dir=$save_checkpoint_path
### fp16 train
python3 ./benchmarks-master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --save_model_steps=10020 --optimizer=momentum --variable_update=parameter_server --print_training_accuracy=true --eval_during_training_every_n_epochs=1 --nodistortions --num_gpus=1 --num_epochs=90 --weight_decay=1e-4 --data_dir=$data_dir_path --use_fp16=True --data_name=imagenet --train_dir=$save_checkpoint_path
## 大规模测试
### 单卡
HIP_VISIBLE_DEVICES=0 python3 ./benchmarks-master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --save_model_steps=10020 --optimizer=momentum --variable_update=parameter_server --print_training_accuracy=true --eval_during_training_every_n_epochs=1 --nodistortions --num_gpus=1 --num_epochs=90 --weight_decay=1e-4 --data_dir=$data_dir_path --use_fp16=False --data_name=imagenet --train_dir=$save_checkpoint_path
### 多卡
mpirun -np ${num_gpu} --hostfile hostfile --bind-to none scripts-run/single_process.sh
# 参考资料
[https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks]
[https://github.com/horovod/horovod]
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
*.sw[op]
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# IPython Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# dotenv
.env
# virtualenv
venv/
ENV/
# Spyder project settings
.spyderproject
# Rope project settings
.ropeproject
# PyCharm
.idea/
# For mac
.DS_Store
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright {yyyy} {name of copyright owner}
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Table of Contents
=================
* [Table of Contents](#table-of-contents)
* [Introduction](#introduction)
* [Executing tests](#executing-tests)
* [PerfZero on private GCE instance.](#perfzero-on-private-gce-instance)
* [Step one: Create GCE Instance](#step-one-create-gce-instance)
* [Step two: Build docker on GCE instance](#step-two-build-docker-on-gce-instance)
* [Step three: Start and "enter" the docker instance](#step-three-start-and-enter-the-docker-instance)
* [Step four: Run tests](#step-four-run-tests)
* [Step five: Delete the instance when done](#step-five-delete-the-instance-when-done)
* [PerfZero on local workstation or any server](#perfzero-on-local-workstation-or-any-server)
* [PerfZero without docker](#perfzero-without-docker)
* [Creating tests](#creating-tests)
* [Deep dive into individual tools](#deep-dive-into-individual-tools)
* [Build docker image](#build-docker-image)
* [Run benchmark](#run-benchmark)
* [Instructions for managing Google Cloud Platform computing instance](#instructions-for-managing-google-cloud-platform-computing-instance)
* [Understand the benchmark execution output](#understand-the-benchmark-execution-output)
* [Json formatted benchmark summary](#json-formatted-benchmark-summary)
* [Profiling](#profiling)
* [Visualize in TensorBoard](#visualize-in-tensorboard)
* [Visualize system metric values over time](#visualize-system-metric-values-over-time)
* [PerfZero development](#perfzero-development)
# Introduction
PerfZero is a benchmark framework for TensorFlow. It intends to address the
following use-cases:
1) For user who wants to execute TensorFlow test to debug performance
regression.
PerfZero makes it easy to execute the pre-defined test by consolidating the
docker image build, GPU driver installation, TensorFlow installation, benchmark
library checkout, data download, system statistics collection, benchmark metrics
collection, profiler data collection and so on into 2 to 3 commands. This allows
developer to focus on investigating the issue rather than setting up the test
environment.
2) For user who wants to track the performance change of TensorFlow for a
variety of setup (e.g. GPU model, cudnn version, TensorFlow version)
The developer can setup periodic job to execute these benchmark methods using
PerfZero. PerfZero will collect the information needed to identify the
benchmark (e.g. GPU model, TensorFlow version, dependent library git hash), get
benchmark execution result (e.g. wall time, accuracy, succeeded or not),
summarize the result in a easy-to-read json string and upload the result to
bigquery table. Using the data in the bigquery table, user can then visualize
the performance change in a dashboard, compare performance between different
setup in a table, and trigger alert on when there is performance regression.
# Executing tests
There are multiple ways to use PerfZero to execute a test. Listed from highest
to lowest abstraction.
* [PerfZero on private GCE instance](#perfzero-on-private-gce-instance)
* [PerfZero on local workstation or any server](#perfzero-on-local-workstation-or-any-server)
* [PerfZero without docker](#perfzero-without-docker)
## PerfZero on private GCE instance.
There are many variations on this approach, to get you started quickly the steps
below detail setting up an 8xV100 setup with local nvme drives where training
data would be stored. The only con to this setup is that it cannot be stopped
and can only be deleted due to the local nvme drives.
### Step one: Create GCE Instance
Creates an 8xV100 instance with 4 nvme drives. Output of the command will
provide the command to run to SSH to the machine. To set the project, zone, and
other features read
[cloud_manager tool details](https://github.com/tensorflow/benchmarks/tree/master/perfzero#instructions-for-managing-google-cloud-platform-computing-instance).
```bash
python perfzero/lib/cloud_manager.py create --accelerator_count 8 --nvme_count 4
```
### Step two: Build docker on GCE instance
After logging into the instance run the following command to create a docker
instance with the latest nightly TF 2.0 build. For more options read the
[build docker image section](https://github.com/tensorflow/benchmarks/tree/master/perfzero#build-docker-image)
```bash
python3 perfzero/lib/setup.py --dockerfile_path=docker/Dockerfile_ubuntu_1804_tf_v2
```
For all options for building the docker image, including controlling the version
of TensorFlow installed, checkout the public
[README for PerfZero](https://github.com/tensorflow/benchmarks/tree/master/perfzero#build-docker-image).
### Step three: Start and "enter" the docker instance
```bash
nvidia-docker run -it --rm -v $(pwd):/workspace -v /data:/data perfzero/tensorflow bash
```
### Step four: Run tests
The command below pulls GitHub official/models, downloads the cifar-10 dataset
from our internal Google Cloud storage bucket, and executes a ResNet56 benchmark
with TensorFlow 2.0 nightly build. For info on the args read the
[run benchmark section](https://github.com/tensorflow/benchmarks/tree/master/perfzero#run-benchmark).
```bash
python3 /workspace/perfzero/lib/benchmark.py \
--git_repos="https://github.com/tensorflow/models.git;benchmark" \
--python_path=models \
--data_downloads="gs://tf-perf-imagenet-uswest1/tensorflow/cifar10_data/cifar-10-batches-bin" \
--benchmark_methods=official.benchmark.keras_cifar_benchmark.Resnet56KerasBenchmarkReal.benchmark_1_gpu_no_dist_strat
```
For all options that can be used when executing a test checkout the public
[README for PerfZero](https://github.com/tensorflow/benchmarks/tree/master/perfzero#run-benchmark).
### Step five: Delete the instance when done
```bash
python perfzero/lib/cloud_manager.py delete
```
## PerfZero on local workstation or any server
This approach is the same as PerfZero on personal GCE instance, just jump to
Step two: Build docker on GCE instance.
If the workstation does not have access to the PerfZero GCS bucket and does not
need access, e.g. data is already copied locally via another method, passing
`--gcloud_key_file_url=""` will skip attempting to download the key.
A quick test that does not require accessing GCS for data is:
```bash
python3 /workspace/perfzero/lib/benchmark.py \
--git_repos="https://github.com/tensorflow/models.git;benchmark" \
--python_path=models \
--gcloud_key_file_url="" \
--benchmark_methods=official.benchmark.keras_cifar_benchmark.Resnet56KerasBenchmarkSynth.benchmark_1_gpu_no_dist_strat
```
## PerfZero without docker
PerfZero is not dependent on Docker. Docker is used to handle dependencies and
create a consistent environment. Most tests do not require much beyond
TensorFlow and PerfZero mostly depends on Google Cloud, but only for downloading
data and upload results if desired. While this approach works, we do not
maintain a clear list of the required libraries. The Docker build files are a
good starting point beyond.
Once the requirements are met the command below can be executed which will pull
GitHub official/models, downloads the cifar-10 dataset from our internal Google
Cloud storage bucket, and executes a ResNet56 benchmark with TensorFlow 2.0
nightly build.
```bash
python3 /workspace/perfzero/lib/benchmark.py \
--git_repos="https://github.com/tensorflow/models.git;benchmark" \
--python_path=models \
--data_downloads="gs://tf-perf-imagenet-uswest1/tensorflow/cifar10_data/cifar-10-batches-bin" \
--benchmark_methods=official.r1.resnet.estimator_benchmark.Resnet50EstimatorBenchmarkReal.benchmark_graph_1_gpu
```
# Creating tests
Here are the instructions that developers of benchmark method needs to follow in
order to run benchmark method in PerfZero. See
[estimator_benchmark.py](https://github.com/tensorflow/models/blob/master/official/r1/resnet/estimator_benchmark.py)
for example test code that supports PerfZero.
1) The benchmark class should extend the TensorFlow python class
[tensorflow.test.Benchmark](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/platform/benchmark.py). The benchmark class constructor should have a
constructor with signature `__init__(self, output_dir, data_dir, **kwargs)`.
Below is the usage for each arguments:
- Benchmark method should put all generated files (e.g. logs) in `output_dir` so that PerfZero can
upload these files to Google Cloud Storage when `--output_gcs_url` is specified.
- Benchmark method should read data from `root_data_dir`. For example, the benchmark method can read data from e.g. `${root_data_dir}/cifar-10-binary`
- `**kwargs` is useful to make the benchmark constructor forward compatible when PerfZero provides more named arguments to the benchmark constructor before
updating the benchmark class.
2) At the end of the benchmark method execution, the method should call [report_benchmark()](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/platform/benchmark.py)
with the following parameters:
```
tf.test.Benchmark.report_benchmark(
iters=num_iteration, # The number of iterations of the benchmark.
wall_time=wall_time_sec, # Total wall time in sec for all iterations.
metrics = [ # List of metric entries
{
'name': 'accuracy_top_5', # Metric name
'value': 80, # Metric value
'min_value': 90, # Optional. Minimum acceptable metric value for the benchmark to succeed.
'max_value': 99 # Optional. Maximum acceptable metric value for the benchmark to succeed.
},
{
'name': 'accuracy_top_1',
'value': 99.5
}
]
)
```
This format allows PerfZero to specify whether the benchmark has succeeded
(e.g. for convergence test) in its summary based on the logic determined by
the benchmark developer.
3) Include dependent libraries in `--git_repos` and `--python_path`. These
libraries will be checked-out in the directory
`path_to_perfzero/workspace/site-packages` by default. Developer can edit these
libraries directly and execute benchmark with the local change.
# Deep dive into individual tools
The sections below go into details about the individual components of PerfZero.
## Build docker image
The command below builds the docker image named `perfzero/tensorflow` which contains the
libraries (e.g. TensorFlow) needed for benchmark.
```
python3 benchmarks/perfzero/lib/setup.py
```
Here are a few selected optional flags. Run `python3 setup.py -h` to see
detailed documentation for all supported flags.
1) Use `--dockerfile_path=docker/Dockerfile_ubuntu_1804_tf_v2` to build docker image for TensorFlow v2
2) Use `--tensorflow_pip_spec` to specify the tensorflow pip package name (and optionally version) to be
installed in the docker image, e.g. `--tensorflow_pip_spec=tensorflow==1.12.0`.
## Run benchmark
The command below executes the benchmark method specified by `--benchmark_methods`.
```
export ROOT_DATA_DIR=/data
nvidia-docker run -it --rm -v $(pwd):/workspace -v $ROOT_DATA_DIR:$ROOT_DATA_DIR perfzero/tensorflow \
python3 /workspace/benchmarks/perfzero/lib/benchmark.py \
--gcloud_key_file_url="" \
--git_repos="https://github.com/tensorflow/models.git;benchmark" \
--python_path=models \
--benchmark_methods=official.r1.resnet.estimator_benchmark.Resnet50EstimatorBenchmarkSynth.benchmark_graph_1_gpu \
--root_data_dir=$ROOT_DATA_DIR
```
`${ROOT_DATA_DIR}` should be the directory which contains the dataset files
required by the benchmark method. If the flag `--data_downloads` is specified,
PerfZero will download files from the specified url to the directory specified
by the flag `--root_data_dir`. Otherwise, user needs to manually download and
move the dataset files into the directory specified by `--root_data_dirs`. The
default `root_data_dir` is `/data`. Some benchmark methods, like the one run in
the sample command above, do not require any dataset files.
Here are a few selected optional flags. Run `python3 benchmark.py -h` to see
detailed documentation for all supported flags.
1) Use `--workspace=unique_workspace_name` if you need to run multiple benchmark
using different workspace setup. One example usecase is that you may want to
test a branch from a pull request without changing your existing workspace.
2) Use `--debug` if you need to see the debug level logging
3) Use `--git_repos="git_url;git_branch;git_hash"` to checkout a git repo with
the specified git_branch at the specified git_hash to the local folder with the
specified folder name. **Note that** the value of the flag `--git_repos` is
wrapped by the quotation mark `"` so that `;` will not be interpreted by the
bash as the end of the command. Specify the flag once for each repository you
want to checkout.
5) Use `--profiler_enabled_time=start_time:end_time` to collect profiler data
during period `[start_time, end_time)` after the benchmark method execution
starts. Skip `end_time` in the flag value to collect data until the end of
benchmark method execution. See [here](#visualize-tensorflow-graph-etc-using-tensorboard)
for instructions on how to use the generated profiler data.
## Instructions for managing Google Cloud Platform computing instance
PerfZero aims to make it easy to run and debug TensorFlow which is usually run
with GPU. However, most users do not have dedicated machine with expensive
hardware. One cost-effective solution is for users to create machine with the
desired hardware on demand in a public cloud when they need to debug TensorFlow.
We provide a script in PerfZero to make it easy to manage computing instance in
Google Cloud Platform. This assumes that you have access to an existing project
in GCP.
Run `python perfzero/lib/cloud_manager.py -h` for list of commands supported
by the script. Run `cloud_manager.py <command> -h` to see detailed documentation
for all supported flags for the specified `command`.
In most cases, user only needs to run the following commands:
```
# Create a new instance that is unique to your username
python perfzero/lib/cloud_manager.py create --project=project_name
# Query the status of the existing instanced created by your and its IP address
python perfzero/lib/cloud_manager.py status --project=project_name
# Stop the instance
python perfzero/lib/cloud_manager.py stop --project=project_name
# Start the instance
python perfzero/lib/cloud_manager.py start --project=project_name
# Delete the instance
python perfzero/lib/cloud_manager.py delete --project=project_name
```
## Understand the benchmark execution output
### Json formatted benchmark summary
PerfZero outputs a json-formatted summary that provides the information needed
to understand the benchmark result. The summary is printed in the stdout and
in the file `path_to_perfzero/${workspace}/output/${execution_id}/perfzero.log`.
Additionally, Perfzero outputs a pure json file containing the summary at
`path_to_perfzero/${workspace}/output/${execution_id}/perfzero_summary.json`
Here is an example output from PerfZero. Explanation is provided inline for each
key when the name of the key is not sufficiently self-explanary.
```
{
"ml_framework_info": { # Summary of the machine learning framework
"version": "1.13.0-dev20190206", # Short version. It is tf.__version__ for TensorFlow
"name": "tensorflow", # Machine learning framework name such as PyTorch
"build_label": "ml_framework_build_label", # Specified by the flag --ml_framework_build_label
"build_version": "v1.12.0-7504-g9b32b5742b" # Long version. It is tf.__git_version__ for TensorFlow
},
"execution_timestamp": 1550040322.8991697, # Timestamp when the benchmark is executed
"execution_id": "2019-02-13-06-45-22-899128", # A string that uniquely identify this benchmark execution
"benchmark_info": { # Summary of the benchmark framework setup
"output_url": "gs://tf-performance/test-results/2019-02-13-06-45-22-899128/", # Google storage url that contains the log file from this benchmark execution
"has_exception": false,
"site_package_info": {
"models": {
"branch": "benchmark",
"url": "https://github.com/tensorflow/models.git",
"hash": "f788046ca876a8820e05b0b48c1fc2e16b0955bc"
},
"benchmarks": {
"branch": "master",
"url": "https://github.com/tensorflow/benchmarks.git",
"hash": "af9e0ef36fc6867d9b63ebccc11f229375cd6a31"
}
},
"harness_name": "perfzero",
"harness_info": {
"url": "https://github.com/tensorflow/benchmarks.git",
"branch": "master",
"hash": "75d2991b88630dde10ef65aad8082a6d5cd8b5fc"
},
"execution_label": "execution_label" # Specified by the flag --execution_label
},
"system_info": { # Summary of the resources in the system that is used to execute the benchmark
"system_name": "system_name", # Specified by the flag --system_name
"accelerator_count": 2, # Number of GPUs in the system
"physical_cpu_count": 8, # Number of physical cpu cores in the system. Hyper thread CPUs are excluded.
"logical_cpu_count": 16, # Number of logical cpu cores in the system. Hyper thread CPUs are included.
"cpu_socket_count": 1, # Number of cpu socket in the system.
"platform_name": "platform_name", # Specified by the flag --platform_name
"accelerator_model": "Tesla V100-SXM2-16GB",
"accelerator_driver_version": "410.48",
"cpu_model": "Intel(R) Xeon(R) CPU @ 2.20GHz"
},
"process_info": { # Summary of the resources used by the process to execute the benchmark
"max_rss": 4269047808, # maximum physical memory in bytes used by the process
"max_vms": 39894450176, # maximum virtual memory in bytes used by the process
"max_cpu_percent": 771.1 # CPU utilization as a percentage. See psutil.Process.cpu_percent() for more information
},
"benchmark_result": { # Summary of the benchmark execution results. This is pretty much the same data structure defined in test_log.proto.
# Most values are read from test_log.proto which is written by tf.test.Benchmark.report_benchmark() defined in TensorFlow library.
"metrics": [ # This is derived from `extras` [test_log.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/test_log.proto)
# which is written by report_benchmark().
# If the EntryValue is double, then name is the extra's key and value is extra's double value.
# If the EntryValue is string, then name is the extra's key. The string value will be a json formated string whose keys
# include `value`, `succeeded` and `description`. Benchmark method can provide arbitrary metric key/value pairs here.
{
"name": "accuracy_top_5",
"value": 0.7558000087738037
},
{
"name": "accuracy_top_1",
"value": 0.2639999985694885
}
],
"name": "official.resnet.estimator_cifar_benchmark.EstimatorCifar10BenchmarkTests.unit_test", # Full path to the benchmark method, i.e. module_path.class_name.method_name
"succeeded": true, # True iff benchmark method execution finishes without exception and no metric in metrics show succeeded = false
"wall_time": 14.552583694458008 # The value is determined by tf.test.Benchmark.report_benchmark() called by the benchmark method. It is -1 if report_benchmark() is not called.
}
}
```
### Profiling
When the flag `--profiler_enabled_time=start_time:end_time` is specified, the
profiler data will be collected and stored in
`path_to_perfzero/${workspace}/output/${execution_id}/profiler_data`.
#### Visualize in TensorBoard
Firstly, install the profile plugin for TensorBoard.
```
pip install -U tensorboard-plugin-profile
```
Run `tensorboard --logdir=path_to_perfzero/workspace/output/${execution_id}/profiler_data` or
`python3 -m tensorboard.main --logdir=path_to_perfzero/workspace/output/${execution_id}/profiler_data` to open
TensorBoard server.
If PerfZero is executed on a remote machine, run `ssh -L
6006:127.0.0.1:6006 remote_ip` before opening `http://localhost:6006` in your
browser to access the TensorBoard UI.
You can also run TensorBoard inside the docker, e.g.
`tensorboard --logdir=/workspace/perfzero/workspace/output/${execution_id}/profiler_data --bind_all`
In this case, you have to start docker with port mapping, i.e. with "-p 6006:6006" flag, .e.g
```
nvidia-docker run -it --rm -v $(pwd):/workspace -p 6006:6006 perfzero/tensorflow
```
Normally, the pages you see will look like:
![Screenshot](screenshots/profiling_overview.png "Profiling Overview")
![Screenshot](screenshots/profiling_trace_view.png "Profiling Trace View")
### Visualize system metric values over time
PerfZero also records a few useful system metrics (e.g. rss, vms) over time in
the file `path_to_perfzero/${workspace}/output/${execution_id}/process_info.log`.
Run `python perfzero/scripts/plot_process_info.py process_info.log` to generate a
pdf showing the value of these metrics over time.
# PerfZero development
Avoid importing `tensorflow` package in any place that requires the `logging`
package because tensorflow package appears to prevent logging from working
properly. Importing `tensorflow` package only in the method that requires it.
Here are the commands to run unit tests and check code style.
```
# Run all unit tests
# This must be executed in directory perfzero/lib
python3 -B -m unittest discover -p "*_test.py"
# Format python code in place
find perfzero/lib -name *.py -exec pyformat --in_place {} \;
# Check python code format and report warning and errors
find perfzero/lib -name *.py -exec gpylint3 {} \;
```
Here is the command to generate table-of-contents for this README. Run this
command and copy/paste it to the README.md.
```
./perfzero/scripts/generate-readme-header.sh perfzero/README.md
```
# Ubuntu 18.04 Python3 with CUDA 10 and the following:
# - Installs tf-nightly-gpu-2.0-preview
# - Installs requirements.txt for tensorflow/models
# - Install bazel for building TF from source
FROM nvidia/cuda:10.0-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu-2.0-preview"
ARG extra_pip_specs=""
ARG local_tensorflow_pip_spec=""
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-0 \
cuda-cublas-dev-10-0 \
cuda-cufft-dev-10-0 \
cuda-curand-dev-10-0 \
cuda-cusolver-dev-10-0 \
cuda-cusparse-dev-10-0 \
libcudnn7=7.6.2.24-1+cuda10.0 \
libcudnn7-dev=7.6.2.24-1+cuda10.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl \
&& \
find /usr/local/cuda-10.0/lib64/ -type f -name 'lib*_static.a' -not -name 'libcudart_static.a' -delete && \
rm /usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \
libnvinfer-dev=5.1.5-1+cuda10.0 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
build-essential \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python
# (bulding TF needs py2 even if building for Python3 as of 06-AUG-2019)
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv \
python
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client pyyaml google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt
RUN pip3 freeze
# Install bazel
ARG BAZEL_VERSION=0.24.1
RUN mkdir /bazel && \
wget -O /bazel/installer.sh "https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh" && \
wget -O /bazel/LICENSE.txt "https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE" && \
chmod +x /bazel/installer.sh && \
/bazel/installer.sh && \
rm -f /bazel/installer.sh
RUN git clone https://github.com/tensorflow/tensorflow.git /tensorflow_src
# Ubuntu 18.04 Python3 with CUDA 10 and the following:
# - Installs tf-nightly-gpu (this is TF 2.0)
# - Installs requirements.txt for tensorflow/models
# Additionally also installs:
# - Latest S4TF development snapshot for cuda 10.0
FROM nvidia/cuda:10.0-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
ARG swift_tf_url=https://storage.googleapis.com/swift-tensorflow-artifacts/nightlies/latest/swift-tensorflow-DEVELOPMENT-cuda10.0-cudnn7-ubuntu18.04.tar.gz
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-0 \
cuda-cublas-10-0 \
cuda-cublas-dev-10-0 \
cuda-cufft-10-0 \
cuda-curand-10-0 \
cuda-cusolver-10-0 \
cuda-cusparse-10-0 \
libcudnn7=7.6.2.24-1+cuda10.0 \
libcudnn7-dev=7.6.2.24-1+cuda10.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \
libnvinfer-dev=5.1.5-1+cuda10.0 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client pyyaml google-cloud google-cloud-bigquery mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip freeze
### Install Swift deps.
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
curl \
git \
python \
python-dev \
python-pip \
python-setuptools \
python-tk \
python3 \
python3-pip \
python3-setuptools \
clang \
libcurl4-openssl-dev \
libicu-dev \
libpython-dev \
libpython3-dev \
libncurses5-dev \
libxml2 \
libblocksruntime-dev
# Download and extract S4TF
WORKDIR /swift-tensorflow-toolchain
RUN if ! curl -fSsL --retry 5 $swift_tf_url -o swift.tar.gz; \
then sleep 30 && curl -fSsL --retry 5 $swift_tf_url -o swift.tar.gz; \
fi;
RUN mkdir usr \
&& tar -xzf swift.tar.gz --directory=usr --strip-components=1 \
&& rm swift.tar.gz
ENV PATH="/swift-tensorflow-toolchain/usr/bin:${PATH}"
ENV LD_LIBRARY_PATH="/swift-tensorflow-toolchain/usr/lib/swift/linux/:${LD_LIBRARY_PATH}"
# Ubuntu 18.04 Python3 with CUDA 10.1 and the following:
# - Installs tf-nightly-gpu (this is TF 2.1)
# - Installs requirements.txt for tensorflow/models
# - TF 2.0 tested with cuda 10.0, but we need to test tf 2.1 with cuda 10.1.
# Additionally also installs
# - Latest S4TF development snapshot for cuda 10.1
FROM nvidia/cuda:10.1-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
ARG swift_tf_url=https://storage.googleapis.com/swift-tensorflow-artifacts/nightlies/latest/swift-tensorflow-DEVELOPMENT-cuda10.1-cudnn7-stock-ubuntu18.04.tar.gz
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-1 \
cuda-cufft-10-1 \
cuda-curand-10-1 \
cuda-cusolver-10-1 \
cuda-cusparse-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.1 \
libnvinfer-dev=5.1.5-1+cuda10.1 \
libnvinfer6=6.0.1-1+cuda10.1 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client pyyaml google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip freeze
### Install Swift deps.
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
curl \
git \
python \
python-dev \
python-pip \
python-setuptools \
python-tk \
python3 \
python3-pip \
python3-setuptools \
clang \
libcurl4-openssl-dev \
libicu-dev \
libpython-dev \
libpython3-dev \
libncurses5-dev \
libxml2 \
libblocksruntime-dev
# Download and extract S4TF
WORKDIR /swift-tensorflow-toolchain
RUN if ! curl -fSsL --retry 5 $swift_tf_url -o swift.tar.gz; \
then sleep 30 && curl -fSsL --retry 5 $swift_tf_url -o swift.tar.gz; \
fi;
RUN mkdir usr \
&& tar -xzf swift.tar.gz --directory=usr --strip-components=1 \
&& rm swift.tar.gz
ENV PATH="/swift-tensorflow-toolchain/usr/bin:${PATH}"
ENV LD_LIBRARY_PATH="/swift-tensorflow-toolchain/usr/lib/swift/linux/:${LD_LIBRARY_PATH}"
# Ubuntu 18.04 Python3 with CUDA 11.0 and the following:
# - Installs tf-nightly-gpu (this is TF 2.4)
# - Installs requirements.txt for tensorflow/models
# Additionally also installs
# - Latest S4TF development snapshot for cuda 11.0
FROM nvidia/cuda:11.0-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
ARG swift_tf_url=https://storage.googleapis.com/swift-tensorflow-artifacts/nightlies/latest/swift-tensorflow-DEVELOPMENT-cuda11.0-cudnn8-stock-ubuntu18.04.tar.gz
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn8-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-tools-11-0 \
cuda-toolkit-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer7=7.2.0-1+cuda11.0 \
libnvinfer-dev=7.2.0-1+cuda11.0 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client pyyaml google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip freeze
### Install Swift deps.
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
curl \
git \
python \
python-dev \
python-pip \
python-setuptools \
python-tk \
python3 \
python3-pip \
python3-setuptools \
clang \
libcurl4-openssl-dev \
libicu-dev \
libpython-dev \
libpython3-dev \
libncurses5-dev \
libxml2 \
libblocksruntime-dev
# Download and extract S4TF
WORKDIR /swift-tensorflow-toolchain
RUN if ! curl -fSsL --retry 5 $swift_tf_url -o swift.tar.gz; \
then sleep 30 && curl -fSsL --retry 5 $swift_tf_url -o swift.tar.gz; \
fi;
RUN mkdir usr \
&& tar -xzf swift.tar.gz --directory=usr --strip-components=1 \
&& rm swift.tar.gz
ENV PATH="/swift-tensorflow-toolchain/usr/bin:${PATH}"
ENV LD_LIBRARY_PATH="/swift-tensorflow-toolchain/usr/lib/swift/linux/:${LD_LIBRARY_PATH}"
# Ubuntu 18.04 Python3 with CUDA 10.1 and the following:
# - Installs tf-nightly-gpu (this is TF 2.1)
# - Installs requirements.txt for tensorflow/models
# - TF 2.0 tested with cuda 10.0, but we need to test tf 2.1 with cuda 10.1.
FROM nvidia/cuda:10.1-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-1 \
cuda-cufft-10-1 \
cuda-curand-10-1 \
cuda-cusolver-10-1 \
cuda-cusparse-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.1 \
libnvinfer-dev=5.1.5-1+cuda10.1 \
libnvinfer6=6.0.1-1+cuda10.1 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client==1.8.0 pyyaml google-cloud google-cloud-bigquery mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 11 and the following:
# - Installs tf-nightly-gpu (this is TF 2.3)
# - Installs requirements.txt for tensorflow/models
FROM nvidia/cuda:11.2.1-cudnn8-devel-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
ENV PIP_CMD="python3.9 -m pip"
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
# Needed to disable prompts during installation.
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
# Python 3.9 related deps in this ppa.
RUN add-apt-repository ppa:deadsnakes/ppa
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3.9 \
python3-pip \
python3.9-dev \
python3-setuptools \
python3.9-venv \
python3.9-distutils \
python3.9-lib2to3
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN ${PIP_CMD} install --upgrade pip
RUN ${PIP_CMD} install --upgrade distlib
# setuptools upgraded to fix install requirements from model garden.
RUN ${PIP_CMD} install --upgrade setuptools
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
RUN ${PIP_CMD} install --upgrade pyyaml
RUN ${PIP_CMD} install --upgrade google-api-python-client==1.8.0
RUN ${PIP_CMD} install --upgrade google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN ${PIP_CMD} install wheel
RUN ${PIP_CMD} install absl-py
RUN ${PIP_CMD} install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN ${PIP_CMD} install tfds-nightly
RUN ${PIP_CMD} install -U scikit-learn
# Install dependnecies needed for tf.distribute test utils
RUN ${PIP_CMD} install dill tblib portpicker
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN ${PIP_CMD} install -r /tmp/requirements.txt
RUN ${PIP_CMD} install tf-estimator-nightly
RUN ${PIP_CMD} install tensorflow-text-nightly
# RUN nvidia-smi
RUN nvcc --version
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 11 and the following:
# - Installs tf-nightly-gpu (this is TF 2.3)
# - Installs requirements.txt for tensorflow/models
FROM nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-tools-11-0 \
cuda-toolkit-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client==1.8.0 pyyaml google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
# Install dependnecies needed for tf.distribute test utils
RUN pip install dill tblib portpicker
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip install tf-estimator-nightly
RUN pip install tensorflow-text-nightly
# RUN nvidia-smi
RUN nvcc --version
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 11 and the following:
# - Installs tf-nightly-gpu (this is TF 2.3)
# - Installs requirements.txt for tensorflow/models
FROM nvidia/cuda:11.2.1-cudnn8-devel-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
RUN apt-get update && apt-get install -y --no-install-recommends \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client==1.8.0 pyyaml google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
# Install dependnecies needed for tf.distribute test utils
RUN pip install dill tblib portpicker
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip install tf-estimator-nightly
RUN pip install tensorflow-text-nightly
RUN pip install psutil
# RUN nvidia-smi
RUN nvcc --version
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 11 and the following:
# - Installs tf-nightly-gpu (this is TF 2.3)
# - Installs requirements.txt for tensorflow/models
FROM nvidia/cuda:11.2.1-cudnn8-devel-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
ENV PIP_CMD="python3.9 -m pip"
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
# Needed to disable prompts during installation.
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
# Python 3.9 related deps in this ppa.
RUN add-apt-repository ppa:deadsnakes/ppa
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3.9 \
python3-pip \
python3.9-dev \
python3-setuptools \
python3.9-venv \
python3.9-distutils \
python3.9-lib2to3
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN ${PIP_CMD} install --upgrade pip
RUN ${PIP_CMD} install --upgrade distlib
# setuptools upgraded to fix install requirements from model garden.
RUN ${PIP_CMD} install --upgrade setuptools
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
RUN ${PIP_CMD} install --upgrade pyyaml
RUN ${PIP_CMD} install --upgrade google-api-python-client==1.8.0
RUN ${PIP_CMD} install --upgrade google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN ${PIP_CMD} install wheel
RUN ${PIP_CMD} install absl-py
RUN ${PIP_CMD} install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN ${PIP_CMD} install tfds-nightly
RUN ${PIP_CMD} install -U scikit-learn
# Install dependnecies needed for tf.distribute test utils
RUN ${PIP_CMD} install dill tblib portpicker
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN ${PIP_CMD} install -r /tmp/requirements.txt
RUN ${PIP_CMD} install tf-estimator-nightly
RUN ${PIP_CMD} install tensorflow-text-nightly
RUN ${PIP_CMD} install keras-nightly==2.7.0.dev2021082607
# RUN nvidia-smi
RUN nvcc --version
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 11 and the following:
# - Installs tf-nightly-gpu (this is TF 2.3)
# - Installs requirements.txt for tensorflow/models
FROM nvidia/cuda:11.2.1-cudnn8-devel-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
ENV PIP_CMD="python3.9 -m pip"
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
# Needed to disable prompts during installation.
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
# Python 3.9 related deps in this ppa.
RUN add-apt-repository ppa:deadsnakes/ppa
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3.9 \
python3-pip \
python3.9-dev \
python3-setuptools \
python3.9-venv \
python3.9-distutils \
python3.9-lib2to3
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN ${PIP_CMD} install --upgrade pip
RUN ${PIP_CMD} install --upgrade distlib
# setuptools upgraded to fix install requirements from model garden.
RUN ${PIP_CMD} install --upgrade setuptools
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
RUN ${PIP_CMD} install --upgrade pyyaml
RUN ${PIP_CMD} install --upgrade google-api-python-client==1.8.0
RUN ${PIP_CMD} install --upgrade google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN ${PIP_CMD} install wheel
RUN ${PIP_CMD} install absl-py
RUN ${PIP_CMD} install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN ${PIP_CMD} install tfds-nightly
RUN ${PIP_CMD} install -U scikit-learn
# Install dependnecies needed for tf.distribute test utils
RUN ${PIP_CMD} install dill tblib portpicker
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN ${PIP_CMD} install -r /tmp/requirements.txt
RUN ${PIP_CMD} install tf-estimator-nightly
RUN ${PIP_CMD} install tensorflow-text-nightly
RUN ${PIP_CMD} install keras-nightly==2.7.0.dev2021070900
# RUN nvidia-smi
RUN nvcc --version
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 11 and the following:
# - Installs tf-nightly-gpu (this is TF 2.3)
# - Installs requirements.txt for tensorflow/models
FROM nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-tools-11-0 \
cuda-toolkit-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client==1.8.0 pyyaml google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
# Install dependnecies needed for tf.distribute test utils
RUN pip install dill tblib portpicker
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip install tf-estimator-nightly
RUN pip install tensorflow-text-nightly
# RUN nvidia-smi
RUN nvcc --version
RUN pip freeze
# Ubuntu 18.04 Python3.6 with CUDA 10 and the following:
# - Installs custom TensorFlow pip package
# - Installs requirements.txt for tensorflow/models
# NOTE: Branched from Dockerfile_ubuntu_1804_tf_v1 with changes relevant to
# tensorflow_pip_spec. When updating please keep the difference minimal.
FROM nvidia/cuda:10.0-base-ubuntu18.04 as base
# Location of custom TF pip package, must be relative to docker context.
# Note that the version tag in the name of wheel file is meaningless.
ARG tensorflow_pip_spec="resources/tensorflow-0.0.1-cp36-cp36m-linux_x86_64.whl"
ARG extra_pip_specs=""
ARG local_tensorflow_pip_spec=""
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
COPY ${tensorflow_pip_spec} /tensorflow-0.0.1-cp36-cp36m-linux_x86_64.whl
# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-0 \
cuda-cublas-10-0 \
cuda-cufft-10-0 \
cuda-curand-10-0 \
cuda-cusolver-10-0 \
cuda-cusparse-10-0 \
libcudnn7=7.4.1.5-1+cuda10.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 \
&& apt-get update \
&& apt-get install -y --no-install-recommends libnvinfer5=5.0.2-1+cuda10.0 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
build-essential \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Setup Python3 environment
RUN pip3 install --upgrade pip==9.0.1
# setuptools upgraded to fix install requirements from model garden.
RUN pip3 install wheel
RUN pip3 install --upgrade setuptools google-api-python-client pyyaml google-cloud google-cloud-bigquery
RUN pip3 install absl-py
RUN pip3 install --upgrade --force-reinstall /tensorflow-0.0.1-cp36-cp36m-linux_x86_64.whl ${extra_pip_specs}
RUN pip3 install tfds-nightly
RUN pip3 install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt
RUN pip3 freeze
# Ubuntu 18.04 Python3 with CUDA 10 and the following:
# - Installs tf-nightly-gpu
# - Installs requirements.txt for tensorflow/models
#
# This docker is not needed and is the same as the tf_v2 docker. The
# User can pass in the desired `ARG tensorflow_pip_spec` Remove
# one TF 1.0 testing is done or KOKORO jobs are updated to use the
# tensorfow_pip_spec rather than docker path to control TF version.
FROM nvidia/cuda:10.0-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG extra_pip_specs=""
ARG local_tensorflow_pip_spec=""
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-0 \
cuda-cublas-10-0 \
cuda-cublas-dev-10-0 \
cuda-cufft-10-0 \
cuda-curand-10-0 \
cuda-cusolver-10-0 \
cuda-cusparse-10-0 \
libcudnn7=7.6.0.64-1+cuda10.0 \
libcudnn7-dev=7.6.0.64-1+cuda10.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \
libnvinfer-dev=5.1.5-1+cuda10.0 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client pyyaml google-cloud google-cloud-bigquery mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 10 and the following:
# - Installs tf-nightly-gpu (this is TF 2.0)
# - Installs requirements.txt for tensorflow/models
FROM nvidia/cuda:10.0-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-0 \
cuda-cublas-10-0 \
cuda-cublas-dev-10-0 \
cuda-cufft-10-0 \
cuda-curand-10-0 \
cuda-cusolver-10-0 \
cuda-cusparse-10-0 \
libcudnn7=7.6.2.24-1+cuda10.0 \
libcudnn7-dev=7.6.2.24-1+cuda10.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \
libnvinfer-dev=5.1.5-1+cuda10.0 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client pyyaml google-cloud google-cloud-bigquery mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip freeze
# Ubuntu 18.04 Python3 with CUDA 10.1 and the following:
# - Installs tf-nightly-gpu (this is TF 2.1)
# - Installs requirements.txt for tensorflow/models
# - TF 2.0 tested with cuda 10.0, but we need to test tf 2.1 with cuda 10.1.
FROM nvidia/cuda:10.1-base-ubuntu18.04 as base
ARG tensorflow_pip_spec="tf-nightly-gpu"
ARG local_tensorflow_pip_spec=""
ARG extra_pip_specs=""
# setup.py passes the base path of local .whl file is chosen for the docker image.
# Otherwise passes an empty existing file from the context.
COPY ${local_tensorflow_pip_spec} /${local_tensorflow_pip_spec}
# Pick up some TF dependencies
# cublas-dev and libcudnn7-dev only needed because of libnvinfer-dev which may not
# really be needed.
# In the future, add the following lines in a shell script running on the
# benchmark vm to get the available dependent versions when updating cuda
# version (e.g to 10.2 or something later):
# sudo apt-cache search cuda-command-line-tool
# sudo apt-cache search cuda-cublas
# sudo apt-cache search cuda-cufft
# sudo apt-cache search cuda-curand
# sudo apt-cache search cuda-cusolver
# sudo apt-cache search cuda-cusparse
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-10-1 \
cuda-cufft-10-1 \
cuda-curand-10-1 \
cuda-cusolver-10-1 \
cuda-cusparse-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1 \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
libpng-dev \
pkg-config \
software-properties-common \
unzip \
lsb-core \
curl
RUN apt-get update && \
apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.1 \
libnvinfer-dev=5.1.5-1+cuda10.1 \
libnvinfer6=6.0.1-1+cuda10.1 \
&& apt-get clean
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
# Add google-cloud-sdk to the source list
RUN echo "deb http://packages.cloud.google.com/apt cloud-sdk-$(lsb_release -c -s) main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# Install extras needed by most models
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
wget \
htop \
zip \
google-cloud-sdk
# Install / update Python and Python3
RUN apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-venv
# Upgrade pip, need to use pip3 and then pip after this or an error
# is thrown for no main found.
RUN pip3 install --upgrade pip
# setuptools upgraded to fix install requirements from model garden.
RUN pip install wheel
RUN pip install --upgrade setuptools google-api-python-client==1.8.0 pyyaml google-cloud google-cloud-bigquery google-cloud-datastore mock
RUN pip install absl-py
RUN pip install --upgrade --force-reinstall ${tensorflow_pip_spec} ${extra_pip_specs}
RUN pip install tfds-nightly
RUN pip install -U scikit-learn
# Install dependnecies needed for tf.distribute test utils
RUN pip install dill tblib portpicker
RUN curl https://raw.githubusercontent.com/tensorflow/models/master/official/requirements.txt > /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
RUN pip freeze
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment