Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
tsoc
superbenchmark
Commits
bd8f105d
Unverified
Commit
bd8f105d
authored
Dec 04, 2021
by
Yifan Xiong
Committed by
GitHub
Dec 04, 2021
Browse files
Benchmarks - Add config file for NDm A100 v4 (#255)
Add config file for Azure NDm A100 v4 SKU.
parent
8042fa34
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
170 additions
and
0 deletions
+170
-0
superbench/config/azure_ndmv4.yaml
superbench/config/azure_ndmv4.yaml
+170
-0
No files found.
superbench/config/azure_ndmv4.yaml
0 → 100644
View file @
bd8f105d
# SuperBench Config
#
# Azure NDm A100 v4
# reference: https://docs.microsoft.com/en-us/azure/virtual-machines/ndm-a100-v4-series
version
:
v0.3
superbench
:
enable
:
null
var
:
default_local_mode
:
&default_local_mode
enable
:
true
modes
:
-
name
:
local
proc_num
:
8
prefix
:
CUDA_VISIBLE_DEVICES={proc_rank}
parallel
:
yes
default_pytorch_mode
:
&default_pytorch_mode
enable
:
true
modes
:
-
name
:
torch.distributed
proc_num
:
8
node_num
:
1
frameworks
:
-
pytorch
common_model_config
:
&common_model_config
duration
:
0
num_warmup
:
64
num_steps
:
2048
sample_count
:
8192
batch_size
:
32
precision
:
-
float32
-
float16
model_action
:
-
train
pin_memory
:
yes
benchmarks
:
kernel-launch
:
<<
:
*default_local_mode
gemm-flops
:
<<
:
*default_local_mode
nccl-bw
:
enable
:
true
modes
:
-
name
:
local
proc_num
:
1
parallel
:
no
parameters
:
ngpus
:
8
ib-loopback
:
enable
:
true
modes
:
-
name
:
local
proc_num
:
4
prefix
:
PROC_RANK={proc_rank} IB_DEVICES=0,2,4,6 NUMA_NODES=1,0,3,2
parallel
:
yes
-
name
:
local
proc_num
:
4
prefix
:
PROC_RANK={proc_rank} IB_DEVICES=1,3,5,7 NUMA_NODES=1,0,3,2
parallel
:
yes
mem-bw
:
enable
:
true
modes
:
-
name
:
local
proc_num
:
8
prefix
:
CUDA_VISIBLE_DEVICES={proc_rank} numactl -N $(({proc_rank}/2))
parallel
:
yes
disk-benchmark
:
enable
:
true
modes
:
-
name
:
local
proc_num
:
1
parallel
:
no
parameters
:
block_devices
:
-
/dev/nvme0n1
-
/dev/nvme1n1
-
/dev/nvme2n1
-
/dev/nvme3n1
-
/dev/nvme4n1
-
/dev/nvme5n1
-
/dev/nvme6n1
-
/dev/nvme7n1
seq_read_runtime
:
60
seq_write_runtime
:
60
seq_readwrite_runtime
:
60
rand_read_runtime
:
60
rand_write_runtime
:
60
rand_readwrite_runtime
:
60
gpu-copy-bw
:
enable
:
true
modes
:
-
name
:
local
parallel
:
no
parameters
:
mem_type
:
-
htod
-
dtoh
-
dtod
copy_type
:
-
sm
-
dma
cudnn-function
:
<<
:
*default_local_mode
cublas-function
:
<<
:
*default_local_mode
matmul
:
<<
:
*default_local_mode
frameworks
:
-
pytorch
sharding-matmul
:
<<
:
*default_pytorch_mode
computation-communication-overlap
:
<<
:
*default_pytorch_mode
gpt_models
:
<<
:
*default_pytorch_mode
models
:
-
gpt2-small
-
gpt2-large
parameters
:
<<
:
*common_model_config
batch_size
:
8
seq_len
:
224
bert_models
:
<<
:
*default_pytorch_mode
models
:
-
bert-base
-
bert-large
parameters
:
<<
:
*common_model_config
seq_len
:
224
lstm_models
:
<<
:
*default_pytorch_mode
models
:
-
lstm
parameters
:
<<
:
*common_model_config
batch_size
:
224
input_size
:
224
hidden_size
:
1000
seq_len
:
32
pin_memory
:
no
resnet_models
:
<<
:
*default_pytorch_mode
models
:
-
resnet50
-
resnet101
-
resnet152
parameters
:
<<
:
*common_model_config
batch_size
:
192
num_steps
:
512
densenet_models
:
<<
:
*default_pytorch_mode
models
:
-
densenet169
-
densenet201
parameters
:
<<
:
*common_model_config
pin_memory
:
no
vgg_models
:
<<
:
*default_pytorch_mode
models
:
-
vgg11
-
vgg13
-
vgg16
-
vgg19
parameters
:
<<
:
*common_model_config
pin_memory
:
no
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment