Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
apex
Commits
8e1f01c5
Commit
8e1f01c5
authored
May 16, 2018
by
Michael Carilli
Browse files
Adding minimal examples with Apex and Pytorch distributed data parallel
parent
83acda92
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
124 additions
and
0 deletions
+124
-0
examples/FP16_Optimizer_simple/README.md
examples/FP16_Optimizer_simple/README.md
+24
-0
examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
...izer_simple/distributed_apex/distributed_data_parallel.py
+50
-0
examples/FP16_Optimizer_simple/distributed_apex/run.sh
examples/FP16_Optimizer_simple/distributed_apex/run.sh
+5
-0
examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
...r_simple/distributed_pytorch/distributed_data_parallel.py
+43
-0
examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
+2
-0
No files found.
examples/FP16_Optimizer_simple/README.md
0 → 100644
View file @
8e1f01c5
# Simple examples of FP16_Optimizer functionality
`minimal.py`
shows the basic usage of
`FP16_Optimizer`
.
`closure.py`
shows how to use
`FP16_Optimizer`
with a closure.
`save_load.py`
shows that
`FP16_Optimizer`
uses the same checkpointing syntax as ordinary Pytorch
optimizers.
`distributed_pytorch`
shows an example using
`FP16_Optimizer`
with Pytorch DistributedDataParallel.
The usage of
`FP16_Optimizer`
with distributed does not need to change from ordinary single-process
usage. Run via
```
bash
cd
distributed_pytorch
bash run.sh
```
`distributed_pytorch`
shows an example using
`FP16_Optimizer`
with Apex DistributedDataParallel.
Again, the usage of
`FP16_Optimizer`
with distributed does not need to change from ordinary
single-process usage. Run via
```
bash
cd
distributed_apex
bash run.sh
```
examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
0 → 100644
View file @
8e1f01c5
import
torch
from
torch.autograd
import
Variable
import
argparse
from
apex.parallel
import
DistributedDataParallel
as
DDP
from
apex.fp16_utils
import
FP16_Optimizer
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
'--dist-url'
,
default
=
'tcp://224.66.41.62:23456'
,
type
=
str
,
help
=
'url used to set up distributed training'
)
parser
.
add_argument
(
'--world-size'
,
default
=
2
,
type
=
int
,
help
=
'Number of distributed processes.'
)
parser
.
add_argument
(
"--rank"
,
type
=
int
,
help
=
'Rank of this process'
)
args
=
parser
.
parse_args
()
torch
.
cuda
.
set_device
(
args
.
rank
)
torch
.
distributed
.
init_process_group
(
backend
=
'nccl'
,
init_method
=
args
.
dist_url
,
world_size
=
args
.
world_size
,
rank
=
args
.
rank
)
torch
.
backends
.
cudnn
.
benchmark
=
True
N
,
D_in
,
D_out
=
64
,
1024
,
16
x
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_in
).
normal_
()).
half
()
y
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_out
).
normal_
()).
half
()
model
=
torch
.
nn
.
Linear
(
D_in
,
D_out
).
cuda
().
half
()
model
=
DDP
(
model
)
optimizer
=
torch
.
optim
.
SGD
(
model
.
parameters
(),
lr
=
1e-3
)
### CONSTRUCT FP16_Optimizer ###
optimizer
=
FP16_Optimizer
(
optimizer
)
###
loss_fn
=
torch
.
nn
.
MSELoss
()
for
t
in
range
(
500
):
optimizer
.
zero_grad
()
y_pred
=
model
(
x
)
loss
=
loss_fn
(
y_pred
,
y
)
### CHANGE loss.backward() TO: ###
optimizer
.
backward
(
loss
)
###
optimizer
.
step
()
print
(
"final loss = "
,
loss
)
examples/FP16_Optimizer_simple/distributed_apex/run.sh
0 → 100644
View file @
8e1f01c5
#!/bin/bash
# By default, apex.parallel.multiproc will attempt to use all available GPUs on the system.
# The number of GPUs to use can be limited by setting CUDA_VISIBLE_DEVICES:
export
CUDA_VISIBLE_DEVICES
=
0,1
python
-m
apex.parallel.multiproc distributed_data_parallel.py
examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
0 → 100644
View file @
8e1f01c5
import
torch
from
torch.autograd
import
Variable
import
argparse
from
apex.fp16_utils
import
FP16_Optimizer
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--local_rank"
,
type
=
int
)
args
=
parser
.
parse_args
()
torch
.
cuda
.
set_device
(
args
.
local_rank
)
torch
.
distributed
.
init_process_group
(
backend
=
'nccl'
,
init_method
=
'env://'
)
torch
.
backends
.
cudnn
.
benchmark
=
True
N
,
D_in
,
D_out
=
64
,
1024
,
16
x
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_in
).
normal_
()).
half
()
y
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_out
).
normal_
()).
half
()
model
=
torch
.
nn
.
Linear
(
D_in
,
D_out
).
cuda
().
half
()
model
=
torch
.
nn
.
parallel
.
DistributedDataParallel
(
model
,
device_ids
=
[
args
.
local_rank
],
output_device
=
args
.
local_rank
)
optimizer
=
torch
.
optim
.
SGD
(
model
.
parameters
(),
lr
=
1e-3
)
### CONSTRUCT FP16_Optimizer ###
optimizer
=
FP16_Optimizer
(
optimizer
)
###
loss_fn
=
torch
.
nn
.
MSELoss
()
for
t
in
range
(
500
):
optimizer
.
zero_grad
()
y_pred
=
model
(
x
)
loss
=
loss_fn
(
y_pred
,
y
)
### CHANGE loss.backward() TO: ###
optimizer
.
backward
(
loss
)
###
optimizer
.
step
()
print
(
"final loss = "
,
loss
)
examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
0 → 100644
View file @
8e1f01c5
#!/bin/bash
python
-m
torch.distributed.launch
--nproc_per_node
=
2 distributed_data_parallel.py
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment