Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
apex
Commits
8e1f01c5
"vscode:/vscode.git/clone" did not exist on "7f0773aede92a8be5bf0645185de4f5707b3a2a8"
Commit
8e1f01c5
authored
May 16, 2018
by
Michael Carilli
Browse files
Adding minimal examples with Apex and Pytorch distributed data parallel
parent
83acda92
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
124 additions
and
0 deletions
+124
-0
examples/FP16_Optimizer_simple/README.md
examples/FP16_Optimizer_simple/README.md
+24
-0
examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
...izer_simple/distributed_apex/distributed_data_parallel.py
+50
-0
examples/FP16_Optimizer_simple/distributed_apex/run.sh
examples/FP16_Optimizer_simple/distributed_apex/run.sh
+5
-0
examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
...r_simple/distributed_pytorch/distributed_data_parallel.py
+43
-0
examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
+2
-0
No files found.
examples/FP16_Optimizer_simple/README.md
0 → 100644
View file @
8e1f01c5
# Simple examples of FP16_Optimizer functionality
`minimal.py`
shows the basic usage of
`FP16_Optimizer`
.
`closure.py`
shows how to use
`FP16_Optimizer`
with a closure.
`save_load.py`
shows that
`FP16_Optimizer`
uses the same checkpointing syntax as ordinary Pytorch
optimizers.
`distributed_pytorch`
shows an example using
`FP16_Optimizer`
with Pytorch DistributedDataParallel.
The usage of
`FP16_Optimizer`
with distributed does not need to change from ordinary single-process
usage. Run via
```
bash
cd
distributed_pytorch
bash run.sh
```
`distributed_pytorch`
shows an example using
`FP16_Optimizer`
with Apex DistributedDataParallel.
Again, the usage of
`FP16_Optimizer`
with distributed does not need to change from ordinary
single-process usage. Run via
```
bash
cd
distributed_apex
bash run.sh
```
examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
0 → 100644
View file @
8e1f01c5
import
torch
from
torch.autograd
import
Variable
import
argparse
from
apex.parallel
import
DistributedDataParallel
as
DDP
from
apex.fp16_utils
import
FP16_Optimizer
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
'--dist-url'
,
default
=
'tcp://224.66.41.62:23456'
,
type
=
str
,
help
=
'url used to set up distributed training'
)
parser
.
add_argument
(
'--world-size'
,
default
=
2
,
type
=
int
,
help
=
'Number of distributed processes.'
)
parser
.
add_argument
(
"--rank"
,
type
=
int
,
help
=
'Rank of this process'
)
args
=
parser
.
parse_args
()
torch
.
cuda
.
set_device
(
args
.
rank
)
torch
.
distributed
.
init_process_group
(
backend
=
'nccl'
,
init_method
=
args
.
dist_url
,
world_size
=
args
.
world_size
,
rank
=
args
.
rank
)
torch
.
backends
.
cudnn
.
benchmark
=
True
N
,
D_in
,
D_out
=
64
,
1024
,
16
x
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_in
).
normal_
()).
half
()
y
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_out
).
normal_
()).
half
()
model
=
torch
.
nn
.
Linear
(
D_in
,
D_out
).
cuda
().
half
()
model
=
DDP
(
model
)
optimizer
=
torch
.
optim
.
SGD
(
model
.
parameters
(),
lr
=
1e-3
)
### CONSTRUCT FP16_Optimizer ###
optimizer
=
FP16_Optimizer
(
optimizer
)
###
loss_fn
=
torch
.
nn
.
MSELoss
()
for
t
in
range
(
500
):
optimizer
.
zero_grad
()
y_pred
=
model
(
x
)
loss
=
loss_fn
(
y_pred
,
y
)
### CHANGE loss.backward() TO: ###
optimizer
.
backward
(
loss
)
###
optimizer
.
step
()
print
(
"final loss = "
,
loss
)
examples/FP16_Optimizer_simple/distributed_apex/run.sh
0 → 100644
View file @
8e1f01c5
#!/bin/bash
# By default, apex.parallel.multiproc will attempt to use all available GPUs on the system.
# The number of GPUs to use can be limited by setting CUDA_VISIBLE_DEVICES:
export
CUDA_VISIBLE_DEVICES
=
0,1
python
-m
apex.parallel.multiproc distributed_data_parallel.py
examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
0 → 100644
View file @
8e1f01c5
import
torch
from
torch.autograd
import
Variable
import
argparse
from
apex.fp16_utils
import
FP16_Optimizer
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--local_rank"
,
type
=
int
)
args
=
parser
.
parse_args
()
torch
.
cuda
.
set_device
(
args
.
local_rank
)
torch
.
distributed
.
init_process_group
(
backend
=
'nccl'
,
init_method
=
'env://'
)
torch
.
backends
.
cudnn
.
benchmark
=
True
N
,
D_in
,
D_out
=
64
,
1024
,
16
x
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_in
).
normal_
()).
half
()
y
=
Variable
(
torch
.
cuda
.
FloatTensor
(
N
,
D_out
).
normal_
()).
half
()
model
=
torch
.
nn
.
Linear
(
D_in
,
D_out
).
cuda
().
half
()
model
=
torch
.
nn
.
parallel
.
DistributedDataParallel
(
model
,
device_ids
=
[
args
.
local_rank
],
output_device
=
args
.
local_rank
)
optimizer
=
torch
.
optim
.
SGD
(
model
.
parameters
(),
lr
=
1e-3
)
### CONSTRUCT FP16_Optimizer ###
optimizer
=
FP16_Optimizer
(
optimizer
)
###
loss_fn
=
torch
.
nn
.
MSELoss
()
for
t
in
range
(
500
):
optimizer
.
zero_grad
()
y_pred
=
model
(
x
)
loss
=
loss_fn
(
y_pred
,
y
)
### CHANGE loss.backward() TO: ###
optimizer
.
backward
(
loss
)
###
optimizer
.
step
()
print
(
"final loss = "
,
loss
)
examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
0 → 100644
View file @
8e1f01c5
#!/bin/bash
python
-m
torch.distributed.launch
--nproc_per_node
=
2 distributed_data_parallel.py
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment