Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
fairscale
Commits
fb7b6a93
Unverified
Commit
fb7b6a93
authored
Oct 19, 2021
by
Rohan Varma
Committed by
GitHub
Oct 19, 2021
Browse files
[FairScale] Remove refs to "cpu_offload" in code comments (#814)
* fix * remove dup file
parent
8acbec71
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
7 deletions
+8
-7
fairscale/nn/data_parallel/fully_sharded_data_parallel.py
fairscale/nn/data_parallel/fully_sharded_data_parallel.py
+8
-7
No files found.
fairscale/nn/data_parallel/fully_sharded_data_parallel.py
View file @
fb7b6a93
...
@@ -191,7 +191,7 @@ class FullyShardedDataParallel(nn.Module):
...
@@ -191,7 +191,7 @@ class FullyShardedDataParallel(nn.Module):
move_grads_to_cpu (bool, Optional):
move_grads_to_cpu (bool, Optional):
move gradient shard to CPU after reduction. This is useful when
move gradient shard to CPU after reduction. This is useful when
combined with CPU-based optimizers. It defaults to the value of
combined with CPU-based optimizers. It defaults to the value of
*``
cpu_offload
``*.
*``
move_params_to_cpu
``*.
bucket_cap_mb (int, Optional):
bucket_cap_mb (int, Optional):
FSDP will bucket parameters so that gradient reduction can
FSDP will bucket parameters so that gradient reduction can
be more efficient for small parameters.
be more efficient for small parameters.
...
@@ -251,7 +251,8 @@ class FullyShardedDataParallel(nn.Module):
...
@@ -251,7 +251,8 @@ class FullyShardedDataParallel(nn.Module):
cpu_offload (bool, Optional):
cpu_offload (bool, Optional):
if ``True``, offload FP32 params to CPU. This is only relevant when
if ``True``, offload FP32 params to CPU. This is only relevant when
*``mixed_precision``* is ``True``. Note: This arg will be deprecated in favor of
*``mixed_precision``* is ``True``. Note: This arg will be deprecated in favor of
*``move_params_to_cpu``* in an upcoming release.
*``move_params_to_cpu``* in an upcoming release. Please prefer
specifying ``move_params_to_cpu`` instead.
"""
"""
def
__init__
(
def
__init__
(
...
@@ -306,7 +307,7 @@ class FullyShardedDataParallel(nn.Module):
...
@@ -306,7 +307,7 @@ class FullyShardedDataParallel(nn.Module):
if
self
.
fp32_reduce_scatter
and
not
self
.
mixed_precision
:
if
self
.
fp32_reduce_scatter
and
not
self
.
mixed_precision
:
raise
ValueError
(
"fp32_reduce_scatter requires mixed_precision=True"
)
raise
ValueError
(
"fp32_reduce_scatter requires mixed_precision=True"
)
if
self
.
move_params_to_cpu
and
not
self
.
mixed_precision
:
if
self
.
move_params_to_cpu
and
not
self
.
mixed_precision
:
raise
ValueError
(
"
cpu_offload
requires mixed_precision=True"
)
raise
ValueError
(
"
move_params_to_cpu
requires mixed_precision=True"
)
# skip validation if the process group was created above
# skip validation if the process group was created above
if
process_group
:
if
process_group
:
...
@@ -634,7 +635,7 @@ class FullyShardedDataParallel(nn.Module):
...
@@ -634,7 +635,7 @@ class FullyShardedDataParallel(nn.Module):
f
"buffer_dtype=
{
self
.
buffer_dtype
}
, "
f
"buffer_dtype=
{
self
.
buffer_dtype
}
, "
f
"fp32_reduce_scatter=
{
self
.
fp32_reduce_scatter
}
, "
f
"fp32_reduce_scatter=
{
self
.
fp32_reduce_scatter
}
, "
f
"compute_device=
{
self
.
compute_device
}
"
f
"compute_device=
{
self
.
compute_device
}
"
f
"
cpu_offload
=
{
self
.
move_params_to_cpu
}
, "
f
"
move_params_to_cpu
=
{
self
.
move_params_to_cpu
}
, "
f
"move_grads_to_cpu=
{
self
.
move_grads_to_cpu
}
, "
f
"move_grads_to_cpu=
{
self
.
move_grads_to_cpu
}
, "
f
"bucket_cap_mb=
{
self
.
bucket_cap_mb
}
, "
f
"bucket_cap_mb=
{
self
.
bucket_cap_mb
}
, "
f
"clear_autocast_cache=
{
self
.
clear_autocast_cache
}
"
f
"clear_autocast_cache=
{
self
.
clear_autocast_cache
}
"
...
@@ -987,7 +988,7 @@ class FullyShardedDataParallel(nn.Module):
...
@@ -987,7 +988,7 @@ class FullyShardedDataParallel(nn.Module):
``_fp32_shard``: a single shard of the parameters in full precision
``_fp32_shard``: a single shard of the parameters in full precision
(typically FP32, but this is dependent on the dtype of the model
(typically FP32, but this is dependent on the dtype of the model
as it's passed in by the user). This can be on CPU or GPU
as it's passed in by the user). This can be on CPU or GPU
depending on the value of *``
cpu_offload
``*.
depending on the value of *``
move_params_to_cpu
``*.
``_fp16_shard``: if *``mixed_precision``* is ``True``, this will be
``_fp16_shard``: if *``mixed_precision``* is ``True``, this will be
a single shard of the parameters in FP16, used for all-gather.
a single shard of the parameters in FP16, used for all-gather.
``_full_param_padded``: the full weight (padded to be evenly
``_full_param_padded``: the full weight (padded to be evenly
...
@@ -1834,8 +1835,8 @@ class FullyShardedDataParallel(nn.Module):
...
@@ -1834,8 +1835,8 @@ class FullyShardedDataParallel(nn.Module):
assert
p
.
_fp16_shard
is
not
None
assert
p
.
_fp16_shard
is
not
None
alloc_storage_
(
p
.
_fp16_shard
,
size
=
p
.
_fp32_shard
.
size
())
alloc_storage_
(
p
.
_fp16_shard
,
size
=
p
.
_fp32_shard
.
size
())
p
.
_fp16_shard
.
copy_
(
p
.
_fp16_shard
.
copy_
(
# If
cpu_offload
is True, this will be non-blocking
because
# If
move_params_to_cpu
is True, this will be non-blocking
# _fp32_shard is pinned, otherwise it's a no-op.
#
because
_fp32_shard is pinned, otherwise it's a no-op.
p
.
_fp32_shard
.
to
(
p
.
_fp16_shard
.
device
,
non_blocking
=
True
)
p
.
_fp32_shard
.
to
(
p
.
_fp16_shard
.
device
,
non_blocking
=
True
)
)
)
p
.
data
=
p
.
_fp16_shard
p
.
data
=
p
.
_fp16_shard
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment