Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ColossalAI
Commits
8f7ce94b
Unverified
Commit
8f7ce94b
authored
Apr 14, 2022
by
ver217
Committed by
GitHub
Apr 14, 2022
Browse files
[hotfix] fix auto tensor placement policy (#753)
parent
84c6700b
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
3 deletions
+3
-3
colossalai/zero/sharded_model/sharded_model_v2.py
colossalai/zero/sharded_model/sharded_model_v2.py
+1
-2
colossalai/zero/utils/tensor_placement_policy.py
colossalai/zero/utils/tensor_placement_policy.py
+2
-1
No files found.
colossalai/zero/sharded_model/sharded_model_v2.py
View file @
8f7ce94b
...
@@ -53,10 +53,9 @@ class ShardedModelV2(nn.Module):
...
@@ -53,10 +53,9 @@ class ShardedModelV2(nn.Module):
If it's 'cpu', parameters, gradients and optimizer states will be offloaded to CPU, which means min CUDA memory will be used.
If it's 'cpu', parameters, gradients and optimizer states will be offloaded to CPU, which means min CUDA memory will be used.
If it's 'cuda', they won't be offloaded, which means max CUDA memory will be used.
If it's 'cuda', they won't be offloaded, which means max CUDA memory will be used.
If it's 'auto', they are moving dynamically based on CPU and CUDA memory usage. It will utilize heterogeneous memory space evenly and well.
If it's 'auto', they are moving dynamically based on CPU and CUDA memory usage. It will utilize heterogeneous memory space evenly and well.
Note that 'auto' policy can only work well when no other processes use CUDA during your training.
Defaults to 'cuda'.
Defaults to 'cuda'.
offload_config (Optional[dict], optional): We currently only support CPU offload. Set to `{"device": "cpu"}` to enable CPU offload. Defaults to None.
gradient_predivide_factor (Optional[float], optional): Gradient is divived by this value before reduce-scatter. Defaults to 1.0.
gradient_predivide_factor (Optional[float], optional): Gradient is divived by this value before reduce-scatter. Defaults to 1.0.
use_memory_tracer (bool, optional): Whether to use memoty tracer. Defaults to False.
reuse_fp16_shard (bool, optional): Whether to reuse fp16 shard for param and grad.
reuse_fp16_shard (bool, optional): Whether to reuse fp16 shard for param and grad.
Enabling this can reduce GPU memory usage, but you have to make sure you disable it when using gradient accumulation.
Enabling this can reduce GPU memory usage, but you have to make sure you disable it when using gradient accumulation.
In this mode, grad will be fp16. Make sure your optimizer supports mixed precision (fp32 param and fp16 grad).
In this mode, grad will be fp16. Make sure your optimizer supports mixed precision (fp32 param and fp16 grad).
...
...
colossalai/zero/utils/tensor_placement_policy.py
View file @
8f7ce94b
...
@@ -45,7 +45,8 @@ class AutoTensorPlacementPolicy(TensorPlacementPolicy):
...
@@ -45,7 +45,8 @@ class AutoTensorPlacementPolicy(TensorPlacementPolicy):
def
__init__
(
self
,
mem_stats_collector
:
Optional
[
MemStatsCollector
]
=
None
)
->
None
:
def
__init__
(
self
,
mem_stats_collector
:
Optional
[
MemStatsCollector
]
=
None
)
->
None
:
super
().
__init__
(
None
,
mem_stats_collector
=
mem_stats_collector
)
super
().
__init__
(
None
,
mem_stats_collector
=
mem_stats_collector
)
self
.
_warmup_non_model_data_ratio
:
float
=
0.2
# model data will use 1-self._warmup_non_model_data_ratio CUDA memory in warmup phase
self
.
_warmup_non_model_data_ratio
:
float
=
0.8
def
evict_tensors
(
self
,
def
evict_tensors
(
self
,
hold_cuda_tensor_list
:
List
[
StatefulTensor
],
hold_cuda_tensor_list
:
List
[
StatefulTensor
],
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment