Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dcnv3
Commits
0404891e
Commit
0404891e
authored
Jan 17, 2025
by
zhe chen
Browse files
Fix bug in newer slurm system
parent
d36b7c67
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
6 additions
and
2 deletions
+6
-2
classification/main.py
classification/main.py
+3
-1
classification/main_deepspeed.py
classification/main_deepspeed.py
+3
-1
No files found.
classification/main.py
View file @
0404891e
...
...
@@ -581,7 +581,9 @@ if __name__ == '__main__':
assert
has_native_amp
,
'Please update pytorch(1.6+) to support amp!'
# init distributed env
if
'SLURM_PROCID'
in
os
.
environ
and
int
(
os
.
environ
[
'SLURM_TASKS_PER_NODE'
])
!=
1
:
# In the newer versions of Slurm, the format of `SLURM_TASKS_PER_NODE` has changed from a single
# numeric string to a format like `8(xn)`, which represents n nodes is used in the training.
if
'SLURM_PROCID'
in
os
.
environ
and
int
(
os
.
environ
[
'SLURM_TASKS_PER_NODE'
][
0
])
!=
1
:
print
(
'
\n
Dist init: SLURM'
)
rank
=
int
(
os
.
environ
[
'SLURM_PROCID'
])
gpu
=
rank
%
torch
.
cuda
.
device_count
()
...
...
classification/main_deepspeed.py
View file @
0404891e
...
...
@@ -497,7 +497,9 @@ if __name__ == '__main__':
args
,
config
=
parse_option
()
# init distributed env
if
'SLURM_PROCID'
in
os
.
environ
and
int
(
os
.
environ
[
'SLURM_TASKS_PER_NODE'
])
!=
1
:
# In the newer versions of Slurm, the format of `SLURM_TASKS_PER_NODE` has changed from a single
# numeric string to a format like `8(xn)`, which represents n nodes is used in the training.
if
'SLURM_PROCID'
in
os
.
environ
and
int
(
os
.
environ
[
'SLURM_TASKS_PER_NODE'
][
0
])
!=
1
:
print
(
'
\n
Dist init: SLURM'
)
rank
=
int
(
os
.
environ
[
'SLURM_PROCID'
])
gpu
=
rank
%
torch
.
cuda
.
device_count
()
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment