Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
cf0af9a3
Unverified
Commit
cf0af9a3
authored
Mar 20, 2023
by
heya5
Committed by
GitHub
Mar 20, 2023
Browse files
[Trainer] Add optional communication backends for torch.distributed when using GPU (#22247)
Update training_args.py
parent
c4bf6f38
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
1 deletion
+4
-1
src/transformers/training_args.py
src/transformers/training_args.py
+4
-1
No files found.
src/transformers/training_args.py
View file @
cf0af9a3
...
...
@@ -1641,7 +1641,10 @@ class TrainingArguments:
# Here, we'll use torch.distributed.
# Initializes the distributed backend which will take care of synchronizing nodes/GPUs
if
not
torch
.
distributed
.
is_initialized
():
torch
.
distributed
.
init_process_group
(
backend
=
"nccl"
,
timeout
=
self
.
ddp_timeout_delta
)
if
self
.
xpu_backend
and
self
.
xpu_backend
in
(
"mpi"
,
"gloo"
):
torch
.
distributed
.
init_process_group
(
backend
=
self
.
xpu_backend
,
timeout
=
self
.
ddp_timeout_delta
)
else
:
torch
.
distributed
.
init_process_group
(
backend
=
"nccl"
,
timeout
=
self
.
ddp_timeout_delta
)
device
=
torch
.
device
(
"cuda"
,
self
.
local_rank
)
self
.
_n_gpu
=
1
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment