Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
llama3_pytorch
Commits
cc7fcbbb
Commit
cc7fcbbb
authored
Aug 28, 2024
by
chenych
Browse files
Modify deepspeed multi nodes
parent
ab643c4f
Changes
7
Hide whitespace changes
Inline
Side-by-side
Showing
7 changed files
with
7 additions
and
14 deletions
+7
-14
llama-factory/.deepspeed_env
llama-factory/.deepspeed_env
+0
-0
llama-factory/examples/full_multi_gpu/70B/hostfile
llama-factory/examples/full_multi_gpu/70B/hostfile
+0
-2
llama-factory/examples/full_multi_gpu/70B/multi_node_deepspeed.sh
...ctory/examples/full_multi_gpu/70B/multi_node_deepspeed.sh
+4
-3
llama-factory/examples/lora_multi_gpu/70B/.deepspeed_env
llama-factory/examples/lora_multi_gpu/70B/.deepspeed_env
+0
-6
llama-factory/examples/lora_multi_gpu/70B/hostfile
llama-factory/examples/lora_multi_gpu/70B/hostfile
+0
-2
llama-factory/examples/lora_multi_gpu/70B/multi_node_deepspeed.sh
...ctory/examples/lora_multi_gpu/70B/multi_node_deepspeed.sh
+1
-1
llama-factory/hostfile
llama-factory/hostfile
+2
-0
No files found.
llama-factory/
examples/full_multi_gpu/70B/
.deepspeed_env
→
llama-factory/.deepspeed_env
View file @
cc7fcbbb
File moved
llama-factory/examples/full_multi_gpu/70B/hostfile
deleted
100644 → 0
View file @
ab643c4f
10.5.32.245 slots=8
10.5.32.246 slots=8
\ No newline at end of file
llama-factory/examples/full_multi_gpu/70B/multi_node_deepspeed.sh
View file @
cc7fcbbb
...
@@ -2,8 +2,9 @@
...
@@ -2,8 +2,9 @@
export
HSA_FORCE_FINE_GRAIN_PCIE
=
1
export
HSA_FORCE_FINE_GRAIN_PCIE
=
1
MASTER_ADDR
=
''
MASTER_ADDR
=
''
# 多机多卡+deepspeed
# 多机多卡+deepspeed
deepspeed
--hostfile
=
.
/hostfile
\
deepspeed
--hostfile
=
/path/of
/hostfile
\
--num_nodes
2
\
--num_nodes
2
\
--master_addr
$MASTER_ADDR
\
--master_addr
$MASTER_ADDR
\
--master_port
12345
\
--master_port
12345
\
...
@@ -20,10 +21,10 @@ deepspeed --hostfile=./hostfile \
...
@@ -20,10 +21,10 @@ deepspeed --hostfile=./hostfile \
--overwrite_cache
\
--overwrite_cache
\
--overwrite_output_dir
\
--overwrite_output_dir
\
--cutoff_len
8192
\
--cutoff_len
8192
\
--preprocessing_num_workers
1
\
--preprocessing_num_workers
1
6
\
--per_device_train_batch_size
1
\
--per_device_train_batch_size
1
\
--per_device_eval_batch_size
1
\
--per_device_eval_batch_size
1
\
--gradient_accumulation_steps
1
\
--gradient_accumulation_steps
8
\
--lr_scheduler_type
cosine
\
--lr_scheduler_type
cosine
\
--logging_steps
10
\
--logging_steps
10
\
--warmup_steps
20
\
--warmup_steps
20
\
...
...
llama-factory/examples/lora_multi_gpu/70B/.deepspeed_env
deleted
100644 → 0
View file @
ab643c4f
NCCL_SOCKET_IFNAME=ens38f0
NCCL_IB_DISABLE=1
HSA_FORCE_FINE_GRAIN_PCIE=1
MIOPEN_COMPILE_PARALLEL_LEVEL=1
NCCL_PATH=/opt/dtk/rccl
NCCL_DEBUG=DEBUG
llama-factory/examples/lora_multi_gpu/70B/hostfile
deleted
100644 → 0
View file @
ab643c4f
10.5.32.245 slots=8
10.5.32.246 slots=8
\ No newline at end of file
llama-factory/examples/lora_multi_gpu/70B/multi_node_deepspeed.sh
View file @
cc7fcbbb
...
@@ -6,7 +6,7 @@ export HSA_FORCE_FINE_GRAIN_PCIE=1
...
@@ -6,7 +6,7 @@ export HSA_FORCE_FINE_GRAIN_PCIE=1
MASTER_ADDR
=
''
MASTER_ADDR
=
''
# LoRA + 多机多卡 + deepspeed
# LoRA + 多机多卡 + deepspeed
deepspeed
--hostfile
=
.
/hostfile
\
deepspeed
--hostfile
=
/path/of
/hostfile
\
--num_nodes
2
\
--num_nodes
2
\
--master_addr
$MASTER_ADDR
\
--master_addr
$MASTER_ADDR
\
--master_port
12345
\
--master_port
12345
\
...
...
llama-factory/hostfile
0 → 100644
View file @
cc7fcbbb
node1 slots=8
node2 slots=8
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment