"vscode:/vscode.git/clone" did not exist on "e0e8736fa0d685715c2d7e217eb98928ce81c7f7"
Unverified Commit 9e9f8cbe authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

[doc] launcher (#868)

As discussed in https://github.com/microsoft/DeepSpeed/issues/662 this PR modifies the doc:
* explains what to use instead of CUDA_VISIBLE_DEVICES
* puts the `--hostfile` cl arg in the correct place in the invocation script

Fixes: https://github.com/microsoft/DeepSpeed/issues/662

Co-authored-by: default avatarJeff Rasley <jerasley@microsoft.com>
parent 10c0bea6
...@@ -186,8 +186,8 @@ slots available. ...@@ -186,8 +186,8 @@ slots available.
The following command launches a PyTorch training job across all available nodes and GPUs The following command launches a PyTorch training job across all available nodes and GPUs
specified in `myhostfile`: specified in `myhostfile`:
```bash ```bash
deepspeed <client_entry.py> <client args> \ deepspeed --hostfile=myhostfile <client_entry.py> <client args> \
--deepspeed --deepspeed_config ds_config.json --hostfile=myhostfile --deepspeed --deepspeed_config ds_config.json
``` ```
Alternatively, DeepSpeed allows you to restrict distributed training of your model to a Alternatively, DeepSpeed allows you to restrict distributed training of your model to a
...@@ -264,3 +264,10 @@ not detected or passed in then DeepSpeed will query the number of GPUs on the ...@@ -264,3 +264,10 @@ not detected or passed in then DeepSpeed will query the number of GPUs on the
local machine to discover the number of slots available. The `--include` and local machine to discover the number of slots available. The `--include` and
`--exclude` arguments work as normal, but the user should specify 'localhost' `--exclude` arguments work as normal, but the user should specify 'localhost'
as the hostname. as the hostname.
Also note that `CUDA_VISIBLE_DEVICES` can't be used with DeepSpeed to control
which devices should be used. For example, to use only gpu1 of the current
node, do:
```bash
deepspeed --include localhost:1 ...
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment