"src/vscode:/vscode.git/clone" did not exist on "7aa6af1138b206bec10ab3af23a365c0f573b67d"
Unverified Commit 9e9f8cbe authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

[doc] launcher (#868)

As discussed in https://github.com/microsoft/DeepSpeed/issues/662 this PR modifies the doc:
* explains what to use instead of CUDA_VISIBLE_DEVICES
* puts the `--hostfile` cl arg in the correct place in the invocation script

Fixes: https://github.com/microsoft/DeepSpeed/issues/662

Co-authored-by: default avatarJeff Rasley <jerasley@microsoft.com>
parent 10c0bea6
......@@ -186,8 +186,8 @@ slots available.
The following command launches a PyTorch training job across all available nodes and GPUs
specified in `myhostfile`:
```bash
deepspeed <client_entry.py> <client args> \
--deepspeed --deepspeed_config ds_config.json --hostfile=myhostfile
deepspeed --hostfile=myhostfile <client_entry.py> <client args> \
--deepspeed --deepspeed_config ds_config.json
```
Alternatively, DeepSpeed allows you to restrict distributed training of your model to a
......@@ -264,3 +264,10 @@ not detected or passed in then DeepSpeed will query the number of GPUs on the
local machine to discover the number of slots available. The `--include` and
`--exclude` arguments work as normal, but the user should specify 'localhost'
as the hostname.
Also note that `CUDA_VISIBLE_DEVICES` can't be used with DeepSpeed to control
which devices should be used. For example, to use only gpu1 of the current
node, do:
```bash
deepspeed --include localhost:1 ...
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment