[doc] launcher (#868)

As discussed in https://github.com/microsoft/DeepSpeed/issues/662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: https://github.com/microsoft/DeepSpeed/issues/662 Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

[doc] launcher (#868)
As discussed in https://github.com/microsoft/DeepSpeed/issues/662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: https://github.com/microsoft/DeepSpeed/issues/662 Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
9e9f8cbe · Stas Bekman · GitHub · 10c0bea6 · 9e9f8cbe
Unverified Commit 9e9f8cbe authored Mar 18, 2021 by Stas Bekman Committed by GitHub Mar 18, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 2 deletions

docs/_tutorials/getting-started.md docs/_tutorials/getting-started.md +9 -2

No files found.
--- a/docs/_tutorials/getting-started.md
+++ b/docs/_tutorials/getting-started.md
@@ -186,8 +186,8 @@ slots available.
 The following command launches a PyTorch training job across all available nodes and GPUs
 specified in `myhostfile`:
 ```bash
-deepspeed <client_entry.py> <client args> \
-  --deepspeed --deepspeed_config ds_config.json --hostfile=myhostfile
+deepspeed --hostfile=myhostfile <client_entry.py> <client args> \
+  --deepspeed --deepspeed_config ds_config.json
 ```

 Alternatively, DeepSpeed allows you to restrict distributed training of your model to a
@@ -264,3 +264,10 @@ not detected or passed in then DeepSpeed will query the number of GPUs on the
 local machine to discover the number of slots available. The `--include` and
 `--exclude` arguments work as normal, but the user should specify 'localhost'
 as the hostname.
+
+Also note that `CUDA_VISIBLE_DEVICES` can't be used with DeepSpeed to control 
+which devices should be used. For example, to use only gpu1 of the current 
+node, do:
+```bash
+deepspeed --include localhost:1 ...
+```