Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
deepspeed
Commits
11a426ac
Unverified
Commit
11a426ac
authored
Feb 07, 2020
by
Shaden Smith
Committed by
GitHub
Feb 07, 2020
Browse files
Pointing docs to hosted HTML files for core API. (#41)
parent
246a2844
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
21 additions
and
19 deletions
+21
-19
README.md
README.md
+8
-7
docs/features.md
docs/features.md
+13
-12
No files found.
README.md
View file @
11a426ac
...
...
@@ -215,7 +215,7 @@ pre-defined learning rate schedule:
*
**Gradient Averaging**
: in distributed data parallel training,
`backward`
ensures that gradients are averaged across data parallel processes after
training on an
`
effective
_batch_size`
.
training on an
`
train
_batch_size`
.
*
**Loss Scaling**
: in FP16/mixed precision training, the DeepSpeed
engine automatically handles scaling the loss to avoid precision loss in the
...
...
@@ -274,7 +274,7 @@ the `step` value is stored as part of the `client_sd`.
DeepSpeed featureds can be enabled, disabled, or configured using a config JSON
file that should be specified as
`args.deepspeed_config`
. A sample config file
is shown below. For a full set of features see
[
core API
doc
](
../../API/core_api/core_api.md
)
.
doc
](
https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html
)
.
```
json
{
...
...
@@ -363,11 +363,12 @@ deepspeed --include="worker-2:0,1" \
## Further Reading
| Article | Description |
| ---------------------------------------------------------------- | -------------------------------------------- |
|
[
DeepSpeed Features
](
./docs/features.md
)
| DeepSpeed features |
|
[
CIFAR-10 Tutorial
](
./docs/tutorials/CIFAR-10.md
)
| Getting started with CIFAR-10 and DeepSpeed |
|
[
Megatron-LM Tutorial
](
./docs/tutorials/MegatronGPT2Tutorial.md
)
| Train GPT2 with DeepSpeed and Megatron-LM |
| Article | Description |
| ---------------------------------------------------------------------------------------------- | -------------------------------------------- |
|
[
DeepSpeed Features
](
./docs/features.md
)
| DeepSpeed features |
|
[
CIFAR-10 Tutorial
](
./docs/tutorials/CIFAR-10.md
)
| Getting started with CIFAR-10 and DeepSpeed |
|
[
Megatron-LM Tutorial
](
./docs/tutorials/MegatronGPT2Tutorial.md
)
| Train GPT2 with DeepSpeed and Megatron-LM |
|
[
API Documentation
](
https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html
)
| Generated DeepSpeed API documentation |
...
...
docs/features.md
View file @
11a426ac
...
...
@@ -124,19 +124,19 @@ The DeepSpeed core API consists of just a handful of methods:
*
checkpointing :
`load_checkpoint`
and
`store_checkpoint`
DeepSpeed supports all the features described in this document, via the use of these API,
along with a
`deepspeed_config`
JSON file for enabling and disabling the features.
Please
se
e
[
core API doc
](
../../API/core_api/core_api.md
)
for more details.
along with a
`deepspeed_config`
JSON file for enabling and disabling the features.
Please see th
e
[
core API doc
](
https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html
)
for more details.
### Gradient Clipping
DeepSpeed handles gradient clipping under the hood based on the max gradient norm
specified by the user.
See
[
core API doc
](
../../API/core_api/core_api.md
)
for more
details.
specified by the user.
Please see the
[
core API doc
](
https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html
)
for more
details.
### Automatic loss scaling with mixed precision
DeepSpeed internally handles loss scaling for mixed precision training. The parameters
for loss scaling can be specified in the
`deepspeed_config`
JSON file.
See
[
core API
doc
](
../../API/core_api/core_api.md
)
for more details.
for loss scaling can be specified in the
`deepspeed_config`
JSON file.
Please see the
[
core API doc
](
https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html
)
for more details.
## Training Optimizers
...
...
@@ -169,12 +169,12 @@ more details see [ZeRO paper](https://arxiv.org/abs/1910.02054) .
## Training Agnostic Checkpointing
**TODO: API documentation**
DeepSpeed can simplify checkpointing for you regardless of whether you are using data
parallel training, model parallel training, mixed-precision training, a mix of these
three, or using the zero optimizer to enable larger model sizes. See the
[
getting
started
](
../../Onboard/onboard/onboard.md
)
or
[
core API
doc
](
../../API/core_api/core_api.md
)
for details.
three, or using the zero optimizer to enable larger model sizes.
Please see the
[
Getting Started
](
../README.md#getting-started
)
guide
and the
[
core API doc
](
https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html
)
for more details.
## Advanced parameter search
DeepSpeed supports multiple Learning Rate Schedules to enable faster convergence for
...
...
@@ -195,9 +195,10 @@ can automatically handle batch creation appropriately.
## Performance Analysis and Debugging
For performance debugging, DeepSpeed can give you a detailed breakdown of the time spent
in different parts of the training with by simply enabling it in the
`deepspeed_config`
file. See
[
core API doc
](
../../API/core_api/core_api.md
)
.
file.
Please see the
[
core API doc
](
https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html
)
for more details.
```
json
{
"wallclock_breakd
w
on"
:
true
"wallclock_breakdo
w
n"
:
true
}
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment