new transformer pre-ln image (#268)

2e6d93e0 · Shaden Smith · GitHub · fedd0386 · 2e6d93e0 · fedd0386
Unverified Commit 2e6d93e0 authored Jun 17, 2020 by Shaden Smith Committed by GitHub Jun 17, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

docs/_tutorials/bert-finetuning.md docs/_tutorials/bert-finetuning.md +1 -1

docs/assets/images/transformer_preln_arch.png docs/assets/images/transformer_preln_arch.png +0 -0

No files found.
--- a/docs/_tutorials/bert-finetuning.md
+++ b/docs/_tutorials/bert-finetuning.md
@@ -301,7 +301,7 @@ Table 4. The setting of memory-optimization flags for a range of micro-batch siz

 ### FineTuning model pre-trained with DeepSpeed Transformer Kernels

-Fine-tuning the model pre-trained using DeepSpeed Transformer and the recipe in [DeepSpeed Fast-Bert Training](/fast_bert/) should yield F1 score of 90.5 and is expected to increase if you let the pre-training longer than suggested in the tutorial.
+Fine-tuning the model pre-trained using DeepSpeed Transformer and the recipe in [DeepSpeed Fast-Bert Training](https://www.deepspeed.ai/news/2020/05/27/fastest-bert-training.html) should yield F1 score of 90.5 and is expected to increase if you let the pre-training longer than suggested in the tutorial.

 To get these results, we do require some tuning of the dropout settings as described below:


--- a/docs/assets/images/transformer_preln_arch.png
+++ b/docs/assets/images/transformer_preln_arch.png