Unverified Commit 2e6d93e0 authored by Shaden Smith's avatar Shaden Smith Committed by GitHub
Browse files

new transformer pre-ln image (#268)

parent fedd0386
......@@ -301,7 +301,7 @@ Table 4. The setting of memory-optimization flags for a range of micro-batch siz
### FineTuning model pre-trained with DeepSpeed Transformer Kernels
Fine-tuning the model pre-trained using DeepSpeed Transformer and the recipe in [DeepSpeed Fast-Bert Training](/fast_bert/) should yield F1 score of 90.5 and is expected to increase if you let the pre-training longer than suggested in the tutorial.
Fine-tuning the model pre-trained using DeepSpeed Transformer and the recipe in [DeepSpeed Fast-Bert Training](https://www.deepspeed.ai/news/2020/05/27/fastest-bert-training.html) should yield F1 score of 90.5 and is expected to increase if you let the pre-training longer than suggested in the tutorial.
To get these results, we do require some tuning of the dropout settings as described below:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment