[seq2seq] document the caveat of leaky native amp (#8930)

* document the caveat of leaky native amp * Update examples/seq2seq/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

[seq2seq] document the caveat of leaky native amp (#8930)
* document the caveat of leaky native amp * Update examples/seq2seq/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
df311a5c · Stas Bekman · GitHub · 73c51f7f · df311a5c
Unverified Commit df311a5c authored Dec 04, 2020 by Stas Bekman Committed by GitHub Dec 04, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

examples/seq2seq/README.md examples/seq2seq/README.md +5 -1

No files found.
--- a/examples/seq2seq/README.md
+++ b/examples/seq2seq/README.md
@@ -79,6 +79,11 @@ test.target
 ```
 The `.source` files are the input, the `.target` files are the desired output.

+### Potential issues
+
+- native AMP (`--fp16` and no apex) may lead to a huge memory leak and require 10x gpu memory. This has been fixed in pytorch-nightly and the minimal official version to have this fix will be pytorch-1.8. Until then if you have to use mixed precision please use AMP only with pytorch-nightly or NVIDIA's apex. Reference: https://github.com/huggingface/transformers/issues/8403
+
+
 ### Tips and Tricks

 General Tips:
@@ -592,4 +597,3 @@ The feature is still experimental, because:
 + we can make it much more robust if we have memory mapped/preprocessed datasets.
 + The speedup over sortish sampler is not that large at the moment.

-