[doc] add potential solution for OOM in llama2 example (#4699)

068372a7 · Baizhou Zhang · GitHub · 9c2feb2f · 068372a7
Unverified Commit 068372a7 authored Sep 13, 2023 by Baizhou Zhang Committed by GitHub Sep 13, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 0 deletions

examples/language/llama2/README.md examples/language/llama2/README.md +3 -0

No files found.
--- a/examples/language/llama2/README.md
+++ b/examples/language/llama2/README.md
@@ -149,6 +149,9 @@ Finally, run the following command to start training:
 ```bash
 bash gemini.sh
 ```
+If you encounter out-of-memory(OOM) error during training with script `gemini.sh`, changing to script `gemini_auto.sh` might be a solution, since gemini_auto will set a upper limit on GPU memory usage through offloading part of the model parameters and optimizer states back to CPU memory. But there's a trade-off: `gemini_auto.sh` will be a bit slower, since more data are transmitted between CPU and GPU.
 #### c. Results
 If you run the above command successfully, you will get the following results:
 `max memory usage:  55491.10 MB, throughput:  24.26 samples/s, TFLOPS/GPU:  167.43`.