"ml/backend/git@developer.sourcefind.cn:OpenDAS/ollama.git" did not exist on "4fb47ed36862cead8d9455df8e34ae398d83e29f"
Unverified Commit 31be2f45 authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

[deepspeed docs] Megatron-Deepspeed info (#15488)

parent bbe9c698
...@@ -308,9 +308,14 @@ ZeRO stage 3 is not a good choice either for the same reason - more inter-node c ...@@ -308,9 +308,14 @@ ZeRO stage 3 is not a good choice either for the same reason - more inter-node c
And since we have ZeRO, the other benefit is ZeRO-Offload. Since this is stage 1 optimizer states can be offloaded to CPU. And since we have ZeRO, the other benefit is ZeRO-Offload. Since this is stage 1 optimizer states can be offloaded to CPU.
Implementations: Implementations:
- [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) - [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) and [Megatron-Deepspeed from BigScience](https://github.com/bigscience-workshop/Megatron-DeepSpeed), which is the fork of the former repo.
- [OSLO](https://github.com/tunib-ai/oslo) - [OSLO](https://github.com/tunib-ai/oslo)
Important papers:
- [Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model](
https://arxiv.org/abs/2201.11990)
🤗 Transformers status: not yet implemented, since we have no PP and TP. 🤗 Transformers status: not yet implemented, since we have no PP and TP.
## FlexFlow ## FlexFlow
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment