[parallelism doc] document Deepspeed-Inference and parallelformers (#12836)

* document Deepspeed-Inference and parallelformers * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

[parallelism doc] document Deepspeed-Inference and parallelformers (#12836)
* document Deepspeed-Inference and parallelformers * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
27a8c9e4 · Stas Bekman · GitHub · 807b6bd1 · 27a8c9e4
Unverified Commit 27a8c9e4 authored Jul 21, 2021 by Stas Bekman Committed by GitHub Jul 21, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

docs/source/parallelism.md docs/source/parallelism.md +5 -1

No files found.
--- a/docs/source/parallelism.md
+++ b/docs/source/parallelism.md
@@ -224,7 +224,11 @@ Implementations:
 - DeepSpeed calls it [tensor slicing](https://www.deepspeed.ai/features/#model-parallelism)
 - [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) has an internal implementation.

-🤗 Transformers status: not yet implemented
+🤗 Transformers status:
+- core: not yet implemented in the core
+- but if you want inference [parallelformers](https://github.com/tunib-ai/parallelformers) provides this support for most of our models. So until this is implemented in the core you can use theirs. And hopefully training mode will be supported too.
+- Deepspeed-Inference also supports our BERT, GPT-2, and GPT-Neo models in their super-fast CUDA-kernel-based inference mode, see more [here](https://www.deepspeed.ai/tutorials/inference-tutorial/)
+


 ## DP+PP