Unverified Commit e706e5fa authored by Alp Dener's avatar Alp Dener Committed by GitHub
Browse files

[C/PyTorch] Removed MPI dependence in Userbuffers (#901)



* added DL framework callbacks for bootstrapping userbuffers without MPI
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* removed userbuffers availability check in TE modules since userbuffers is now always compiled
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* added comm+GEMM overlap example with LayerNormMLP
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* lintin and review fixes
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* linting and review fixes
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* added header guards
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* removed defunct userbuffers checks in build_utils and setup.py
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* added exposed API in modules/base.py to __all__
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

* removed transformer_engine/CMakeLists.txt and shifted all TE/common compile into transformer_engine/common/CmakeLists.txt
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>

---------
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>
Co-authored-by: default avatarTim Moon <4406448+timmoon10@users.noreply.github.com>
parent 7d576ed2
...@@ -729,9 +729,6 @@ class Linear(TransformerEngineBaseModule): ...@@ -729,9 +729,6 @@ class Linear(TransformerEngineBaseModule):
self.ub_overlap_ag = ub_overlap_ag self.ub_overlap_ag = ub_overlap_ag
if ub_overlap_rs or ub_overlap_ag: if ub_overlap_rs or ub_overlap_ag:
assert ub_name is not None, "Userbuffer name [string] is not set." assert ub_name is not None, "Userbuffer name [string] is not set."
assert (
tex.userbuf_comm_available()
), "Userbuffer communication backend not available."
self.ub_name = ub_name self.ub_name = ub_name
self.get_rng_state_tracker = get_rng_state_tracker self.get_rng_state_tracker = get_rng_state_tracker
self.rng_tracker_name = rng_tracker_name self.rng_tracker_name = rng_tracker_name
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment