"...git@developer.sourcefind.cn:kecinstone/2024-pra-vllm.git" did not exist on "ac5cf86aa6aebbf9e42df51f7e377fbee85bc703"
[PyTorch] Fix for deferred init bug causing NeMo MLPerf LLM crash (#619)
* added missing parameter materialization on real device for LayerNorm and RMSNorm Signed-off-by:Alp Dener <adener@nvidia.com> * added new unittest for deferred initialization and modified parameter materialization to support standalone execution outside of FSDP Signed-off-by:
Alp Dener <adener@nvidia.com> * restored tensor parallel attributes that were being wiped out by the parameter reset Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed incorrect order of fp8 metadata initialization Signed-off-by:
Alp Dener <adener@nvidia.com> * added deferred init unittest to the QA script Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com>
Showing
Please register or sign in to comment