"src/git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "3b37fefee99425286984a9d5fa4f1850064d01eb"
Merge branch 'megatron-lm_dtk24.04' into 'main'
Megatron lm dtk24.04 See merge request !1
| Status | Job ID | Name | Coverage | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Test | |||||||||
| failed |
#4982
ssh_selene_runner
|
resume.checkpoint.bert.345m_tp1_pp2_1node |
|
|
|||||
| failed |
#4977
ssh_selene_runner
|
resume.checkpoint.gpt3.345m_tp1_pp2_1node |
|
|
|||||
| failed |
#4980
ssh_selene_runner
|
train.bert.345m_tp1_pp2_1node_50steps |
|
|
|||||
| failed |
#4981
ssh_selene_runner
|
train.bert.345m_tp1_pp4_1node_50steps |
|
|
|||||
| failed |
#4979
ssh_selene_runner
|
train.bert.345m_tp2_pp2_1node_50steps |
|
|
|||||
| failed |
#4978
ssh_selene_runner
|
train.bert.345m_tp4_pp1_1node_50steps |
|
|
|||||
| failed |
#4975
ssh_selene_runner
|
train.gpt3.345m_tp1_pp2_1node_50steps |
|
|
|||||
| failed |
#4976
ssh_selene_runner
|
train.gpt3.345m_tp1_pp4_1node_50steps |
|
|
|||||
| failed |
#4974
ssh_selene_runner
|
train.gpt3.345m_tp2_pp2_1node_50steps |
|
|
|||||
| failed |
#4973
ssh_selene_runner
|
train.gpt3.345m_tp4_pp1_1node_50steps |
|
|
|||||
| Cleanup | |||||||||
| failed |
#4983
ssh_selene_runner
allowed to fail
|
cleanup.selene |
|
|
|||||
| Name | Stage | Failure | ||
|---|---|---|---|---|
|
failed
|
train.gpt3.345m_tp4_pp1_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
train.gpt3.345m_tp1_pp2_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
train.gpt3.345m_tp2_pp2_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
train.gpt3.345m_tp1_pp4_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
train.bert.345m_tp4_pp1_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
train.bert.345m_tp1_pp2_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
train.bert.345m_tp1_pp4_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
train.bert.345m_tp2_pp2_1node_50steps | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
resume.checkpoint.bert.345m_tp1_pp2_1node | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
resume.checkpoint.gpt3.345m_tp1_pp2_1node | Test | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||
|
failed
|
cleanup.selene | Cleanup | There has been a timeout failure or the job got stuck. Check your timeout limits or try again | |
|
||||