"src/git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "3b37fefee99425286984a9d5fa4f1850064d01eb"

Merge branch 'megatron-lm_dtk24.04' into 'main'

Megatron lm dtk24.04

See merge request !1
11 jobs for main in 0 seconds (queued for 73 minutes and 3 seconds)
Status Job ID Name Coverage
  Test
failed #4982
ssh_selene_runner
resume.checkpoint.bert.345m_tp1_pp2_1node

failed #4977
ssh_selene_runner
resume.checkpoint.gpt3.345m_tp1_pp2_1node

failed #4980
ssh_selene_runner
train.bert.345m_tp1_pp2_1node_50steps

failed #4981
ssh_selene_runner
train.bert.345m_tp1_pp4_1node_50steps

failed #4979
ssh_selene_runner
train.bert.345m_tp2_pp2_1node_50steps

failed #4978
ssh_selene_runner
train.bert.345m_tp4_pp1_1node_50steps

failed #4975
ssh_selene_runner
train.gpt3.345m_tp1_pp2_1node_50steps

failed #4976
ssh_selene_runner
train.gpt3.345m_tp1_pp4_1node_50steps

failed #4974
ssh_selene_runner
train.gpt3.345m_tp2_pp2_1node_50steps

failed #4973
ssh_selene_runner
train.gpt3.345m_tp4_pp1_1node_50steps

 
  Cleanup
failed #4983
ssh_selene_runner allowed to fail
cleanup.selene

 
Name Stage Failure
failed
train.gpt3.345m_tp4_pp1_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.gpt3.345m_tp1_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.gpt3.345m_tp2_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.gpt3.345m_tp1_pp4_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.bert.345m_tp4_pp1_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.bert.345m_tp1_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.bert.345m_tp1_pp4_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.bert.345m_tp2_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
resume.checkpoint.bert.345m_tp1_pp2_1node Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
resume.checkpoint.gpt3.345m_tp1_pp2_1node Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
cleanup.selene Cleanup There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log