Update Llama_pretraining.sh

12 jobs for !5 with main in 0 seconds (queued for 81 minutes and 12 seconds)
latest detached
Status Job ID Name Coverage
  Test
failed #11866
ssh_selene_runner
resume.checkpoint.bert.345m_tp1_pp2_1node

failed #11861
ssh_selene_runner
resume.checkpoint.gpt3.345m_tp1_pp2_1node

failed #11864
ssh_selene_runner
train.bert.345m_tp1_pp2_1node_50steps

failed #11865
ssh_selene_runner
train.bert.345m_tp1_pp4_1node_50steps

failed #11863
ssh_selene_runner
train.bert.345m_tp2_pp2_1node_50steps

failed #11862
ssh_selene_runner
train.bert.345m_tp4_pp1_1node_50steps

failed #11859
ssh_selene_runner
train.gpt3.345m_tp1_pp2_1node_50steps

failed #11860
ssh_selene_runner
train.gpt3.345m_tp1_pp4_1node_50steps

failed #11858
ssh_selene_runner
train.gpt3.345m_tp2_pp2_1node_50steps

failed #11857
ssh_selene_runner
train.gpt3.345m_tp4_pp1_1node_50steps

failed #11856
docker_local_runner
unit_tests

 
  Cleanup
failed #11867
ssh_selene_runner allowed to fail
cleanup.selene

 
Name Stage Failure
failed
train.bert.345m_tp1_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
resume.checkpoint.gpt3.345m_tp1_pp2_1node Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.gpt3.345m_tp4_pp1_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.bert.345m_tp4_pp1_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
cleanup.selene Cleanup There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
unit_tests Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.gpt3.345m_tp2_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.gpt3.345m_tp1_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.gpt3.345m_tp1_pp4_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
resume.checkpoint.bert.345m_tp1_pp2_1node Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.bert.345m_tp2_pp2_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log
failed
train.bert.345m_tp1_pp4_1node_50steps Test There has been a timeout failure or the job got stuck. Check your timeout limits or try again
No job log