Initial commit

4e867b3c · jerrrrry · 4e867b3c · 4e867b3c · 4e867b3c · 4e867b3c
Commit 4e867b3c authored Aug 06, 2025 by jerrrrry
20 changed files
--- a/.gitignore
+++ b/.gitignore
+__pycache__
+*.bak
+*.log
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "Megatron-LM"]
+	path = Megatron-LM
+	url = https://github.com/NVIDIA/Megatron-LM.git
+	branch = d580efc68a9f0d 
+[submodule]
+	Megatron-LM = main
--- a/Megatron-LM/.coveragerc
+++ b/Megatron-LM/.coveragerc
+[html]
+directory = coverage
+[run]
+data_file = .coverage_$LOCAL_RANK
+relative_files = true
--- a/Megatron-LM/.flake8
+++ b/Megatron-LM/.flake8
+[flake8]
+max-line-length = 100
+extend-ignore = E203,E501,F401,E402,E714
+per-file-ignores = __init__.py:F401
\ No newline at end of file
--- a/Megatron-LM/.github/ISSUE_TEMPLATE/bug.md
+++ b/Megatron-LM/.github/ISSUE_TEMPLATE/bug.md
+---
+name: BUG
+about: Report a bug that needs attention
+title: "[BUG]"
+labels: ''
+assignees: ''
+---
+**Describe the bug**
+A clear and concise description of what the bug is.
+**To Reproduce**
+Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+**Stack trace/logs**
+If applicable, add the stack trace or logs from the time of the error.
+**Environment (please complete the following information):**
+ - Megatron-LM commit ID
+ - PyTorch version
+ - CUDA version
+ - NCCL version
+**Proposed fix**
+If you have a proposal for how to fix the issue state it here or link to a PR.
+**Additional context**
+Add any other context about the problem here.
--- a/Megatron-LM/.github/ISSUE_TEMPLATE/enhancement.md
+++ b/Megatron-LM/.github/ISSUE_TEMPLATE/enhancement.md
+---
+name: ENHANCEMENT
+about: Suggest an idea to improve this project
+title: "[ENHANCEMENT]"
+labels: ''
+assignees: ''
+---
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+**Proposed implementation**
+If you have a proposed implementation for the feature state it here or link to a PR.
+**Additional context**
+Add any other context or screenshots about the feature request here.
--- a/Megatron-LM/.github/ISSUE_TEMPLATE/question.md
+++ b/Megatron-LM/.github/ISSUE_TEMPLATE/question.md
+---
+name: QUESTION
+about: Ask a question about Megatron-LM that is not a bug, regression or enhancement
+  request
+title: "[QUESTION]"
+labels: ''
+assignees: ''
+---
+**Your question**
+Ask a clear and concise question about Megatron-LM.
--- a/Megatron-LM/.github/ISSUE_TEMPLATE/regression.md
+++ b/Megatron-LM/.github/ISSUE_TEMPLATE/regression.md
+---
+name: REGRESSION
+about: Report a regression in speed or accuracy due to a Megatron-LM update
+title: "[REGRESSION]"
+labels: ''
+assignees: ''
+---
+**Describe the regression**
+A clear and concise description of what the regression is.
+**To Reproduce**
+Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.
+**Previous performance**
+What speed or accuracy did you previously see.
+**New performance**
+What speed or accuracy do you see after the update.
+**Stack trace/logs**
+If applicable, add the stack trace or logs related to the regression.
+**Environment (please complete the following information):**
+ - Previous Megatron-LM commit ID
+ - New Megatron-LM commit ID
+ - Previous PyTorch version
+ - New PyTorch version
+ - Previous CUDA version
+ - New CUDA version
+ - Previous NCCL version
+ - New NCCL version
+**Proposed fix**
+If you have a proposal for how to fix the issue state it here or link to a PR.
+**Additional context**
+Add any other context about the problem here.
--- a/Megatron-LM/.github/workflows/stale.yml
+++ b/Megatron-LM/.github/workflows/stale.yml
+# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
+#
+# You can adjust the behavior by modifying this file.
+# For more information, see:
+# https://github.com/actions/stale
+name: Mark stale issues and pull requests
+on:
+  schedule:
+  - cron: '15 18 * * *'
+jobs:
+  stale:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      pull-requests: write
+    steps:
+    - uses: actions/stale@v5
+      with:
+        repo-token: ${{ secrets.GITHUB_TOKEN }}
+        days-before-stale: 60
+        stale-issue-message: 'Marking as stale. No activity in 60 days.'
+        stale-pr-message: 'Marking as stale. No activity in 60 days.'
+        stale-issue-label: 'stale'
+        stale-pr-label: 'stale'
+        remove-stale-when-updated: true
+        operations-per-run: 1000
+        days-before-close: -1
--- a/Megatron-LM/.gitignore
+++ b/Megatron-LM/.gitignore
+__pycache__
+*.so
+build
+.coverage_*
+*.egg-info
+*~
+slurm*
+logs
+.vscode
+local/
+.gitmodules
+wandb/
+onelogger.log
+onelogger.err
\ No newline at end of file
--- a/Megatron-LM/.gitlab-ci.yml
+++ b/Megatron-LM/.gitlab-ci.yml
+.merge_train_rule: &merge_train_rule
+  UNIT_TEST: "yes"
+  UNIT_TEST_REPEAT: 1
+  UNIT_TEST_TIMEOUT: 15
+  INTEGRATION_TEST: "yes"
+  INTEGRATION_TEST_SCOPE: mr
+  FUNCTIONAL_TEST: "yes"
+  FUNCTIONAL_TEST_SCOPE: mr-slim
+  FUNCTIONAL_TEST_REPEAT: 5
+  FUNCTIONAL_TEST_TIME_LIMIT: 2700
+  CLUSTER_A100: ""
+  CLUSTER_H100: ""
+  PUBLISH: "no"
+workflow:
+  rules:
+    # Do not trigger for forks
+    - if: $CI_PROJECT_NAMESPACE != "ADLR"
+      when: never
+    # ci-branches only for schedule
+    - if: $CI_COMMIT_BRANCH =~ /ci-/ && $CI_PIPELINE_SOURCE != "schedule"
+      when: never
+    # For schedules pipelines
+    - if: $CI_PIPELINE_SOURCE == "schedule"
+      auto_cancel:
+        on_new_commit: none
+    # For manual pipelines
+    - if: $CI_PIPELINE_SOURCE == "web"
+    # For push to main
+    - if: $CI_PIPELINE_SOURCE == 'push' && $CI_COMMIT_REF_PROTECTED == "true"
+      variables:
+        UNIT_TEST: "no"
+        INTEGRATION_TEST: "no"
+        FUNCTIONAL_TEST: "yes"
+        FUNCTIONAL_TEST_SCOPE: mr
+        FUNCTIONAL_TEST_REPEAT: 5
+        FUNCTIONAL_TEST_RECORD_CHECKPOINTS: 'no'
+        FUNCTIONAL_TEST_TIME_LIMIT: 2700
+        CLUSTER_A100: ""
+        CLUSTER_H100: ""
+        PUBLISH: "no"
+      auto_cancel:
+        on_new_commit: none
+    # For merge-trains that need to be fast-tracked
+    - if: $CI_MERGE_REQUEST_EVENT_TYPE == 'merge_train' && $CI_MERGE_REQUEST_LABELS =~ /fast-track/
+      variables:
+        UNIT_TEST: "yes"
+        UNIT_TEST_REPEAT: 1
+        UNIT_TEST_TIMEOUT: 15
+        INTEGRATION_TEST: "no"
+        FUNCTIONAL_TEST: "no"
+        CLUSTER_A100: ""
+        CLUSTER_H100: ""
+        PUBLISH: "no"
+    # For normal merge-trains
+    - if: $CI_MERGE_REQUEST_EVENT_TYPE == 'merge_train'
+      variables: *merge_train_rule
+    # For MRs with integration suite
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_LABELS =~ /Run tests/
+      variables: *merge_train_rule
+    # For MRs with nightly
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_LABELS =~ /Run nightly/
+      variables:
+        UNIT_TEST: "no"
+        INTEGRATION_TEST: "no"
+        FUNCTIONAL_TEST: "yes"
+        FUNCTIONAL_TEST_SCOPE: nightly
+        FUNCTIONAL_TEST_REPEAT: 5
+        FUNCTIONAL_TEST_RECORD_CHECKPOINTS: 'no'
+        FUNCTIONAL_TEST_TIME_LIMIT: 2700
+        CLUSTER_A100: ""
+        CLUSTER_H100: ""
+        PUBLISH: "no"
+    # For MRs with weekly
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_LABELS =~ /Run weekly/
+      variables:
+        UNIT_TEST: "no"
+        INTEGRATION_TEST: "no"
+        FUNCTIONAL_TEST: "yes"
+        FUNCTIONAL_TEST_SCOPE: weekly
+        FUNCTIONAL_TEST_REPEAT: 1
+        FUNCTIONAL_TEST_RECORD_CHECKPOINTS: 'no'
+        FUNCTIONAL_TEST_TIME_LIMIT: 9000
+        CLUSTER_A100: ""
+        CLUSTER_H100: ""
+        PUBLISH: "no"
+    # For MRs with heavy suite
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_LABELS =~ /Run functional tests/
+      variables:
+        UNIT_TEST: "yes"
+        UNIT_TEST_REPEAT: 1
+        UNIT_TEST_TIMEOUT: 15
+        INTEGRATION_TEST: "no"
+        FUNCTIONAL_TEST: "yes"
+        FUNCTIONAL_TEST_SCOPE: mr
+        FUNCTIONAL_TEST_REPEAT: 5
+        FUNCTIONAL_TEST_TIME_LIMIT: 2700
+        CLUSTER_A100: ""
+        CLUSTER_H100: ""
+        PUBLISH: "no"
+    # Default MRs
+    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
+      variables:
+        UNIT_TEST: "yes"
+        UNIT_TEST_REPEAT: 1
+        UNIT_TEST_TIMEOUT: 15
+        INTEGRATION_TEST: "no"
+        FUNCTIONAL_TEST: "no"
+        PUBLISH: "no"
+    - when: never
+  auto_cancel:
+    on_new_commit: interruptible
+stages:
+  - build
+  - test
+  - integration_tests
+  - functional_tests
+  - publish
+default:
+  interruptible: true
+  retry:
+    max: 2
+    when: runner_system_failure
+variables:
+  UNIT_TEST:
+    value: "yes"
+    options:
+      - "yes"
+      - "no"
+    description: To run the funtional test suite
+  UNIT_TEST_REPEAT:
+    value: "1"
+    description: "Number of repetitions"
+  UNIT_TEST_TIMEOUT:
+    value: "30"
+    description: Timeout (minutes) for Unit tests (all repeats)
+  INTEGRATION_TEST:
+    value: "yes"
+    options:
+      - "yes"
+      - "no"
+    description: To run the integration test suite
+  INTEGRATION_TEST_SCOPE:
+    value: "mr"
+    options:
+      - "mr"
+      - "nightly"
+      - "weekly"
+      - "pre-release"
+      - "release"
+    description: "Testsuite to run (only for INTEGRATION_TEST=yes)"
+  INTEGRATION_TEST_TIME_LIMIT:
+    value: "900"
+    description: "Timeout in seconds per test"
+  INTEGRATION_TEST_CASES:
+    value: "all"
+    description: "Comma-separated list of test_cases to run. Use 'all' to run the full suite."
+  FUNCTIONAL_TEST:
+    value: "yes"
+    options:
+      - "yes"
+      - "no"
+    description: To run the funtional test suite
+  FUNCTIONAL_TEST_SCOPE:
+    value: "mr"
+    options:
+      - "mr"
+      - "nightly"
+      - "weekly"
+      - "pre-release"
+      - "release"
+    description: "Testsuite to run (only for FUNCTIONAL_TEST=yes)"
+  FUNCTIONAL_TEST_REPEAT:
+    value: "5"
+    description: "Number of repetitions per test"
+  FUNCTIONAL_TEST_TIME_LIMIT:
+    value: "2700"
+    description: "Timeout in seconds per test"
+  FUNCTIONAL_TEST_CASES:
+    value: "all"
+    description: "Comma-separated list of test_cases to run. Use 'all' to run the full suite."
+  FUNCTIONAL_TEST_RECORD_CHECKPOINTS:
+    value: 'no'
+    description: "Record golden checkpoints"
+    options:
+      - 'yes'
+      - 'no'
+  CLUSTER_A100:
+    value: "dgxa100_dracooci"
+    options:
+      - "dgxa100_dracooci"
+      - "dgxa100_dracooci-ord"
+    description: "Cluster for A100 workloads"
+  CLUSTER_H100:
+    value: "dgxh100_eos"
+    options:
+      - "dgxh100_coreweave"
+      - "dgxh100_eos"
+    description: "Cluster for H100 workloads"
+  FUNCTIONAL_TEST_NAME:
+    description: "Name of functional test run (only for pre-release and release)"
+    value: "$$CI_COMMIT_SHA"
+  FUNCTIONAL_TEST_RECORD_CHECKPOINTS:
+    description: "Record golden checkpoints"
+    value: "no"
+    options:
+      - "yes"
+      - "no"
+  PUBLISH:
+    value: "no"
+    options:
+      - "yes"
+      - "no"
+    description: Build and publish a wheel to PyPi
+  PUBLISH_COMMIT:
+    value: "$$CI_COMMIT_SHA"
+    description: Which commit to publish
+  PUBLISH_VERSION_BUMP_BRANCH:
+    value: "$$CI_COMMIT_BRANCH"
+    description: Which branch to target for version bump
+  PUBLISH_SCOPE:
+    value: "code-freeze"
+    options:
+      - "code-freeze"
+      - "release"
+    description: Type of publish (freeze or final release)
+  # CI wide variables
+  CI_MCORE_LTS_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/mcore_ci_lts
+  CI_MCORE_DEV_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/mcore_ci_dev
+  CI_NEMO_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/nemo_ci
+  UTILITY_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/mcore_utility
+include:
+  - .gitlab/stages/00.pre.yml
+  - .gitlab/stages/01.build.yml
+  - .gitlab/stages/02.test.yml
+  - .gitlab/stages/03.integration-tests.yml
+  - .gitlab/stages/04.functional-tests.yml
+  - .gitlab/stages/05.publish.yml
--- a/Megatron-LM/.gitlab/labeler-config.yml
+++ b/Megatron-LM/.gitlab/labeler-config.yml
+CI:
+  - .gitlab-ci.yml
+  - Dockerfile.ci.lts
+  - Dockerfile.ci.dev
+  - .github/**
+  - .gitlab/**
+Datasets:
+  - megatron/core/datasets/**
+BERT:
+  - megatron/core/models/bert/**
+GPT:
+  - megatron/core/models/gpt/**
+RETRO:
+  - megatron/core/models/retro/**
+Dist-Ckpt:
+  - megatron/core/dist_checkpointing
+Dist-Opt:
+  - megatron/core/optimizer/distrib_optimizer
+Inference:
+  - megatron/core/inference
+MoE:
+  - megatron/core/transformer/moe
+Tests:
+  - tests/**
+ParallelState:
+  - megatron/core/parallel_state.py
--- a/Megatron-LM/.gitlab/stages/00.pre.yml
+++ b/Megatron-LM/.gitlab/stages/00.pre.yml
+include:
+  - template: Security/Secret-Detection.gitlab-ci.yml
+.pre_rules:
+  rules:
+    - if: $CI_PIPELINE_SOURCE == 'main'
+      allow_failure: true
+      when: always
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
+      allow_failure: true
+      when: always
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event'
+      when: always
+    - when: never
+  stage: .pre
+.dind_rules:
+  image: docker:26.1.4-dind
+  variables:
+    DOCKER_HOST: unix:///var/run/docker.sock
+  before_script:
+    - docker system prune -a --filter "until=36h" -f || true
+    - echo "$NGC_API_KEY" | docker login nvcr.io -u '$oauthtoken' --password-stdin
+    - echo "$CI_REGISTRY_PASSWORD" | docker login $CI_REGISTRY -u $CI_REGISTRY_USER --password-stdin
+pre:mirror_to_github:
+  rules:
+    - if: '$CI_COMMIT_REF_PROTECTED == "true" && $CI_PIPELINE_SOURCE == "push"'
+      allow_failure: true
+    - when: never
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  stage: .pre
+  image: python:3.10
+  variables:
+    GIT_STRATEGY: "clone"
+  script:
+    - git checkout $CI_COMMIT_BRANCH
+    - git remote add github https://ko3n1g:$GH_TOKEN@github.com/NVIDIA/Megatron-LM.git || true
+    - git push -u github $CI_COMMIT_BRANCH
+  retry:
+    max: 2
+pre:create_ci_branches:
+  rules:
+    - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH && $CI_PIPELINE_SOURCE == "push"'
+      allow_failure: true
+    - when: never
+  parallel:
+    matrix:
+      - branch: ci-unit-test-extended
+      - branch: ci-rebuild-mcore-nemo-image
+      - branch: ci-mr
+      - branch: ci-nightly
+      - branch: ci-weekly
+      - branch: ci-pre-release
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  stage: .pre
+  image: python:3.10
+  variables:
+    GIT_STRATEGY: "clone"
+  script:
+    - git remote set-url origin "https://gitlab-ci-token:${PROJECT_ACCESS_TOKEN_MCORE}@${GITLAB_ENDPOINT}/adlr/megatron-lm.git"
+    - git switch --force-create $branch
+    - git push --force -u origin $branch
+  retry:
+    max: 2
+pre:label_merge_request:
+  extends: [.pre_rules]
+  image: golang:1.22
+  tags:
+    - mcore-docker-node-small
+  before_script:
+    - git clone -b nv https://${GITLAB_ENDPOINT}/okoenig/gitlab-mr-labeler.git
+    - cd gitlab-mr-labeler
+    - go install .
+    - cd ..
+    - go install github.com/itchyny/gojq/cmd/gojq@latest
+  script:
+    - set -x
+    - |
+      LABELS=$(curl --header "PRIVATE-TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}" --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}")
+    - LABELS=$(echo "$LABELS" | gojq '.labels -= ["ParallelState"]')
+    - |
+      if git --no-pager diff --merge-base origin/${CI_MERGE_REQUEST_TARGET_BRANCH_NAME} -- 'megatron/core/' | grep -q 'parallel_state'; then
+        LABELS=$(echo "$LABELS" | gojq '.labels += ["ParallelState"]')
+        echo "$LABELS"
+      fi
+    - echo LABELS=$(echo "$LABELS" | gojq '.labels | join(",")') > labels
+    - gitlab-mr-labeler -f .gitlab/labeler-config.yml -t ${PROJECT_ACCESS_TOKEN_MCORE} --debug true
+    - cat labels
+  after_script:
+    - |
+      source labels
+      curl --header "PRIVATE-TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}" --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}" --data-urlencode "add_labels=$LABELS" -X PUT
+pre:maybe_cherry_pick_commit:
+  rules:
+    - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH && $CI_PIPELINE_SOURCE == "push"'
+    - when: never
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  stage: .pre
+  image: nentangso/alpine-git-curl-jq
+  variables:
+    GIT_STRATEGY: "clone"
+  script:
+    - set -x
+    - set +e
+    - SHA=$(git rev-list --no-merges -n 1 HEAD)
+    - MESSAGE=$(git log -n 1 --pretty=format:%s $SHA)
+    - MR_ID=$(echo $MESSAGE | awk -F'!' '{print $2}' | awk '{print $1}' )
+    - git remote set-url origin "https://gitlab-ci-token:${PROJECT_ACCESS_TOKEN_MCORE}@${GITLAB_ENDPOINT}/$CI_PROJECT_NAMESPACE/megatron-lm.git"
+    - git config --global user.email "mcore-bot@nvidia.com"
+    - git config --global user.name "Mcore Bot"
+    - |
+      MR=$(curl --header "PRIVATE-TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}" --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${MR_ID}")
+      LABELS=$(echo -E $MR | jq '.labels | join(",")' | tr -d '"')
+      AUTHOR_ID=$(echo -E $MR | jq '.author.id' | tr -d '"')
+      AUTHOR_NAME=$(echo -E $MR | jq '.author.username' | tr -d '"')
+      TITLE=$(echo -E $MR | jq '.title' | tr -d '"')
+      MILESTONE_ID=$(echo -E $MR | jq '.milestone.id' | tr -d '"')
+      TARGET_BRANCHES=$(echo "$LABELS" | grep -o 'core_[^,]*')
+      if [[ $TARGET_BRANCHES == "" ]]; then
+        echo Nothing to cherry pick
+        exit 0
+      fi
+      echo $TARGET_BRANCHES | while read -r RELEASE_BRANCH ; do
+        TARGET_BRANCH_EXISTS_OK=$([[ "$(git ls-remote --heads origin refs/heads/$RELEASE_BRANCH)" != "" ]] && echo true || echo false)
+        if [[ "$TARGET_BRANCH_EXISTS_OK" == "false" ]]; then
+          echo Release branch does not yet exist, will not  cherry-pick
+          continue
+        fi
+        (
+          git fetch origin $RELEASE_BRANCH:$RELEASE_BRANCH
+          git switch --force-create cherry-pick-$MR_ID-$RELEASE_BRANCH $RELEASE_BRANCH
+          git cherry-pick $SHA
+          git push -u origin --force cherry-pick-$MR_ID-$RELEASE_BRANCH
+          git checkout ${CI_DEFAULT_BRANCH:-main}
+        )
+        CHERRYPICK_SUCCESSFUL=$?
+        if [[ $CHERRYPICK_SUCCESSFUL -eq 0 ]]; then
+          curl \
+            --header "PRIVATE-TOKEN: $PAT" \
+            --url https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests \
+            -d "source_branch=cherry-pick-$MR_ID-$RELEASE_BRANCH" \
+            -d "target_branch=$RELEASE_BRANCH" \
+            -d "title=Cherry pick \`$TITLE ($MR_ID)\` into \`$RELEASE_BRANCH\`" \
+            -d "labels=cherry-pick" \
+            -d "reviewer_ids=$AUTHOR_ID" \
+            -d "milestone_id=$MILESTONE_ID" \
+            -d "description=[🤖]: Hi @$AUTHOR_NAME 👋,<br><br>we've cherry picked \`$TITLE ($MR_ID)\` into \`$RELEASE_BRANCH\` for you! 🚀<br><br>Please review and approve this cherry pick by your convenience\!"
+        else
+          URL=https://${GITLAB_ENDPOINT}/ADLR/megatron-lm/-/merge_requests/$MR_ID
+          MESSAGE='{
+            "blocks": [
+              {
+                "type": "section",
+                "text": {
+                  "type": "mrkdwn",
+                  "text": "beep boop 🤖: Cherry-pick of <'$URL'|!'$MR_ID'> failed\ncc '$SLACK_ADMIN'"
+                }
+              }
+            ]
+          }'
+          curl -X POST -H "Content-type: application/json" --data "$MESSAGE" ${MCORE_NOTIFICATION_HOOK}
+        fi
+      done
+  interruptible: false
+pre:check_milestone:
+  extends: [.pre_rules]
+  image: badouralix/curl-jq
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  script:
+    - env
+    - |
+      MILESTONE=$(curl --header "PRIVATE-TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}" --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}" | jq '.milestone')
+    - |
+      if [[ "$MILESTONE" == "null" ]]; then
+        echo Please assign a Milestone to this MR!
+        exit 1
+      fi
+pre:check_status_of_main:
+  extends: [.pre_rules]
+  image: python:3.10
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  timeout: 7 days
+  script:
+    - env
+    - pip install --no-cache-dir python-gitlab click
+    - export RO_API_TOKEN=${PROJECT_ACCESS_TOKEN_MCORE}
+    - export GITLAB_ENDPOINT
+    - python tests/test_utils/python_scripts/check_status_of_main.py --target-branch "$CI_MERGE_REQUEST_TARGET_BRANCH_NAME"
+  rules:
+    - if: $CI_MERGE_REQUEST_EVENT_TYPE == 'merge_train' && $CI_MERGE_REQUEST_LABELS =~ /fast-track/
+      when: never
+    - if: $CI_MERGE_REQUEST_EVENT_TYPE == 'merge_train'
+      when: always
+    - when: never
--- a/Megatron-LM/.gitlab/stages/01.build.yml
+++ b/Megatron-LM/.gitlab/stages/01.build.yml
+.build_image:
+  extends: [.test_rules, .dind_rules]
+  stage: build
+  tags:
+    - arch/amd64
+    - origin/jet-fleet
+    - env/prod
+    - ${TAG}
+  services:
+    - name: docker:24.0.5-dind
+      variables:
+        HEALTHCHECK_TCP_PORT: "2376"
+  timeout: 180m
+  variables:
+    DOCKER_HOST: tcp://docker:2376
+    DOCKER_TLS_CERTDIR: "/certs"
+    DOCKER_TLS_VERIFY: 1
+    DOCKER_CERT_PATH: "$DOCKER_TLS_CERTDIR/client"
+    TAG: purpose/builder-large
+    STAGE: jet
+    MCORE_BACKWARDS_REF: ko3n1g/fix/pyt2501 # core_r0.11.0
+    KUBERNETES_SERVICE_MEMORY_REQUEST: 90Gi
+    KUBERNETES_SERVICE_MEMORY_LIMIT: 90Gi
+    # KUBERNETES_SERVICE_CPU_REQUEST: 60
+    # KUBERNETES_SERVICE_CPU_LIMIT: 60
+  script:
+    - env
+    - eval PUBLISH_COMMIT=$PUBLISH_COMMIT
+    - env
+    - apk add bash
+    - |
+      bash -c '
+        set -x
+        env
+        eval "IMAGE=\$$IMAGE"
+        docker context create tls-environment
+        docker buildx create --name container --driver=docker-container --use tls-environment
+        ADDITIONAL_PARAMS=()
+        if [[ "$CI_COMMIT_BRANCH" == "ci-rebuild-mcore-nemo-image" || "$CI_COMMIT_BRANCH" == "main" ]]; then
+          ADDITIONAL_PARAMS+=("--pull")
+          ADDITIONAL_PARAMS+=("--cache-to type=registry,ref=${IMAGE}-buildcache:main,mode=max")
+        else
+          ADDITIONAL_PARAMS+=("--cache-to type=registry,ref=${IMAGE}-buildcache:${CI_MERGE_REQUEST_IID:-$CI_COMMIT_REF_SLUG},mode=max")
+        fi
+        if [[ "$CI_COMMIT_BRANCH" == "ci-nightly" ]]; then
+          ADDITIONAL_PARAMS+=("-t ${IMAGE}:nightly")
+        fi
+        echo $(git rev-parse HEAD)
+        DOCKER_BUILDKIT=1 docker build \
+          --secret id=JET_INDEX_URLS \
+          --secret id=LOGGER_INDEX_URL \
+          --secret id=EXPERIMENTAL_FLASH_ATTN \
+          --target $STAGE \
+          -f $FILE \
+          -t ${IMAGE}:${CI_PIPELINE_ID} \
+          --builder=container \
+          --build-arg CACHEBUST=$(cat /proc/sys/kernel/random/uuid) \
+          --build-arg MCORE_REPO=${CI_REPOSITORY_URL} \
+          --build-arg MCORE_REF=$CI_COMMIT_SHA \
+          --build-arg MCORE_BACKWARDS_REF=$MCORE_BACKWARDS_REF \
+          --cache-from type=registry,ref=${IMAGE}-buildcache:${CI_MERGE_REQUEST_IID} \
+          --cache-from type=registry,ref=${IMAGE}-buildcache:main \
+          --build-arg FROM_IMAGE_NAME=$BASE_IMAGE \
+          --push \
+          --progress plain \
+          ${ADDITIONAL_PARAMS[@]} .
+        '
+  retry:
+    max: 2
+test:build_image:
+  extends: [.build_image]
+  parallel:
+    matrix:
+      - IMAGE: CI_MCORE_LTS_IMAGE
+        FILE: Dockerfile.ci.lts
+        BASE_IMAGE: nvcr.io/nvidia/pytorch:24.01-py3
+      - IMAGE: CI_MCORE_DEV_IMAGE
+        FILE: Dockerfile.ci.dev
+        BASE_IMAGE: nvcr.io/nvidia/pytorch:25.03-py3
+      - IMAGE: UTILITY_IMAGE
+        FILE: Dockerfile.linting
+        BASE_IMAGE: python:3.10
+test:build_nemo_image:
+  extends: [.build_image]
+  variables:
+    IMAGE: CI_NEMO_IMAGE
+    FILE: Dockerfile.ci.dev
+    BASE_IMAGE: nvcr.io/nvidian/nemo:nightly
--- a/Megatron-LM/.gitlab/stages/02.test.yml
+++ b/Megatron-LM/.gitlab/stages/02.test.yml
+.test_rules:
+  rules:
+    - when: on_success
+  stage: test
+include:
+  - template: Security/Secret-Detection.gitlab-ci.yml
+test:unit_tests_configure:
+  extends: [.test_rules]
+  needs: [test:build_image]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  before_script:
+    - git rm -r tests/test_utils/local_recipes || true
+    - git submodule add --force https://gitlab-ci-token:${CI_JOB_TOKEN}@${GITLAB_ENDPOINT}/ADLR/megatron-lm-convergence-tests.git tests/test_utils/local_recipes
+    - ls tests/test_utils/local_recipes
+  script:
+    - set -x
+    - |
+      A100_CLUSTER=$([[ "$CLUSTER_A100" != "" ]] && echo $CLUSTER_A100 || echo $DEFAULT_A100_CLUSTER)
+      H100_CLUSTER=$([[ "$CLUSTER_H100" != "" ]] && echo $CLUSTER_H100 || echo $DEFAULT_H100_CLUSTER)
+    - |
+      ARGS=(
+        "--scope unit-tests"
+        "--n-repeat ${UNIT_TEST_REPEAT}"
+        "--time-limit $(( UNIT_TEST_TIMEOUT * 60 ))"
+        "--test-cases all"
+        "--a100-cluster dgxa100_dracooci-ord"
+        "--h100-cluster dgxh100_coreweave"
+        "--h100-partition batch_short,batch"
+        "--container-image ${UTILITY_IMAGE}"
+        "--container-tag ${CI_PIPELINE_ID}"
+        "--dependent-job test:unit_tests_configure"
+        "--slurm-account ${CI_SLURM_ACCOUNT}"
+        "--no-enable-warmup"
+      )
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment "lts" \
+        --tag "legacy" \
+        --output-path "unit-test-job-lts-legacy.yaml"
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment "lts" \
+        --tag "latest" \
+        --output-path "unit-test-job-lts-latest.yaml"
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment "dev" \
+        --tag "legacy" \
+        --output-path "unit-test-job-dev-legacy.yaml"
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment "dev" \
+        --tag "latest" \
+        --output-path "unit-test-job-dev-latest.yaml"
+  rules:
+    - if: $UNIT_TEST == 'yes' && $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
+      allow_failure: true
+      when: on_success
+    - if: $UNIT_TEST == 'yes' && $UNIT_TEST_REPEAT != '0'
+      when: on_success
+  artifacts:
+    paths:
+      - unit-test-job-dev-legacy.yaml
+      - unit-test-job-dev-latest.yaml
+      - unit-test-job-lts-legacy.yaml
+      - unit-test-job-lts-latest.yaml
+      - tests/test_utils/local_recipes
+.unit_tests_run:
+  needs:
+    - test:linting_formatting
+    - test:linting_copyright
+    - job: test:linting_secret_detection
+      optional: true
+    - test:unit_tests_configure
+  extends: [.test_rules]
+  trigger:
+    include:
+      - artifact: unit-test-job-$ENVIRONMENT-$TAG.yaml
+        job: test:unit_tests_configure
+    strategy: depend
+  variables:
+    RO_API_TOKEN: $PAT
+    CONTAINER_TAG: $CI_PIPELINE_ID
+    CI_MCORE_LTS_IMAGE: $CI_MCORE_LTS_IMAGE
+    GITLAB_ENDPOINT: $GITLAB_ENDPOINT
+    PARENT_PIPELINE_ID: $CI_PIPELINE_ID
+  inherit:
+    variables: true
+  rules:
+    - if: $UNIT_TEST == 'yes' && $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
+      allow_failure: true
+      when: on_success
+    - if: $UNIT_TEST == 'yes' && $UNIT_TEST_REPEAT != '0'
+      when: on_success
+test:unit_tests_pyt(DEV)_mcore(legacy):
+  extends: [.unit_tests_run]
+  variables:
+    ENVIRONMENT: dev
+    TAG: legacy
+  rules:
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_NAME != 'main'
+      when: never
+    - if: $UNIT_TEST == 'yes' && $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
+      allow_failure: true
+      when: on_success
+    - if: $UNIT_TEST == 'yes' && $UNIT_TEST_REPEAT != '0'
+      when: on_success
+test:unit_tests_pyt(LTS)_mcore(legacy):
+  extends: [.unit_tests_run]
+  variables:
+    ENVIRONMENT: lts
+    TAG: legacy
+  rules:
+    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_NAME != 'main'
+      when: never
+    - if: $UNIT_TEST == 'yes' && $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
+      allow_failure: true
+      when: on_success
+    - if: $UNIT_TEST == 'yes' && $UNIT_TEST_REPEAT != '0'
+      when: on_success
+test:unit_tests_pyt(DEV)_mcore(latest):
+  extends: [.unit_tests_run]
+  variables:
+    ENVIRONMENT: dev
+    TAG: latest
+test:unit_tests_pyt(LTS)_mcore(latest):
+  extends: [.unit_tests_run]
+  variables:
+    ENVIRONMENT: lts
+    TAG: latest
+test:unit_tests_notify:
+  extends: [.test_rules]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  needs:
+    - test:unit_tests_pyt(DEV)_mcore(latest)
+    - test:unit_tests_pyt(LTS)_mcore(latest)
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  script:
+    - env
+    - export WEBHOOK_URL=${MCORE_NOTIFICATION_HOOK}
+    - export RO_API_TOKEN=${PROJECT_ACCESS_TOKEN_MCORE}
+    - export GITLAB_ENDPOINT
+    - export TAG_TEAM=$([[ "$CI_COMMIT_BRANCH" == "main" ]] && echo "1" || "0")
+    - export TEAM_SLUG=$SLACK_ADMIN
+    - |
+      python tests/test_utils/python_scripts/notify.py \
+        --pipeline-id "${CI_PIPELINE_ID}" \
+        --check-for unit-tests \
+        --pipeline-context "unit-tests-extended" \
+        --pipeline-created-at "${CI_PIPELINE_CREATED_AT}"
+  artifacts:
+    when: always
+    paths:
+      - scripts
+  rules:
+    - if: $CI_PIPELINE_SOURCE == "schedule" && $CI_COMMIT_BRANCH == "ci-unit-test-extended"
+      when: always
+    - when: never
+test:linting_docs_build:
+  extends: [.test_rules]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  needs: [test:build_image]
+  script:
+    - cd ..
+    - rm -rf documentation && git clone https://gitlab-ci-token:${CI_JOB_TOKEN}@${GITLAB_ENDPOINT}/nemo-megatron-core-tme/documentation.git
+    - mv megatron-lm/ documentation/
+    - cd documentation/
+    - ./repo docs
+test:linting_formatting:
+  extends: [.test_rules]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  needs: [test:build_image]
+  variables:
+    GIT_STRATEGY: "clone"
+  script:
+    - |
+      if [[ "$CI_PIPELINE_SOURCE" != "merge_request_event" ]]; then
+        exit 0
+      fi
+    - set +e
+    - git fetch origin main:main
+    - |
+      if [[ "$CI_MERGE_REQUEST_PROJECT_PATH" == "$CI_MERGE_REQUEST_SOURCE_PROJECT_PATH" ]]; then 
+        bash tools/autoformat.sh
+        set -e
+        git fetch origin $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME
+        git checkout $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME
+        git config --global user.email "mcore-bot@nvidia.com"
+        git config --global user.name "Mcore Bot"
+        git remote set-url origin "https://gitlab-ci-token:${PAT}@${GITLAB_ENDPOINT}/$CI_PROJECT_NAMESPACE/megatron-lm.git"
+        git add -A .
+        git commit -m "chore: Format files" || true
+        git push -u origin $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME
+      fi
+    - env
+    - BASE_REF="$CI_MERGE_REQUEST_TARGET_BRANCH_NAME" CHECK_ONLY=true SKIP_DOCS=$([[ "$CI_MERGE_REQUEST_LABELS" == *"Skip docs"* ]] && echo "true" || echo "false") bash tools/autoformat.sh
+test:linting_copyright:
+  extends: [.test_rules]
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  needs: [test:build_image]
+  script:
+    - git fetch origin main
+    - bash tools/copyright.sh
+# Override from template
+secret_detection:
+  rules:
+    - when: never
+# Inherit and modify template
+test:linting_secret_detection:
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  extends: [".secret-analyzer"]
+  needs: [test:build_image]
+  variables:
+    GIT_DEPTH: 0
+    SECRET_DETECTION_LOG_OPTIONS: ${CI_MERGE_REQUEST_DIFF_BASE_SHA}..${CI_COMMIT_SHA}
+  allow_failure: false
+  rules:
+    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
+    - when: never
+  script:
+    - apk add jq
+    - /analyzer run
+    - |
+      if [[ $(cat gl-secret-detection-report.json | jq '.vulnerabilities | length > 0') == true ]]; then
+        echo "Atleast one vulnerability has been found"
+        cat gl-secret-detection-report.json | jq '.'
+        exit 1
+      fi
+test:unit_tests_x_coverage_report:
+  extends: [.test_rules]
+  needs:
+    - job: test:unit_tests_pyt(DEV)_mcore(latest)
+    - job: test:unit_tests_pyt(LTS)_mcore(latest)
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  script:
+    - env
+    - export RO_API_TOKEN=${PROJECT_ACCESS_TOKEN_MCORE}
+    - export GITLAB_ENDPOINT
+    - python tests/test_utils/python_scripts/download_coverage_results.py --pipeline-id ${CI_PIPELINE_ID}
+    - coverage combine --keep $(ls coverage_results/*/coverage_report)
+    - coverage report
+    - coverage xml
+  coverage: "/TOTAL.+ ([0-9]{1,3}%)/"
+  artifacts:
+    reports:
+      coverage_report:
+        coverage_format: cobertura
+        path: coverage.xml
+  rules:
+    - if: $UNIT_TEST == 'yes' && $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
+      allow_failure: true
+      when: on_success
+    - if: $UNIT_TEST == 'yes' && $UNIT_TEST_REPEAT != '0'
+      when: on_success
--- a/Megatron-LM/.gitlab/stages/03.integration-tests.yml
+++ b/Megatron-LM/.gitlab/stages/03.integration-tests.yml
+.integration_tests_rules:
+  stage: integration_tests
+  rules:
+    - if: $INTEGRATION_TEST == "yes"
+      when: on_success
+    - when: never
+default:
+  id_tokens:
+    VAULT_JWT_TOKEN:
+      aud: https://stg.vault.nvidia.com
+include:
+  - project: dl/jet/gitlab-templates
+    ref: main
+    file: downstreams.yml
+integration:configure:
+  needs:
+    - test:build_image
+    - job: test:unit_tests_pyt(DEV)_mcore(latest)
+      optional: true
+    - job: test:unit_tests_pyt(LTS)_mcore(latest)
+      optional: true
+    - job: test:build_nemo_image
+  extends: [.integration_tests_rules]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  before_script:
+    - git rm -r tests/test_utils/local_recipes || true
+    - git submodule add --force https://gitlab-ci-token:${CI_JOB_TOKEN}@${GITLAB_ENDPOINT}/ADLR/megatron-lm-convergence-tests.git tests/test_utils/local_recipes
+    - ls tests/test_utils/local_recipes
+  script:
+    - set -x
+    - |
+      A100_CLUSTER=$([[ "$CLUSTER_A100" != "" ]] && echo $CLUSTER_A100 || echo $DEFAULT_A100_CLUSTER)
+      H100_CLUSTER=$([[ "$CLUSTER_H100" != "" ]] && echo $CLUSTER_H100 || echo $DEFAULT_H100_CLUSTER)
+    - |
+      ARGS=(
+        "--scope $INTEGRATION_TEST_SCOPE"
+        "--n-repeat 1"
+        "--time-limit $INTEGRATION_TEST_TIME_LIMIT"
+        "--test-cases $INTEGRATION_TEST_CASES"
+        "--a100-cluster $A100_CLUSTER"
+        "--h100-cluster $H100_CLUSTER"
+        "--container-image ${UTILITY_IMAGE}"
+        "--container-tag ${CI_PIPELINE_ID}"
+        "--slurm-account ${CI_SLURM_ACCOUNT}"
+        "--no-enable-warmup"
+        "--dependent-job integration:configure"
+        "--enable-lightweight-mode"
+      )
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment dev \
+        --output-path "functional-test-job-dev.yaml"
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment lts \
+        --output-path "functional-test-job-lts.yaml"
+  artifacts:
+    paths:
+      - functional-test-job-lts.yaml
+      - functional-test-job-dev.yaml
+      - tests/test_utils/local_recipes
+.integration_run:
+  needs: [integration:configure]
+  extends: [.integration_tests_rules]
+  trigger:
+    include:
+      - artifact: functional-test-job-$ENVIRONMENT.yaml
+        job: integration:configure
+    strategy: depend
+  variables:
+    RO_API_TOKEN: $PAT
+    CONTAINER_TAG: $CI_PIPELINE_ID
+    CI_MCORE_LTS_IMAGE: $CI_MCORE_LTS_IMAGE
+    GITLAB_ENDPOINT: $GITLAB_ENDPOINT
+    PARENT_PIPELINE_ID: $CI_PIPELINE_ID
+  inherit:
+    variables: true
+integration:run_lts:
+  extends: [.integration_run]
+  variables:
+    ENVIRONMENT: lts
+integration:run_dev:
+  extends: [.integration_run]
+  variables:
+    ENVIRONMENT: dev
--- a/Megatron-LM/.gitlab/stages/04.functional-tests.yml
+++ b/Megatron-LM/.gitlab/stages/04.functional-tests.yml
+.functional_tests_rules:
+  stage: functional_tests
+  rules:
+    - if: $FUNCTIONAL_TEST == "yes"
+      when: on_success
+    - when: never
+default:
+  id_tokens:
+    VAULT_JWT_TOKEN:
+      aud: https://stg.vault.nvidia.com
+include:
+  - project: dl/jet/gitlab-templates
+    ref: main
+    file: downstreams.yml
+functional:configure:
+  needs:
+    - test:build_image
+    - test:build_nemo_image
+    - job: test:unit_tests_pyt(DEV)_mcore(latest)
+      optional: true
+    - job: test:unit_tests_pyt(LTS)_mcore(latest)
+      optional: true
+    - job: integration:run_lts
+      optional: true
+    - job: integration:run_dev
+      optional: true
+  extends: [.functional_tests_rules]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  before_script:
+    - git rm -r tests/test_utils/local_recipes || true
+    - git submodule add --force https://gitlab-ci-token:${CI_JOB_TOKEN}@${GITLAB_ENDPOINT}/ADLR/megatron-lm-convergence-tests.git tests/test_utils/local_recipes
+    - ls tests/test_utils/local_recipes
+  script:
+    - set -x
+    - |
+      A100_CLUSTER=$([[ "$CLUSTER_A100" != "" ]] && echo $CLUSTER_A100 || echo $DEFAULT_A100_CLUSTER)
+      H100_CLUSTER=$([[ "$CLUSTER_H100" != "" ]] && echo $CLUSTER_H100 || echo $DEFAULT_H100_CLUSTER)
+    - |
+      RECORD_CHECKPOINTS=$([[ "$CI_MERGE_REQUEST_LABELS" == *"Record checkpoints"* || "$FUNCTIONAL_TEST_RECORD_CHECKPOINTS" == "yes" ]] && echo "true" || echo "false")
+    - |
+      if [[ "$FUNCTIONAL_TEST_SCOPE" == "release" || "$FUNCTIONAL_TEST_SCOPE" == "pre-release" ]]; then
+        FUNCTIONAL_TEST_NAME=$(eval echo $FUNCTIONAL_TEST_NAME)
+        RELEASE_ARGS=(
+          "--run-name"
+          $FUNCTIONAL_TEST_NAME
+          "--wandb-experiment"
+          $(echo $FUNCTIONAL_TEST_NAME | tr '/' '-')
+        )
+      else
+        RELEASE_ARGS=()
+      fi
+    - |
+      ARGS=(
+        "--scope $FUNCTIONAL_TEST_SCOPE"
+        "--n-repeat $FUNCTIONAL_TEST_REPEAT"
+        "--time-limit $FUNCTIONAL_TEST_TIME_LIMIT"
+        "--test-cases $FUNCTIONAL_TEST_CASES"
+        "--a100-cluster $A100_CLUSTER"
+        "--h100-cluster $H100_CLUSTER"
+        "--container-image ${UTILITY_IMAGE}"
+        "--container-tag ${CI_PIPELINE_ID}"
+        "--dependent-job functional:configure"
+        "--record-checkpoints ${RECORD_CHECKPOINTS}"
+        "--slurm-account ${CI_SLURM_ACCOUNT}"
+        "--no-enable-warmup"
+      )
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment dev \
+        --output-path "functional-test-job-dev.yaml" \
+        ${RELEASE_ARGS[@]}
+    - |
+      export PYTHONPATH=$(pwd)
+      python tests/test_utils/python_scripts/generate_jet_trigger_job.py \
+        ${ARGS[@]} \
+        --environment lts \
+        --output-path "functional-test-job-lts.yaml" \
+        ${RELEASE_ARGS[@]}
+  artifacts:
+    paths:
+      - functional-test-job-lts.yaml
+      - functional-test-job-dev.yaml
+      - tests/test_utils/local_recipes
+.functional_run:
+  needs: [functional:configure]
+  extends: [.functional_tests_rules]
+  trigger:
+    include:
+      - artifact: functional-test-job-$ENVIRONMENT.yaml
+        job: functional:configure
+    strategy: depend
+  variables:
+    RO_API_TOKEN: $PAT
+    CONTAINER_TAG: $CI_PIPELINE_ID
+    CI_MCORE_LTS_IMAGE: $CI_MCORE_LTS_IMAGE
+    GITLAB_ENDPOINT: $GITLAB_ENDPOINT
+    PARENT_PIPELINE_ID: $CI_PIPELINE_ID
+  inherit:
+    variables: true
+functional:run_lts:
+  extends: [.functional_run]
+  variables:
+    ENVIRONMENT: lts
+functional:run_dev:
+  extends: [.functional_run]
+  variables:
+    ENVIRONMENT: dev
+functional:run_nemo:
+  extends: [.functional_tests_rules]
+  trigger:
+    project: "dl/joc/nemo-ci"
+    branch: main-mirror
+    strategy: depend
+  inherit:
+    variables: true
+  variables:
+    MCORE_COMMIT: $CI_COMMIT_SHA
+    TEST_LLM_MODULE: "True"
+    TEST_ALIGNER_MODULE: "False"
+    TEST_DATA_CURATOR_MODULE: "False"
+    TESTS_TO_RUN_ON_THIS_COMMIT: nightly
+  rules:
+    - if: $FUNCTIONAL_TEST == "yes"
+      when: manual
+      allow_failure: true
+    - when: never
+functional:x_notify:
+  extends: [.functional_tests_rules]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  needs:
+    - functional:run_lts
+    - functional:run_dev
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  variables:
+    WEBHOOK_URL: ${MCORE_NOTIFICATION_HOOK}
+    RO_API_TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}
+    CONTEXT: $FUNCTIONAL_TEST_SCOPE
+  script:
+    - env
+    - export WEBHOOK_URL=${MCORE_NOTIFICATION_HOOK}
+    - export RO_API_TOKEN=${PROJECT_ACCESS_TOKEN_MCORE}
+    - export GITLAB_ENDPOINT
+    - export CONTEXT=$FUNCTIONAL_TEST_SCOPE
+    - export TAG_TEAM=$([[ "$CI_COMMIT_BRANCH" == "main" ]] && echo "1" || "0")
+    - export TEAM_SLUG=$SLACK_ADMIN
+    - |
+      python tests/test_utils/python_scripts/notify.py \
+        --pipeline-id "${CI_PIPELINE_ID}" \
+        --check-for functional-tests \
+        --pipeline-context $CONTEXT \
+        --pipeline-created-at "${CI_PIPELINE_CREATED_AT}"
+  artifacts:
+    when: always
+    paths:
+      - scripts
+  rules:
+    - if: ($CI_PIPELINE_SOURCE == "schedule" || $CI_COMMIT_BRANCH == "main") && $FUNCTIONAL_TEST == "yes"
+      when: always
+    - when: never
+functional:x_download_golden_values:
+  extends: [.functional_tests_rules]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  script:
+    - env
+    - export RO_API_TOKEN=${PROJECT_ACCESS_TOKEN_MCORE}
+    - export GITLAB_ENDPOINT
+    - python tests/test_utils/python_scripts/download_golden_values.py --pipeline-id ${CI_PIPELINE_ID}
+  artifacts:
+    paths:
+      - tests/
+  rules:
+    - if: $FUNCTIONAL_TEST == "yes"
+      when: manual
+      allow_failure: true
+    - when: never
--- a/Megatron-LM/.gitlab/stages/05.publish.yml
+++ b/Megatron-LM/.gitlab/stages/05.publish.yml
+.publish_common_freeze:
+  stage: publish
+  rules:
+    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH && $PUBLISH == "yes" && $PUBLISH_SCOPE == "code-freeze"
+      when: manual
+    - when: never
+.publish_common_release:
+  stage: publish
+  rules:
+    - if: $CI_COMMIT_BRANCH =~ /^core_r/ && $PUBLISH == "yes" && $PUBLISH_SCOPE == "release"
+      when: manual
+    - if: $PUBLISH == "yes" && $PUBLISH_SCOPE == "release"
+      when: manual
+    - when: never
+publish:test_release_pypi_build_wheel:
+  extends: [.test_rules]
+  stage: publish
+  image:
+    name: ${IMAGE}
+    entrypoint: [""]
+  services:
+    - name: docker:24.0.5-dind
+      variables:
+        HEALTHCHECK_TCP_PORT: "2376"
+  needs: [test:build_image]
+  parallel:
+    matrix:
+      - PLATFORM: arm64
+        IMAGE: quay.io/pypa/manylinux_2_28_aarch64
+      - PLATFORM: amd64
+        IMAGE: quay.io/pypa/manylinux_2_28_x86_64
+  tags:
+    - arch/${PLATFORM}
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/builder-small
+    - team/megatron
+  variables:
+    PY_ENV: pytorch_25.03
+    KUBERNETES_SERVICE_MEMORY_REQUEST: 16Gi
+    KUBERNETES_SERVICE_MEMORY_LIMIT: 16Gi
+    PUBLISH_DRYRUN: "yes"
+    KUBERNETES_SERVICE_CPU_REQUEST: 4
+    KUBERNETES_SERVICE_CPU_LIMIT: 8
+  before_script:
+    - env
+    - eval PUBLISH_COMMIT=$PUBLISH_COMMIT
+    - env
+    - git fetch origin $PUBLISH_COMMIT
+    - git checkout $PUBLISH_COMMIT
+  script:
+    - echo $PUBLISH_DRYRUN
+    - |
+      if [ "$PUBLISH_DRYRUN" = "yes" ]; then
+        PRE_RELEASE=$(sed -n "s/.*PRE_RELEASE = '\(.*\)'/\1/p" megatron/core/package_info.py)
+        sed -i "/^PRE_RELEASE/c\PRE_RELEASE = '${PRE_RELEASE}.dev$((RANDOM % 900000 + 100000))'" megatron/core/package_info.py 
+      fi
+    - /opt/python/cp310-cp310/bin/python -m build
+    - /opt/python/cp311-cp311/bin/python -m build
+    - auditwheel repair dist/*.whl
+    - rm -rf dist/*.whl
+    - pushd megatron/core
+    - EXPECTED_RELEASE_NUMBER=$(/opt/python/cp311-cp311/bin/python -c "import package_info; print(package_info.__version__)")
+    - popd
+    - echo "EXPECTED_RELEASE_NUMBER_$PLATFORM=$EXPECTED_RELEASE_NUMBER" | tee -a build.env
+  artifacts:
+    paths:
+      - megatron/core/package_info.py
+      - wheelhouse/
+      - dist/
+    reports:
+      dotenv: build.env
+  retry:
+    max: 2
+publish:test_release_pypi_test_wheel:
+  extends: [.test_rules]
+  stage: publish
+  image:
+    name: python:3.11
+    entrypoint: [""]
+  needs: [publish:test_release_pypi_build_wheel]
+  parallel:
+    matrix:
+      - PLATFORM: arm64
+      - PLATFORM: amd64
+  services:
+    - name: docker:24.0.5-dind
+      variables:
+        HEALTHCHECK_TCP_PORT: "2376"
+  tags:
+    - arch/${PLATFORM}
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/builder-small
+    - team/megatron
+  variables:
+    KUBERNETES_SERVICE_MEMORY_REQUEST: 16Gi
+    KUBERNETES_SERVICE_MEMORY_LIMIT: 16Gi
+    KUBERNETES_SERVICE_CPU_REQUEST: 4
+    KUBERNETES_SERVICE_CPU_LIMIT: 8
+    GIT_STRATEGY: none
+    PUBLISH_DRYRUN: "yes"
+  script:
+    - rm -rf megatron
+    - pip install -U --no-cache-dir pip
+    - |
+      if [[ "$PLATFORM" == "arm64" ]]; then
+        pip install --no-cache-dir wheelhouse/*cp311*aarch64.whl
+      else
+        pip install --no-cache-dir wheelhouse/*cp311*x86_64.whl
+      fi
+    - RELEASE_NUMBER=$(python -c "from megatron import core; print(core.__version__)")
+    - |
+      if [[ "$PLATFORM" == "arm64" ]]; then
+        test "$EXPECTED_RELEASE_NUMBER_arm64" == "$RELEASE_NUMBER"
+      else
+        test "$EXPECTED_RELEASE_NUMBER_amd64" == "$RELEASE_NUMBER"
+      fi
+    - echo "RELEASE_NUMBER=$RELEASE_NUMBER" | tee -a build.env
+  artifacts:
+    reports:
+      dotenv: build.env
+    paths:
+      - wheelhouse/
+      - dist/
+  retry:
+    max: 2
+publish:test_release_pypi_push_wheel:
+  extends: [.test_rules]
+  image: python:3.11
+  stage: publish
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  needs: [publish:test_release_pypi_test_wheel]
+  variables:
+    GIT_STRATEGY: none
+    PUBLISH_DRYRUN: "yes"
+  timeout: 3m
+  script:
+    - echo $PUBLISH_DRYRUN
+    - |
+      if [ "$PUBLISH_DRYRUN" = "yes" ]; then
+        REPOSITORY=testpypi
+        export TWINE_USERNAME=$TWINE_TEST_USERNAME
+        export TWINE_PASSWORT=$TWINE_TEST_PASSWORD
+      else
+        REPOSITORY=pypi
+        export TWINE_USERNAME=$TWINE_PROD_USERNAME
+        export TWINE_PASSWORT=$TWINE_PROD_PASSWORD
+      fi
+    - ls -al dist/
+    - ls -al wheelhouse/
+    - pip install twine
+    - |
+      if [[ "$PUBLISH_DRYRUN" != "yes" ]]; then
+        twine upload --verbose -u $TWINE_USERNAME -p $TWINE_PASSWORT --repository   $REPOSITORY wheelhouse/* dist/*
+      fi
+  rules:
+    - if: $UNIT_TEST == 'yes' && $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
+      allow_failure: true
+      when: on_success
+    - when: on_success
+      allow_failure: true
+publish:test_release_github:
+  extends: [.test_rules]
+  needs: [publish:test_release_pypi_test_wheel]
+  stage: publish
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  image: nentangso/alpine-git-curl-jq
+  before_script:
+    - eval PUBLISH_COMMIT=$PUBLISH_COMMIT
+    - git fetch origin $PUBLISH_COMMIT
+    - git checkout $PUBLISH_COMMIT
+  variables:
+    PUBLISH_DRYRUN: "yes"
+  script:
+    - echo $PUBLISH_DRYRUN
+    - NAME="NVIDIA Megatron Core $RELEASE_NUMBER"
+    - IS_PRERELEASE=$([[ "$RELEASE_NUMBER" == *rc* ]] && echo "true" || echo "false")
+    - |
+      if [[ "$IS_PRERELEASE" == "true" ]]; then
+        DATE=$(date +"%Y-%m-%d")
+        CHANGELOG="Prerelease: $NAME ($DATE)"
+      else
+        CHANGELOG=$(awk '/^## '"$NAME"'/{flag=1; next} /^## /{flag=0} flag' CHANGELOG.md)
+        CHANGELOG=$(echo "$CHANGELOG" | sed '/./!d')
+      fi
+    - |
+      PAYLOAD=$(jq -nc \
+                  --arg TAG_NAME "v${RELEASE_NUMBER}" \
+                  --arg CI_COMMIT_SHA "$PUBLISH_COMMIT" \
+                  --arg NAME "$NAME" \
+                  --arg BODY "$CHANGELOG" \
+                  --argjson PRERELEASE "$IS_PRERELEASE" \
+                  '{
+                      "tag_name": $TAG_NAME,
+                      "target_commitish": $CI_COMMIT_SHA,
+                      "name": $NAME,
+                      "body": $BODY,
+                      "draft": false,
+                      "prerelease": $PRERELEASE,
+                      "generate_release_notes": false
+                  }'
+              )
+      echo -E "$PAYLOAD" | tee -a payload.txt
+    - cat payload.txt
+    - |
+      CMD=$(echo -E 'curl -L \
+        -X POST \
+        -H "Accept: application/vnd.github+json" \
+        -H "Authorization: Bearer '"$GH_TOKEN"'" \
+        -H "X-GitHub-Api-Version: 2022-11-28" \
+        https://api.github.com/repos/NVIDIA/Megatron-LM/releases \
+        -d @payload.txt
+      ')
+    - |
+      if [[ "$PUBLISH_DRYRUN" == "yes" ]]; then
+        echo -E "$CMD"
+      else
+        eval "$CMD"
+      fi
+publish:test_release_notify:
+  needs: [publish:test_release_pypi_test_wheel, publish:test_release_pypi_push_wheel, publish:test_release_github]
+  extends: [.test_rules]
+  image: badouralix/curl-jq
+  stage: publish
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  variables:
+    PUBLISH_DRYRUN: "yes"
+  script:
+    - echo $PUBLISH_DRYRUN
+    - URL="https://github.com/NVIDIA/Megatron-LM/releases/tag/core_r$RELEASE_NUMBER"
+    - |
+      MESSAGE='{
+          "blocks": [
+            {
+              "type": "section",
+              "text": {
+                "type": "mrkdwn",
+                    "text": "Releasebot 🤖: Megatron-Core released <'$URL'|core_r'"$RELEASE_NUMBER"'> 🚀"
+              }
+            }
+          ]
+        }'
+    - echo "$MESSAGE"
+    - |
+      CMD=$(echo curl \
+        -X POST \
+        -H "Content-type: application/json" \
+        --data "$MESSAGE" ${MCORE_NOTIFICATION_HOOK_MAIN}
+      )
+      if [[ "$PUBLISH_DRYRUN" == "yes" ]]; then
+        echo "$CMD"
+      else
+        eval "$CMD"
+      fi
+publish:test_release_version_bump:
+  needs: [publish:test_release_pypi_test_wheel, publish:test_release_pypi_push_wheel, publish:test_release_github]
+  extends: [.test_rules]
+  image: nentangso/alpine-git-curl-jq
+  stage: publish
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  before_script:
+    - eval PUBLISH_COMMIT=$PUBLISH_COMMIT
+    - eval PUBLISH_VERSION_BUMP_BRANCH=$PUBLISH_VERSION_BUMP_BRANCH
+    - git fetch origin $PUBLISH_COMMIT
+    - git checkout $PUBLISH_COMMIT
+  variables:
+    PUBLISH_DRYRUN: "yes"
+  script:
+    - env
+    - echo $PUBLISH_DRYRUN
+    - MAJOR=$(cat megatron/core/package_info.py | awk '/^MAJOR = /' | awk -F"= " '{print $2}')
+    - MINOR=$(cat megatron/core/package_info.py | awk '/^MINOR = /' | awk -F"= " '{print $2}')
+    - PATCH=$(cat megatron/core/package_info.py | awk '/^PATCH = /' | awk -F"= " '{print $2}')
+    - PRERELEASE=$(cat megatron/core/package_info.py | awk '/^PRE_RELEASE = /' | awk -F"= " '{print $2}' | tr -d '"' | tr -d "'")
+    - |
+      if [[ "$PRERELEASE" != "" ]]; then
+        NEXT_PATCH=$PATCH
+        NEXT_PRERELEASE=rc$((${PRERELEASE#rc} + 1))
+      else
+        NEXT_PATCH=$((${PATCH} + 1))
+        NEXT_PRERELEASE=$NEXT_PRERELEASE
+      fi
+    - sed -i "/^PATCH/c\PATCH = $NEXT_PATCH" megatron/core/package_info.py
+    - sed -i "/^PRE_RELEASE/c\PRE_RELEASE = '$NEXT_PRERELEASE'" megatron/core/package_info.py
+    - git config --global user.email "mcore-bot@nvidia.com"
+    - git config --global user.name "Mcore Bot"
+    - git remote set-url origin "https://gitlab-ci-token:${PAT}@${GITLAB_ENDPOINT}/$CI_PROJECT_NAMESPACE/megatron-lm.git"
+    - |
+      CMD=$(
+        cat <<'EOF'
+          git switch --force-create bot/chore/bump-version && \
+          git add megatron/core/package_info.py && \
+          git commit -m "chore: adjust version version" && \
+          git push -f -u origin bot/chore/bump-version && \
+          curl \
+            --header "PRIVATE-TOKEN: $PAT" \
+            --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests" \
+            -d "source_branch=bot/chore/bump-version" \
+            -d "target_branch=$PUBLISH_VERSION_BUMP_BRANCH" \
+            -d "title=chore: Fix version of \`$PUBLISH_VERSION_BUMP_BRANCH\`" \
+            -d "description=[🤖]: Hi @okoenig 👋,<br><br>we've adjusted the version number of \`$PUBLISH_VERSION_BUMP_BRANCH\` for you! 🚀<br><br>Please review and approve this cherry pick by your convenience\!"
+      EOF
+      )
+    - |
+      if [[ "$PUBLISH_DRYRUN" == "yes" ]]; then
+        echo "$CMD"
+      else
+        eval "$CMD"
+      fi
+publish:code_freeze:
+  extends: [.publish_common_freeze]
+  image: ${CI_MCORE_LTS_IMAGE}:${CI_PIPELINE_ID}
+  needs: [test:build_image]
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  variables:
+    GIT_STRATEGY: "none"
+  script:
+    - git fetch origin $CI_DEFAULT_BRANCH
+    - git config --global user.email "mcore-bot@nvidia.com"
+    - git config --global user.name "Mcore Bot"
+    - git remote set-url origin "https://gitlab-ci-token:${PAT}@${GITLAB_ENDPOINT}/$CI_PROJECT_NAMESPACE/megatron-lm.git"
+    - sed -i "/^PRE_RELEASE/c\PRE_RELEASE = ''" megatron/core/package_info.py
+    - VERSION=$(python -c "from megatron import core; print(core.__version__)")
+    - RELEASE_BRANCH=core_r$VERSION
+    - git switch --force-create $RELEASE_BRANCH origin/$CI_DEFAULT_BRANCH
+    - |
+      MESSAGE='{
+        "blocks": [
+          {
+            "type": "section",
+            "text": {
+              "type": "mrkdwn",
+              "text": "Releasebot 🤖: Megatron Core has been frozen 🎉 to branch `'"$RELEASE_BRANCH"'`"
+            }
+          }
+        ]
+      }'
+    - |
+      curl -X POST -H "Content-type: application/json" --data "$MESSAGE" ${MCORE_NOTIFICATION_HOOK_MAIN}
+    - git switch --force-create bot/chore/bump-version
+    - git add megatron/core/package_info.py
+    - |
+      git commit -m "chore: adjust version version"
+    - git push -u origin bot/chore/bump-version
+    - |
+      curl \
+        --header "PRIVATE-TOKEN: $PAT" \
+        --url https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests \
+        -d "source_branch=bot/chore/bump-version" \
+        -d "target_branch=$RELEASE_BRANCH" \
+        -d "title=chore: Fix version of \`$RELEASE_BRANCH\`" \
+        -d "description=[🤖]: Hi @okoenig 👋,<br><br>we've adjusted the version number of \`$RELEASE_BRANCH\` for you! 🚀<br><br>Please review and approve this cherry pick by your convenience\!"
+publish:release_pypi_build_wheel:
+  extends: [publish:test_release_pypi_build_wheel, .publish_common_release]
+  dependencies: []
+  variables:
+    PUBLISH_DRYRUN: "no"
+publish:release_pypi_test_wheel:
+  extends: [publish:test_release_pypi_test_wheel, .publish_common_release]
+  needs: [publish:release_pypi_build_wheel]
+  variables:
+    PUBLISH_DRYRUN: "no"
+publish:release_pypi_push_wheel:
+  extends: [publish:test_release_pypi_push_wheel, .publish_common_release]
+  needs: [publish:release_pypi_test_wheel]
+  dependencies: [publish:release_pypi_test_wheel]
+  variables:
+    PUBLISH_DRYRUN: "no"
+publish:release_github:
+  extends: [publish:test_release_github, .publish_common_release]
+  dependencies: [publish:release_pypi_test_wheel]
+  needs: [publish:release_pypi_test_wheel]
+  variables:
+    PUBLISH_DRYRUN: "no"
+publish:release_version_bump:
+  needs: [publish:release_pypi_test_wheel]
+  extends: [publish:test_release_version_bump, .publish_common_release]
+  variables:
+    PUBLISH_DRYRUN: "no"
+publish:release_notify:
+  needs: [publish:release_pypi_test_wheel, publish:release_pypi_push_wheel, publish:release_github]
+  extends: [publish:test_release_notify, .publish_common_release]
+  dependencies: [publish:release_pypi_test_wheel]
+  variables:
+    PUBLISH_DRYRUN: "no"
+publish:docs:
+  extends: [.publish_common_release]
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  script:
+    - cd ..
+    - rm -rf documentation && git clone --recursive https://gitlab-ci-token:${PROJECT_ACCESS_TOKEN_MCORE}@${GITLAB_ENDPOINT}/nemo-megatron-core-tme/documentation.git
+    - cd documentation/megatron-lm
+    - git fetch origin '+refs/merge-requests/*:refs/remotes/merge-requests/*'
+    - git fetch origin $PUBLISH_COMMIT
+    - git checkout $PUBLISH_COMMIT
+    - cd ..
+    - git add megatron-lm
+    - |
+      git commit -m 'feat: Bump mcore'
+    - git push
+publish:upload_statistics:
+  stage: publish
+  image: ${UTILITY_IMAGE}:${CI_PIPELINE_ID}
+  needs:
+    - job: test:unit_tests_pyt(DEV)_mcore(legacy)
+      optional: true
+    - job: test:unit_tests_pyt(LTS)_mcore(legacy)
+      optional: true
+    - job: test:unit_tests_pyt(DEV)_mcore(latest)
+    - job: test:unit_tests_pyt(LTS)_mcore(latest)
+    - job: functional:run_lts
+      optional: true
+    - job: functional:run_dev
+      optional: true
+  tags:
+    - arch/amd64
+    - env/prod
+    - origin/jet-fleet
+    - owner/jet-core
+    - purpose/utility
+    - team/megatron
+  script:
+    - env
+    - export RO_API_TOKEN=${PROJECT_ACCESS_TOKEN_MCORE}
+    - export GITLAB_ENDPOINT
+    - export DASHBOARD_ENDPOINT
+    - python tests/test_utils/python_scripts/dashboard.py --pipeline-id ${CI_PIPELINE_ID}
+  rules:
+    - if: ($CI_PIPELINE_SOURCE == 'merge_request_event' || $CI_MERGE_REQUEST_EVENT_TYPE == 'merge_train') && ($UNIT_TEST == "yes" || $INTEGRATION_TEST == "yes" || $FUNCTIONAL_TEST == "yes")
+      when: always
+      allow_failure: true
+    - when: never
--- a/Megatron-LM/.pylintrc
+++ b/Megatron-LM/.pylintrc
+[MAIN]
+ignore-paths=tests
+max-line-length=100
+[MESSAGES CONTROL]
+disable=all
+enable=C0115,C0116,W0611,C0301,E0606
+# C0115: missing-class-docstring
+# C0116: missing-function-docstring
+# W0611: unused-import
+# C0301: line-too-long
+# E0606: possibly-used-before-assignment
\ No newline at end of file
--- a/Megatron-LM/CHANGELOG.md
+++ b/Megatron-LM/CHANGELOG.md
+# Changelog
+## NVIDIA Megatron Core 0.12.0
+- Add FP8 recipe selection to arguments (--fp8-recipe, --first-last-layers-bf16, --num-layers-at-start-in-bf16, --num-layers-at-end-in-bf16)
+- Context parallel: fix loss scaling when calculate_per_token_loss=True
+- Make the number of data parallel communication buckets configurable (--ddp-num-buckets, --ddp-pad-buckets-for-high-nccl-busbw)
+- Inference
+  - Support in-flight batching and chunked KV cache
+  - Reduce memory usage,
+    - by not materializing full attention mask
+    - by only materializing logits for the last token during decode
+    - by removing an obsolete tensor reference
+- Hybrid Model
+  - Inference
+    - Add CUDA graph support
+    - Change tools/run_mamba_text_generation_server.py to use megatron.core.inference
+    - Fix a shape issue when materializing logits for Mamba model
+  - Improve initialization of Mamba layers
+  - Add configuration switches (--mamba-state-dim, --mamba-head-dim, --mamba-num-groups, --is-hybrid-model)
+  - Make num_floating_point_operations work with hybrid model
+  - Make hybrid_conversion.py work with mixer that uses TE linear
+  - Add FP8 support
+  - Fix Mamba dt_bias tensor parallelism
+  - Support multimodal tokenizer
+  - Improve data parallelism scaling
+- MoE
+  - Features:
+    - DeepEP support, compatible with all the parallelisms and token drop / dropless
+    - Important precision improvement: Enable FP32/FP64 routing and unpermutation using –moe-router-dtype. FP32 is recommended for all fine-grained MoE training
+    - CUDA Graph support for MoE
+    - Multi-Token Prediction (MTP) Support
+    - Fused indices_to_multihot kernel for DeepEP dispatcher
+  - Bug fixes:
+    - Fix Hang Issue with MoE+Dense Hybrid models
+    - Update theoretical memory and tflops estimation for MoE and MLA
+    - Fix MoE Aux loss scaling for per token loss
+    - Fixes for group limited routing and expert bias. We verified these fixes through dsv3 e2e verifications
+  - Known issues:
+    - The ckpt trained with Custom FSDP for MoE may not be compatible with 3D parallel training.
+## NVIDIA Megatron Core 0.11.0
+- Add multi datacenter training support though N/S connection
+- MoE
+  - Features
+    - Support DeepSeek-V3 fine-tuning
+      - Aux-loss-free load balancing strategy
+      - Node-limited routing and Device-limited routing support.
+      - Tensor Parallelism support for MLA and Sequence Auxiliary Loss
+      - MTP (with TP and PP support) is coming soon.
+    - Permutation / Unpermutation fusion kernel from TransformerEngine.
+    - Uneven virtual pipeline parallel split support in first and last PP stage.
+  - Bug fixes:
+    - Fix the grad scale when TP != expert-TP and average_in_collective is enabled in DDP.
+    - Fix TEGroupedMLP distckpt compatibility issue with FP8 padding/unpadding.
+  - Known Issues:
+    - When training the Dense+MoE hybrid model, the process will hang if any PP rank does not have expert params.
+- Add MX-FP16 support for optimizer and master weights
+- CUDA Graph memory optimizations
+- Enable UCC backend for PP communication
+- Optimizer CPU offload support for memory savings
+- Models
+  - Initial RADIO/CRADIO implementation
+  - llama3.2 support
+- Hybrid Model
+  - Support quantization via TensorRT Model Optimizer
+## NVIDIA Megatron Core 0.10.0
+- Adding MLA to MCore
+- Enable FP8 for GroupedMLP
+- MoE Parallel Folding
+- Enhance MoE Architecture: Support MoE Layer Frequency Patterns and Configurable MoE FFN Hidden Size
+- Multimodal: NVLM training and evaluation support in MCore
+- Mamba Hybrid
+  - Increase performance and reduce memory footprint of Triton language/compiler distributed caching
+  - Add more unit testing and fix bugs
+## NVIDIA Megatron Core 0.9.0
+- Uneven pipeline parallelism
+  - Enable pipeline parallelism where first and last ranks have fewer transformer layers than the intermediate ranks
+- Per layer CUDAGraph support for GPT training with Transformer Engine modules
+- Enable different TP sizes for the vision encoder
+- Enable pipeline parallelism for T5 & Llava models
+- Support multi-tile multi-image input in Llava models
+- MoE
+  - FP8 support
+  - Runtime upcycling support
+  - Dispatcher implementation optimizations
+  - Shared expert support with overlapping optimizations
+    - Qwen Model support
+- Known Issues
+  - When using sequence parallel, during the transformer block forward pass, dropout is not using the appropriate rng context.
+- NVRx / Fault tolerance
+  - fault and hang detection in addition to existing straggler detection
+  - graceful exit and auto restart
+## NVIDIA Megatron Core 0.8.0
+- Multimodal
+  - Added initial support for training vision language models using the LLaVA architecture
+  - Added initial support for inference with multimodal inputs
+  - End-to-end multimodal example from data collection to training to evaluation is provided in examples/multimodal
+- MoE
+  - Context Parallel support.
+  - Distributed checkpoint support for grouped GEMM.
+- Mamba
+## NVIDIA Megatron Core 0.7.0
+- MoE
+  - Token drop support
+  - Several efficiency optimizations
+  - Improved model parallelism
+  - Memory optimizations
+- Distributed checkpointing
+  - Enabled for Retro
+  - Asynchronous checkpoint saving
+- Several minor bug fixes, speed improvements, and memory optimizations
+## NVIDIA Megatron Core 0.6.0
+- MoE (Mixture of Experts)
+  - Performance optimization
+    - Communication optimization for multi GPU and Single GPU
+    - 23% improvement (323 TFLOPS/GPU) over MCore 0.5.0 on Mixtral with Hopper BF16
+    - GroupedMLP enhancement for Hopper
+    - DP Overlapping. Support overlapping computation with gradient reduction and parameter gathering.
+  - All-to-All based Token Dispatcher
+  - Layer-wise logging for load balancing loss.
+  - Improved expert parallel support including distributed optimizer.
+- Distributed optimizer
+- RETRO
+  - Data processing
+- BERT
+  - Distributed checkpointing
+- Dist checkpointing
+  - PyTorch native distributed backend
+  - Improved saving/loading speed
+- TensorRT-LLM Export
+  - Integration with TensorRT Model Optimizer Post-training quantization (PTQ)
+  - Text generation driver to perform PTQ in Megatron-LM
+  - Llama2 and Nemotron3-8b examples to use TensorRT-LLM unified build API to build engine after training.
+- Several minor enhancements, bug fixes, and documentation updates
+## NVIDIA Megatron Core 0.5.0
+### Key Features and Enhancements
+Megatron core documentation is now [live!](https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start)
+### Model Features
+- MoE (Mixture of Experts)
+  - Support for Z-loss, Load balancing and Sinkhorn
+  - Layer and communications refactor
+  - Richer parallelism mappings and EP can be combined with other model parallel techniques for larger MoE variants, e.g. EP + TP + DP + SP + PP
+  - Token dropless architecture with Top-K routing
+  - Performance optimization with with GroupedGEMM when number of local experts is > 1
+  - Distributed checkpointing
+- Interleaved rotary embedding
+### Datasets
+- Masked WordPiece datasets for BERT and T5
+- Raw and mock datasets
+### Parallelism
+### Performance
+- Activation offloading to CPU
+- Rope and Swiglu fusion
+- Sliding window attention (via Transformer Engine)
+### General Improvements
+- Timers
+## NVIDIA Megatron Core 0.4.0
+### Key Features and Enhancements
+#### Models
+- BERT
+- RETRO
+- T5
+#### Parallelism
+- Mixture of Experts support for GPT
+- Model parallel efficient Distributed Data Parallel (DDP)
+- Context Parallel (2D Tensor Parallel) support
+#### Datasets
+- GPT Dataset
+- Blended Dataset