Unverified Commit 0d040089 authored by Lianmin Zheng's avatar Lianmin Zheng Committed by GitHub
Browse files

[CI] Code sync tools (#9830)

parent 05e47872
name: Open A PR to Copy Code From OSS
on:
workflow_dispatch:
# schedule:
# - cron: '0 10 * * *'
permissions:
contents: write
jobs:
copy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: 'main'
- name: Install GitHub CLI (if not present)
run: |
bash scripts/code_sync/install_github_cli.sh
- name: Copy from OSS code
env:
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
run: |
python3 scripts/code_sync/copy_from_oss.py
name: Open A PR to Copy Diff To OSS
on:
workflow_dispatch:
inputs:
commit_sha:
description: 'The commit SHA to copy. Defaults to LAST to copy the latest commit.'
required: false
default: 'LAST'
permissions:
contents: write
jobs:
copy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install GitHub CLI (if not present)
run: |
bash scripts/code_sync/install_github_cli.sh
- name: Copy to OSS code
env:
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
run: |
python3 scripts/code_sync/copy_to_oss.py --commit ${{ github.event.inputs.commit_sha }}
"""
Sync code from OSS repo to the local repo and open a PR if changes exist.
NOTE:
1. You need to execute this script in the git root folder.
2. A GH_TOKEN environment variable is required to create the pull request.
- see also https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
This script will:
1. Clone the sgl-project/sglang repository (or use a local copy).
2. Sync specified files and directories using rsync.
3. Check if the sync operation resulted in any changes.
4. If there are changes:
a. Create a new branch.
b. Commit and push the changes.
c. Open a pull request using the GitHub CLI (gh).
Usage:
# Run the full sync and PR creation process
python3 scripts/copy_from_oss.py
# Perform a dry run without making any actual changes
python3 scripts/copy_from_oss.py --dry-run
# Use a local directory as the source instead of cloning
python3 scripts/copy_from_oss.py --local-dir ~/projects/sglang
"""
import argparse
import datetime
import os
import shutil
import subprocess
import tempfile
# --- Configuration Begin ---
# List of folders and files to copy from the OSS repo.
# Changes outside these paths will be ignored.
folder_names = [
"3rdparty",
"assets",
"benchmark",
"docker",
"docs",
"examples",
"sgl-kernel",
"README.md",
"python/sglang/lang",
"python/sglang/srt",
"python/sglang/test",
"test/lang",
"test/srt",
]
private_repo = "your-org/sglang-private-repo"
# --- Configuration End ---
def write_github_step_summary(content):
if not os.environ.get("GITHUB_STEP_SUMMARY"):
return
with open(os.environ["GITHUB_STEP_SUMMARY"], "a") as f:
f.write(content)
def check_dependencies():
"""Check for required command-line tools."""
if not shutil.which("git"):
raise EnvironmentError("git is not installed or not in PATH.")
if not shutil.which("gh"):
raise EnvironmentError("GitHub CLI (gh) is not installed or not in PATH.")
print("✅ All dependencies (git, gh) are available.")
def checkout_main(dry_run):
"""Checkout to the main branch."""
commands = [
"git checkout main",
"git reset --hard",
]
for cmd in commands:
print(f"Run: {cmd}")
if not dry_run:
try:
subprocess.run(cmd, shell=True, check=True, capture_output=True)
except subprocess.CalledProcessError as e:
print(f"Git command failed: {e.stderr.decode()}")
raise
print("✅ Checkout the main branch.")
def get_source_folder(args):
"""
Prepare the source repository, either by cloning from GitHub or using a local directory.
Returns the path to the source repo root, a temporary directory path (if created),
and the short commit hash.
"""
temp_dir = None
if args.local_dir:
oss_root = os.path.expanduser(args.local_dir)
if not os.path.exists(oss_root):
raise FileNotFoundError(
f"Specified local directory {oss_root} does not exist."
)
print(f"Using local directory as the source: {oss_root}")
else:
temp_dir = tempfile.mkdtemp()
oss_root = temp_dir
print(f"Created temporary directory: {oss_root}")
repo_url = "https://github.com/sgl-project/sglang.git"
try:
subprocess.run(
[
"git",
"clone",
"--single-branch",
"--branch",
"main",
repo_url,
temp_dir,
],
check=True,
capture_output=True,
)
print(f"Successfully cloned repository to {temp_dir}")
except subprocess.CalledProcessError as e:
print(f"Error cloning repository: {e.stderr.decode()}")
raise
commit_hash = subprocess.run(
["git", "-C", oss_root, "rev-parse", "HEAD"],
capture_output=True,
text=True,
check=True,
).stdout.strip()[:8]
print(f"✅ Get source OSS code at commit: {commit_hash}")
return oss_root, temp_dir, commit_hash
def sync_directories(oss_root, folder_names, dry_run):
"""Sync specified directories from oss_root to current working directory."""
rsync_commands = []
for folder_name in folder_names:
target_name = f"{oss_root}/{folder_name}"
src_name = "./" + "/".join(folder_name.split("/")[:-1])
cmd = f"rsync -r --delete {target_name} {src_name}"
rsync_commands.append(cmd)
for cmd in rsync_commands:
try:
print(f"Run: {cmd}")
if not dry_run:
subprocess.run(cmd, shell=True, check=True)
except subprocess.CalledProcessError as e:
print(f"Error executing command '{cmd}': {e}")
raise
print(f"✅ Sync all folders.")
def check_for_changes():
"""Check if there are any uncommitted git changes."""
# This command exits with 1 if there are changes, 0 otherwise.
result = subprocess.run(["git", "diff", "--quiet"])
return result.returncode != 0
def create_and_push_branch(branch_name, commit_message, dry_run):
"""Create a new branch, commit all changes, and push to origin."""
commands = [
f"git checkout -b {branch_name}",
"git config user.name 'github-actions[bot]'",
"git config user.email 'github-actions[bot]@users.noreply.github.com'",
"git add .",
f"git commit -m '{commit_message}'",
f"git push origin {branch_name} --force",
]
print("\nCreating and pushing git branch...")
for cmd in commands:
print(f"Run: {cmd}")
if not dry_run:
try:
subprocess.run(cmd, shell=True, check=True, capture_output=True)
except subprocess.CalledProcessError as e:
print(f"Git command failed: {e.stderr.decode()}")
raise
def create_pull_request(branch_name, title, body, dry_run):
"""Create a pull request using the GitHub CLI."""
gh_token = os.getenv("GH_TOKEN")
if not gh_token:
print(
"\n⚠️ Warning: GH_TOKEN environment variable not set. Skipping PR creation."
)
if not dry_run:
return
print("\nCreating pull request...")
command = [
"gh",
"pr",
"create",
"--base",
"main",
"--head",
branch_name,
"--repo",
private_repo,
"--title",
title,
"--body",
body,
]
print(f"Run: {' '.join(command)}")
if not dry_run:
env = os.environ.copy()
env["GH_TOKEN"] = gh_token
try:
result = subprocess.run(
command, check=True, capture_output=True, text=True, env=env
)
pr_url = result.stdout.strip()
msg = f"✅ Successfully created pull request: {pr_url}"
print(msg)
write_github_step_summary(msg)
except subprocess.CalledProcessError as e:
print(f"Error creating pull request: {e.stderr}")
raise
def main():
parser = argparse.ArgumentParser(
description="Copy code from OSS and open a PR if changes are detected."
)
parser.add_argument(
"--local-dir",
type=str,
help="Path to local SGLang directory to use instead of cloning from GitHub.",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Dry run the script without executing git, rsync, or gh commands.",
)
args = parser.parse_args()
check_dependencies()
checkout_main(args.dry_run)
oss_root, temp_dir, oss_commit = get_source_folder(args)
try:
# Sync directories
sync_directories(oss_root, folder_names, args.dry_run)
# Check for changes and create PR if necessary
if not check_for_changes():
msg = "😴 No changes detected. The code is already in sync."
print(msg)
write_github_step_summary(msg)
return
print("✅ Changes detected. Proceeding to create a PR.")
current_date = datetime.datetime.now().strftime("%Y%m%d")
branch_name = f"copy-from-oss-{oss_commit}-{current_date}"
commit_message = f"Copy OSS code from {oss_commit} on {current_date}"
pr_title = (
f"[Automated PR] Copy OSS code from commit {oss_commit} on {current_date}"
)
pr_body = (
f"Copy OSS code from https://github.com/sgl-project/sglang/commit/{oss_commit} on {current_date}."
"\n\n---\n\n"
"*This is an automated PR created by scripts/copy_from_oss.py.*"
)
create_and_push_branch(branch_name, commit_message, args.dry_run)
create_pull_request(branch_name, pr_title, pr_body, args.dry_run)
finally:
# Remove temporary directory if it was created
if temp_dir:
try:
shutil.rmtree(temp_dir)
print(f"\nRemoved temporary directory: {temp_dir}")
except OSError as e:
print(f"Error removing temporary directory {temp_dir}: {e}")
if __name__ == "__main__":
main()
"""
Sync a specific commit from the local private repo to the OSS upstream and open a PR.
NOTE:
1. You need to execute this script in the git root folder.
2. A GH_TOKEN environment variable is required to create the pull request.
- see also https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
This script will:
1. Take a commit hash as an argument (or use the latest commit by default).
2. Create a patch for that commit.
3. Filter the patch to only include changes in specified directories.
4. Clone the sgl-project/sglang repository.
5. Create a new branch in the OSS repo.
6. Apply the filtered patch, commit, and force push.
7. Open a pull request to the OSS repo using the GitHub CLI (gh).
Usage:
# Sync the latest commit from the current branch
python3 scripts/copy_to_oss.py
# Run the full sync and PR creation process for a given commit
python3 scripts/copy_to_oss.py --commit <commit_hash>
# Perform a dry run without making any actual changes
python3 scripts/copy_to_oss.py --commit <commit_hash> --dry-run
"""
import argparse
import datetime
import os
import shutil
import subprocess
import tempfile
# --- Configuration Begin ---
# List of folders and files to copy to the OSS repo.
# Changes outside these paths will be ignored.
folder_names = [
"3rdparty",
"assets",
"benchmark",
"docker",
"docs",
"examples",
"sgl-kernel",
"README.md",
"python/sglang/lang",
"python/sglang/srt",
"python/sglang/test",
"test/lang",
"test/srt",
]
# --- Configuration End ---
def write_github_step_summary(content):
if not os.environ.get("GITHUB_STEP_SUMMARY"):
return
with open(os.environ["GITHUB_STEP_SUMMARY"], "a") as f:
f.write(content)
def get_commit_info(commit_ref):
"""
Retrieves the hash and message of a specific commit.
Args:
commit_ref (str): The commit hash, tag, or branch to inspect (e.g., 'HEAD').
Returns:
A tuple containing the (commit_hash, commit_message),
or (None, None) if an error occurs.
"""
try:
# Use a custom format to get the hash (%H) and the full message (%B)
# separated by a null character for safe parsing.
command = ["git", "log", "-1", f"--pretty=%H%x00%B", commit_ref]
result = subprocess.run(
command, capture_output=True, text=True, check=True, encoding="utf-8"
)
# Split the output by the null character separator
commit_hash, commit_message = result.stdout.strip().split("\x00", 1)
return commit_hash, commit_message
except FileNotFoundError:
print("❌ Error: 'git' command not found. Is Git installed and in your PATH?")
except subprocess.CalledProcessError as e:
print(f"❌ Error getting commit info for '{commit_ref}': {e.stderr.strip()}")
print(
"Hint: Make sure you are running this from within a Git repository and the commit exists."
)
return None, None
def check_dependencies():
"""Check for required command-line tools."""
if not shutil.which("git"):
raise EnvironmentError("git is not installed or not in PATH.")
if not shutil.which("gh"):
raise EnvironmentError("GitHub CLI (gh) is not installed or not in PATH.")
print("✅ All dependencies (git, gh) are available.")
def create_filtered_patch(commit_hash, dry_run):
"""
Create a patch file for the given commit, containing only changes
to files and directories specified in `folder_names`.
"""
print(f"Creating a filtered patch for commit {commit_hash}")
try:
# Get the list of all files changed in the commit
changed_files_raw = subprocess.run(
["git", "diff-tree", "--no-commit-id", "--name-only", "-r", commit_hash],
capture_output=True,
text=True,
check=True,
).stdout
changed_files = changed_files_raw.strip().split("\n")
# Filter the list of files
relevant_files = [
f for f in changed_files if any(f.startswith(path) for path in folder_names)
]
if not relevant_files:
msg = "\n😴 No relevant file changes found in this commit. Exiting."
print(msg)
write_github_step_summary(msg)
return None, None
print("Found relevant changes in the following files:")
for f in relevant_files:
print(f" - {f}")
# Create a patch containing only the changes for the relevant files
patch_command = [
"git",
"format-patch",
"--stdout",
f"{commit_hash}^..{commit_hash}",
"--",
] + relevant_files
print(f"Run: {' '.join(patch_command)}")
patch_content = subprocess.run(
patch_command, capture_output=True, text=True, check=True
).stdout
# Save the patch to a temporary file
patch_file = tempfile.NamedTemporaryFile(
mode="w", delete=False, suffix=".patch", encoding="utf-8"
)
patch_file.write(patch_content)
patch_file.close()
print(f"✅ Filtered patch created successfully at: {patch_file.name}")
return patch_file.name, relevant_files
except subprocess.CalledProcessError as e:
print(f"Error creating patch: {e.stderr}")
raise
def get_oss_repo(dry_run):
"""
Clones the OSS repository into a temporary directory.
Returns the path to the repo root and the temp directory itself.
"""
gh_token = os.getenv("GH_TOKEN")
if not gh_token:
print("⚠️ Warning: GH_TOKEN environment variable not set. Skipping PR creation.")
if not dry_run:
return
temp_dir = tempfile.mkdtemp()
oss_root = os.path.join(temp_dir, "sglang")
print(f"\nCreated temporary directory for OSS repo: {temp_dir}")
repo_url = f"https://{gh_token}@github.com/sgl-project/sglang.git"
command = ["git", "clone", "--branch", "main", repo_url, oss_root]
print(f"Run: {' '.join(command)}")
if not dry_run:
try:
subprocess.run(command, check=True, capture_output=True)
print(f"✅ Successfully cloned repository to {oss_root}")
except subprocess.CalledProcessError as e:
print(f"Error cloning repository: {e.stderr.decode()}")
shutil.rmtree(temp_dir)
raise
return oss_root, temp_dir
def apply_patch_and_push(oss_root, patch_file, branch_name, commit_message, dry_run):
"""
In the OSS repo, create a branch, apply the patch, commit, and push.
"""
print("\nApplying patch and pushing to OSS repo...")
original_cwd = os.getcwd()
if not dry_run:
os.chdir(oss_root)
try:
# Define commands as lists to avoid shell injection issues
commands_to_run = [
["git", "checkout", "-b", branch_name],
["git", "apply", patch_file],
["git", "config", "user.name", "github-actions[bot]"],
[
"git",
"config",
"user.email",
"github-actions[bot]@users.noreply.github.com",
],
["git", "add", "."],
]
for cmd_list in commands_to_run:
print(f"Run: {' '.join(cmd_list)}")
if not dry_run:
subprocess.run(cmd_list, check=True, capture_output=True, text=True)
# Handle commit separately to pass multi-line message safely via stdin
commit_cmd = ["git", "commit", "-F", "-"]
print(f"Run: {' '.join(commit_cmd)}")
if not dry_run:
print(f"Commit Message:\n---\n{commit_message}\n---")
subprocess.run(
commit_cmd,
input=commit_message,
text=True,
check=True,
capture_output=True,
)
# Push the changes
push_cmd = ["git", "push", "origin", branch_name, "--force"]
print(f"Run: {' '.join(push_cmd)}")
if not dry_run:
subprocess.run(push_cmd, check=True, capture_output=True, text=True)
except subprocess.CalledProcessError as e:
print(f"Git command failed: {e.stderr}")
raise
finally:
if not dry_run:
os.chdir(original_cwd)
print("✅ Branch created, patch applied, and pushed successfully.")
def create_pull_request(oss_root, branch_name, title, body, dry_run):
"""Create a pull request in the OSS repo using the GitHub CLI."""
gh_token = os.getenv("GH_TOKEN")
if not gh_token:
print("⚠️ Warning: GH_TOKEN environment variable not set. Skipping PR creation.")
if not dry_run:
return
print("\nCreating pull request...")
command = [
"gh",
"pr",
"create",
"--base",
"main",
"--head",
branch_name,
"--repo",
"sgl-project/sglang",
"--title",
title,
"--body",
body,
]
print(f"Run: {' '.join(command)}")
if not dry_run:
env = os.environ.copy()
env["GH_TOKEN"] = gh_token
try:
result = subprocess.run(
command,
check=True,
capture_output=True,
text=True,
env=env,
cwd=oss_root,
)
msg = f"✅ Successfully created pull request: {result.stdout.strip()}"
print(msg)
write_github_step_summary(msg)
except subprocess.CalledProcessError as e:
print(f"Error creating pull request: {e.stderr}")
# Check if a PR already exists
if "A pull request for" in e.stderr and "already exists" in e.stderr:
print("ℹ️ A PR for this branch likely already exists.")
else:
raise
def get_commit_author(commit_hash):
"""Get the author name and email of a commit."""
try:
author_name = subprocess.run(
["git", "show", "-s", "--format=%an", commit_hash],
capture_output=True,
text=True,
check=True,
).stdout.strip()
author_email = subprocess.run(
["git", "show", "-s", "--format=%ae", commit_hash],
capture_output=True,
text=True,
check=True,
).stdout.strip()
return author_name, author_email
except subprocess.CalledProcessError as e:
print(f"Error getting commit author for {commit_hash}: {e.stderr}")
raise
def main():
parser = argparse.ArgumentParser(
description="Copy a commit from the private repo to OSS and open a PR."
)
parser.add_argument(
"--commit",
type=str,
default="LAST",
help="The commit hash to sync. Defaults to 'LAST' to use the latest commit.",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Dry run the script without executing git, rsync, or gh commands.",
)
args = parser.parse_args()
check_dependencies()
commit_ref = "HEAD" if args.commit == "LAST" else args.commit
commit_hash, original_commit_message = get_commit_info(commit_ref)
if not commit_hash:
return # Exit if we couldn't get commit info
# Display the details of the commit being processed
if args.commit == "LAST":
summary = (
f"\nℹ️ No commit specified. Using the last commit:\n"
f" - **Hash:** `{commit_hash}`\n"
f" - **Message:** {original_commit_message}\n\n"
)
else:
summary = (
f"\nℹ️ Using specified commit:\n"
f" - **Hash:** `{commit_hash}`\n"
f" - **Message:** {original_commit_message}\n\n"
)
print(summary)
write_github_step_summary(summary)
short_hash = commit_hash[:8]
patch_file = None
temp_dir = None
try:
# 1. Create a filtered patch from the local repo
patch_file, relevant_files = create_filtered_patch(commit_hash, args.dry_run)
if not patch_file:
return
# 2. Get the OSS repo
oss_root, temp_dir = get_oss_repo(args.dry_run)
# 3. Get original commit author for the co-author line
author_name, author_email = get_commit_author(commit_hash)
# 4. Prepare content for the commit and PR based on changed files
file_list_str = "\n".join([f"- {f}" for f in relevant_files])
filename_list_str = ", ".join([f.split("/")[-1] for f in relevant_files])
if len(filename_list_str) > 40:
filename_list_str = filename_list_str[:40] + "..."
current_date = datetime.datetime.now().strftime("%Y%m%d")
pr_title = f"[Auto Sync] Update {filename_list_str} ({current_date})"
pr_body = (
f"Sync changes from commit `{short_hash}`.\n\n"
f"**Relevant Files Changed:**\n{file_list_str}"
"\n\n---\n\n"
"*This is an automated PR created by a script.*"
)
# 5. Create branch, apply patch, and push
branch_name = f"sync-{short_hash}-{current_date}"
co_author_line = f"Co-authored-by: {author_name} <{author_email}>"
commit_message = f"{pr_title}\n\n{co_author_line}"
apply_patch_and_push(
oss_root, patch_file, branch_name, commit_message, args.dry_run
)
# 6. Create Pull Request
create_pull_request(oss_root, branch_name, pr_title, pr_body, args.dry_run)
finally:
# Cleanup temporary files
if patch_file and os.path.exists(patch_file):
os.remove(patch_file)
print(f"\nRemoved temporary patch file: {patch_file}")
if temp_dir and os.path.exists(temp_dir):
shutil.rmtree(temp_dir)
print(f"Removed temporary directory: {temp_dir}")
if __name__ == "__main__":
main()
### Sync Code Between OSS and Private Fork
We can use the following principals and tools to sync the code between the a private fork and the oss repo [sgl-project/sglang](https://github.com/sgl-project/sglang/tree/main).
## Principals
- The folder `python/sglang/srt` is 100% mirrored between the private fork and OSS repo.
- The OSS repo is the single source of truth. If one commit changes `python/sglang/srt` in the private repo, the change should be synced to the OSS repo as soon as possible with the action B below.
- The common code (e.g., base classes, well-known techniques in the industry without private secrets) goes to `python/sglang/srt`. The private-specific code (e.g., with private-specific features, confidential info) goes to `python/sglang/private` .
## How to sync the code bidirectionally
### Action A: Copy code from OSS to private
- We can run this action: [Open A PR to Copy Code From OSS](https://github.com/sgl-project/sglang/tree/main/.github/workflows/open-pr-copy-from-oss.yml)
- It opens a PR to copy all files under certain folders (e.g., `python/sglang/srt` , `test/srt` , `sgl-kernel` ) from the OSS main branch to the private fork.
- Since the OSS repo is the single source of truth, this action copies files and overwrites any changes in the private fork. To prevent the private changes from being overwritten, you need to ensure all private changes are merged into the OSS repo before running this action.
- This action will be run automatically everyday and can also be triggered manually.
### Action B: Copy diff from private to OSS
- We can run this action: [Open A PR to Copy Code To OSS](https://github.com/sgl-project/sglang/tree/main/.github/workflows/open-pr-copy-to-oss.yml)
- It opens a PR to apply the diff of one specific commit of the private fork to the OSS main branch. It will only pick the changes under certain folders (e.g., `python/sglang/srt` , `test/srt` , `sgl-kernel` ) and ignore changes under private folders (e.g., `python/sglang/private` )
- For example, you can have a PR that changes both `python/sglang/srt` and `python/sglang/private/srt`. Once you merge the PR into the private repo, `python/sglang/srt` becomes desynced between the two repos. You need to run this action on your merge commit immediately to open a PR to send your diff to the OSS repo. Then, we need to merge the OSS PR as soon as possible. Once your OSS PR is merged, we can run action A again.
- Action A copies files directly but Action B applies diff. This is because OSS is the source of truth, action A can just copy files. Action B cannot copy so it uses diff instead.
- This action currently needs manual trigger in order to prevent incidental code leak. One can also consider making it automatic.
#!/bin/bash
# Check if gh is installed before attempting to install it
if ! command -v gh &> /dev/null
then
echo "GitHub CLI not found. Installing now..."
(type -p wget >/dev/null || ( apt update && apt install wget -y)) \
&& mkdir -p -m 755 /etc/apt/keyrings \
&& out=$(mktemp) && wget -nv -O$out https://cli.github.com/packages/githubcli-archive-keyring.gpg \
&& cat $out | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& mkdir -p -m 755 /etc/apt/sources.list.d \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& apt update \
&& apt install gh -y
else
echo "GitHub CLI is already installed. Skipping installation."
fi
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment