# SGLang CI Monitor > **Note**: This README.md is primarily generated by Claude 4 with some manual adjustments. A comprehensive toolkit to analyze CI failures and performance trends for the SGLang project. This toolkit includes two main tools: 1. **CI Analyzer** (`ci_analyzer.py`): Analyzes CI failures and provides detailed failure pattern analysis 2. **Performance Analyzer** (`ci_analyzer_perf.py`): Tracks performance metrics over time and generates trend charts ## Features ### CI Analyzer (`ci_analyzer.py`) - **Simple Analysis**: Analyze recent CI runs and identify failure patterns - **Category Classification**: Automatically categorize failures by type (unit-test, performance, etc.) - **Pattern Recognition**: Identify common failure patterns (timeouts, build failures, etc.) - **CI Links**: Direct links to recent failed CI runs for detailed investigation - **Last Success Tracking**: Track the last successful run for each failed job with PR information - **JSON Export**: Export detailed analysis data to JSON format ### Performance Analyzer (`ci_analyzer_perf.py`) - **Performance Tracking**: Monitor performance metrics across CI runs over time - **Automated Chart Generation**: Generate time-series charts for each performance metric - **Multi-Test Support**: Track performance for all test types (throughput, latency, accuracy) - **CSV Export**: Export performance data in structured CSV format - **Trend Analysis**: Visualize performance trends with interactive charts - **Comprehensive Metrics**: Track output throughput, E2E latency, TTFT, accept length, and more - **Time-Based Sampling**: Intelligent sampling strategy to cover extended time periods (up to 30 days) with limited API calls ### Common Features - **Automated Monitoring**: GitHub Actions workflow for continuous CI and performance monitoring ## Installation ### For CI Analyzer No additional dependencies required beyond Python standard library and `requests`: ```bash pip install requests ``` ### For Performance Analyzer Additional dependencies required for chart generation: ```bash pip install requests matplotlib pandas ``` ## Usage ### CI Analyzer #### Basic Usage ```bash # Replace YOUR_GITHUB_TOKEN with your actual token from https://github.com/settings/tokens python ci_analyzer.py --token YOUR_GITHUB_TOKEN ``` #### Advanced Usage ```bash # Analyze last 1000 runs python ci_analyzer.py --token YOUR_GITHUB_TOKEN --limit 1000 # Custom output file python ci_analyzer.py --token YOUR_GITHUB_TOKEN --limit 500 --output my_analysis.json ``` ### Performance Analyzer #### Basic Usage ```bash # Analyze performance trends from recent CI runs python ci_analyzer_perf.py --token YOUR_GITHUB_TOKEN ``` #### Advanced Usage ```bash # Analyze last 1000 PR Test runs (auto-enables uniform sampling for ~30 days coverage) python ci_analyzer_perf.py --token YOUR_GITHUB_TOKEN --limit 1000 # Custom output directory python ci_analyzer_perf.py --token YOUR_GITHUB_TOKEN --limit 500 --output-dir my_performance_data # Use sampling with 500 runs (will use sequential mode since < 500 threshold) python ci_analyzer_perf.py --token YOUR_GITHUB_TOKEN --limit 500 # Get ALL performance data within a specific date range (recommended for historical analysis) python ci_analyzer_perf.py --token YOUR_GITHUB_TOKEN --start-date 2024-12-01 --end-date 2024-12-31 # Get complete data for the last week python ci_analyzer_perf.py --token YOUR_GITHUB_TOKEN --start-date $(date -d '7 days ago' +%Y-%m-%d) --end-date $(date +%Y-%m-%d) # Upload results to GitHub repository for sharing python ci_analyzer_perf.py --token YOUR_GITHUB_TOKEN --limit 1000 --upload-to-github ``` **Important**: Make sure your GitHub token has `repo` and `workflow` permissions, otherwise you'll get 404 errors. ## Data Collection Strategies The Performance Analyzer offers multiple strategies for collecting performance data to suit different analysis needs. ### 1. Uniform Sampling Strategy **When to use**: Daily monitoring and trend analysis over extended periods. - **Automatically enabled** when `--limit >= 500` - **Disabled** for smaller limits (< 500) to maintain backward compatibility #### How it works: - Collects data uniformly across a 30-day period - Ensures even time distribution of samples - Provides consistent coverage for trend analysis #### Example with 1000 Runs: - **Time Range**: Last 30 days - **Distribution**: 1000 samples evenly distributed across the period - **Coverage**: ~33 samples per day on average ### 2. Date Range Collection **When to use**: Historical analysis, specific period investigation, or complete data collection. Use `--start-date` and `--end-date` parameters to get **ALL** CI runs within a specific time range. #### Features: - **Complete Data**: Gets every CI run in the specified range (no sampling) - **No Limit**: Ignores the `--limit` parameter - **Flexible Range**: Specify any date range you need - **Historical Analysis**: Perfect for investigating specific time periods #### Date Format: - Use `YYYY-MM-DD` format (e.g., `2024-12-01`) - Both parameters are optional: - Only `--start-date`: Gets all runs from that date to now - Only `--end-date`: Gets all runs from 30 days ago to that date - Both: Gets all runs in the specified range ### 3. Sequential Collection (Traditional) **When to use**: Quick checks or when you only need recent data. - **Default behavior** for `--limit < 500` - Gets the most recent CI runs in chronological order - Fast and simple for immediate analysis ### Comparison | Strategy | Use Case | Time Coverage | Data Completeness | API Efficiency | |----------|----------|---------------|-------------------|----------------| | **Uniform Sampling** | Daily monitoring, trends | ~30 days | Sampled | High | | **Date Range** | Historical analysis | Any range | Complete | Variable | | **Sequential** | Quick checks | 3-4 days | Complete (recent) | High | ### Benefits - **Flexible Analysis**: Choose the right strategy for your needs - **Extended Coverage**: Up to 30 days with sampling, unlimited with date ranges - **Complete Data**: Get every run in a specific period when needed - **API Efficiency**: Optimized for different use patterns ## Parameters ### CI Analyzer Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `--token` | Required | GitHub Personal Access Token | | `--limit` | 100 | Number of CI runs to analyze | | `--output` | ci_analysis.json | Output JSON file for detailed data | ### Performance Analyzer Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `--token` | Required | GitHub Personal Access Token | | `--limit` | 100 | Number of PR Test runs to analyze (ignored when using date range) | | `--output-dir` | performance_tables | Output directory for CSV tables and PNG charts | | `--start-date` | None | Start date for date range query (YYYY-MM-DD format) | | `--end-date` | None | End date for date range query (YYYY-MM-DD format) | | `--upload-to-github` | False | Upload results to sglang-bot/sglang-ci-data repository | ## Getting GitHub Token 1. Go to [GitHub Settings > Personal Access Tokens](https://github.com/settings/tokens) 2. Click "Generate new token" > "Generate new token (classic)" 3. **Important**: Select the following permissions: - `repo` (Full control of private repositories) - **Required for accessing repository data** - `workflow` (Update GitHub Action workflows) - **Required for reading CI/CD data** 4. Copy the generated token and use it as `YOUR_GITHUB_TOKEN` **Note**: Without the `repo` and `workflow` permissions, the tool will not be able to access CI run data and will return 404 errors. ## Output ### CI Analyzer Output #### Console Output - Overall statistics (total runs, success rate, etc.) - Category failure breakdown - Most frequently failed jobs (Top 50) with direct CI links - Failure pattern analysis #### JSON Export Detailed analysis data including: - Complete failure statistics - Job failure counts - Workflow failure counts - Failure patterns - Recent failure details ### Performance Analyzer Output #### Console Output - Performance data collection progress - Summary statistics of collected tests and records - Generated file locations (CSV tables and PNG charts) #### File Outputs - **CSV Tables**: Structured performance data with columns: - `created_at`: Timestamp of the CI run - `run_number`: GitHub Actions run number - `pr_number`: Pull request number (if applicable) - `author`: Developer who triggered the run - `head_sha`: Git commit SHA - Performance metrics (varies by test type): - `output_throughput_token_s`: Output throughput in tokens/second - `median_e2e_latency_ms`: Median end-to-end latency in milliseconds - `median_ttft_ms`: Median time-to-first-token in milliseconds - `accept_length`: Accept length for speculative decoding tests - `url`: Direct link to the GitHub Actions run - **PNG Charts**: Time-series visualization charts for each metric: - X-axis: Time (MM-DD HH:MM format) - Y-axis: Performance metric values - File naming: `{test_name}_{metric_name}.png` #### Directory Structure ``` performance_tables/ ├── performance-test-1-gpu-part-1_summary/ │ ├── test_bs1_default.csv │ ├── test_bs1_default_output_throughput_token_s.png │ ├── test_online_latency_default.csv │ ├── test_online_latency_default_median_e2e_latency_ms.png │ └── ... ├── performance-test-1-gpu-part-2_summary/ │ └── ... └── performance-test-2-gpu_summary/ └── ... ``` ## Example Output ### CI Analyzer Example ``` ============================================================ SGLang CI Analysis Report ============================================================ Overall Statistics: Total runs: 1000 Successful: 392 Failed: 187 Cancelled: 181 Skipped: 150 Success rate: 39.2% Category Failure Statistics: unit-test: 351 failures accuracy: 84 failures performance: 55 failures deepep: 1 failures Most Frequently Failed Jobs (Top 50): 1. unit-test-backend-1-gpu-amd-mi35x (linux-mi35x-gpu-1): 32 times Last Success: Run #28893 (2025-09-24 13:35) by Xiaoze Fan: https://github.com/sgl-project/sglang/actions/runs/17978451434 Recent Failures: - Run #28958 (2025-09-25 01:51) (PR #1 by Yuhao Yao): https://github.com/sgl-project/sglang/actions/runs/17994520789 - Run #28957 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860400 - Run #28956 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826732 2. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 3): 31 times Last Success: Run #28903 (2025-09-24 15:38) by gholmes829: https://github.com/sgl-project/sglang/actions/runs/17981905113 Recent Failures: - Run #28958 (2025-09-25 01:51) (PR #1 by Yuhao Yao): https://github.com/sgl-project/sglang/actions/runs/17994520789 - Run #28957 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860400 - Run #28956 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826732 3. accuracy-test-2-gpu-amd (linux-mi35x-gpu-2): 29 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28958 (2025-09-25 01:51) (PR #1 by Yuhao Yao): https://github.com/sgl-project/sglang/actions/runs/17994520789 - Run #28957 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860400 - Run #28956 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826732 4. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 5): 23 times Last Success: Run #28906 (2025-09-24 15:43) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17982029749 Recent Failures: - Run #28958 (2025-09-25 01:51) (PR #1 by Yuhao Yao): https://github.com/sgl-project/sglang/actions/runs/17994520789 - Run #28956 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826732 - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 5. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 0): 23 times Last Success: Run #28893 (2025-09-24 13:35) by Xiaoze Fan: https://github.com/sgl-project/sglang/actions/runs/17978451434 Recent Failures: - Run #28956 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826732 - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 6. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 7): 18 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28956 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826732 - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 7. unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 3): 17 times Last Success: Run #28893 (2025-09-24 13:35) by Xiaoze Fan: https://github.com/sgl-project/sglang/actions/runs/17978451434 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 8. build-test (all): 16 times Last Success: Run #15748 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435618 Recent Failures: - Run #15824 (2025-09-25 02:16) by Yuan Luo: https://github.com/sgl-project/sglang/actions/runs/17994892894 - Run #15814 (2025-09-25 00:53) by diwei sun: https://github.com/sgl-project/sglang/actions/runs/17993616261 - Run #15812 (2025-09-25 00:35) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993338746 9. bench-test-2-gpu-amd (linux-mi300-gpu-2): 15 times Last Success: Run #28893 (2025-09-24 13:35) by Xiaoze Fan: https://github.com/sgl-project/sglang/actions/runs/17978451434 Recent Failures: - Run #28957 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860400 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 10. performance-test-1-gpu-part-2-amd (linux-mi300-gpu-1): 15 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 11. accuracy-test-1-gpu-amd (linux-mi325-gpu-1): 15 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 12. unit-test-backend-8-gpu-amd (linux-mi300-gpu-8): 15 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 13. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 1): 14 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 14. unit-test-backend-2-gpu-amd (linux-mi300-gpu-2): 14 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 15. performance-test-1-gpu-part-1-amd (linux-mi325-gpu-1): 13 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 16. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 2): 13 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 17. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 4): 13 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 18. accuracy-test-2-gpu-amd (linux-mi325-gpu-2): 13 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 19. mla-test-1-gpu-amd (linux-mi325-gpu-1): 13 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 20. accuracy-test-2-gpu-amd (linux-mi300-gpu-2): 13 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 21. accuracy-test-1-gpu-amd (linux-mi300-gpu-1): 12 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 22. performance-test-1-gpu-part-2-amd (linux-mi325-gpu-1): 12 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 23. bench-test-2-gpu-amd (linux-mi325-gpu-2): 11 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28957 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860400 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 24. unit-test-sgl-kernel-amd (linux-mi325-gpu-1): 11 times Last Success: Run #28891 (2025-09-24 12:44) by Yuan Luo: https://github.com/sgl-project/sglang/actions/runs/17977053408 Recent Failures: - Run #28956 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826732 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 25. performance-test-1-gpu-part-1-amd (linux-mi300-gpu-1): 11 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 26. unit-test-backend-1-gpu-amd (linux-mi300-gpu-1, 6): 11 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 27. unit-test-backend-2-gpu-amd (linux-mi325-gpu-2): 11 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 28. unit-test-backend-1-gpu (9): 10 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34623 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826751 - Run #34617 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619818 - Run #34581 (2025-09-24 19:49) by Yineng Zhang: https://github.com/sgl-project/sglang/actions/runs/17987860976 29. unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 0): 10 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 30. unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 1): 10 times Last Success: Run #28891 (2025-09-24 12:44) by Yuan Luo: https://github.com/sgl-project/sglang/actions/runs/17977053408 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 31. mla-test-1-gpu-amd (linux-mi300-gpu-1): 10 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 32. unit-test-backend-1-gpu (5): 9 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34624 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860412 - Run #34617 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619818 - Run #34560 (2025-09-24 17:01) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17983919007 33. unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 2): 9 times Last Success: Run #28906 (2025-09-24 15:43) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17982029749 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 34. unit-test-sgl-kernel-amd (linux-mi300-gpu-1): 9 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28952 (2025-09-24 23:57) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992751764 - Run #28951 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619816 35. unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 4): 7 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28955 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426068 - Run #28953 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178855 - Run #28949 (2025-09-24 23:44) (PR #10372 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992591372 36. unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 6): 7 times Last Success: Run #28890 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435645 Recent Failures: - Run #28950 (2025-09-24 23:45) (PR #1 by Xiaoyu Zhang): https://github.com/sgl-project/sglang/actions/runs/17992598523 - Run #28946 (2025-09-24 23:39) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992521547 - Run #28936 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244192 37. vllm-dependency-test: 6 times Last Success: Run #22949 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435651 Recent Failures: - Run #23028 (2025-09-25 02:39) by xuyongfei.xyf: https://github.com/sgl-project/sglang/actions/runs/17995251178 - Run #23021 (2025-09-25 02:16) by Yuan Luo: https://github.com/sgl-project/sglang/actions/runs/17994892873 - Run #22993 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244213 38. per-commit-4-ascend-npu: 6 times Last Success: Run #10065 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435703 Recent Failures: - Run #10138 (2025-09-25 02:17) by wangyi: https://github.com/sgl-project/sglang/actions/runs/17994908950 - Run #10137 (2025-09-25 02:16) by Yuan Luo: https://github.com/sgl-project/sglang/actions/runs/17994892896 - Run #10124 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619819 39. unit-test-backend-2-gpu (0): 6 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34624 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860412 - Run #34593 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244227 - Run #34576 (2025-09-24 18:46) by eigen: https://github.com/sgl-project/sglang/actions/runs/17986403452 40. unit-test-backend-1-gpu (4): 6 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34623 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826751 - Run #34609 (2025-09-24 23:25) (PR #10853 by Yineng Zhang): https://github.com/sgl-project/sglang/actions/runs/17992311361 - Run #34560 (2025-09-24 17:01) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17983919007 41. run-all-notebooks: 6 times Last Success: Run #26939 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435610 Recent Failures: - Run #26988 (2025-09-24 23:25) (PR #10853 by Yineng Zhang): https://github.com/sgl-project/sglang/actions/runs/17992311396 - Run #26982 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244193 - Run #26973 (2025-09-24 18:46) by eigen: https://github.com/sgl-project/sglang/actions/runs/17986403458 42. per-commit-2-ascend-npu: 5 times Last Success: Run #10065 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435703 Recent Failures: - Run #10135 (2025-09-25 02:16) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17994888152 - Run #10109 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244207 - Run #10085 (2025-09-24 16:42) by likesen: https://github.com/sgl-project/sglang/actions/runs/17983486537 43. unit-test-backend-8-gpu (0): 5 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34623 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826751 - Run #34621 (2025-09-25 00:40) by Hubert Lu: https://github.com/sgl-project/sglang/actions/runs/17993426098 - Run #34619 (2025-09-25 00:24) (PR #10372 by BBuf): https://github.com/sgl-project/sglang/actions/runs/17993178853 44. pytest-rust: 5 times Last Success: Run #1761 (2025-09-24 16:39) by Chang Su: https://github.com/sgl-project/sglang/actions/runs/17983415401 Recent Failures: - Run #1770 (2025-09-24 21:02) by Simo Lin: https://github.com/sgl-project/sglang/actions/runs/17989538977 - Run #1769 (2025-09-24 20:54) by Simo Lin: https://github.com/sgl-project/sglang/actions/runs/17989380799 - Run #1767 (2025-09-24 20:36) by Ata Fatahi: https://github.com/sgl-project/sglang/actions/runs/17988964074 45. per-commit-16-ascend-a3: 4 times Last Success: Run #10065 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435703 Recent Failures: - Run #10138 (2025-09-25 02:17) by wangyi: https://github.com/sgl-project/sglang/actions/runs/17994908950 - Run #10135 (2025-09-25 02:16) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17994888152 - Run #10109 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244207 46. unit-test-backend-1-gpu (7): 4 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34624 (2025-09-25 01:10) (PR #10883 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993860412 - Run #34573 (2025-09-24 18:45) by Tejesh Anand: https://github.com/sgl-project/sglang/actions/runs/17986382981 - Run #34565 (2025-09-24 17:35) by YAMY: https://github.com/sgl-project/sglang/actions/runs/17984740528 47. unit-test-backend-2-gpu (1): 4 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34593 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244227 - Run #34576 (2025-09-24 18:46) by eigen: https://github.com/sgl-project/sglang/actions/runs/17986403452 - Run #34565 (2025-09-24 17:35) by YAMY: https://github.com/sgl-project/sglang/actions/runs/17984740528 48. per-commit-1-ascend-npu: 3 times Last Success: Run #10065 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435703 Recent Failures: - Run #10138 (2025-09-25 02:17) by wangyi: https://github.com/sgl-project/sglang/actions/runs/17994908950 - Run #10109 (2025-09-24 21:32) by xiafang: https://github.com/sgl-project/sglang/actions/runs/17990244207 - Run #10085 (2025-09-24 16:42) by likesen: https://github.com/sgl-project/sglang/actions/runs/17983486537 49. unit-test-backend-1-gpu (1): 3 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34623 (2025-09-25 01:07) (PR #10495 by Lianmin Zheng): https://github.com/sgl-project/sglang/actions/runs/17993826751 - Run #34554 (2025-09-24 16:29) by Yuan Luo: https://github.com/sgl-project/sglang/actions/runs/17983177051 - Run #34548 (2025-09-24 15:38) by gholmes829: https://github.com/sgl-project/sglang/actions/runs/17981905143 50. unit-test-backend-1-gpu (8): 3 times Last Success: Run #34533 (2025-09-24 12:20) by Yuhong Guo: https://github.com/sgl-project/sglang/actions/runs/17976435636 Recent Failures: - Run #34617 (2025-09-24 23:47) (PR #10881 by Chang Su): https://github.com/sgl-project/sglang/actions/runs/17992619818 - Run #34581 (2025-09-24 19:49) by Yineng Zhang: https://github.com/sgl-project/sglang/actions/runs/17987860976 - Run #34554 (2025-09-24 16:29) by Yuan Luo: https://github.com/sgl-project/sglang/actions/runs/17983177051 Failure Pattern Analysis: GPU Related Failure: 223 times Unit Test Failure: 190 times Accuracy Test Failure: 84 times Performance Test Failure: 54 times Other: 34 times Dependency Installation Failure: 19 times Build Failure: 15 times ``` ### Performance Analyzer Example ``` ============================================================ SGLang Performance Analysis Report ============================================================ Getting recent 100 PR Test runs... Got 100 PR test runs... Collecting performance data from CI runs... Processing run 34882 (2025-09-26 03:16)... Found performance-test-1-gpu-part-1 job (success) Found performance-test-1-gpu-part-2 job (success) Found performance-test-2-gpu job (success) Processing run 34881 (2025-09-26 02:45)... Found performance-test-1-gpu-part-1 job (success) Found performance-test-1-gpu-part-2 job (success) ... Performance data collection completed! Generating performance tables to directory: performance_tables Generated table: performance_tables/performance-test-1-gpu-part-1_summary/test_bs1_default.csv Generated chart: performance_tables/performance-test-1-gpu-part-1_summary/test_bs1_default_output_throughput_token_s.png Generated table: performance_tables/performance-test-1-gpu-part-1_summary/test_online_latency_default.csv Generated chart: performance_tables/performance-test-1-gpu-part-1_summary/test_online_latency_default_median_e2e_latency_ms.png ... Performance tables and charts generation completed! ============================================================ Performance Analysis Summary ============================================================ Total PR Test runs processed: 100 Total performance tests found: 15 Total performance records collected: 1,247 Performance test breakdown: performance-test-1-gpu-part-1: 7 tests, 423 records performance-test-1-gpu-part-2: 5 tests, 387 records performance-test-2-gpu: 6 tests, 437 records Generated files: CSV tables: 18 files PNG charts: 18 files Output directory: performance_tables/ Analysis completed successfully! ``` ## CI Job Categories The tool automatically categorizes CI jobs into: - **sgl-kernel**: Kernel-related tests (build, unit tests, MLA tests) - **unit-test**: Unit tests (frontend, backend with different GPU counts) - **performance**: Performance tests (latency, throughput benchmarks) - **accuracy**: Accuracy tests (model evaluation) - **deepep**: DeepEP-related tests - **b200**: B200 hardware-specific tests ## Failure Patterns The tool recognizes these failure patterns: - **Timeout**: Step execution timeout - **Unit Test Failure**: Unit test execution failures - **Performance Test Failure**: Performance benchmark failures - **Accuracy Test Failure**: Model accuracy evaluation failures - **Build Failure**: Compilation/build process failures - **Dependency Installation Failure**: Package installation issues - **GPU Related Failure**: GPU-specific test failures - **Other**: Unclassified failures ## Troubleshooting ### Common Issues 1. **404 Error**: - Ensure the repository name is correct (`sgl-project/sglang`) - **Most common cause**: Missing `repo` or `workflow` permissions in your GitHub token - Go to [GitHub Settings > Personal Access Tokens](https://github.com/settings/tokens) and regenerate with correct permissions 2. **403 Error**: Check that your GitHub token has the correct permissions (`repo` and `workflow`) 3. **Rate Limiting**: The tool includes built-in delays to avoid API rate limits 4. **Network Issues**: Ensure stable internet connection ### Debug Mode For detailed API call information, you can modify the code to include logging: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` ## Automated Monitoring Both CI and Performance analyzers are available as a GitHub Actions workflow that runs automatically every 6 hours. The workflow: ### CI Analysis - Analyzes the last 1000 CI runs (configurable) - Generates detailed failure reports - Uploads analysis results as JSON artifacts ### Performance Analysis - Analyzes the last 1000 PR Test runs (configurable) - Generates performance trend data and charts - Uploads CSV tables and PNG charts as artifacts ### Workflow Configuration The workflow is located at `.github/workflows/ci-monitor.yml` and uses the `GH_PAT_FOR_NIGHTLY_CI` secret for GitHub API access. ### Manual Trigger You can manually trigger the workflow from the GitHub Actions tab with custom parameters: - `limit`: Number of CI runs to analyze (default: 1000) ### Artifacts Generated The workflow generates and uploads the following artifacts: - **CI Analysis**: JSON files with failure analysis data - **Performance Analysis**: - CSV files with performance metrics organized by test type - PNG charts showing performance trends over time - Directory structure: `performance_tables_{timestamp}/` ## License This tool follows the same license as the SGLang project.