Unverified Commit 6720dfb6 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: fix SGLang docs links (docs.sglang.ai → docs.sglang.io) (#5894)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Co-authored-by: default avatarCursor <cursoragent@cursor.com>
parent cad453f2
...@@ -133,7 +133,7 @@ sudo apt install python3-dev ...@@ -133,7 +133,7 @@ sudo apt install python3-dev
uv pip install "ai-dynamo[sglang]" uv pip install "ai-dynamo[sglang]"
``` ```
> **Note:** For CUDA 13 (B300/GB300), the container is recommended. See [SGLang install docs](https://docs.sglang.ai/start/install.html) for details. > **Note:** For CUDA 13 (B300/GB300), the container is recommended. See [SGLang install docs](https://docs.sglang.io/get_started/install.html) for details.
**TensorRT-LLM** **TensorRT-LLM**
......
...@@ -63,7 +63,7 @@ Install system dependencies and the Dynamo wheel for your chosen backend: ...@@ -63,7 +63,7 @@ Install system dependencies and the Dynamo wheel for your chosen backend:
.. note:: .. note::
For CUDA 13 (B300/GB300), the container is recommended. See For CUDA 13 (B300/GB300), the container is recommended. See
`SGLang install docs <https://docs.sglang.ai/start/install.html>`_ for details. `SGLang install docs <https://docs.sglang.io/get_started/install.html>`_ for details.
**TensorRT-LLM** **TensorRT-LLM**
......
...@@ -9,7 +9,7 @@ SPDX-License-Identifier: Apache-2.0 ...@@ -9,7 +9,7 @@ SPDX-License-Identifier: Apache-2.0
When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with `sglang:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint. When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with `sglang:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint.
**For the complete and authoritative list of all SGLang metrics**, always refer to the [official SGLang Production Metrics documentation](https://docs.sglang.ai/references/production_metrics.html). **For the complete and authoritative list of all SGLang metrics**, always refer to the [official SGLang Production Metrics documentation](https://docs.sglang.io/references/production_metrics.html).
**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](../../observability/metrics.md). **For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](../../observability/metrics.md).
...@@ -77,7 +77,7 @@ sglang:generation_tokens_total{model_name="meta-llama/Llama-3.1-8B-Instruct"} 75 ...@@ -77,7 +77,7 @@ sglang:generation_tokens_total{model_name="meta-llama/Llama-3.1-8B-Instruct"} 75
sglang:cache_hit_rate{model_name="meta-llama/Llama-3.1-8B-Instruct"} 0.0075 sglang:cache_hit_rate{model_name="meta-llama/Llama-3.1-8B-Instruct"} 0.0075
``` ```
**Note:** The specific metrics shown above are examples and may vary depending on your SGLang version. Always inspect your actual `/metrics` endpoint or refer to the [official documentation](https://docs.sglang.ai/references/production_metrics.html) for the current list. **Note:** The specific metrics shown above are examples and may vary depending on your SGLang version. Always inspect your actual `/metrics` endpoint or refer to the [official documentation](https://docs.sglang.io/references/production_metrics.html) for the current list.
### Metric Categories ### Metric Categories
...@@ -88,7 +88,7 @@ SGLang provides metrics in the following categories (all prefixed with `sglang:` ...@@ -88,7 +88,7 @@ SGLang provides metrics in the following categories (all prefixed with `sglang:`
- **Latency metrics** - Request and token latency measurements - **Latency metrics** - Request and token latency measurements
- **Disaggregation metrics** - Metrics specific to disaggregated deployments (when enabled) - **Disaggregation metrics** - Metrics specific to disaggregated deployments (when enabled)
**Note:** Specific metrics are subject to change between SGLang versions. Always refer to the [official documentation](https://docs.sglang.ai/references/production_metrics.html) or inspect the `/metrics` endpoint for your SGLang version. **Note:** Specific metrics are subject to change between SGLang versions. Always refer to the [official documentation](https://docs.sglang.io/references/production_metrics.html) or inspect the `/metrics` endpoint for your SGLang version.
## Available Metrics ## Available Metrics
...@@ -99,7 +99,7 @@ The official SGLang documentation includes complete metric definitions with: ...@@ -99,7 +99,7 @@ The official SGLang documentation includes complete metric definitions with:
- Setup guide for Prometheus + Grafana monitoring - Setup guide for Prometheus + Grafana monitoring
- Troubleshooting tips and configuration examples - Troubleshooting tips and configuration examples
For the complete and authoritative list of all SGLang metrics, see the [official SGLang Production Metrics documentation](https://docs.sglang.ai/references/production_metrics.html). For the complete and authoritative list of all SGLang metrics, see the [official SGLang Production Metrics documentation](https://docs.sglang.io/references/production_metrics.html).
## Implementation Details ## Implementation Details
...@@ -111,7 +111,7 @@ For the complete and authoritative list of all SGLang metrics, see the [official ...@@ -111,7 +111,7 @@ For the complete and authoritative list of all SGLang metrics, see the [official
## Related Documentation ## Related Documentation
### SGLang Metrics ### SGLang Metrics
- [Official SGLang Production Metrics](https://docs.sglang.ai/references/production_metrics.html) - [Official SGLang Production Metrics](https://docs.sglang.io/references/production_metrics.html)
- [SGLang GitHub - Metrics Collector](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/metrics/collector.py) - [SGLang GitHub - Metrics Collector](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/metrics/collector.py)
### Dynamo Metrics ### Dynamo Metrics
......
...@@ -82,7 +82,7 @@ Note the IP address of this node - you'll need it for worker configuration. ...@@ -82,7 +82,7 @@ Note the IP address of this node - you'll need it for worker configuration.
### 2. Software Requirements ### 2. Software Requirements
Install Dynamo with [SGLang](https://docs.sglang.ai/) support: Install Dynamo with [SGLang](https://docs.sglang.io/) support:
```bash ```bash
pip install ai-dynamo[sglang] pip install ai-dynamo[sglang]
......
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with `sglang:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint. When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with `sglang:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint.
**For the complete and authoritative list of all SGLang metrics**, always refer to the [official SGLang Production Metrics documentation](https://docs.sglang.ai/references/production_metrics.html). **For the complete and authoritative list of all SGLang metrics**, always refer to the [official SGLang Production Metrics documentation](https://docs.sglang.io/references/production_metrics.html).
**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](../../observability/metrics.md). **For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](../../observability/metrics.md).
...@@ -77,7 +77,7 @@ sglang:generation_tokens_total{model_name="meta-llama/Llama-3.1-8B-Instruct"} 75 ...@@ -77,7 +77,7 @@ sglang:generation_tokens_total{model_name="meta-llama/Llama-3.1-8B-Instruct"} 75
sglang:cache_hit_rate{model_name="meta-llama/Llama-3.1-8B-Instruct"} 0.0075 sglang:cache_hit_rate{model_name="meta-llama/Llama-3.1-8B-Instruct"} 0.0075
``` ```
**Note:** The specific metrics shown above are examples and may vary depending on your SGLang version. Always inspect your actual `/metrics` endpoint or refer to the [official documentation](https://docs.sglang.ai/references/production_metrics.html) for the current list. **Note:** The specific metrics shown above are examples and may vary depending on your SGLang version. Always inspect your actual `/metrics` endpoint or refer to the [official documentation](https://docs.sglang.io/references/production_metrics.html) for the current list.
### Metric Categories ### Metric Categories
...@@ -88,7 +88,7 @@ SGLang provides metrics in the following categories (all prefixed with `sglang:` ...@@ -88,7 +88,7 @@ SGLang provides metrics in the following categories (all prefixed with `sglang:`
- **Latency metrics** - Request and token latency measurements - **Latency metrics** - Request and token latency measurements
- **Disaggregation metrics** - Metrics specific to disaggregated deployments (when enabled) - **Disaggregation metrics** - Metrics specific to disaggregated deployments (when enabled)
**Note:** Specific metrics are subject to change between SGLang versions. Always refer to the [official documentation](https://docs.sglang.ai/references/production_metrics.html) or inspect the `/metrics` endpoint for your SGLang version. **Note:** Specific metrics are subject to change between SGLang versions. Always refer to the [official documentation](https://docs.sglang.io/references/production_metrics.html) or inspect the `/metrics` endpoint for your SGLang version.
## Available Metrics ## Available Metrics
...@@ -99,7 +99,7 @@ The official SGLang documentation includes complete metric definitions with: ...@@ -99,7 +99,7 @@ The official SGLang documentation includes complete metric definitions with:
- Setup guide for Prometheus + Grafana monitoring - Setup guide for Prometheus + Grafana monitoring
- Troubleshooting tips and configuration examples - Troubleshooting tips and configuration examples
For the complete and authoritative list of all SGLang metrics, see the [official SGLang Production Metrics documentation](https://docs.sglang.ai/references/production_metrics.html). For the complete and authoritative list of all SGLang metrics, see the [official SGLang Production Metrics documentation](https://docs.sglang.io/references/production_metrics.html).
## Implementation Details ## Implementation Details
...@@ -111,7 +111,7 @@ For the complete and authoritative list of all SGLang metrics, see the [official ...@@ -111,7 +111,7 @@ For the complete and authoritative list of all SGLang metrics, see the [official
## Related Documentation ## Related Documentation
### SGLang Metrics ### SGLang Metrics
- [Official SGLang Production Metrics](https://docs.sglang.ai/references/production_metrics.html) - [Official SGLang Production Metrics](https://docs.sglang.io/references/production_metrics.html)
- [SGLang GitHub - Metrics Collector](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/metrics/collector.py) - [SGLang GitHub - Metrics Collector](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/metrics/collector.py)
### Dynamo Metrics ### Dynamo Metrics
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment